PDF to text converter to explore text data from PDF document without Acrobat control.

If the PDF is filled with text content, the PDF text extractor is just suite for you. If the PDF is filled with image content, maybe you need the OCR tool to recognize the text in.

In this C# guide, we will show how you can easily extract text from PDF files or convert PDF files to text files in aspx.cs, it's easy to integrate the text extracting feature to your web programming in windows servers.

For "KeepTableLikeStyle" property, if set to true, the text extracted is keep the layout from top to bottom, and from left to right, such as table, list, multi-column and so on.


Extract text content from each PDF page in ASP.NET
// Initialize a PDF extractor by reading a local PDF file in the server
PdfTextExtractor extractor = new PdfTextExtractor("sample.pdf");

// Set layout of output text.
extractor.KeepTableLikeStyle = true;

for (int i = 0; i < extractor.PageCount; i++)
{
    // Extract text from each page of PDF
    string text = extractor.ExtractTextFromPage(i);
    Console.WriteLine(text);
}

All the text extracted from the PDF page will be combined together, removing all the style, not distinguishing the title, paragraph, list, form or table.

Notice - If you used the trial version of PDF SDK, can only extract text in the first 3 pages