PDF to text converter to extract text data from PDF files without having to install any software.

The Portable Document Format (PDF) is designed for end-use files, those that will be viewed and printed, but not substantially modified. You may want to extract text from PDF files.

XsPDF text extractor is designed to extract text from Adobe PDF files for use in other applications. To extract text from a PDF file, the PDF file must meet the condition which is formatted to contain text and not just images, otherwise, you may need the PDF OCR tool, it can recognize text from PDF and images.

In this C# guide, we will show how you can easily extract text from PDF files or convert PDF files to text files.


Extract text content from each PDF page using CSharp.
// Read a local PDF file in the disk
PdfTextExtractor extractor = new PdfTextExtractor("sample.pdf");

// Set layout of output text. If true, the extract text from pdf page will keey the location
// from top to bottom, and from left to right, such as table, list, multi-column and so on.
extractor.KeepTableLikeStyle = true;

for (int i = 0; i < extractor.PageCount; i++)
{
    // Extract text from each page of PDF
    string text = extractor.ExtractTextFromPage(i);
    Console.WriteLine(text);
}

All the text extracted from the PDF page will be combined together, removing all the style and layout, not distinguishing the title, paragraph, list, form or table.

Notice - If you used the trial version of PDF SDK, can only extract text in the first 3 pages

More PDF tutorial