I used a pdftotext function to convert a list of PDF files to text. In most cases, it can be converted normally. But in a few cases, the final result is a blank text file. It is worth mentioning that my PDF document is not encrypted or signed, and it is not read-only.
This article on XsPDF.com is what you looking for, check out: PdfToText c#
Answers
In fact, in many tests, the process of PDF to text was not as simple as it sounded. Sometimes complete conversions do not work at all. Just as you mentioned, you get a blank text file. What is the reason?
From PDF to text can rely on OCR technology or use pdftotext tools to achieve.
With OCR technology, plain text and location information on a PDF document can be obtained. However, the text information of PDF is distributed throughout PDF document. Even one alphabet may have different changes, such as italics, underlining, and different positions. When processing with OCR technology, it is not always possible to fully notice the changes of these points, which are confusing. So, the accuracy of OCR recognition is not as high as expected.
One these cases, I would like to recommend you to try pdftotext tools like XsPDF text extractor. In this way, you can get the text content of entire PDF document, or get a part of it.