I need to read a part of PDF file in my c# program. Most of my PDF files are large-sized.
My initial idea was to read and write PDF into a txt file format. In addition, i want to control the size of file written. For example, i can stop reading PDF file and save the txt file when the size of txt file reaches10MB.
I do not know if this idea can be achieved. If so, how can i achieve it in c# project?
This article on XsPDF.com is what you looking for, check out: Reading text from PDF file in c#
Answers
Briefly, you should firstly convert PDF to a text file. There are a lot of toolkits on the market that support this feature, like what i am using, XsPDF.
You can refer to the following steps to control the size of output text file.
- Load PDF document and get the number of its pages;
- Perform single-page pdf to text conversion gradually and detect the size of output text file in real time;
- Once text file size reaches 10 MB, stop the current conversion dynamically. And then, transfer to another file for conversion automatically.
You may use a PDF library to achieve this. Recently, i used XsPDF's library to read pdf pages and successfully extracted its page content. If you want to use it for commercial purposes, you need to purchase its product license.