OCR (Optical Character Recognition) scanner is converting image to text in .NET application. Be able to extract text from images (JPEG, BMP, TIFF, GIF, PNG) in high accurate and fast. OCR reader can analyse and recognize 100+ languages and font, contains all the western language and CJK (Chinese, Japanese and Korean). Converting scanned images into editable text is not hard for you now.

OCR is critical for scanned documents conversion into machine-encoded/computer-readable text. As for a computer, the scanned image or document is just a picture. So, you need to use OCR technology to recognize the characters of scanned document and transter all encoded info to your PC or program.

This online guide shows you how to convert image to text using C# language.


// Please note:
// If you choose the x64 platform, please copy the "XsOCR_Tesseract.dll" and "XsOCR_Lept.dll"
// from the x64 folder to the same level path which "XsOCR.dll" in. 
// Otherwise, please copy from x86 folder.

// Create an OCR Engine instance
OCREngine engine = new OCREngine();
// Set the absolute path of tessdata
engine.DataPath = "F:/tessdata/";
// Set the target text language you want to recognize
engine.Language = "eng";
// Recognize text from image file
string text = engine.DoOCR("F:/sample.jpg");

System.Console.WriteLine(text);

Find and Download all OCR language data from this page.

Notice - If you used the trial version of OCR SDK, the first character of result is symbol "?"