Recognize text from multipage tiff images, it's as easy as looping through the image pages and OCRing each one, and appending a text file with the result.
XsOCR provides developers with the most standard and comprehensive Optical Character Recognition SDK that is fully developed, highly accurate and easy to work within C#.NET, VB.NET, ASP.NET web and .NET WinForms program development environments. This online tutorial mainly talks about high level OCR toolkit in C# class programming. With this C# imaging OCR SDK, users are supposed to extract text from multiple pages tiff image format. OCR scanner control can recognize and detect text from all tiff pages
This online guide shows you how to extract text from multi-page tiff document using C# language.
// Please note: // If you choose the x64 platform, please copy the "XsOCR_Tesseract.dll" and "XsOCR_Lept.dll" // from the x64 folder to the same level path which "XsOCR.dll" in. // Otherwise, please copy from x86 folder. // i.e. the "XsOCR.dll" is in "/bin/", the "XsOCR_Tesseract.dll" and "XsOCR_Lept.dll" // need to be copyed to "/bin/". // Create an OCR Engine instance OCREngine engine = new OCREngine(); // Set the absolute path of tessdata engine.DataPath = "F:/tessdata/"; // Set the target text language you want to recognize engine.Language = "eng"; // Extract text from whole mulitple pages tiff List<string> pages = engine.DoOCRFromMultiPageTiff("F:/sample.tif"); foreach (string page in pages) { // Show text from each tiff page System.Console.WriteLine(page); }
Find and Download all OCR language data from this page.