Developers are able to recognize multiple languages by using OCR SDK for .NET in C# programming, such as English, Latin, French, Chinese, Japanese (kanji), Spanish, German, Hindi, Greek, Italian, Russian, Turkish, Dutch, Portuguese, Thai and Korean.
This online guide shows you how to analyze English and German two language at one time using C# code. Please note, make sure the "eng.traineddata" and "deu.traineddata" language files in the "tessdata" folder.
// Please note: // If you choose the x64 platform, please copy the "XsOCR_Tesseract.dll" and "XsOCR_Lept.dll" // from the x64 folder to the same level path which "XsOCR.dll" in. // Otherwise, please copy from x86 folder. // i.e. the "XsOCR.dll" is in "/bin/", the "XsOCR_Tesseract.dll" and "XsOCR_Lept.dll" // need to be copyed to "/bin/". // Create an OCR Engine instance OCREngine engine = new OCREngine(); // Set the absolute path of tessdata engine.DataPath = "F:/tessdata/"; // Set multiple languages you want to detect and analyze engine.Language = "eng+deu"; // Recognize text from image file string text = engine.DoOCR("F:/sample.jpg"); System.Console.WriteLine(text);
Find and Download all OCR language data from this page.