Multi-language Support
Supports over 100 languages and scripts, allowing users to process documents in multiple languages with high accuracy.
Highly Accurate Text Recognition
Utilizes advanced LSTM neural networks for improved recognition accuracy, especially on clean, high-quality images.
Flexible Output Formats
Generates output in plain text, hOCR (HTML-based OCR), PDF, and TSV formats, enabling versatile downstream processing.
Open Source and Extensible
Fully open-source under the Apache 2.0 license, allowing developers to customize, extend, and integrate the engine into their own projects.
Command Line and API Access
Offers both command-line tools for batch processing and APIs for integration with programming languages like Python, Java, and C++.
Support for Image Preprocessing
Includes tools and recommendations for image preprocessing such as binarization and deskewing to improve OCR results.