Key Features

What you can do

🌐

Multi-language Support

Supports over 100 languages and scripts, allowing users to process documents in multiple languages with high accuracy.

🧠

Highly Accurate Text Recognition

Utilizes advanced LSTM neural networks for improved recognition accuracy, especially on clean, high-quality images.

📚

Flexible Output Formats

Generates output in plain text, hOCR (HTML-based OCR), PDF, and TSV formats, enabling versatile downstream processing.

💻

Open Source and Extensible

Fully open-source under the Apache 2.0 license, allowing developers to customize, extend, and integrate the engine into their own projects.

💻

Command Line and API Access

Offers both command-line tools for batch processing and APIs for integration with programming languages like Python, Java, and C++.

⚙️

Support for Image Preprocessing

Includes tools and recommendations for image preprocessing such as binarization and deskewing to improve OCR results.