Tesseract LSTM competitive word recognition (at least for certain use cases)

Jozef M.

unread,

Nov 10, 2025, 4:54:06 PM11/10/25

to tesser...@googlegroups.com

Dear Tesseract Community,

We run a high-volume, multi-engine OCR pipeline that includes Tesseract 4 (LSTM, Latin), a custom Tesseract 3 (outline-based) model for specific cases, and newer OCR models on low-resource serverless environment.
We wanted to share some brief internal results that may be useful to the community.

Key points

Data: Internal printed-word sets. We cannot publish the datasets. The goal here is practical relevance, not reproducibility claims.
Scope: These results focus strictly on word recognition.
Context: We did not evaluate other valuable Tesseract features (e.g., segmentation, CPU performance) or address known limitations (e.g., GPU support or the practicality of generic LSTM retraining); however, they might be important for your use case.

Findings

Confidence Calibration For Tesseract LSTM based models, there is a strong link between confidence and correctness: most errors sit at lower confidence levels.

This makes thresholding and model voting reasonably straightforward. In our tests, the confidence distributions of Tesseract LSTM models are usable for such decisions.

Note that the Tesseract 3 outline-based matching model is more noise-sensitive on our data, reinforcing that the tested dataset is not "easy".

Confidence scores limited to the [0, 100] range. For a single confidence level, there are two corresponding values, red and green, where ideally, high confidence has a low red value and a high green value (and vice-versa at the lower confidence levels).

Head-to-Head Comparisons Direct word-level comparisons show a meaningful share of cases where Tesseract LSTM model is correct while others are not.
This complementary behavior means Tesseract LSTM model still adds significant value in an ensemble, despite being an older engine.

Conclusion
Mature engines like Tesseract are not obsolete (at least for certain use cases). In our pipeline, Tesseract LSTM word recognition remains competitive and, importantly, provides well-calibrated confidence scores that are useful for filtering and ensemble voting.

Best regards,

Jozef Misutka

Ger Hobbelt

unread,

Nov 11, 2025, 4:48:16 AM11/11/25

to tesseract-ocr

Thank you for publishing this.

Question: the -1 confidence numbers for T3 and T4 in the charts: could you tell us what happened there? (Smells like mapping software failures to a score number; the word counts for these are pretty high so I'm very curious what went on there!)

Met vriendelijke groeten / Best regards,

Ger Hobbelt

--------------------------------------------------
web: http://www.hobbelt.com/
http://www.hebbut.net/
mail: g...@hobbelt.com
mobile: +31-6-11 120 978
--------------------------------------------------

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/tesseract-ocr/CABCmPq0AtjM-nY0vb%2B2PWwLRkqkf2Kkznp%3DoTNL1T678VQjAhA%40mail.gmail.com.

Jakub Hybl

unread,

Mar 31, 2026, 7:48:55 AMMar 31

to tesseract-ocr

Thank you for your follow-up and interest in the technical details.

Regarding the -1 values for T3 and T4: these represent a specific classification within our evaluation methodology.

To maintain the integrity of our single-word accuracy analysis, we used this value to categorize instances where the OCR engines returned multiple words for a single-word source image. This allowed us to effectively separate segmentation variances from standard character recognition scores. Given the scale of our dataset, these instances were grouped together to provide a complete picture of the engines' behavior.

I hope this clarifies the anomaly. Please let me know if you have any further questions.

Best regards,
Jakub Hybl

Reply all

Reply to author

Forward