Tesseract unstable font property prediction

77 views
Skip to first unread message

Kehinde Adeoya

unread,
Sep 2, 2022, 4:33:46 AM9/2/22
to tesseract-ocr
Tesseract 3.0.5
TessData 3.0.4
Tesseract 5Java binding.

I am using Tesseract 3.0.5 in a project, which is awesome. It works brilliantly well. Lately, I noticed its predictability changes when the same code is run multiple times for the same image text. I was able to train new fonts in different languages. An example is this: when I run to get the font properties of an image, I'm getting these properties: font-name, bold, italic, monospace, serif, and underline. I ran it multiple times on the same image text, and it produces different results for the same image text.

The text on the image should return this result: 
Ubuntu, FALSE, FALSE, FALSE, FALSE, FALSE, PASS, but subsequent runs produce different results for the same text on the same image. 

Runs    Font name    Bold    Italic    Monospace    Serif    Underline    Result
First run:    Ubuntu    FALSE    FALSE    FALSE    FALSE    FALSE    PASS
Second run:   Ubuntu-Italic    FALSE    TRUE    FALSE    FALSE    FALSE    FAIL
Third run:    Ubuntu-Bold    TRUE    FALSE    FALSE    FALSE    FALSE    FAIL

Are there settings to make it more resilient and specific than changing it at every new run?


Screenshot 2022-09-02 at 10.31.32 AM.png
Reply all
Reply to author
Forward
0 new messages