Tesseract unstable font property prediction

77 views

Skip to first unread message

Kehinde Adeoya

unread,

Sep 2, 2022, 4:33:46 AM9/2/22

to tesseract-ocr

Tesseract 3.0.5

TessData 3.0.4

Tesseract 5Java binding.

I am using Tesseract 3.0.5 in a project, which is awesome. It works brilliantly well. Lately, I noticed its predictability changes when the same code is run multiple times for the same image text. I was able to train new fonts in different languages. An example is this: when I run to get the font properties of an image, I'm getting these properties: font-name, bold, italic, monospace, serif, and underline. I ran it multiple times on the same image text, and it produces different results for the same image text.

The text on the image should return this result:

Ubuntu, FALSE, FALSE, FALSE, FALSE, FALSE, PASS, but subsequent runs produce different results for the same text on the same image.

Runs Font name Bold Italic Monospace Serif Underline Result
First run: Ubuntu FALSE FALSE FALSE FALSE FALSE PASS
Second run: Ubuntu-Italic FALSE TRUE FALSE FALSE FALSE FAIL
Third run: Ubuntu-Bold TRUE FALSE FALSE FALSE FALSE FAIL

Are there settings to make it more resilient and specific than changing it at every new run?

Screenshot 2022-09-02 at 10.31.32 AM.png

Reply all

Reply to author

Forward

0 new messages