Tesseract improve prediction accuracy

96 views
Skip to first unread message

Kehinde Adeoya

unread,
Dec 2, 2022, 9:27:09 AM12/2/22
to tesseract-ocr
Environment
  • Tesseract Version: 5.0.1-1.5.7, Tessdata: 3.04, Langdata: 3.04
  • Platform: 21.5.0 Darwin Kernel Version 21.5.0: root:xnu-8020.121.3~4/RELEASE_X86_64 x86_64 i386 Darwin
Current Behavior:

Tesseract is unable to differentiate between font weights. After training a font, in the project, there are varying font weights used from 100, 200, to 900. Are there provisions for how to get font-weight as attributes as it only returns bold? There is no way to check the weights.

Passport

Secondly, Tesseract seems unstable in predictions. I have done all that has been recommended to improve accuracy and yet the prediction seems indefinite. The image above is a prime example, there are times it'll see it as bold, which is correct. In the next run, it might start seeing it as a normal font. The font-weight is 700, which interprets as bold. I have run the same test case more than 10 times, and the result could be bold=6, normalfont=4.

Expected Behavior:

It should be consistent in prediction and differentiate between font-weights.

Reply all
Reply to author
Forward
0 new messages