Digits only for tesseract4

50 views
Skip to first unread message

Declic73

unread,
Sep 4, 2017, 2:54:28 PM9/4/17
to tesseract-ocr
Hello,

I am trying tesseract 4 for a project to read digits on different surfaces. 

currently I invoke tesseract with the following options :
--oem 1 -l eng -c tessedit_char_whitelist=0123456789 digits

when using the "best" testdata (from https://github.com/tesseract-ocr/tessdata/tree/master/best ) it only works in oem 1 mode and returns all kinds of characters, it completely ignores the char whitelist and/or the digits directive

I am wondering how to get the best out of tesseract for digits only ?

How to get the so called "best" traineddata in digits only mode ?

Is there somewhere some traineddata focusing on digits only (from different fonts ?) ?

Is my setup with tesseract4 and the options above the best way to run on digits ?

Thank you very much for the help.

Declic

ShreeDevi Kumar

unread,
Sep 4, 2017, 9:59:27 PM9/4/17
to tesser...@googlegroups.com
Tesseract 4 does not honor whitelist, digits etc. Use an older version such as 3.02, 3.04.

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscribe@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/d80c9cb7-ffc2-4f3d-98c7-3402cb6451df%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages