Improving accuracy on recognition Tesseract 4.3.1

213 views
Skip to first unread message

Nenad Kocev

unread,
Feb 24, 2019, 12:21:17 PM2/24/19
to tesseract-ocr
Hello, I recently discovered Tesseract and I've been using it to extract digits from images using tess4j library. With the settings posted bellow I get around 85% accuracy of recognition.
Is there a way to get 100% accuracy. I have example of an image in the attachments. Other images may differ only in number of digits they have and may also contain special characters like ",+-". Thanks for your help. 

Settings:
tesseract.setPageSegMode(7); // text is in single line
tesseract.setTessVariable("tessedit_char_whitelist", ",+-0123456789");
tesseract.setTessVariable("load_system_dawg ", "false");
tesseract.setTessVariable("load_freq_dawg ", "false");

3057539.png

Quan Nguyen

unread,
Feb 24, 2019, 7:06:27 PM2/24/19
to tesseract-ocr
The whitelist feature is currently not working in Tesseract 4.0.0.

Alberto Andreotti

unread,
Feb 24, 2019, 7:38:55 PM2/24/19
to tesser...@googlegroups.com
Hello,

You can try the OCR preprocessing  in spark NLP, if you are on Python or Scala.
Try to use the scaling option.

Alberto.

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscribe@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/be275b8f-1c58-4793-b2c3-545bc2e5ac74%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

易鑫

unread,
Mar 15, 2019, 5:48:05 AM3/15/19
to tesseract-ocr
The latest Tesseract version is 4.0.0,how do you get the 4.3.1 version?

Alberto Andreotti <albertoa...@gmail.com> 于2019年2月25日周一 上午8:38写道:
Hello,

You can try the OCR preprocessing  in spark NLP, if you are on Python or Scala.
Try to use the scaling option.

Alberto.
On Feb 24, 2019 2:21 PM, "'Nenad Kocev' via tesseract-ocr" <tesser...@googlegroups.com> wrote:
Hello, I recently discovered Tesseract and I've been using it to extract digits from images using tess4j library. With the settings posted bellow I get around 85% accuracy of recognition.
Is there a way to get 100% accuracy. I have example of an image in the attachments. Other images may differ only in number of digits they have and may also contain special characters like ",+-". Thanks for your help. 

Settings:
tesseract.setPageSegMode(7); // text is in single line
tesseract.setTessVariable("tessedit_char_whitelist", ",+-0123456789");
tesseract.setTessVariable("load_system_dawg ", "false");
tesseract.setTessVariable("load_freq_dawg ", "false");

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.

To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.

Quan Nguyen

unread,
Mar 16, 2019, 4:45:49 AM3/16/19
to tesseract-ocr
He was referring to the tess4j version.
Reply all
Reply to author
Forward
0 new messages