# Test 1: No options
$ tesseract cropped.tif stdout
Page 1
Empty page!!
Empty page!!
# Test 2: Setting psm, resulted in better results but still lots of junk
$ tesseract cropped.tif stdout -psm 11
Page 1
14-15
..................
10-11
113-14
_ I.
i
# Test 3: Setting psm and whitelisting
# ./config/digits file
tessedit_char_whitelist 0123456789
$ tesseract cropped.tif stdout -psm 11 ./config/digits
Page 1
14 15
10 11
113 14
3
As you can see, I got the best results when I whitelisted for just 0-9 (test 3). However, it's still not perfect and missing the 18, which is probably the most critical for my application.
I did some tweaking of the command line values (i.e. http://www.sk-spell.sk.cx/tesseract-ocr-parameters-in-302-version) but this didn't result in anything better.
Are there any other suggested configuration parameters I can play with to increase accuracy?
Thanks.
--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscribe@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/4dfec158-280e-446d-a5ae-cf0b93e9d392%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.