Inaccurate output from Windows Screen Shot PNG OCR

57 views
Skip to first unread message

Testing Windows Screenshots

unread,
Jul 15, 2018, 5:57:48 AM7/15/18
to tesseract-ocr
I'm using TessBaseAPI to scan a png of WIndows screenshot,

Issue 1) List of numbers are inaccurate
7254134516423432  1324152643132424   324326178176892

text out=> 72541345 16423432  1324152643132424   324326178176892

Tesseract put a space between 5 and 1


2) Missing numbers

If there are 2 notepad windows in screenshot it fails to get the 2nd list, just fragments


Input File is a png high quality. I increase resolution to 300%.

I tried setting textord_space_size_is_variable to 1
and
I tried setting tosp_min_sane_kn_sp to different values but no difference,


How can I get the best results?


Thanks


Testing Windows Screenshots

unread,
Jul 15, 2018, 9:03:02 AM7/15/18
to tesseract-ocr
Had a breakthrough - John's reply: "Gimp is your friend."



Note:
Gimp Scale Image: Quality: Interpolation must be set to Linear.  Cubic gave only 60% text. Now results are 95%. bottom row of numbers is appearing.


But still space between 5 and 1.


Reply all
Reply to author
Forward
0 new messages