Last line is poorly recognized

73 views
Skip to first unread message

Дарья Арбузова

unread,
Oct 9, 2014, 5:45:17 AM10/9/14
to tesser...@googlegroups.com
Hello!

I'm surprised I couldn't google this problem.
I'm using tesseract to recognize a tif-file, which is an image of a doc-file, but turns out the line with a lot of empty space before/after it is pain. It's being recognized in all caps and with bad quality.
Maybe the problem here is to allocate the line?
Any thoughts?

Thank you!

Daria

Дарья Арбузова

unread,
Oct 16, 2014, 8:08:19 AM10/16/14
to tesser...@googlegroups.com
Maybe I should specify a different pagesegmode?

четверг, 9 октября 2014 г., 13:45:17 UTC+4 пользователь Дарья Арбузова написал:
Message has been deleted

Zunair Fayaz

unread,
Oct 16, 2014, 11:07:49 AM10/16/14
to tesser...@googlegroups.com
Try exporting your doc with 300dpi or more

Дарья Арбузова

unread,
Oct 23, 2014, 4:26:58 AM10/23/14
to tesser...@googlegroups.com
Zunair, I already used 600 dpi.
I'm now wondering whether it happens only with Russian text.

четверг, 16 октября 2014 г., 19:07:49 UTC+4 пользователь Zunair Fayaz написал:

bulk...@gmail.com

unread,
Oct 23, 2014, 10:50:00 PM10/23/14
to tesser...@googlegroups.com
С русским у тессеракта совсем беда :)
Reply all
Reply to author
Forward
0 new messages