Help needed: 12xT not recognized

Nicolas Nickisch

unread,

Jun 7, 2015, 1:10:38 PM6/7/15

to tesser...@googlegroups.com

I try to use tesseract 3.03 to OCR scanned pages.

In many cases 1 scan job contains many jobs and they are separated by feeding a special spearator page between the jobs to separate them.

This page contains only 12 "T" on the left top of the page (and a second line head down at the right bottom).

I tried a lot, but it seems that tesseract completely ignores this text, even the scan looks great. That page is completely empty! The rest of the OCRed text looks also good.

The idea is not mine, but i have to use this kind of separation.

Is there something i can do to improve recognition of this sepcial text ?

Nicolas Nickisch

Павел Щербаков

unread,

Jun 8, 2015, 2:26:55 AM6/8/15

to tesser...@googlegroups.com

Can you provide an image that you're trying to recognize?

Also, English language? Shell application or API? Do you use some special whitelists, configs or psm?

Nicolas Nickisch

unread,

Jun 9, 2015, 4:42:08 PM6/9/15

to tesser...@googlegroups.com

The "usable" data is normally in german.

I have to generate a test job with non-sense data. Takes some time.

Reply all

Reply to author

Forward