OCR missing out single Characters - why ?

1,274 views
Skip to first unread message

Bernhard Gramberg

unread,
Feb 28, 2017, 11:11:32 AM2/28/17
to tesseract-ocr
Hi, 

I make OCR from a special picture,
which is build out of several pictures 

I make separators in between to better pick up the real content  

In the middle, there is a single letter (in this example 5 + 7 ) 
which are not recogniced. (no differnz with letters as well) 

Its with Windows, version 3.5 and 4.0 as well , 
the short digit / letters (same problem) is not recognized 

Any idea, what to do, to have the 5 + 7 detected ? 

Yours Bernhard  
screen-vertikal.png
screen-vertikal.txt

ShreeDevi Kumar

unread,
Mar 1, 2017, 1:47:29 AM3/1/17
to tesser...@googlegroups.com
try with --psm 6

Here is the output I got - using english traineddata on 4.0 version (using gimagereader)

‘IO

EEEEE FFFFF DDDDD

7

EEEEE FFFFF DDDDD

5

EEEEE FFFFF DDDDD

12

EEEEE FFFFF DDDDD

EEEEE FFFFF DDDDD

EEEEE FFFFF DDDDD

Wlie viele verschiedene Plagen gab es in

Agypten, bevor Moses das israelische

Volk befreite (2, Buch Mose)?

EEEEE FFFFF DDDDD



​---------------

​tesseract ./screen-vertikal.png ./screen-vertikal  --oem 1 --psm 6 -l deu

Output file attached ​



screen-vertikal.txt

Bernhard Gramberg

unread,
Mar 1, 2017, 10:39:52 AM3/1/17
to tesseract-ocr
Hi, this helped me a lot. 

I tried a little, the importent parameter was --psm 6

 Page segmentation modes:
  0    Orientation and script detection (OSD) only.
  1    Automatic page segmentation with OSD.
  2    Automatic page segmentation, but no OSD, or OCR.
  3    Fully automatic page segmentation, but no OSD. (Default)
  4    Assume a single column of text of variable sizes.
  5    Assume a single uniform block of vertically aligned text.
  6    Assume a single uniform block of text.
  7    Treat the image as a single text line.
  8    Treat the image as a single word.
  9    Treat the image as a single word in a circle.
 10    Treat the image as a single character.
 11    Sparse text. Find as much text as possible in no particular order.
 12    Sparse text with OSD.
 13    Raw line. Treat the image as a single text line,

Bernhard Gramberg

unread,
Mar 1, 2017, 11:00:32 AM3/1/17
to tesseract-ocr
I tested with other problems, 

now perfect working. 

Thanks again for the super fast answer.

Yours Bernhard from germany 
(actual in Teneriffa Kiting) 


Am Dienstag, 28. Februar 2017 17:11:32 UTC+1 schrieb Bernhard Gramberg:
s1.png
s2.png
s3.png
Reply all
Reply to author
Forward
0 new messages