improving boxes

161 views
Skip to first unread message

Mike

unread,
May 10, 2014, 1:03:42 PM5/10/14
to tesser...@googlegroups.com
Hello,

I'm working on mobile app which uses tesseract library for OCR. I trained tesseract for my own fonts but results are still very unstable. When I debug results it seems library recognizes letters correctly if boxes are found correctly. However, in many cases they are incorrect.

For preprocessing I'm using adaptive thresholding, which deals with pretty well. 

The common problems with boxes are:
1) detecting one character as two or vice versa
2) detecting very long but narrow boxes covering few lines
3) not detecting boxes

How to improve boxes detection? Can I constrain their sizes or ratio?

Any suggestions are appreciated.


Mike

zdenko podobny

unread,
May 11, 2014, 11:12:41 AM5/11/14
to tesser...@googlegroups.com
You did not provide any example image - it does not help ;-).
Did you try suggested solution on wiki or forum for image improving (it was discussed here few times)?

Zdenko


--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/2599fce4-947e-4093-bf01-f83e0945cfc8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Mike

unread,
May 14, 2014, 5:51:33 PM5/14/14
to tesser...@googlegroups.com
Hey Zdenko, thanks for your response! 

Sorry I didn't show any examples. Here are the images:


\ 


As preprocessing steps I made a few:
1) DPI is as high as possible (letters are about 30-50 pixel high)
2) adaptive thresholding is used to remove the most of the noise and it works quite well
3) image is framed with white rectangle. 

I didn't do:
1) deskewing - image is sometimes not perfectly horizontal (but it's just couple degrees off)
2) any of morphology filters such as erosion, dilation: in the most cases it was worsening results
3) any other image processing (bluring, enhancing, smoothing etc.)


Not sure if any other ideas were proposed. What makes me wonder is why those boxes are well placed some times and the other time placed just plain awfully? The biggest problem - as you can see - is taking two lines as one. I used also version without adaptive threshold, but the problem stays the same.

zdenko podobny

unread,
May 15, 2014, 3:15:52 AM5/15/14
to tesser...@googlegroups.com
can you provide images without drawn boxes?

Zdenko


Mike

unread,
May 22, 2014, 5:22:05 PM5/22/14
to tesser...@googlegroups.com
Sure, little late but here are images without boxes. I'm sending both versions: thresholded and not.


The first one (003.png) is special case, because of dots which are insanely difficult to remove.

Thanks a lot!

Mike
Reply all
Reply to author
Forward
0 new messages