wrong recognition of small lines or points

unread,

Dec 14, 2015, 8:42:29 AM12/14/15

to tesseract-ocr

Hallo everybody,

I am testing Tesseract to recognize the characters in the attached Picture.

I created a traineddata with a small number of characters.

My Problem is that Tesseract recognizes as character also the small lines at the left of the first 0 and

under J. Precisely, the recognized text is F0002HNJH2UF

How can I avoid it? It is possible to fix the minimal size of characters?

Thank you in advance.

text0_F0002HNJH2UF.png

unread,

Mar 4, 2016, 1:17:03 AM3/4/16

to tesseract-ocr, filr...@gmail.com

The is definitely tesseract api configs for that :

textord_heavy_nr = 0 (0 default, 1 is very aggressive)

textord_max_noise_size

However I would simply use opencv to remove any blob with a vertical height of less than desired.

unread,

Mar 4, 2016, 2:25:06 PM3/4/16

to tesseract-ocr, filr...@gmail.com

What function of opencv would you use to do that?

unread,

Mar 5, 2016, 10:40:34 AM3/5/16

to tesseract-ocr, filr...@gmail.com

Some links:

I was entirely impressed by the bounding box method of contour removal, but I did find success with findContours:

Just filter which are the contours you want to lose (in your case using height I would say) and replace the black pixels with white.

Looking at that text I would also consider doing some morphology to make the characters a bit stronger.

I hope this helps

Reply all

Reply to author

Forward