tessearact-ocr ignore I and 0

108 views
Skip to first unread message

Vladimir Radnovic

unread,
May 28, 2014, 12:43:57 PM5/28/14
to tesser...@googlegroups.com
I have problem whit I and 0 ( zerro )
on picture you can see

segmentation clean filter ignore 0 ( zerro ) nad I

can someone help me you figure out what is problem... ?

thanks
Vladimir


tessearc-ocr-problem.png

zdenko podobny

unread,
May 30, 2014, 3:45:49 AM5/30/14
to tesser...@googlegroups.com
If send screenshot like this, I guess nobody is intersting even testing of your problem...
Send final image you try to, describe how you try to OCR etc...

Zdenko


--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/f3ab8d1d-d71f-4ea2-a7de-5ffc6015962f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Vladimir Radnovic

unread,
May 30, 2014, 5:00:00 AM5/30/14
to tesser...@googlegroups.com
Hi Zdenko
In attachme is real image...

I put screnshoot just to show what is problem

I make treaindata for Latin ( Cro,BIH,Serbian letters ) Š Đ Ž Č Ć 
Bat I have problem whit I and 0 ( in Serbian numberplates thay have special zero like you can see on image )

Can you send me you mail on pp

Thanks
Vladimir
kiki.jpg
Message has been deleted

zdenko podobny

unread,
May 30, 2014, 4:13:03 PM5/30/14
to tesser...@googlegroups.com
  1. You posted 2 times original (not pre-processed) images. Please post image you try to OCR. It is pretty boring and discouraging to get output like you.
  2. I just cropped plate in image editor, cover state symbol and run (tesseract 3.03):
    tesseract fiat_.png - -psm 7 -l srp \
      -c tessedit_char_whitelist=KIČ0123456789 2>/dev/null
    output was this:
    KI 513 IČ
    As you can see on special 0 is problem if you remove noise. 
Unfortunately tesseract is not able to "add" additional symbol to existing, so you need to play a little with with training... Maybe you can try to train only digits, but not "one-by-one". Try to make groups of digits "013  513 310" as they on plate. Be careful so there are no columns on training image (vertical spaces). I have wrong experiences with such images.
If you create such training (e.g.srp_digits.traineddata), try to this command to see if it helps:
tesseract fiat_.png - -psm 7 -l srp+srp_digits \
  -c tessedit_char_whitelist=KIČ0123456789

And please share your experience ;-)

Zdenko


fiat_.png

Vladimir Radnovic

unread,
May 31, 2014, 4:57:24 AM5/31/14
to tesser...@googlegroups.com
Hi Zdenko
I tray to train and in attach is txt file what I tray (tablice.txt) and srb.treaineddata

I make new "font" 100% like on plate... bat in segmentation I have problem he ignore I and 0 like on my first post

What I make wrong ?

does this tipe of picture is ok or I need to remove all noise from picture ? ( picture in attachment )

Thanks
Vladimir
srb.traineddata
plate (18).jpg
plate (19).jpg
plate (21).jpg
plate (22).jpg
plate (23).jpg
plate (24).jpg
tablice.txt
plate (2).jpg
plate (4).jpg
plate (10).jpg
plate (12).jpg
plate (13).jpg
plate (16).jpg
plate (17).jpg
Reply all
Reply to author
Forward
0 new messages