Slashes in numerical date not recognized

603 views
Skip to first unread message

Michael Beauregard

unread,
Jan 14, 2014, 3:30:40 PM1/14/14
to tesser...@googlegroups.com
Hey everyone,

I'm struggling to get an image with a date to be recognized correctly and would like some advice if possible. 

The image has the text "1946/05/29" using the following command:

tesseract date.png date.out -psm 6

is recognized as:

$ cat date.out.txt
1 946I05I29

I can deal with the unwanted space character easy enough, but I don't know what to do about the capital 'i' instead of forward slashes '/'. Interestingly enough, I enumerated through the ResultIterator and ChoiceIterator to see what symbols tesseract is matching and found that the forward slash isn't even considered:
 
Result: I                                     
choice: I=99.000870                           
choice: l=96.095596                           
choice: !=89.777245                           
choice: i=84.559441                           

I would have expected one of the choices to be '/', but it wasn't.

Any help would be greatly appreciated.

Thanks,

Michael
date.png

Ian Carroll

unread,
Aug 18, 2015, 11:39:38 AM8/18/15
to tesseract-ocr
Michael,

Any chance you solved this (old) problem? I'm encountering the same issue and haven't found a fix yet.

Thanks,
Ian

Michael Beauregard

unread,
Aug 18, 2015, 1:00:28 PM8/18/15
to tesser...@googlegroups.com
I don't think I ever found a solution to this, but it was so long ago I don't remember for sure. The project ended not long after posting the question and so I never had the chance to follow up.

--
You received this message because you are subscribed to a topic in the Google Groups "tesseract-ocr" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/tesseract-ocr/w5EYFpAetuo/unsubscribe.
To unsubscribe from this group and all its topics, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/4745b515-07eb-4218-a2ae-945c98d0b037%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Will Hansen

unread,
Oct 25, 2015, 12:42:20 PM10/25/15
to tesseract-ocr
Ian,

Did you ever have any luck solving this? I'm dealing with the same issue. I can't get Tesseract to recognize the slash in a date like MM/DD no matter what I try!

Will

Supriya Das

unread,
Oct 29, 2015, 5:48:51 AM10/29/15
to tesseract-ocr, mic...@insightfulminds.com

hello  Everybody,

    You can train this kind of "/" using tesseract. and use the train file for solve this problem.
    or
    you can do some post processing algorithm.
Reply all
Reply to author
Forward
0 new messages