Recognizing "..."

49 views
Skip to first unread message

Christen Møller

unread,
Jan 6, 2015, 2:27:24 PM1/6/15
to tesser...@googlegroups.com
Hi

I have a problem with Tesseract - it simply ignores three dots in sequence: "...".

So "It's ..." becomes "It's" og "9,1 ... 9,3 ... 9,7" becomes "9,19,39,7". Leaving very much manual work!

Does anybody know how to make Tesseract recognize "..."?

Best regards

Christen Møller

Allistair C

unread,
Jan 7, 2015, 4:54:55 PM1/7/15
to tesser...@googlegroups.com
The "..." is formally called an "ellipsis" and I can find nothing useful Googling except that somebody has tried using OpenCV object/feature detection to try and look for this. The only possible way I can imagine getting Tesseract to recognise an ellipsis is to train it where 3 full stops appear within a single box in the box file - something like that. But I'm not sure.
Reply all
Reply to author
Forward
0 new messages