http:II11111111111111111111111111111111111
1111111111111111111.coml"You can resolve the ambiguity using the unicharambigs file, for details see my SO answer to your SO question.
Stef
The only difference between the files is the border around them.
In my eng.unicharambigs file I have added the following lines:
3 : I I 3 : / / 1
3 : / I 3 : / / 1
3 : I / 3 : / / 1
5 . c o m l 5 . c o m / 1
3 : / l 3 : / / 1
3 : l / 3 : / / 1
When I run tesseract on file without spacing I get the following output:
http:II11111111111111111111111111111111111111111
1111111111111111111.com/
When I run tesseract on file with spacing I get the correct output:
http://11111111111111111111111111111111111111111
1111111111111111111.com/
Another example of spacing (or something else?) making a difference:
Smaller border
Larger border:
both these files have spacing around the text with the first image having less spacing. (and the find is a little different between the two images, though very slightly)
running Tesseract on first file gives correct result: http://alphaGl.com/primenumbershittingbearl (Except for 6 -> G and last / becoming l)
On the second image I get the output http://alpha61.comIprimenumbershittingbearl. It seems as if the unicharambigs file is ignored for the .com/ case. It doesn't do the substitution as specified.
Anything you can think of the fix this problem?