I have noticed that OCR results are better when underlining is removed
by preprocessing before OCR is attempted. Could you try an experiment
where you manually remove the underlining from the images using Paint
or something similar? (If you need info on how to automate removal of
underlining, post about that. If anybody in the forum has ideas about
this, please post those. I am interested in ideas myself.)
Also ,usually urls in web pages are what the tesseract FAQ calls
"screen text", so if you have not already handled the small font
issue, resizing your image to make the lower case letters (such as
'x') about 20 to 30 pixels high is recommended.
On Nov 20, 6:10 am, maxm007 <
max.hilla...@gmail.com> wrote:
> Hi,
>
> I'm researching whether it is possible to use OCR to gather web
> addresses from images. I've tried a tesseract online service and some
> others and it seems OCR doesn't like web addresses.
>
> Is it at all possible with current OCR technology to recognize the
> following url from an image:
http://www.google.co.uk/search?source=ig&hl=en&rlz=&=&q=test&btnG=Goo...