OCR of Screenshots

1,458 views
Skip to first unread message

Quan Nguyen

unread,
Aug 30, 2010, 6:46:30 PM8/30/10
to tesseract-ocr
I understand the resolutions of screenshots are typically inadequate
for OCR, but besides rescaling to a higher resolution, say, 300 DPI,
what other preprocessing operations may be needed on the images to
yield optimal OCR results?

Thanks.

Ian Ozsvald (A.I. Cookbook)

unread,
Aug 31, 2010, 1:17:56 PM8/31/10
to tesser...@googlegroups.com
Hi Quan.

I've used tesseract to OCR frames from 640x480 screencast videos,
generally it worked fine:
http://ianozsvald.com/2010/05/17/extracting-keyword-text-from-screencasts-with-ocr/

What problems are you seeing when you try tesseract?

Ian.

> --
> You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
> To post to this group, send email to tesser...@googlegroups.com.
> To unsubscribe from this group, send email to tesseract-oc...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en.
>
>

--
Ian Ozsvald (A.I. researcher, screencaster)
i...@IanOzsvald.com

http://IanOzsvald.com
http://MorConsulting.com/
http://blog.AICookbook.com/
http://TheScreencastingHandbook.com
http://FivePoundApp.com/
http://twitter.com/IanOzsvald

Quan Nguyen

unread,
Aug 31, 2010, 10:26:29 PM8/31/10
to tesseract-ocr
Hi Ian,

I'm implementing a feature in my program to enable OCR of screenshots.
The results have been generally better after the captured images were
rescaled from 96 DPI to 300 DPI. I was wondering if other simple
manipulations could be done programmatically to the images to produce
even better results.

The types of the screenshots are either 32bppArgb or 24bppRgb. Would
changing to grayscale or stripping the Alpha help?

Quan

On Aug 31, 12:17 pm, "Ian Ozsvald (A.I. Cookbook)"
<i...@aicookbook.com> wrote:
> Hi Quan.
>
> I've used tesseract to OCR frames from 640x480 screencast videos,
> generally it worked fine:http://ianozsvald.com/2010/05/17/extracting-keyword-text-from-screenc...
>
> What problems are you seeing when you try tesseract?
>
> Ian.
>
> On 30 August 2010 23:46, Quan Nguyen <nguyen...@gmail.com> wrote:
>
> > I understand the resolutions of screenshots are typically inadequate
> > for OCR, but besides rescaling to a higher resolution, say, 300 DPI,
> > what other preprocessing operations may be needed on the images to
> > yield optimal OCR results?
>
> > Thanks.
>
> > --
> > You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
> > To post to this group, send email to tesser...@googlegroups.com.
> > To unsubscribe from this group, send email to tesseract-oc...@googlegroups.com.
> > For more options, visit this group athttp://groups.google.com/group/tesseract-ocr?hl=en.

Ian Ozsvald (A.I. Cookbook)

unread,
Sep 1, 2010, 4:56:56 AM9/1/10
to tesser...@googlegroups.com
Re. your questions - I don't know :-(

For my videos I took 640x480 FLV screencasts (from ShowMeDo.com -
pretty high quality videos with hardly any artefacts) and I ran
tesseract 2 on the colour screengrabs without rescaling.

What resolution are you capturing at?

If the fonts are small you might want to manually try to sharpen the
image, in case anti-aliasing/smoothing is blending adjacent characters
into one another? You could visually confirm if this looks to be the
case.

Maybe you could upload a sample screengrab and explain what it gets
right and which errors it gets (maybe by drawing on the image)?

i.

> For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en.
>
>

--
Ian Ozsvald (A.I. researcher, screencaster)
i...@IanOzsvald.com

http://IanOzsvald.com

Quan Nguyen

unread,
Sep 1, 2010, 8:22:03 PM9/1/10
to tesseract-ocr
They are just basic 96-DPI window screen prints, such as those
generated by Print Screen key. I've implemented the function by merely
rescaling the images to 300 DPI, but just wonder if any more
improvement is possible with only minor preprocessing. Any tips would
have been appreciated; but otherwise, I'm OK with the improved results
from the rescaling.

Thanks for your response.

On Sep 1, 3:56 am, "Ian Ozsvald (A.I. Cookbook)" <i...@aicookbook.com>
wrote:
> Re. your questions - I don't know :-(
>
> For my videos I took 640x480 FLV screencasts (from ShowMeDo.com -
> pretty high quality videos with hardly any artefacts) and I ran
> tesseract 2 on the colour screengrabs without rescaling.
>
> What resolution are you capturing at?
>
> If the fonts are small you might want to manually try to sharpen the
> image, in case anti-aliasing/smoothing is blending adjacent characters
> into one another? You could visually confirm if this looks to be the
> case.
>
> Maybe you could upload a sample screengrab and explain what it gets
> right and which errors it gets (maybe by drawing on the image)?
>
> i.
>
> >>http://IanOzsvald.comhttp://MorConsulting.com/http://blog.AICookbook....

Ian Ozsvald (A.I. Cookbook)

unread,
Sep 2, 2010, 5:46:35 AM9/2/10
to tesser...@googlegroups.com
Offering tips would be easier if we knew what kind of errors you're
getting...Personally I'm be interested as I'd like to improve my
screencast-reader at some point.
i.

> For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en.
>
>

--
Ian Ozsvald (A.I. researcher, screencaster)
i...@IanOzsvald.com

http://IanOzsvald.com

SteveP

unread,
Sep 7, 2010, 4:30:33 PM9/7/10
to tesseract-ocr
Hi Quan,
There is more than one way to scale as you may know. I have seen
OCR fail in some cases depending on how you scale. I have a front end
I use for my software that calls tesseract. I ended up providing
options for scaling and options for converting from 24-bit color to
gray or black and white.

Let me start with some simple answers, though. Scaling with
interpolation seems to work best most of the time. Converting to gray-
scale seems to work most of the time. (I read that Ray Smith did not
design tesseract for color screen images, so I really have not
experimented with leaving things in color.) I do not think tesseract
pays attention to the Alpha channel since it does not pertain to when
a single image sits by itself. (Converting to gray-scale does not
work in general if the text is rendered with ClearType or sub-pixel
rendering. If anybody figures out a good approach for OCR of
ClearType, I would appreciate getting an email since I don't read a
lot of the posts. Post your answer too.)

I think the scaling software at the leptonica web site is good. I
have had some trouble with the method in Windows that uses
createGraphics and drawImage. (Someone I worked with used the Windows
method on a blank image and got non-blank OCR results because the
Windows method seemed to me to introduce a row of black around a
couple of the edges. That's how it appeared to me, but it is possible
I did something wrong.)

Relative to scaling, I made a post in August about using nearest-
neighbor scaling when the characters are close together. This is
because scaling with interpolation without sharpening tends to blur
the edges of text characters. Leptonica has code for sharpening, I
believe, but I have not used it yet. Scaling by a factor of 2 without
interpolation and then by a variable factor with interpolation to the
needed size is a simple way to get some sharpening and some separation
between characters.
> >http://IanOzsvald.comhttp://MorConsulting.com/http://blog.AICookbook....- Hide quoted text -
>
> - Show quoted text -

haratron

unread,
Sep 8, 2010, 7:56:35 AM9/8/10
to tesser...@googlegroups.com
I'm also interested in this topic.

I have a couple of questions:
1. How can I calculate the ideal image size (300dpi?) to feed to
tesseract? I mean, how do I identify how much scaling the image needs,
before the OCR procedure.
2. I'm currently using ImageMagick's convert program for scaling and
converting to grayscale. Would it make a difference if I used
leptonica instead?
3. Do the bits of color matter? Is there an optimal color depth?
4. Does the OCR work best when ClearType is enabled or disabled?

SteveP

unread,
Sep 9, 2010, 7:14:22 PM9/9/10
to tesseract-ocr
"Ideal" may be hard to define for image size. The wiki (I believe)
says the lower case letters (for English) should be at least 20 to 30
pixels in height. By default, I scale everything by a factor of 3.
If your screen is set to 96 dpi resolution, 300 dpi would be about a
factor of 3. If your font size is large enough, then sometimes you
can get better results without scaling, since scaling often blurs the
image a little.

What I said about leptonica is for software developers building a
front end to tesseract. If you are using ImageMagick, I suspect that
is fine.

I think 8-bit per color is standard for tesseract if you are not doing
black and white.

ClearType is an implementation of sub-pixel rendering, which is
designed for an LCD screen with the red, green and blue sub-pixels in
separate locations. Printers and scanners and OCR typically are not
oriented to sub-pixels. I think OCR accuracy is better with sub-pixel
rendering disabled.
> >> >http://IanOzsvald.comhttp://MorConsulting.com/http://blog.AICookbook....Hide quoted text -
>
> >> - Show quoted text -
>
> > --
> > You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
> > To post to this group, send email to tesser...@googlegroups.com.
> > To unsubscribe from this group, send email to tesseract-oc...@googlegroups.com.
> > For more options, visit this group athttp://groups.google.com/group/tesseract-ocr?hl=en.- Hide quoted text -

Quan Nguyen

unread,
Sep 9, 2010, 10:22:18 PM9/9/10
to tesseract-ocr
Thanks, Steve, for all the valuable info. I've used bicubic
interpolation in scaling the screenshots and been able to achieve
acceptable results. The scale factor I used was 300 divided by the
image's resolution. If sharpening and keeping the bit depth to 8
improve the recognition rates further, then I will definitely consider
using them in future attempts.

Regards.
> > >> >http://IanOzsvald.comhttp://MorConsulting.com/http://blog.AICookbook....quoted text -
>
> > >> - Show quoted text -
>
> > > --
> > > You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
> > > To post to this group, send email to tesser...@googlegroups.com.
> > > To unsubscribe from this group, send email to tesseract-oc...@googlegroups.com.
> > > For more options, visit this group athttp://groups.google.com/group/tesseract-ocr?hl=en.-Hide quoted text -
Reply all
Reply to author
Forward
0 new messages