Help extracting text from images.

1,615 views
Skip to first unread message

newbie

unread,
Jan 7, 2015, 2:26:53 PM1/7/15
to tesser...@googlegroups.com
I am using tess4j, a java wrapper around tesseract and Here are the images and results. The intent is to extract VIP2500(model number) from the image. An help is appreciated.

Attached are the original png  file ( ArrisVIP2500.png),binarized file(ArrisVIP2500_bin.TIF) and then a zoomed and cropped file(ArrisVIP2500_cropped.png).

ArrisVIP2500.png

é ATE-T U-verse

rowan 0
/


ArrisVIP2500_bin.TIF

AT&T U-verse

rowan <3 3
/ --

vxvzsoo ‘Q’


ArrisVIP2500_cropped.png

ATE-T U-verse

rowsn Q 

VIPZSOO ‘e’                      This looks the closest to VIP2500 , I need to get tess4j to reconginze digits, that said, this might not be a realistic scenario, as someone/something

                                           Needs to zoom and crop the image before hand(preprocessing).

ArrisVIP2500.png
ArrisVIP2500_cropped.png
ArrisVIP2500_bin.TIF

Allistair

unread,
Jan 7, 2015, 4:39:39 PM1/7/15
to tesser...@googlegroups.com
A common technique is to pre-process your input image. 

Resizing produced good results.I also use psm 6 for these types of image with various text locations.

In this case I first used your cropped image:

tesseract ArrisVIP2500_cropped.png out -l eng -psm 6 config

and got:

AT&T U verse
rowsn
O F3.
vrrzsoo ’e'

Then I resampled your image to 2000px wide:

tesseract ArrisVIP2500_cropped_2000.png out2000 -l eng -psm 6 config 

and got:

AT&T U verse
POWER © " ‘|
/ ‘j""'j"’..
VIP2500 '%’

Cheers



--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/009ffbc7-90cc-417a-90c8-b4ac9b5bb203%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Allistair

unread,
Jan 7, 2015, 4:44:47 PM1/7/15
to tesser...@googlegroups.com
I also meant to ask whether your use case allows for cropping. If you know you will have a certain format of image, cropping an area and resampling should be easy. You could also do some preprocessing that looks for certain icons in your image to get some context as to where the model number is likely to be (see feature matching on Open CV). However, I would need to know more about your use case.

That said, resampling your full image to 3000px wide yielded a result with a full model number but the more you can crop the area the better the result:

AT&T U verse ‘ §
LINK HD nzc ,
rowzn Q I ‘ .» . ‘ nsuu 4 0|: > I
/ sj J \
VIP2500 °%' 7 A R R I s

newbie

unread,
Jan 7, 2015, 5:35:47 PM1/7/15
to tesser...@googlegroups.com
Thanks Allistair , my lucky day as you have responded to both my queries. Let me try to address your questions below and then go ahead with a few of my own :-)

I also meant to ask whether your use case allows for cropping. If you know you will have a certain format of image, cropping an area and resampling should be easy.
Basically the image will be an user generated image, more like the first png file, but we could ask the user to zoom in to the model number, if that would help us indentify the model number.we could do anything with the image(cropping ,resampling etc). But the problem is the model number probably will not be located at the same place for all equipments.

2. Preprocessing - as it should be programatically done would I be using opencv in conjunction with tesseract? I did not see much in tesseract for image processing(I could be totally off).
3..I also use psm 6 for these types of image with various text locations.
   what is this ?

Another thing I probably can come up with is all the model #s or images of all potential equipments, so I have repository to match against. Would that help in any way ?

Thanks again for taking the time to respond. Appreciate it.

Allistair

unread,
Jan 7, 2015, 5:58:15 PM1/7/15
to tesser...@googlegroups.com
1. In the case where you do not know where the model number will be your options will be to ask the provider of the image to crop (as you've already identified and will likely be the most reliable) or other techniques, e.g. it could be that you know ahead of time the format of model numbers, e.g. as a regular expression - it could be all your model numbers are in a similar format of (ABC|DEF|FOO|BAR) followed by 5 numbers \d{5}. So long as your input image is large (300dpi) and you use psm 6 then you can perform some regex routine on the Tesseract output to look for the most likely match. Now, the issue with this comes when there is a lot of "noise" returned by Tesseract - this can easily result in a false positive, so again you are much better off trying to minimise noise by locating the model number and removing surrounding noise like other text or details of the hardware. Depending on how your user provides the image you can still make this usable, e.g. if it's an online image upload you can provide a nice JavaScript cropping tool for instance. I'm not sure what your precise flow is, but you get the point I'm sure.

2. You don't do preprocessing with Tesseract. It has some basic stuff built-in but that's it. In my case I ended up using Open CV to apply various blur (gaussian), thresholding (adaptive and Otsu) as well as "opening and closing" morphology filters etc. before sending this image off to Tesseract. With the image already pre-processed Tesseract realises it does not need to do much - you can see this by using the config option tessedit_write_images T to compare your input image to what Tesseract uses internally.


http://docs.opencv.org/doc/tutorials/imgproc/opening_closing_hats/opening_closing_hats.html

3. Page segmentation mode. If you run the "tesseract" command line you will see there are 

pagesegmode values are:
0 = Orientation and script detection (OSD) only.
1 = Automatic page segmentation with OSD.
2 = Automatic page segmentation, but no OSD, or OCR
3 = Fully automatic page segmentation, but no OSD. (Default)
4 = Assume a single column of text of variable sizes.
5 = Assume a single uniform block of vertically aligned text.
6 = Assume a single uniform block of text.
7 = Treat the image as a single text line.
8 = Treat the image as a single word.
9 = Treat the image as a single word in a circle.
10 = Treat the image as a single character.
-l lang and/or -psm pagesegmode must occur before anyconfigfile.

These tell Tesseract the kind of page layout it is dealing with. Remember, Tesseract assumes most of the time with the PSMs that it's a "document" and not a real-world object. PSM 6 performs best from all my research into real-world OCR with varying text fonts/sizes/locations.

4. Your idea to build a database of model numbers as photos and then to use object detection can work, yes using either template matching or feature detection. This gets tricky I'm afraid, but I found in my research that it's possible, and even can accommodate various lighting and angles to a degree.



Cheers

newbie

unread,
Jan 7, 2015, 6:02:09 PM1/7/15
to tesser...@googlegroups.com
Sorry for the barrage here.
The interesting thing is you mentioned feature matching with openCV(I dont know anything at all about it). But the one thing is I can have a repository of these images with me and I need to match it to one of the user generated image.

A little background might help. I can(or come up with) have a repository of all the equipment images with me. A tech might head to the field, take a picture on his mobile device and  I need to match it(tech's picture) against my repository and come up with the model number.

Is this easier with ocr or feature matching with openCV ?

Thanks

Allistair C

unread,
Jan 7, 2015, 6:12:05 PM1/7/15
to tesser...@googlegroups.com
It sort of depends on your hardware and how similar or different they are. Reliable feature matching works on distinct features (so there need to be enough points of interest (edges usually) that cover text, buttons, other bits and pieces). If, for example, all your hardware was the same as the example you originally posted and only the model number was changing then this would be an issue most likely as the feature matching may match several targets. 

Also you mention the tech takes a picture on mobile. Does that need to be looked up immediately? The issue is that feature matching is CPU heavy and can take time on mobile and is a function of the photo resolution. Luckily, feature matching appears to work better on lower resolution images and most of the time works in black and white. Then there is the potential number of hardware items you are trying to match. The most advanced mobile augmented reality products (Metaio, Vuforia) that use feature matching only allow up to 100 targets to be "tracked" or "looked for" at a time - every piece of hardware you are looking for needs to be compared to the live input camera view (or photo) and this is the part that hits the CPU hard. If however there was an option to offload the image(s) to a backend cloud server for feature match or if the tech did not need an instant or any kind of result in the field, then you are in a better situation as you can stand up serious computing power.

It's not easy to recommend one or the other without all the facts - as you begin to mention new things like mobile and techs in the field, this changes things :) For instance I also used mobile - an Android tablet, with Open CV and Tesseract OCR - the combination worked in the field - the tech can position the camera face-on to the model number and take a close photo. You could even provide a mini App for your techs that has a basic cropping tool. The technique I used was to show the camera view in my app with a little white transparent box over the camera view that allowed the user to position the text to fit that white box. Then, when the photo was taken I simply cropped that white box coordinate rectangle and I had a perfect match. This was easy vs. feature matching :)

newbie

unread,
Jan 8, 2015, 10:35:13 AM1/8/15
to tesser...@googlegroups.com
Allistair,
            Thanks for taking the time to respond . Do you know how to use psm 6 in tess4j(its probably an argument to the instantiator, need to look up the src code) ? I have not seen any examples of it being used by googling.. I tried to resample the cropped image to 3000 px(horizontall  using paint) like you suggested and ran it thro tess4j and it still did not recognize my model number. Gave me an output of "VIPZSOO". So I guess piping it thro psm 6 is the key. Also can u send me the image that was produced after you resampled it to 3000px, so that I know my resampling is right.

I also like your idea of providing the white box in the camera view to use it as my input to cropping . Sure can do that. 
I think I am glad discussed the feature matching - that seems more like object recognition than text recognition. So probably is far fetched. I had used camFlow(an app) to see if it would recognize my equipment images and it always came back with "Black media player". So they probably are using feature matching of openCV.

Thanks again and appreciate your taking time to respond.

newbie

unread,
Jan 8, 2015, 10:53:44 AM1/8/15
to tesser...@googlegroups.com
Here's my resampled image using paint for reference.
ArrisVIP2500_resampled.png

Allistair

unread,
Jan 8, 2015, 11:06:33 AM1/8/15
to tesser...@googlegroups.com

newbie

unread,
Jan 8, 2015, 11:24:52 AM1/8/15
to tesser...@googlegroups.com
It worked YAY!, you have all my gratitude!. ok now I need to know how you did the resampling. I thought you said you took the cropped image and resampled. But this seems like the original png file(Arris2500.png) resampled. Let me know how you went about resampling and how I can acheive it programatically.

Thanks

Allistair

unread,
Jan 8, 2015, 11:39:55 AM1/8/15
to tesser...@googlegroups.com
OK good. 

I got it working by both resampling (upscaling) the cropped version and the full image.

If you are using the "white box" approach so that you have a crop area (best method) then you just need to upscale that. 

There are many ways to resize an image up - you can find that easily with Google. I used Open CV for Android and the cvResize function for example. There are libraries for doing this in Java, .NET, Python etc.. just look around.

Cheers

newbie

unread,
Jan 8, 2015, 4:34:54 PM1/8/15
to tesser...@googlegroups.com
Thanks Allistair. I have it working. The problem is , if I used the same "mantra" of resampling for other images its not working. I have this cropped image(attached, which is also upscaled to 3000 pixels width vice), its coming out VIPZZSO. I need to sharpen this  probably. I have to set to very sharp in the preprocessing pgm I am using, but in vain.

Any directions. for general preprocessing ? 
myImage.png

Allistair

unread,
Jan 8, 2015, 5:08:46 PM1/8/15
to tesser...@googlegroups.com
What did the non upscaled version look like  - this looks far too blurred which is why it's struggling. It might be that your upscaling is too much - it should be a ratio of the original size of the cropped image to make it 300dpi, rather than always 3000px.

Cheers

newbie

unread,
Jan 13, 2015, 4:45:22 PM1/13/15
to tesser...@googlegroups.com
Allistair,
            Sorry for coming back to you on this again. When I did the upscaling of the original picture to 3000px(like you suggested), ocr could read it. But the resolution on the upscaled picture still seems to be 96.dpi. How did you arrive at the target 3000px ? Did you have a particular formula ?

I came up with the exact pixels to be upscaled to, using paint, so that the ocr could read it(trail and error). But I am trying to find the common  factor/formula to upscale for any of my images so that the ocr could read it to produce good accuracy.

I guess in other words I didnt understand....the below line, in your post below. Any efforts at ellaborating it, is highly appreciated.
it should be a ratio of the original size of the cropped image to make it 300dpi

Thanks

Allistair

unread,
Jan 13, 2015, 5:39:02 PM1/13/15
to tesser...@googlegroups.com
I wasn't using a formula, just demonstrating that your original text was too small. Here is some advice from the FAQ (https://code.google.com/p/tesseract-ocr/wiki/FAQ)

Is there a Minimum Text Size? (It won't read screen text!)
---
There is a minimum text size for reasonable accuracy. You have to consider resolution as well as point size. Accuracy drops off below 10pt x 300dpi, rapidly below 8pt x 300dpi. A quick check is to count the pixels of the x-height of your characters. (X-height is the height of the lower case x.) At 10pt x 300dpi x-heights are typically about 20 pixels, although this can vary dramatically from font to font. Below an x-height of 10 pixels, you have very little chance of accurate results, and below about 8 pixels, most of the text will be "noise removed".

newbie

unread,
Jan 14, 2015, 1:08:29 PM1/14/15
to tesser...@googlegroups.com
The problem is all the images I have seem to be 96dpi, I dont know what would the height in pts(corresponding to >10pt for 300 dpi) be for the ocr to accurately detect. Even the upscaled images are in 96 dpi. Any solutions ?

Allistair

unread,
Jan 14, 2015, 3:26:16 PM1/14/15
to tesser...@googlegroups.com
dpi means nothing for digital images. Digital images are pixels. The dpi setting for a digital image is used when printing to paper. What Tesseract wants is for the text in the image to not be smaller than a certain size. So for a digital image with text, the text within that image needs to be a certain minimal size and not the whole image. The difficulty you are going to have if your use case has variable image sizes (perhaps because taken at different distances or different camera resolutions) is how to normalise these so that the text within them is minimally such that an 'x' character is, say, 100px high. So it's not about changing all your images to 300dpi, that is a red herring - it's about ensuring the text within your images is sufficiently large for Tesseract to work with and that's a problem you'll need to figure out either programmatically or process-driven on capture of the raw image.

Reply all
Reply to author
Forward
0 new messages