There are multiple options to improve this particular case.
- You could preprocess the image, to supress this kind of noise. (Look for Opening and Closing - Operators)
- There is a tesseract-parameter, which takes the minimum size of a blob, just count the "noise", add some pixels(just to be sure) and let tesseract filter this
- You could do the blob-size-filtering by yourself
Characters like {, . '} may get deleted too.
Am Dienstag, 28. Mai 2013 19:38:14 UTC+2 schrieb Dmitry Katsubo:Dear Tesseract community,--
I would love to hear somebody's advise about how to reduce noise in the following example (part of original image):
For this image library returns text <13'0> with apostrophe triggered by noise. Would be fantastic if this noise could be suppressed by means of Tesseract. Perhaps I should to the direction of image pre-processing like unpaper as suggested here? Following another post I have set "textord_heavy_nr" setting to "1" with no visible effect. If one can suggest any further options to play with, I will appreciate.
In my case I am ready to sacrifice "real" characters from this set {, . '} i.e. if they are not recognized it's not a big deal. Completely blacklisting them I think is not right because in general if they are recognized correctly this would be a plus.
Thanks in advance.
--
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to tesser...@googlegroups.com
To unsubscribe from this group, send email to
tesseract-oc...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en
---
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
The parameter i meant is "textord_max_noise_size" and it defines the maximum size of noise in pixels. You could also try the one you have found in the list "textord_heavy_nr".
"Opening and Closing Operators" are morphological operators. I searched Wikipedia fo a nice example, but the english version is only a stub.
In your case the opening-operation is the way to go. Many image processing frameworks include morphological operations. If your software does not provide a opening operator look for erosion and dilation.(opening is just a erosion followed by dilation)
I made a quick example in gimp.the picture "before.png" shows my object (the circle) with some noise i want to remove. I executed the erosion operation on this picture with a proper filter mask. The result is in picture "after erosion.png". The circle has changed in size (and shape). As last step i executed the dilation operation in gimp. The resulting image "after dilation.png" shows only the circle.
Depending on your objects and noise you need to choose a proper filter mask for this operations. This operation will change the shape of your characters slightly.
-- With best regards, Dmitry
convert image.tiff -write MPR:source -morphology close rectangle:3x4 -clip-mask MPR:source -morphology erode:8 square +clip-mask image-close.tifLooks like I need to pipe images through ImageMagick but I can't decide when it is really necessary. Perhaps I can run Tesseract twice: first time to determine confidence level and then make cleanup & recognize again (if needed).
Regarding open and close operators:First, look atpixDilatepixErodeand for a real example see http://www.imagemagick.org/Usage/morphology/#erodeI think that this code snippet says it all (open is erode and dilate)PIX *
00405 pixOpen(PIX *pixd,
00406 PIX *pixs,
00407 SEL *sel)
00408 {
00409 PIX *pixt;
00410
00411 PROCNAME("pixOpen");
00412
00413 if ((pixd = processMorphArgs2(pixd, pixs, sel)) == NULL)
00414 return (PIX *)ERROR_PTR("pixd not returned", procName, pixd);
00415
00416 if ((pixt = pixErode(NULL, pixs, sel)) == NULL)
00417 return (PIX *)ERROR_PTR("pixt not made", procName, pixd);
00418 pixDilate(pixd, pixt, sel);
00419 pixDestroy(&pixt);
00420
00421 return pixd;
00422 }
Cheers,Jozef