You should do text-detection before passing images to Tesseract.
Text-detection is a process of determining of image regions containing
text. Even if an image contains no text, Tesseract anyways will treat
it as an image of text.
Before recognition Tess applies a so-called binarization algorithm,
which converts an RGB image to monochrome one (black for text and
white for background). For your sample image the Otsu binarization
used in Tesseract (http://en.wikipedia.org/wiki/Otsu%27s_method) would
certainly give a number of skewed vertical lines resembling
backslashes and further recognition classifies them as such.
"textord_heavy_nr" and some other variables control size-based noise
removal but work satisfactory only in case when there's a significant
body of good text surrounded but some amount of noise. In your image
everything is noise, so it won't work.
Therefore you need to extend your pre-processing in order to feed Tess
with images indeed containing text. Decisions can be made based on
contrast estimation, distinctive color distribution, etc.
HTH
Warm regards,
Dmitry Silaev
> --
> You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
> To post to this group, send email to tesser...@googlegroups.com.
> To unsubscribe from this group, send email to tesseract-oc...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en.
>
>
Some articles I had picked out when I was self-studying this field of
document image processing. For the moment, there might be newer ones,
but these can provide you with the basis. Apologies, I've no time to
provide you with direct references and author names - I only listed my
file system directory on this topic. You can Google for exact article
titles to find links.
1990 Scale-Space and Edge Detection Using Anisotropic Diffusion.pdf
1998 Edge detection and ridge detection with automatic scale
selection.pdf
2001 Edge-Based Method for Text Detection from Complex Document
Images.pdf
2001 TEXT EXTRACTION FROM GREY SCALE PAGE IMAGES BY SIMPLE EDGE
DETECTORS.pdf
2002 Gaussian-Based Edge-Detection Methods - A Survey.pdf
2003 Fast Computation of Scale Normalised Gaussian Receptive
Fields.pdf
2003 Real-time scale selection in hybrid multi-scale
representations.pdf
2003 Recognition of text in 3-D scenes.pdf
2004 A method for ridge extraction.pdf
2004 A Review of Vessel Extraction Techniques and Algorithms.pdf
2004 Distinctive Image Features from Scale-Invariant Keypoints.pdf
2004 Scene Text Extraction in Natural Scene Images using
Hierarchical Feature Combining and Verification.PDF
2004 Text Detection from Natural Scene Images - Towards a System
for Visually Impaired Persons.PDF
2005 A novel approach for text detection in images using structural
features.pdf
2005 Color Text Extraction from Camera-based Images - the Impact of
the Choice of the Clustering Distance.PDF
2005 Improved Text-Detection Methods for a Camera-based Text
Reading System for Blind Persons.PDF
2005 Text Extraction from Gray Scale Historical Document Images
Using Adaptive Local Connectivity Map.pdf
2006 Multiscale Edge-Based Text Extraction from Complex Images.PDF
2006 Spatial and Color Spaces Combination for Natural Scene Text
Extraction.PDF
2008 A double-threshold image binarization method based on edge
detector.PDF
HTH
Warm regards,
Dmitry Silaev