Send your sample images to get a more practical advice.
Warm regards,
Dmitri Silaev
www.CustomOCR.com
> --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to tesser...@googlegroups.com
> To unsubscribe from this group, send email to
> tesseract-oc...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en
>
Thanks for your response. I figured some kind of custom segmentation
was going to be required. Any suggestions you can make to help would
be appreciated - I was thinking perhaps I would use some tools from
OpenCV or something but I'm not really sure where to read up on
segmentation approaches.
Here's a sample image:
This is not actually an image I have worked with. It's just a
representative sample pulled at random from a web image search, since
my sample image contains proprietary information that I can't share.
Actual resolution is in the 14,000 x 10,000 range.
-Walter
I think it's worth for you to take a look at the OCRopus project
(http://code.google.com/p/ocropus/) As I know they can offer good
segmentation for such a type of images. At the time of my
investigation, it was based on Thomas Breuel's works related to
whitespace cover approach, particularly his "Two Geometric Algorithms
for Layout Analysis" (2002), maybe also his "Layout Analysis based on
Text Line Segment Hypotheses" (2003.) So you can even implement these
approaches yourself using these articles.
HTH
Warm regards,
Dmitri Silaev
www.CustomOCR.com