I'm trying out Tesseract 2.01. I have a document that two columns of
text, the quality of Tesseract's recognition is very good, but the
columns are mixed, because tesseract recognizes the characters line by
line. So, I like to have two different zones, that are recognized one
after the other. I have tried out a tiff-image and a "zone-file" that
I found on the UNLV site, but this does not work. My command-line
looks like that:
tesseract in.tif out.txt -l deu in.zone
in.tif is not compressed.
When I debug this, the program exits at line 234 in variables.cpp when
trying to read_variables.
Can anyone help ?
Has anyone a useful pair of tiff-file and configuration-file for
recognizing parts of a document ?
thx in advance,
Chris from Aachen, Germany
thx for your answer.
I've tried out ocr with in.uzn and .... it worked very well. Thanks.
But when is a zone file correctly formatted ? I can't find a
documentation. Do you know whether there is one ?
Thx again in advance, Chris
On 12 Okt., 18:47, "Ray Smith" <theraysm...@gmail.com> wrote:
> If you have made a correctly formatted UNLV zone file, then you should name
> it in.uzn and use this command line:
> tesseract in.tif out.txt -l deu
> The in.uzn file will be found based on the name of the input tif file.
> Ray.
>
> On 10/12/07, g...@jetsoftdev.com <ScanH...@gmail.com> wrote:
>
>
>
>
>
> > Tess does not at this point support multiple columns. You can write a
> > zoning software yourself and then use the dll interface to recognize
> > those parts of it.
>
> > On Oct 12, 3:35 am, Reimmann <christ...@reimmann.de> wrote:
> > > Hi,
>
> > > I'm trying out Tesseract 2.01. I have a document that two columns of
> > > text, the quality of Tesseract's recognition is very good, but the
> > > columns are mixed, because tesseract recognizes the characters line by
> > > line. So, I like to have two different zones, that are recognized one
> > > after the other. I have tried out a tiff-image and a "zone-file" that
> > > I found on the UNLV site, but this does not work. My command-line
> > > looks like that:
>
> > > tesseract in.tif out.txt -l deu in.zone
>
> > > in.tif is not compressed.
>
> > > When I debug this, the program exits at line 234 in variables.cpp when
> > > trying to read_variables.
>
> > > Can anyone help ?
>
> > > Has anyone a useful pair of tiff-file and configuration-file for
> > > recognizing parts of a document ?
>
> > > thx in advance,
>
> > > Chris from Aachen, Germany- Zitierten Text ausblenden -
>
> - Zitierten Text anzeigen -
x y width height description
[1] https://code.google.com/p/tesseract-ocr/source/browse/trunk/ccstruct/blread.cpp?r=1064#54
[2] https://code.google.com/p/isri-ocr-evaluation-tools/downloads/detail?name=zset.4B.tar.gz&can=2&q=
--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/8ec643c4-2e0b-4f62-8d52-183da1789cda%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.