Multiple columns text.

2,049 views
Skip to first unread message

Bonny

unread,
Sep 8, 2011, 9:48:57 AM9/8/11
to tesser...@googlegroups.com
Hello..

I have multiple column text. But tess just merge left and right column as one.
Is there some possibility to instruct tess where column is?
I can put out coordinates of columns but don't know how to pass that to tess.
Ie Is it possible to but 'bounding box' coordinate to commandline? (winXp)

Thanks..

p.s.
If it's possible it's probably throught commandfile. Is there any documentation of comandfile at all?

Sven Pedersen

unread,
Sep 8, 2011, 10:34:20 AM9/8/11
to tesser...@googlegroups.com
Hi Bonny,
The other open source/free OCR product OCRopus can handle layout. It
used to work with Tesseract, but I think it is totally independent
now...
--Sven

> --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to tesser...@googlegroups.com
> To unsubscribe from this group, send email to
> tesseract-oc...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en
>

--
``All that is gold does not glitter,
  not all those who wander are lost;
the old that is strong does not wither,
  deep roots are not reached by the frost.
From the ashes a fire shall be woken,
  a light from the shadows shall spring;
renewed shall be blade that was broken,
  the crownless again shall be king.”

Slavko Kocjancic

unread,
Sep 8, 2011, 1:06:23 PM9/8/11
to tesser...@googlegroups.com
OCROpus seems to be just for linux. No winXP executable found.

2011/9/8 Sven Pedersen <sven.p...@gmail.com>

Quan Nguyen

unread,
Sep 8, 2011, 8:06:56 PM9/8/11
to tesseract-ocr
Tess does not accept coordinate arguments at command line, but does
recognize multiple-column documents beginning version 3.0. You can
pass coordinates programmatically. Try VietOCR for your documents.

http://vietocr.sf.net

Slavko Kocjancic

unread,
Sep 9, 2011, 2:20:14 AM9/9/11
to tesser...@googlegroups.com
Dne 9.9.2011 2:06, pi�e Quan Nguyen:

> Tess does not accept coordinate arguments at command line, but does
> recognize multiple-column documents beginning version 3.0. You can
> pass coordinates programmatically. Try VietOCR for your documents.
>
> http://vietocr.sf.net
>
Not possible to pas coordinates (varfile, configfile??)

I had precompiled tess 3.00.1 but doesn't recongnize two colons. Here is
example file. I use hOCR output.

twocol.tif

Bonny

unread,
Sep 9, 2011, 2:42:18 AM9/9/11
to tesser...@googlegroups.com
Huh..

No attachment alowed.
In meantime I try VietOCR but doesn't recongnize two colon too.

Slavko Kocjancic

unread,
Sep 9, 2011, 3:48:12 AM9/9/11
to tesser...@googlegroups.com
Hello....

About digging I find the why didn't work...
In download section there is main download tesseract-ocr-setup-3.00.exe
And bugfixes in tesseract-3.00.1.exe.zip

I find somewhere in page that -v switch is added in version 3 and above. (-v = version) So if we executed tesseract -v and have version before 3.0 the error is displayed and above 3.0 the version is printed.

I tested 3.00 and got error, and multicolon text doesn't work
I tested 3.00.1 and got error, and multicolon text doesn't work too.

So I can only concluded that posted binaryes wasn't version 3 but 2.xx ???
I try to build from source and unsuspected was sucessfuly build in 1'st try.

now tesseract -v show that is version 3.01 and I tested multicolon and works!

So binaries for windows are wrong labeled or outdated. So far I know the multicolon is feature of 3.00

Or I messed something other too?!?

zdenko podobny

unread,
Sep 9, 2011, 4:55:21 AM9/9/11
to tesser...@googlegroups.com
On Fri, Sep 9, 2011 at 9:48 AM, Slavko Kocjancic <esl...@gmail.com> wrote:
Hello....

About digging I find the why didn't work...
In download section there is main download tesseract-ocr-setup-3.00.exe
And bugfixes in tesseract-3.00.1.exe.zip

I find somewhere in page that -v switch is added in version 3 and above. (-v = version) So if we executed tesseract -v and have version before 3.0 the error is displayed and above 3.0 the version is printed.

I tested 3.00 and got error, and multicolon text doesn't work
I tested 3.00.1 and got error, and multicolon text doesn't work too.

So I can only concluded that posted binaryes wasn't version 3 but 2.xx ???

Posted files are 3.00. But there was bug with initialization of segmentation mode. 
See http://code.google.com/p/tesseract-ocr/issues/detail?id=518. So feature was there but difficult to use for non programmers.
 
I try to build from source and unsuspected was sucessfuly build in 1'st try. 
 
now tesseract -v show that is version 3.01 and I tested multicolon and works!

So binaries for windows are wrong labeled or outdated. So far I know the multicolon is feature of 3.00

It is dangerous to make conclusion without knowledge ;-)
 
Or I messed something other too?!?

--

Slavko Kocjancic

unread,
Sep 9, 2011, 5:28:37 AM9/9/11
to tesser...@googlegroups.com
Dne 9.9.2011 10:55, piše zdenko podobny:


On Fri, Sep 9, 2011 at 9:48 AM, Slavko Kocjancic <esl...@gmail.com> wrote:
Hello....

About digging I find the why didn't work...
In download section there is main download tesseract-ocr-setup-3.00.exe
And bugfixes in tesseract-3.00.1.exe.zip

I find somewhere in page that -v switch is added in version 3 and above. (-v = version) So if we executed tesseract -v and have version before 3.0 the error is displayed and above 3.0 the version is printed.

I tested 3.00 and got error, and multicolon text doesn't work
I tested 3.00.1 and got error, and multicolon text doesn't work too.

So I can only concluded that posted binaryes wasn't version 3 but 2.xx ???

Posted files are 3.00. But there was bug with initialization of segmentation mode. 
See http://code.google.com/p/tesseract-ocr/issues/detail?id=518. So feature was there but difficult to use for non programmers.
 
I try to build from source and unsuspected was sucessfuly build in 1'st try. 
 
now tesseract -v show that is version 3.01 and I tested multicolon and works!

So binaries for windows are wrong labeled or outdated. So far I know the multicolon is feature of 3.00

It is dangerous to make conclusion without knowledge ;-)
 

I wan't to know but can't find any help files. So I ask 'dumb' questions.. :(

Quan Nguyen

unread,
Sep 9, 2011, 10:16:05 AM9/9/11
to tesseract-ocr
Please try the latest beta versions, which incorporate the PSM fix.

Slavko Kocjancic

unread,
Sep 9, 2011, 12:00:11 PM9/9/11
to tesser...@googlegroups.com
Dne 9.9.2011 16:16, pi�e Quan Nguyen:

> Please try the latest beta versions, which incorporate the PSM fix.
>
>
Where it is?
on http://code.google.com/p/tesseract-ocr/downloads/list is not.
And I have Windows....

Sven Pedersen

unread,
Sep 9, 2011, 12:42:38 PM9/9/11
to tesser...@googlegroups.com
Latest vietocr I think he means. He is the main developer.
-Sven


On Friday, September 9, 2011, Slavko Kocjancic <esl...@gmail.com> wrote:
> Dne 9.9.2011 16:16, piše Quan Nguyen:

>>
>> Please try the latest beta versions, which incorporate the PSM fix.
>>
>>
> Where it is?
> on http://code.google.com/p/tesseract-ocr/downloads/list is not.
> And I have Windows....
>
> --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to tesser...@googlegroups.com
> To unsubscribe from this group, send email to
> tesseract-oc...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en
>

Reply all
Reply to author
Forward
0 new messages