Line of equals symbols not recognized

7 views
Skip to first unread message

colbec

unread,
Aug 20, 2010, 7:53:30 AM8/20/10
to tesseract-ocr
Using tesseract 3.00 on Opensuse 11.2. From CLI as in
tesseract file.tif file

In an image that contains a line of '=' signs the recognition is much
worse than if these lines are removed, eg:

line 1 and stuff
=======================
line 3 and stuff

line 1 will be recognized, but the second and third lines will be
either missing or line 2 missing and line 3 garbled.
If the file contains lines 1 and 3 only, the recognition is almost
perfect.

Since the "=" character appears to be in the trained charset, what
kind of error does this represent for tesseract?

Jimmy O'Regan

unread,
Aug 20, 2010, 2:02:05 PM8/20/10
to tesser...@googlegroups.com

At a guess - without providing a sample image, that's the best you can
expect - I would say that the line of equals is being treated as
noise.

--
<Leftmost> jimregan, that's because deep inside you, you are evil.
<Leftmost> Also not-so-deep inside you.

Colin Beckingham

unread,
Aug 20, 2010, 2:12:14 PM8/20/10
to tesser...@googlegroups.com, Jimmy O'Regan
On 08/20/2010 02:02 PM, Jimmy O'Regan wrote:
> On 20 August 2010 12:53, colbec<col...@start.ca> wrote:
>> Using tesseract 3.00 on Opensuse 11.2. From CLI as in
>> tesseract file.tif file
>>
>> In an image that contains a line of '=' signs the recognition is much
>> worse than if these lines are removed, eg:
>>
>> line 1 and stuff
>> =======================
>> line 3 and stuff
>>
>> line 1 will be recognized, but the second and third lines will be
>> either missing or line 2 missing and line 3 garbled.
>> If the file contains lines 1 and 3 only, the recognition is almost
>> perfect.
>>
>> Since the "=" character appears to be in the trained charset, what
>> kind of error does this represent for tesseract?
>
> At a guess - without providing a sample image, that's the best you can
> expect - I would say that the line of equals is being treated as
> noise.
>

I'm sorry there is no original image, but it is copyrighted and I don't
have permission to reproduce the original. The closest I can get is to
provide the example of the 3 lines as in the OP.

Reply all
Reply to author
Forward
0 new messages