Lattices in Tesseract

572 views
Skip to first unread message

Mayce Al

unread,
Nov 14, 2011, 9:17:37 AM11/14/11
to tesser...@googlegroups.com
Hi Guys,

Does anyone have an idea about how to get the ocr lattices in Tesseract?

The top-level command line provides only the ocr's results as string.

Did anyone try it before with Tesseract or ABBYY?

I appreciate any support.
Cheers,Mayce

Dmitri Silaev

unread,
Nov 14, 2011, 12:52:19 PM11/14/11
to tesser...@googlegroups.com
Try ResultIterator (programming required) or hOCR output format (using
config file switch.)
Info on both methods can be found using forum search.

Warm regards,
Dmitri Silaev
www.CustomOCR.com

> --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to tesser...@googlegroups.com
> To unsubscribe from this group, send email to
> tesseract-oc...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en
>

Mayce Al

unread,
Nov 15, 2011, 6:04:29 AM11/15/11
to tesser...@googlegroups.com
Thanks a lot Dmitri. I will look there.
In the hOCR, they provide the one output string for each word in the line.
Does it also provide many ocr output for each word?
Best Regards,
Mayce

Nikolay

unread,
Nov 21, 2011, 4:29:30 AM11/21/11
to tesseract-ocr
Hi Mayce,

My name is Nikolay Khlebinsky, I work @ ABBYY and want to comment on
the feature you are asking about.
From what I see, you are looking for retrieving coordinates of
recognized text. And we have a solution for this in our <a
href="http://www.abbyy.com/ocr_sdk/">ABBYY FineReader Engine</a>. It
has a very smart API that provides maximum information about the
recoggnized text, including its coordinates, recognition confidence
level, alternative (less confident) recognition results etc.

These advanced document analysis features are usually used for
converting semi-structured documents, such as invoices, payment
drafts, bills, waybills, business cards, agreements, health claim
forms, resumes, etc. It has been designed to accurately locate all the
text on these documents, including characters and numbers — even if
this information is located within stamps, pictures, logos or small-
text areas.

All of that is thoroughly covered in our well-composed developer
guide, plus we have a friendly worldwide customer support to answer
questions like this. Feel free to contact me if you have anything else
to ask.

Best regards and good luck with your project,
Nikolay Khlebinsky
Nikol...@abbyy.com

William Tozier

unread,
Dec 12, 2011, 3:26:44 PM12/12/11
to tesser...@googlegroups.com

On Nov 21, 2011, at 4:29 AM, Nikolay wrote:

> Hi Mayce,
>
> My name is Nikolay Khlebinsky, I work @ ABBYY and want to comment on
> the feature you are asking about.
> From what I see, you are looking for retrieving coordinates of
> recognized text. And we have a solution for this in our <a
> href="http://www.abbyy.com/ocr_sdk/">ABBYY FineReader Engine</a>. It
> has a very smart API that provides maximum information about the
> recoggnized text, including its coordinates, recognition confidence
> level, alternative (less confident) recognition results etc.

Nikolay,

I hope I speak for other members of the community here when I remind you that we are speaking about the free, open source software package tesseract.

At least a few of us are very familiar with your products, but have found the closed platform, poor support time, and onerous license terms to be a problem for our particular interests and goals.

If you would like to contribute your expertise or advice on improving the tesseract software, or even suggest features and use cases that can foster further development, please do so. Please do not market your company's products here, though.

Best,
Bill Tozier

Jimmy O'Regan

unread,
Dec 12, 2011, 10:23:16 PM12/12/11
to tesser...@googlegroups.com
On 12 December 2011 20:26, William Tozier <vag...@gmail.com> wrote:
>
> On Nov 21, 2011, at 4:29 AM, Nikolay wrote:
>
>> Hi Mayce,
>>
>> My name is Nikolay Khlebinsky, I work @ ABBYY and want to comment on
>> the feature you are asking about.
>> From what I see, you are looking for retrieving coordinates of
>> recognized text. And we have a solution for this in our <a
>> href="http://www.abbyy.com/ocr_sdk/">ABBYY FineReader Engine</a>. It
>> has a very smart API that provides maximum information about the
>> recoggnized text, including its coordinates, recognition confidence
>> level, alternative (less confident) recognition results etc.
>
> Nikolay,
>
> I hope I speak for other members of the community here when I remind you that we are speaking about the free, open source software package tesseract.
>

The original question also asked about FineReader, so it's not quite
so crass (though there has been at least one posting that was).

I find it hard to believe that Nikolay really represents Abbyy,
because surely he would have been able to ask a co-worker what a
lattice is. I would sooner think that he is actually engaged in some
relatively advanced trolling, by making out Abbyy employees to be
utterly inept.

--
<Sefam> Are any of the mentors around?
<jimregan> yes, they're the ones trolling you

Reply all
Reply to author
Forward
0 new messages