A basic question

562 views
Skip to first unread message

Christoph

unread,
May 13, 2010, 12:06:41 PM5/13/10
to ocropus
Hi,

i am new to the ocropus-project, so i've got a basic question. What
are the major benefits of using ocropus rather than just tesseract, if
i only want to train the ocr-engine and using this data to recognize
text inside image-files which were already preprocessed (binarization,
segmentation, ...), discounting postprocessing like semantic analysis
and so on?

--
You received this message because you are subscribed to the Google Groups "ocropus" group.
To post to this group, send email to ocr...@googlegroups.com.
To unsubscribe from this group, send email to ocropus+u...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/ocropus?hl=en.

Tom

unread,
May 13, 2010, 3:21:29 PM5/13/10
to ocropus
The answer to that question depends on several factors. Tesseract is
fairly mature and works on arbitrary binary documents. Tesseract
until 3.0 doesn't work well on isolated lines, but it also didn't have
much in the way of layout analysis. Tesseract 3.0 offers layout
analysis, a neural network recognizer, and improved language modeling.

OCRopus has not had a stable release yet. Its layout analysis is
probably better than Tesseracts. Its text recognition isn't as good
as Tesseract's yet, but it's rapidly improving. OCRopus also contains
a whole range of new technologies for page segmentation,
preprocessing, and language modeling. Our long term plan is to make
Tesseract available through OCR as well, once the 3.0 release and APIs
are stable.

OCRopus has largely moved to Python now, which has speeded up
development and makes it easier to create custom solutions.

The upshot is: both solutions are going to be a lot of work, and they
both have their limitations. If Tesseract gets your job done, just
use it for the time being.

Tom

On May 13, 6:06 pm, Christoph <christoph.m.g.m...@googlemail.com>
wrote:
Reply all
Reply to author
Forward
0 new messages