Why is Tesseract so much more popular than Ocropus?

274 views
Skip to first unread message

maxim...@gmail.com

unread,
Jul 16, 2015, 5:01:22 PM7/16/15
to tesser...@googlegroups.com
Stupidly simple question, but I can't find any straightforward explanation online. The best guess I have is that it is because the research project in germany has now come to a close, but since Ocropusis much more recent and includes features like handwriting it is kind of surprising. Any and all insight or discussion welcome! 

PS. measuring popularity by frequency of discussion online.

gtess...@gmail.com

unread,
Jul 17, 2015, 3:24:00 AM7/17/15
to tesser...@googlegroups.com
I use Microsoft windows. Ocropus not support Microsoft windows.

Jeff Breidenbach

unread,
Jul 17, 2015, 11:41:33 PM7/17/15
to tesser...@googlegroups.com
Tesseract is more complete in terms of 'throw me an arbitrary document image and produce something useful'

maxim...@gmail.com

unread,
Jul 19, 2015, 4:09:03 PM7/19/15
to tesser...@googlegroups.com
This seems like a good explanation based off of everything I've learned over the last few days.

Tom Morris

unread,
Jul 20, 2015, 3:40:44 PM7/20/15
to tesser...@googlegroups.com
Jeff's answer is probably the most important explanation, but some other reasons include:
- Tess supports more languages
- Tess is older
- Tess has a bigger more well developed community (partly because of all the other reasons)
- Tess is higher performance (from a resource utilization point of view, last time I checked)

Ocropus is/was pretty much a one-man project and was, as I understand it, designed to support his research.  It also went through a significant rewrite as a result of a change in implementation strategy and that discontinuity probably didn't help things.  Because it's more modern and was designed as a toolkit to support research, it might lead to better OCR in the future, but it'd still have a hard time competing with the "unreasonable effectiveness of data" that Google can bring to bear with its large training corpuses.

Tom

Jim O'Regan

unread,
Jul 20, 2015, 6:10:21 PM7/20/15
to tesser...@googlegroups.com
On 20 July 2015 at 20:40, Tom Morris <tfmo...@gmail.com> wrote:
> Jeff's answer is probably the most important explanation, but some other
> reasons include:
> - Tess supports more languages
> - Tess is older
> - Tess has a bigger more well developed community (partly because of all the
> other reasons)
> - Tess is higher performance (from a resource utilization point of view,
> last time I checked)
>
> Ocropus is/was pretty much a one-man project and was, as I understand it,
> designed to support his research.

That seems to be true at the moment, but there were a few people
working on it at different stages.

> It also went through a significant
> rewrite as a result of a change in implementation strategy and that
> discontinuity probably didn't help things.

Several rewrites. One of them was based around line-oriented FSTs, and
interestingly enough, similar work was done with Tesseract
(https://github.com/tesseract-ocr/tesseract/pull/29). The current
version of OCRopus uses LSTM, which is one of the currently
fashionable types of neural network.

--
<Sefam> Are any of the mentors around?
<jimregan> yes, they're the ones trolling you
Reply all
Reply to author
Forward
0 new messages