Will Ocropus match EverNote's OCR?

132 views
Skip to first unread message

anelson

unread,
Aug 17, 2008, 3:40:22 PM8/17/08
to ocropus
Lately I've been fascinated with the OCR capabilities of EverNote
(evernote.com), a combination desktop software and web service
solution which, among other things, automatically detects and indexes
textual information in images added to your notebook. So, for
example, I could take a photo of a biz card, a takeout menu, or (on
really good days) a handwritten note, then search my notebook for one
or more words in the photo and EverNote will return the photo and
highlight the area where the text appears. In my tests, it's not
perfect but does a remarkably good job in most cases.

I've downloaded and built Ocropus 0.2 on ubuntu64 (incidentally, build
and installation were no-brainers; good job on the docs), and ran it
through some of the same test files I used with EverNote, and found
the results to be, well, disappointing. My tests, and results, were:

* PNG screen shot of Firefox window displaying Ocropus command line
docs - EverNote indexed nearly the whole thing flawlessly. Ocropus
returned mostly garbage
* 7MP digital photo of a 8.5x11 letter in a Courier-like font -
EverNote was maybe 95% there, while Ocropus was maybe 50%
(incidentally, they both had trouble dealing with underlined text)
* 7MP digital photo of a business card - EverNote found one of two
occurrences of my test search term, while Ocropus did not detect any
text correctly
* 7MP digital photo of a whiteboard with some hand writing of varying
degrees of legibility, some diagrams, etc - EverNote had alot of
trouble with this, though it did recognize some of the more legible
handwritting, while Ocropus (not surprisingly) didn't get anything out
of it.

I know Ocropus is not designed for handwriting recognition, but my
understanding of the Ocropus design goals is that the other test cases
are within the range of its target applications. Are my tests/
expectations unreasonable? Is it just too early to try this sort of
complex test? Can I vary any parameters to improve on the above
performance?

I've been eagerly awaiting an open-source OCR solution for years, and
I'm glad someone's finally stepped up. Please don't take any of the
above as criticism, I just want to better understand what to
reasonably expect from Ocropus.

Thanks,

Adam

Thomas Breuel

unread,
Aug 18, 2008, 11:51:28 AM8/18/08
to ocr...@googlegroups.com
I know Ocropus is not designed for handwriting recognition, but my
understanding of the Ocropus design goals is that the other test cases
are within the range of its target applications.  Are my tests/
expectations unreasonable?

Well, yes.  OCRopus is currently being developed for high-throughput book capture.  That means that its parameters are set, and its models are trained, for 200-400 dpi dewarped page images.  Feeding it camera captured images is sort of like feeding Perl code to a Ruby interpreter.

Camera-based text recognition and search is just a different problem.  IUPR has built a number of camera-based capture and recognition systems, using parts of OCRopus.  See here, for example:

http://ipet.iupr.org/demos.html#DIVER

I don't see much future for special services like Evernote; basically, searching images by text content is simply one of many search modalities for image databases.  You'll almost certainly be able to search services like Flickr by text in the future.
 
Can I vary any parameters to improve on the above performance?

You can improve performance on these kinds of images by changing the top level scripts.  You have more control, though, than simply changing parameters: you can add or change preprocessing steps, change which recognizer and which segmenter to use, which language model to use, and you can train shape and language models.

* PNG screen shot of Firefox window

This violates Tesseract's resolution assumptions.  A simple workaround is to upsample the image to about 300dpi text.  We have HMM-based low-resolution text recognizers to handle this case better, but haven't integrated them.


* 7MP digital photo of a 8.5x11 letter in a Courier-like font -

This may require document dewarping; there is no dewarping code in OCRopus yet.  We have a lot of document image dewarping code and hopefully will be able to integrate that.  It also requires intensity normalization.


* 7MP digital photo of a business card

Again, you need to dewarp and adjust the resolution to 300 dpi.  You also need to create new language models ("dictionaries") for high performance.


* 7MP digital photo of a whiteboard with some hand writing

This requires training a handwriting recognition model, plus dewarping and intensity normalization.

* [Evernote-like recognition]

Generally speaking, an approach that works pretty well for these kinds of problems given a recognizer like OCRopus is to try to divide the input into "words" and then feed every "word" to one of the OCR engines (Tesseract, bpnet, etc.). 

I've been eagerly awaiting an open-source OCR solution for years

There really is no such thing as a single "OCR solution".  OCR is dozens of different problems and tradeoffs between accuracy, throughput, functionality, generality, and development cost.   OCRopus will be a toolbox out of which you can build many different kinds of OCR engines fairly easily. 

Right now, we're going for high accuracy, high-throughput recognition and layout analysis on 300dpi scanned books.   With the modules in OCRopus, you can already build decent Evernote-like services if you plug them together right and tune the parameters.  In the future, we'll be adding more modules for camera-based OCR.

Tom
Reply all
Reply to author
Forward
0 new messages