Lately I've been fascinated with the OCR capabilities of EverNote
(
evernote.com), a combination desktop software and web service
solution which, among other things, automatically detects and indexes
textual information in images added to your notebook. So, for
example, I could take a photo of a biz card, a takeout menu, or (on
really good days) a handwritten note, then search my notebook for one
or more words in the photo and EverNote will return the photo and
highlight the area where the text appears. In my tests, it's not
perfect but does a remarkably good job in most cases.
I've downloaded and built Ocropus 0.2 on ubuntu64 (incidentally, build
and installation were no-brainers; good job on the docs), and ran it
through some of the same test files I used with EverNote, and found
the results to be, well, disappointing. My tests, and results, were:
* PNG screen shot of Firefox window displaying Ocropus command line
docs - EverNote indexed nearly the whole thing flawlessly. Ocropus
returned mostly garbage
* 7MP digital photo of a 8.5x11 letter in a Courier-like font -
EverNote was maybe 95% there, while Ocropus was maybe 50%
(incidentally, they both had trouble dealing with underlined text)
* 7MP digital photo of a business card - EverNote found one of two
occurrences of my test search term, while Ocropus did not detect any
text correctly
* 7MP digital photo of a whiteboard with some hand writing of varying
degrees of legibility, some diagrams, etc - EverNote had alot of
trouble with this, though it did recognize some of the more legible
handwritting, while Ocropus (not surprisingly) didn't get anything out
of it.
I know Ocropus is not designed for handwriting recognition, but my
understanding of the Ocropus design goals is that the other test cases
are within the range of its target applications. Are my tests/
expectations unreasonable? Is it just too early to try this sort of
complex test? Can I vary any parameters to improve on the above
performance?
I've been eagerly awaiting an open-source OCR solution for years, and
I'm glad someone's finally stepped up. Please don't take any of the
above as criticism, I just want to better understand what to
reasonably expect from Ocropus.
Thanks,
Adam