On 04/01/12 20:14, Robert T wrote:
> I'm developing an open source Android app that uses Tesseract 3.01
> for OCR by passing Tesseract images captured by a phone or tablet
> camera.
>
> The OCR is working adequately for small segments of text--like a
> few words--but uneven illumination seems to lower the recognition
> quality with larger text input. Because the input comes from the
> device camera, there's a lot of shadows and glare.
Hi I had the same problem - took pictures of all the pages in a book,
and had trouble with this.
Tried out
http://www.fmwconcepts.com/imagemagick/textcleaner/index.php
and that worked well.
just ran
"./textcleaner badimg.jpg goodimg.jpg"
and that was it :D
for auto-recognition, deskewing etc I ran unpaper
This made it possible to OCR 1500 pages basically without much trouble :D
my 2 cents :)
best
Arno
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
iEYEARECAAYFAk8FydkACgkQEMIGVCc8BjALEQCff/n4acxqfL9J5ZrCNLPrJxYx
FcIAn1w6/ZVPax2UyMXeC4rSZztsFfqZ
=gP0W
-----END PGP SIGNATURE-----
The OCR is working adequately for small segments of text--like a few
words--but uneven illumination seems to lower the recognition quality
with larger text input. Because the input comes from the device
camera, there's a lot of shadows and glare.
--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/20462c69-3100-41a5-9b50-a520d1c2aff6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.