Hello,
I am using Tesseract to output text files from scanned documents.
All text images contain typed text and are fairly clear/clean. So far Tesseract has a pretty good accuracy and I am quite content.
However Tesseract doesn't seem to recognize line breaks, and I was wondering if this is an available option or not?
At first I thought this is not possible but searching online brings me topics (such as: http://code.google.com/p/tesseract-ocr/issues/detail?id=575) which seem to show that it should be possible.
Is there a parameter that should be included in the command prompt?
I am using Windows 7, cmd.exe.
Thanks in advance,
R
BTW I would recommend adding http://tesseract-ocr.googlecode.com/svn-history/trunk/doc/tesseract.1.html to the wiki page, it took me very long to find this page (its hidden in the FAQ) and it provides some helpful information about the parameters.
--
--
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to tesser...@googlegroups.com
To unsubscribe from this group, send email to
tesseract-oc...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en
---
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
Hello,
I am using Tesseract to output text files from scanned documents.
All text images contain typed text and are fairly clear/clean. So far Tesseract has a pretty good accuracy and I am quite content.
However Tesseract doesn't seem to recognize line breaks, and I was wondering if this is an available option or not?