--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/2bc3c616-d82f-4056-8f99-0ed4029fb880%40googlegroups.com.
If you have any suggestions on how to split input images into individual text lines, I would appreciate it. I am able to use Python and OpenCV, but I don't have a lot of experience with either. I can read publications if necessary.
I'm using Tesseract 5.0.0-alpha from UB Mannheim (Windows 10), to process pages from a directory. The line spacing is very narrow. In my project, increasing line spacing improves the recognition accuracy.I believe that splitting the input image into separate lines of text would improve the results, in my case.
=== Original ===FLOYD. THOMAS J.—La.1,°07; (1°07).ao LOWNDES = (b’64)-~Ala.2,°90:=== Spaced ===FLOYD, THOMAS J.—La.1,"07; (1°07).HENDRICK. LOWNDES (b’64)-—~Ala.2,°90:(1°90).In the original example, the name HENDRICK is missing and the third line is also missing.
--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/d8706d07-4a5e-4a62-899e-b79c31d9ceb6%40googlegroups.com.