Please file an issue on GitHub repo with these files so that it can be looked at by the developers.
However, for your app, add the whitespace margin to your images as part of preprocessing, since any fix may take a while.
- sent from my phone. excuse the brevity.
--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/40a4828d-9a46-4e36-9b22-8b925f39a046%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Please file an issue on GitHub repo with these files so that it can be looked at by the developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduUn1PWio0o-n_J80ihc-92Qv5q8JwkK6k%3DxM0qbd0shHw%40mail.gmail.com.
This page only shows the same list I've seen many times before withoutany explanation:What does mean when it says "script detection"
I tried OSD and it did not automatically correct incorrect rotation (90 degrees off)
I think I understand what "Automatic page segmentation" may mean but with / without OSD?Kinda need a full explanation.
"vertically aligned text"???I guess I'll try #4: "Assume a single column of text of variable sizes"That best describes what I have but the default seemed to workin limited testing of my one and two liners.The wiki also has a waybackmachine link to a bug saying that addingwhitespace helps. (Is that a current bug?, etc.)
thanksscott
On Thursday, April 21, 2016 at 4:21:47 AM UTC-7, zdenop wrote:Please read the wiki https://github.com/tesseract-ocr/tesseract/wiki/ImproveQuality#page-segmentation-methodZdenko
--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/3d822711-56fa-41af-8b18-fefadf05a841%40googlegroups.com.
I just did more testing.My one word or single character image works with-psm 7-psm 8my two or three lines of text image works with the default of-psm 3as well as-psm 4They both seem to work with-psm 6I may have to go with 6 even though my three line test with differentfont sizes should be done with 4 based on it's description.I feel it's a bug that 3 and 4 can't reliably handle simpler content.To get the most out of Tesseract, I must analyze the segmentation?!
That is why I had to go through the trouble of compiling leptonica;so that tesseract is smart enough that I don't have to re-invent the wheel.
It seems that it's failing at the segmentation stage. If it finds nothingit could try again automatically with a more primitive setting. That isway more efficient than my process spawning tesseract twice as often.thanksscott
On Thursday, April 21, 2016 at 4:21:47 AM UTC-7, zdenop wrote:Please read the wiki https://github.com/tesseract-ocr/tesseract/wiki/ImproveQuality#page-segmentation-methodZdenko
--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/e9f5cb1a-374f-49b6-82ef-795b009e0180%40googlegroups.com.