Hello all ,
--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscribe@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/b97b440c-3ecd-4cf5-9bad-f94a98b54654%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
See https://github.com/tesseract-ocr/tesseract/wiki/APIExampleFor example of using tesseract in a program.The training tutorial you refer to is old.See tesstrain.sh for creating synthetic training data.
On 10-Jan-2018 2:54 PM, "saumitra mallick" <saumitr...@gmail.com> wrote:
--Hello all ,I'm working on similar project , in my case i'm reading bank statements. I noticed the following1. when you have a single line of text tesseract performs much better2. I'm using openCV to cut individual cells from a table (you always know the order of cells since you cut them )3. once you have data in individual cells (image files ), single line data gives much accurate results than multiline data ( anyone tried LSTM , instead of reading full text , maybe cut down individual cells to individual line and use line recognition with tesseract ?? Please let me know the results )I need help for :- how do I use tesseract in my C++ code , for the time being I'm using tesseract from command line- Please post a sample program for me ,which does the following- make tesseract read an image- generate text output from it and write it to a fileIf you guys are facing bumps in generating traineddata this post might helpPlease let me know if anyone is interested in sharing knowledge with me about the same .Contact me at saumitr...@gmail.comBest RegardsSaumitra Mallick
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
I am trying to solve a similar problem, that of reading forms. Tesseract 4 is doing well but is DROPPING lots of words withing boxes. I thought this problem of dropping words existed with Indic languages but here I am having this issue for English too!
I tried to fool around with some parameters but whatever handful I tried didn't lead to *any* change in the output.
@Shree : Can you please suggest something since you too faced this issue earlier with another language ?