spacing limitation

29 views
Skip to first unread message

Илья

unread,
May 2, 2011, 10:22:25 AM5/2/11
to tesser...@googlegroups.com
Hi all!

Firstly, I want to say thanks because tesseract is the best open source
OCR that I can found!

I am new with tesseract but I extremely need to training it from real
scans! I looked for more variants, but I have some serious causes, that
make impossible to print my texts from PC with a bit of space out.

I want to get more detaildet information about this limitation. IS
anyone working for resolving it? What status? How much time (and what
else?) I will wait for complete decision?

Best regards,
Ilia.


Message has been deleted

Ray Smith

unread,
May 5, 2011, 11:29:05 PM5/5/11
to tesser...@googlegroups.com
With 3.01 (in svn) you can try to train it with real data.

You don't need to space out the training text any more. As of 3.01.

If you are prepared to create character level ground truth you don't need to bootstrap, or if you can bootstrap you can use word or line-level ground truth. This feature is too new to have documentation, but you can look at the code in applybox.cpp to find how to format the box file, and TessBaseAPI::Recognize to find which control flags to set to activate word/line-level training.

Ray.

Илья

unread,
May 8, 2011, 3:19:17 PM5/8/11
to tesser...@googlegroups.com
Hello.

Thanks a lot!

I have installed 3.01 and try it, but I faced with an training error. I
reported this in issue # 488.

( http://code.google.com/p/tesseract-ocr/issues/detail?id=488 )

Best regards,
Ilia.


В Чтв, 05/05/2011 в 20:29 -0700, Ray Smith пишет:

Reply all
Reply to author
Forward
0 new messages