WordStr box file format?

225 views
Skip to first unread message

Adam Funk

unread,
Oct 24, 2019, 5:30:39 AM10/24/19
to tesser...@googlegroups.com
Hi,

I'm a bit confused by some of the comments in the tesseract
documentation, issues, and wiki about the addition of line-by-line
training to tesseract 4. Is the attached box file valid for training
tesseract 4.0.0?

(I know that unicharset_extractor does not support WordStr yet, but I
have found a way to get around that by recycling the unicharset from the
standard English model.)

Thanks,
Adam
20190930-125337-000000001.box

Shree Devi Kumar

unread,
Oct 24, 2019, 9:58:43 AM10/24/19
to tesseract-ocr
Looks ok. The dimensions need to match the bounding box in your tif.

You can extract unicharset from the training text also.

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/c92aef13-060d-a6c9-560a-029f9700f1b1%40sheffield.ac.uk.

J Adam Funk

unread,
Oct 27, 2019, 12:41:09 PM10/27/19
to tesseract-ocr
Thanks!


On Thursday, 24 October 2019 14:58:43 UTC+1, shree wrote:
Looks ok. The dimensions need to match the bounding box in your tif.

You can extract unicharset from the training text also.

On Thu, Oct 24, 2019, 15:00 Adam Funk <a....@sheffield.ac.uk> wrote:
Hi,

I'm a bit confused by some of the comments in the tesseract
documentation, issues, and wiki about the addition of line-by-line
training to tesseract 4.  Is the attached box file valid for training
tesseract 4.0.0?

(I know that unicharset_extractor does not support WordStr yet, but I
have found a way to get around that by recycling the unicharset from the
standard English model.)

Thanks,
Adam

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesser...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages