How to train by tesseract 4.00

514 views
Skip to first unread message

yang3...@gmail.com

unread,
Jun 3, 2018, 6:29:01 AM6/3/18
to tesseract-ocr
I have read that on the version of 4.00, the box file can be used  only need to cover a textline instead of individual characters.

So I make a box file like this 

若存在,试求出实数λ的值; 0 0 256 48 0

Then I want to ask how to train it.

Or is it the same version 3?   【tesseract chi_my.font.exp0.tif chi_my.font.exp0 nobatch box.train】

or there is other better method.

Thanks! 

ShreeDevi Kumar

unread,
Jun 3, 2018, 8:46:11 AM6/3/18
to tesser...@googlegroups.com
If you want to train using fonts, use tesstrain.sh. See the wiki pages regarding training.

If you want to use scanned images, then see https://github.com/OCR-D/ocrd-train for using line images and their ground truth transcriptions to create box files, lstmf files and training.

ShreeDevi
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscribe@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/f65b5c86-e921-455d-9076-c2ff230dac5b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Emiliano Isaza Villamizar

unread,
Jul 20, 2018, 2:06:57 AM7/20/18
to tesseract-ocr
Hi Shree,

I've been trying to use this repo but I keep getting this error when I run any target with OCR-D. 

On Sunday, June 3, 2018 at 7:46:11 AM UTC-5, shree wrote:
If you want to train using fonts, use tesstrain.sh. See the wiki pages regarding training:

 make training
combine_tessdata -u /home/tulip/Documents/Em/OCR/OCRtraining/ocrd-train/usr/share/tessdata /foo.traineddata  /home/tulip/Documents/Em/OCR/OCRtraining/ocrd-train/usr/share/tessdata /foo.
Failed to read /home/tulip/Documents/Em/OCR/OCRtraining/ocrd-train/usr/share/tessdata
Makefile:97: recipe for target 'data/unicharset' failed
make: *** [data/unicharset] Error 1

thank you!

 
 

If you want to use scanned images, then see https://github.com/OCR-D/ocrd-train for using line images and their ground truth transcriptions to create box files, lstmf files and training.

ShreeDevi
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

On Sun, Jun 3, 2018 at 3:59 PM, <yang3...@gmail.com> wrote:
I have read that on the version of 4.00, the box file can be used  only need to cover a textline instead of individual characters.

So I make a box file like this 

若存在,试求出实数λ的值; 0 0 256 48 0

Then I want to ask how to train it.

Or is it the same version 3?   【tesseract chi_my.font.exp0.tif chi_my.font.exp0 nobatch box.train】

or there is other better method.

Thanks! 

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.

Shree Devi Kumar

unread,
Jul 20, 2018, 4:37:10 AM7/20/18
to tesser...@googlegroups.com

for ocr-d related questions.


For more options, visit https://groups.google.com/d/optout.


--

Lorenzo Bolzani

unread,
Jul 20, 2018, 6:30:00 AM7/20/18
to tesser...@googlegroups.com

You have some problems with your path configuration, check the error message:

Failed to read /home/tulip/Documents/Em/OCR/OCRtraining/ocrd-train/usr/share/tessdata

the path does not make sense. And also the command line:

combine_tessdata -u /home/tulip/Documents/Em/OCR/OCRtraining/ocrd-train/usr/share/tessdata /foo.traineddata  /home/tulip/Documents/Em/OCR/OCRtraining/ocrd-train/usr/share/tessdata /foo.

you probably also have a "blank" after "/usr/share/tessdata".


Bye

Lorenzo


To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscribe@googlegroups.com.

To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.

Emiliano Isaza Villamizar

unread,
Jul 23, 2018, 11:36:23 AM7/23/18
to tesseract-ocr
I'm just did that. 

thank you!

Emiliano Isaza Villamizar

unread,
Jul 23, 2018, 11:37:48 AM7/23/18
to tesseract-ocr
But still i don't know why this happens I haven't modified anything in the Makefile!! What would I need to change?



Lorenzo Bolzani

unread,
Jul 23, 2018, 6:57:19 PM7/23/18
to tesser...@googlegroups.com
The TESSDATA_PREFIX maybe?

To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscribe@googlegroups.com.

To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
Reply all
Reply to author
Forward
0 new messages