--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscribe@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/bc664de6-5386-45b3-ae4d-70ac5338938c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
training Tesseract 4.0 from images is not officially .supported . Different people have had success in doing LSTM training with box/tiff pairs. but it requires hacks/programming on their part to create 4.0.0 compatible box files.tesstrain.sh creates box/tiff files in the /tmp directory, these are used to create the lstmf files for LSTMtraining. tesstrain.sh can create a 3.0x compatible traineddata or 4.0.0 compatible starter traineddata depending on options that are chosen. For 4.0.0 this starter traineddata alongwith the lstmf files is used for LSTM training.The format of traineddata files for 3.0x and 4.0.0 is different.For different components of a traineddata file, SeeFor creating 4.0 compatible box files seePlease note that all these are unsupported options.
ShreeDevi
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
On Fri, Apr 13, 2018 at 12:09 PM, <denni...@berkeley.edu> wrote:
Hi all,
I read in a different post that training Tesseract 4.0 from images is not supported, is this true? I have been able to successfully train Tesseract 4.0 so far using font data. When using tesstrain.sh, the script creates a number of files, including an lstmf file alongside the usual trainedata file (and there are some others like unicharset). I was wondering if it is possible to use the traineddata generation from image and boxfile described in the Tesseract 3.0 training instructions to create these training files to train Tesseract 4.0. Tesseract 3.0 instructions already produce a traineddata file, how can I generate the lstmf file (and the others) if it is possible?
Thank you,
Dennis
--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscribe@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/385272ec-6801-4efd-957a-1bb5bc47175e%40googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/425e1871-ccfa-4aa6-a087-842684c047c6%40googlegroups.com.
Training from single line images and their groundtruth is now possible using the makefile in tesstrain repo.
The above link has a good explanation.
The only change I would suggest is to download tessdata_best/eng.traineddata (or other language as needed) to use as startmodel individually using wget rather than clone the whole repo which is a few gigs of data.