--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscribe@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/97e29010-f602-42e9-b3b8-121fb151a49e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Hi everybody!I'm trying this tool https://github.com/OCR-D/ocrd-train/ but without success so far. Tesseract and Leptonica are installed by the scripts.Inspired by the test set provided in that repo, I created pairs of [*.tif, *.gt.txt] with binarized chars and TTF's from two fonts (1869 text lines in total).You can see an example of my set in attachment that also contains files created by the training process.My guess is that something is wrong with my data.Sometimes I can see the char train value increasing instead of decreasing and the final error rate still too high (about 60%).That new training process with LSTM is driving me crazy!I would appreciate if anyone with experience could take a look to my data set.
Have a look at this thread:It's easier than it seems, you do not need per character boxes with 4.0, just one per line (that ocr-d automatically generates). If your text is already split into lines you do not have to do anything more.Unicharset and lstmf files are also created by ocr-d.Feel free to ask if you get stuck, now I have this working but it's a bumpy road (lot of assertion failed/segmentation fault if you miss something).ByeLorenzo
2018-07-17 15:03 GMT+02:00 Ramakant Kushwaha <ramakant...@gmail.com>:
Hi,Recently I trying to retrain Tesseract 4.0 for recognising handwritten digits. I am following official page but finding it very difficult. It would be great if someone can elaborate below steps
- Prepare training text.(I am using jTessBoxEditor for creating box files )
- Render text to image + box file. (Or create hand-made box files for existing image data.)
- Make unicharset file. (Can be partially specified, ie created manually). (Do not how to do this)
- Make a starter traineddata from the unicharset and optional dictionary data.
- Run tesseract to process image + box file to make training data set.
- Run training on training data set.
- Combine data files.
--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscribe@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/885fce6d-2b81-4bc2-9eee-4dea8df5c263%40googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/885fce6d-2b81-4bc2-9eee-4dea8df5c263%40googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAMgOLLwvYgQiLO%2BdWDgaEtqOSg5sgezpic7_HggT5ij9qxZ2Ng%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/ce16eecf-6f30-4e1f-b397-85beabc18301%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscribe@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/ce16eecf-6f30-4e1f-b397-85beabc18301%40googlegroups.com.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscribe@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/479f0447-bb05-4b41-a507-9e571bb5015b%40googlegroups.com.
--
I have already used tesseract 4.0 version for training on hand written digits.The steps are as follows:1.The best way to do is use some handwriten fonts from Google or any where else.2.use the "tesstrain.sh" script to generate the starter trained data using the text corpus containing only 0-9 digits in a random function , create such a text corpus and generate the starter trained .3. Use the starter trained data to generate final traineed data after lstm trainingIf you want a detailed description, I can supply you with a complete documentation of steps.Chandra Churh Chatterjee
On Tue, Jul 17, 2018, 8:43 PM Ramakant Kushwaha <ramakant...@gmail.com> wrote:
Hi,--Recently I trying to retrain Tesseract 4.0 for recognising handwritten digits. I am following official page but finding it very difficult. It would be great if someone can elaborate below steps
- Prepare training text.(I am using jTessBoxEditor for creating box files )
- Render text to image + box file. (Or create hand-made box files for existing image data.)
- Make unicharset file. (Can be partially specified, ie created manually). (Do not how to do this)
- Make a starter traineddata from the unicharset and optional dictionary data.
- Run tesseract to process image + box file to make training data set.
- Run training on training data set.
- Combine data files.
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscribe@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/97e29010-f602-42e9-b3b8-121fb151a49e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscribe@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAD_EDkaz3cM5UOgBEA1KXKdfARj_JTdtW%3DC-B4ffBr7XL4NvRw%40mail.gmail.com.
Thanks @Chandra, I am beginner for this, Please help me with the complete documentation.
On Thu, Jul 19, 2018 at 3:38 PM, chandra churh chatterjee <chandrachurh...@gmail.com> wrote:
I have already used tesseract 4.0 version for training on hand written digits.The steps are as follows:1.The best way to do is use some handwriten fonts from Google or any where else.2.use the "tesstrain.sh" script to generate the starter trained data using the text corpus containing only 0-9 digits in a random function , create such a text corpus and generate the starter trained .3. Use the starter trained data to generate final traineed data after lstm trainingIf you want a detailed description, I can supply you with a complete documentation of steps.Chandra Churh Chatterjee
On Tue, Jul 17, 2018, 8:43 PM Ramakant Kushwaha <ramakant...@gmail.com> wrote:
Hi,--Recently I trying to retrain Tesseract 4.0 for recognising handwritten digits. I am following official page but finding it very difficult. It would be great if someone can elaborate below steps
- Prepare training text.(I am using jTessBoxEditor for creating box files )
- Render text to image + box file. (Or create hand-made box files for existing image data.)
- Make unicharset file. (Can be partially specified, ie created manually). (Do not how to do this)
- Make a starter traineddata from the unicharset and optional dictionary data.
- Run tesseract to process image + box file to make training data set.
- Run training on training data set.
- Combine data files.
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/97e29010-f602-42e9-b3b8-121fb151a49e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAD_EDkaz3cM5UOgBEA1KXKdfARj_JTdtW%3DC-B4ffBr7XL4NvRw%40mail.gmail.com.
--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAJkcRioxN-rmzE8KKZh_xHtgvefar-sVdGtw-gp3cZnURLi6%3DA%40mail.gmail.com.