I think jTessBoxEditor 2.0 has been updated to include Tesseract 4.00dev.
1- Could any body confirm because I am not getting better results for Arabic using it.
Ibr,
You are incorrect in your description of LSTM training.
What you are doing will use the ara.traineddata provided in the repo, there will be no change in output.
Once lstmf files are created, you have to run lstmtraining which will run for days/weeks to give you a good result.
Please read about LSTM training on wiki.
--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscribe@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/1c842b1e-1dc1-418b-a5b7-368c11e7dfa5%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Ibr,
You are incorrect in your description of LSTM training.
What you are doing will use the ara.traineddata provided in the repo, there will be no change in output.
Once lstmf files are created, you have to run lstmtraining which will run for days/weeks to give you a good result.
Please read about LSTM training on wiki.
On May 4, 2017 2:58 PM, "Ibr" <ibr.h...@gmail.com> wrote:
--if you are referring to tesseract 4.00alpha with liptonica 1.74.1, and if you compiled them in the correct way and got the binaries that you need for training lmstf files, then I recommend to follow the suggestions that is made by tesseract devs which is: once you create an .lstmf file for a certain font (that can be used for Arabic writing) then get the official ara.traineddata file from GitHub paste it in tessdata folder, and the lstmf file in tesseract folder and run the command tesseract text_image result_text -l ara --oem 1what Arabic characters exactly are you trying to enhance the accuracy for ?
On Saturday, April 8, 2017 at 11:52:25 AM UTC+3, Ahmad Moawad wrote:Hello All,
I want to make training for Arabic language in Tesseract 4.0, and The result of this version is great but still need some tunning, so I got jTessBoxEditor 2.0 beta.
I tried to modify the incorrect characters and build ara.traineddata. After copying the ara.traineddata to /usr/share/tesseract-ocr/4.00/tessdata, I got random characters when I run the tesseract on the image.
So any suggestion of how making training for Version 4.0, I already know that that last version 3.0x cube doesn't included in 4.0 LSTM or waiting until Ray makes another updated ara.traineddata.
,Thanks.
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
Ibr,
You are incorrect in your description of LSTM training.
What you are doing will use the ara.traineddata provided in the repo, there will be no change in output.
Once lstmf files are created, you have to run lstmtraining which will run for days/weeks to give you a good result.
Please read about LSTM training on wiki.
On May 4, 2017 2:58 PM, "Ibr" <ibr....@gmail.com> wrote:
--if you are referring to tesseract 4.00alpha with liptonica 1.74.1, and if you compiled them in the correct way and got the binaries that you need for training lmstf files, then I recommend to follow the suggestions that is made by tesseract devs which is: once you create an .lstmf file for a certain font (that can be used for Arabic writing) then get the official ara.traineddata file from GitHub paste it in tessdata folder, and the lstmf file in tesseract folder and run the command tesseract text_image result_text -l ara --oem 1what Arabic characters exactly are you trying to enhance the accuracy for ?
On Saturday, April 8, 2017 at 11:52:25 AM UTC+3, Ahmad Moawad wrote:Hello All,
I want to make training for Arabic language in Tesseract 4.0, and The result of this version is great but still need some tunning, so I got jTessBoxEditor 2.0 beta.
I tried to modify the incorrect characters and build ara.traineddata. After copying the ara.traineddata to /usr/share/tesseract-ocr/4.00/tessdata, I got random characters when I run the tesseract on the image.
So any suggestion of how making training for Version 4.0, I already know that that last version 3.0x cube doesn't included in 4.0 LSTM or waiting until Ray makes another updated ara.traineddata.
,Thanks.
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesser...@googlegroups.com.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/7bf66a4e-f85f-4b87-bf82-5688cb2cac8a%40googlegroups.com.