--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscribe@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/676b01e6-139a-4691-9841-78c2a4943b7e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
You could also try Han traineddata files, which have both english and chi_sim. It may have better support for *.,
On 11-Oct-2017 9:08 PM, "ShreeDevi Kumar" <shree...@gmail.com> wrote:
Please add this as feedback in tessdata_fast as an issue so that Ray can include for next training.You can try the plus minus fine-tune training to see if that helps.
On 11-Oct-2017 8:18 PM, <superw...@gmail.com> wrote:
We had an application that used Tesseract 3.05 to recognize some Chinese document image. The results were good but performance is pretty slow. We discovered "chi_sim_fast" trained data for Tesseract 4.0 which indeed having much better performance and slightly better accuracy. However, in the 3.05 version the "*" character can be recognize while in 4.0, the "*" character recognized as another Chinese character. Is it possible to add the "*" character to the list of recognition and continue using the "chi_sim_fast" data. Attache with an example image, the desired output is "年*十" while the actual result is "年友十". I have tried adding "-c tessedit_char_whitelist=*" to the command, but no luck.--Anyone has idea about this case, or I will need to retrain a data set for my own?Thank you!
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.