Śerena Kovitch
ŁAGUNA EVREIST
Äna Optici
Orğu Moninck
(I don't have to recognize words)
Latin.traineddata (fast integer) is doing well with the diacritics, but there are a lot of characters I don't need like numbers, %, ﹕ ,﹖ ,﹗,﹙ ,﹚ ,﹛ ,﹜ ,﹝ ,﹞ ,﹟ ,﹠ ,﹡ ,﹢ ,﹣ ,﹤,﹥,﹦ ,﹨ ,﹩ ﹪ ,﹫,and much more. And so Latin.traineddata is too slow.
So I thought I take eng.traineddata (best float for LSTM) and I train it for the diacritics. But there are almost 400 diacritics. So I don't know if fine-tuning for such amount of characters is a good idea?
However I tried it but the quality is very poor.
I trained with eng.training_text (a English text of 72 lines) and I added all the diacritics several times. The char error rate during lstmeval is around 0.1. I did a test with 80 documents, and I read 30 names correct. (on each document there is one name). (time is similar to Latin.traineddata)
What can I do to get a model that is as good as Latin.traineddata on diacritics but is much faster in ocr reading?
Thank you.
--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/b9ddf333-1229-45d3-9a02-809973294a47%40googlegroups.com.
To unsubscribe from this group and stop receiving emails from it, send an email to tesser...@googlegroups.com.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/d692a36f-81c4-4226-94d6-15ec8238673b%40googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/d692a36f-81c4-4226-94d6-15ec8238673b%40googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/f2e55590-d6e6-4322-b64b-5954735a6360%40googlegroups.com.