--
You received this message because you are subscribed to the Google Groups "tesseract-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-de...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at http://groups.google.com/group/tesseract-dev.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-dev/02e6d16d-8b71-44ba-a2f9-bb150b807e41%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
The code to generate and test the Ancient Greek OCR training data is in several small git repositories. It is all free software under the Apache License 2.0.
--
--
You received this message because you are subscribed to the Google Groups "tesseract-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-de...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at http://groups.google.com/group/tesseract-dev.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-dev/CAHh9-xv6qd6X5jvF1VgOggGdv43_j6ZJN9Fagv1%2BFf-%3D-B%3D-rA%40mail.gmail.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-dev/CAGuE8nXYz%3DzxYgMn6kYzaUAeFb%2BrVCGTaNY3to6UOOAkC4XR6Q%40mail.gmail.com.
Ray,1. I will be happy to test the devanagari based languages as well as other Indic ones - if there is some objective way of measuring the accuracy for the same. Is there any test suite or recommended method for the same?
2. Also, I noticed that there is a directory for Persian Langdata but no traineddata for it.
3. It would be helpful, if we can have a page which symlinks to external (non-google) traineddata files eg. grc, per, gle_uncial etc.
4. Is there a recommended method for listing language-script combinations eg. Sindhi can be written in devanagari and persian scripts - so should the traineddata be snd_deva and snd_per??
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-dev/CAG2NduWG-WtjTtg%3Dj0Zq1v_pKmvUYhHfw03Kru1kWqeLDCd51g%40mail.gmail.com.
If you can share the format for test data, I can try and provide you with files for other Indian languages, specially devanagari based.
Alternately, you can suggest if there is a way to get access to Google's internal tool for this.
- sent from my phone. excuse the brevity.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-dev/CAGuE8nXyAR0LraRM_8UoNd0bdWekaPsEW-Ym04WAcZC1xGxHCw%40mail.gmail.com.
All these responses look correct to me.The actual errors are:grc and fil shouldn't be in the valid language codes list, as they are the wrong variant of ISO 632.
That would also reduce the risk of accidentally overwriting Nick White's grc.traineddata in the future.
*_frak is Fraktur variant of languageequ is Math / equation detection moduleosd is Orientation and script detection module
zxx No linguistic content; Not applicable |