Hi Devi,
Unfortunately, you are slightly misinformed as well.
The file with trained data for Serbian language that is currently in Tesseract's repository contains LATIN characters.
What I made is corpus of trained data that recognizes Serbian Cyrillic characters.
A good summary and explanation what
Serbian Cyrillic is can be found
here (Wikipedia article). Please pay attention to section
"Modern alphabet" in Wikipedia article.
What current version of Tesseract's srp.traineddata can recognize are letters in column labelled "Latin" (see Wikipedia article).
I would like to submit file with trained data which will make Tesseract recognize letters in column "Cyrillic" (again, see Wikipedia article).
Again, I did not get a clear answer to my question - how to submit this file to Tesseract's repository?
Shall I assume that I need to open an issue and submit trained data there? Please clarify.
Regards,
Zoltan