Creating a new language pack for Javanese Script

102 views
Skip to first unread message

Christopher Imantaka Halim

unread,
Apr 22, 2018, 4:46:06 PM4/22/18
to tesseract-ocr
Hi,

I want to develop an OCR for Javanese Script / Aksara.
https://en.wikipedia.org/wiki/Javanese_script

Plan on using Tesseract version 4.0
I've read the wiki but somehow got confused.

What do I need to prepare, to start the bare minimum training process? (for Tesseract 4.0)
In some other thread someone said that training using image files are not supported yet.
Also found out that box file/tiff pairs are not supported also.
(I did try making one box file, using this online tool: https://pp19dd.com/tesseract-ocr-chopper/)

Do we have an example of the training "inputs" somewhere on the github projects?

Sorry if this is a stupid question, I'm a newbie. :)

Thanks before

shree

unread,
Apr 23, 2018, 4:06:48 AM4/23/18
to tesseract-ocr
Reply all
Reply to author
Forward
0 new messages