Hi!
I'm kasun an undergraduate who is currently involved with some ocr based research in Sri Lanka. I have been trying to train tesseract for sinhalses language[1] comprehensively. I'm using the unicharabigs file to overcome some of the problems I'm having during the training. I'm grateful if somebody can sort out some of the problems i'm currently facing.
The first problem is that I don't really understand how the optional field in unicharambigs is working.
The second problem is I'm able to get one of the unichrambigs to work but not both.
2 ෙ ක 1 කෙ 1 ( ෙ ක - U+0DD9 U+0D9A කෙ - U+0D9A U+0DD9 )
3 ෙ ක ා් 1 කෝ 1 (ෙ ක ා් - U+0DD9 U+0D9A U+0DCF U+0DCA කෝ - U+0D9A U+0DDD )
I understand that if the first rule is invoked then the second rule would become dormant since the unicodes U+0DD9 U+0D9A will switch places. but if i put
2 කෙ ා් 1 කෝ 1 ( කෙ ා් - U+0D9A U+0DD9 U+0DCF U+0DCA කෝ- U+0D9A U+0DDD )
it won't work either.
Any help in this regard is highly appreciated! Thanks in advance.
References: