Arabic Text Sort Left to Right

167 views
Skip to first unread message

Ishak DÖLEK

unread,
Nov 23, 2019, 2:58:37 PM11/23/19
to tesser...@googlegroups.com
Hi;
I create a trainneddata for an Arabic font.
I prepared the ara.training_text file to create synthetic data.
I create image and box files with Text2Image.
Then I create the Lstmf files.
I start training.
During training, the text lines are sorted from left to right. Is that normal?

GROUND  TRUTH : هجبرع ردراو ىراثآ ردراو ىقارم هعبتت ردناقشلاچ رد ىكذ ردشمتيا تئشن ند هيبرح بتكم ىغيدلوا ىلشيريول
ALIGNED TRUTH : هجبرع ردراو ىراثآ ردراو ىقارم هعبتت ردناقشلاچ رد ىكذ ردشمتيا تئشن ند هيبرح بتكم ىغيدلوا ىلشيريول
BEST OCR TEXT : هجبرع ردراو ىراثآ ردراو ىقارم هعبگ ردناقشلاچ رد ىكن ردشمتيا تثشن ند هيبرح بتكم ىغيدلوا ىلشيريول

Otherwise I need to sort each line of training text from left to right before training?

Thanks in advance

Shree Devi Kumar

unread,
Nov 23, 2019, 11:35:52 PM11/23/19
to tesseract-ocr
Training for all languages including RTL languages is done in LTR order.
See https://github.com/tesseract-ocr/tesseract/issues/2082 and other related issues in github

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAA%3DdkuYk%2BR5UB0ywPzKFeAzrN2u0ebz2CRV7KTPSvTLugMA34Q%40mail.gmail.com.


--

____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
Reply all
Reply to author
Forward
0 new messages