-l eng+urd not working

128 views
Skip to first unread message

Shubham Gupta

unread,
Aug 30, 2019, 1:56:35 AM8/30/19
to tesser...@googlegroups.com
Hi All

I have one query i.e. if my Image contains both Urdu and English text, I used -l parameter as eng+urd, but my output is all messed up and is not correct. Can anyone help me fix this or someone who is facing the same problem?

I have attached the image below.


Thanks and Regards
Shubham
mix.txt
mix.jpg

Shree Devi Kumar

unread,
Aug 30, 2019, 2:55:23 AM8/30/19
to tesseract-ocr
Try urd+eng to give precedence to Urdu.


--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAOYxz4rt61etBF%2BXgdzqRLDFs72h_KJ4mj1yYCt5dSbOrGusCw%40mail.gmail.com.


--

____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

Shubham Gupta

unread,
Aug 30, 2019, 4:30:42 AM8/30/19
to tesser...@googlegroups.com
Tried that combination already.
I used following combinations:
1) urd
2)eng
3) urd+eng
4) eng+urd

No combination used above gave me meaningful output.

Can someone suggest me any new approach on this? Can I create a Hybrid model by creating Training data consisting of both urdu and english and train tesseract on that data? Is it a good approach?

Thanks
Shubham

mix_only_Eng.txt
mix.jpg
mix_urd+eng.txt
mix_eng+urd.txt
mix_only_urd.txt

Shree Devi Kumar

unread,
Aug 30, 2019, 4:34:36 AM8/30/19
to tesseract-ocr
Which traineddata are you using ? from tessdata, tessdata_best or tessdata_fast?

Are accuracy problems only with mixed image or even for urdu only?

Shubham Gupta

unread,
Aug 30, 2019, 4:51:30 AM8/30/19
to tesser...@googlegroups.com
I am using tessdata_best.
There are no problems if text image contains a single language. Tesseract works fine on them.

Note: The main problem occurs when same line in the image contains 2 language.

Reply all
Reply to author
Forward
0 new messages