read multi-language ( arabic and english) image

116 views
Skip to first unread message

thinker

unread,
Jun 27, 2014, 4:48:52 AM6/27/14
to tesser...@googlegroups.com
reading image  with multiple language (arabic and english)  by using  -l ara+eng option gives garbage output.

how to read that image with both arabic and english content?

Nick White

unread,
Jun 27, 2014, 12:50:25 PM6/27/14
to tesser...@googlegroups.com
On Fri, Jun 27, 2014 at 01:48:52AM -0700, thinker wrote:
> reading image with multiple language (arabic and english) by using -l
> ara+eng option gives garbage output.

There are currently a couple of bugs with combining Arabic and
English together, so it isn't working. I'd recommend you add any
extra information you have to those bugs, to help the issues be
resolved sooner:

https://code.google.com/p/tesseract-ocr/issues/detail?id=899
https://code.google.com/p/tesseract-ocr/issues/detail?id=1220

Nick

Janusz S. Bien

unread,
Jun 27, 2014, 12:54:43 PM6/27/14
to tesser...@googlegroups.com
Quote/Cytat - Nick White <nick....@durham.ac.uk> (Fri 27 Jun 2014
06:49:46 PM CEST):
In the meantime you can try to merge the results of the separate runs
for each language. You will find hocr-merge at

https://bitbucket.org/jwilk/marasca-wbl

in misc/xhocr /

Best regards

Janusz
--
Prof. dr hab. Janusz S. Bień - Uniwersytet Warszawski (Katedra
Lingwistyki Formalnej)
Prof. Janusz S. Bień - University of Warsaw (Formal Linguistics Department)
jsb...@uw.edu.pl, jsb...@mimuw.edu.pl, http://fleksem.klf.uw.edu.pl/~jsbien/

thinker

unread,
Jun 30, 2014, 4:47:25 AM6/30/14
to tesser...@googlegroups.com
thanks for your reply.
Reply all
Reply to author
Forward
0 new messages