Arabic support

15 views
Skip to first unread message

Mohamed Magdy

unread,
Sep 20, 2007, 4:52:48 AM9/20/07
to tesser...@googlegroups.com
Hello

As I asked http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract
at the bottom..

I made a try to train tesseract to use Arabic..I used one training image
only. I got gibberish, but that gibberish contained the first two
characters of the first word بس.

A problem I found was that when I made the box file, it didn't add all
the words in the picture.. just

بسم الله الرحمن الرحيم

نعيب الزمان والعيب فينا وما

From the image/file...

Another thing is two characters had three lines, I merged them as we
merge 2 lines...Is that correct?

If it isn't entirely hopeless, I will try again..

Thanks.

Mohamed

Files:

http://delicieux.info/fulllog.txt <- log

http://delicieux.info/train1.txt <- training text

http://delicieux.info/fontfile.tif <- training image

http://delicieux.info/fontfile.box <- box file

http://delicieux.info/output.txt <- output of tesseract fontfile.tif
output -l ara

Reply all
Reply to author
Forward
0 new messages