Build from source failed to recognize arabic

333 views
Skip to first unread message

Essam Zaky

unread,
Apr 6, 2017, 2:17:44 PM4/6/17
to tesseract-ocr
Hi dears

i had build tesseract and training tools from source for windows and VS2015

when recognize English page it succeeded
but when try to recognize arabic page it fails

C:\Users\emz\tesseract\build\bin\Debug>tesseract eurotext.tif eurotext -l eng
Tesseract Open Source OCR Engine v4.00.00dev with Leptonica
Page 1

C:\Users\emz\tesseract\build\bin\Debug>tesseract sample1.tif sample1 -l ara
Error: LSTM requested, but not present!! Loading tesseract.
tessdata_manager.SeekToStart(TESSDATA_INTTEMP):Error:Assert failed:in file C:\Us
ers\emz\tesseract\classify\adaptmatch.cpp, line 537

what could be the reason of the error


Best regards
Essam

universal reseller

unread,
Apr 6, 2017, 2:22:04 PM4/6/17
to tesser...@googlegroups.com
​send output of
tesseract --list-langs

in cli

Essam Zaky

unread,
Apr 6, 2017, 3:01:38 PM4/6/17
to tesseract-ocr
Hi @.peiman
thanks for reply

i found the problem

I was installed old build for v4 from DanBolomBerg site

and the TESSDATA_PREFIX was refering to old version with cube

now i updated the TESSDATA_PREFIXin system enviornment to the new downloaded data it's working

Thanks again

universal reseller

unread,
Apr 6, 2017, 3:25:53 PM4/6/17
to tesser...@googlegroups.com
​what is accuracy of result for you!?

Essam Zaky

unread,
Apr 8, 2017, 11:00:32 AM4/8/17
to tesseract-ocr
For the sample images i used

The accuracy for english is good

but for arabic the cube is still better than current LSTM
Reply all
Reply to author
Forward
0 new messages