Can't encode transcription

112 views
Skip to first unread message

Genet Gessessew

unread,
Sep 26, 2023, 5:25:40 AM9/26/23
to tesseract-ocr
I am new to tesseract and I have tried to train a Tesseract model for Amharic language
 
and it never stops when it starts like this
Can't encode transcription: 'ህ' in language '' Encoding of string failed! Failure bytes: ffffffe1 ffffff8d ffffffad


anybody aware of this problem and how can I fine tune amh.traineddata? I have followed this tutorial GitHub - livezingy/tesstrain-win: Train Tesseract LSTM with make on Windows

Des Bw

unread,
Sep 26, 2023, 11:35:38 AM9/26/23
to tesseract-ocr
I am also training for Amharic. 
I am pretty sure you are using Windows OS. I had exactly the same problem with it. It think it is contingent with Unicode. But, I was not able to solve the issue. I now installed Ubuntu on the side; and everything works fine. 

Des Bw

unread,
Sep 26, 2023, 11:38:15 AM9/26/23
to tesseract-ocr
Are you planning to fine tune for a specific font, or want to improve the overall accuracy of the best model?
Reply all
Reply to author
Forward
0 new messages