Can't encode transcription

Genet Gessessew

unread,

Sep 26, 2023, 5:25:40 AM9/26/23

to tesseract-ocr

I am new to tesseract and I have tried to train a Tesseract model for Amharic language

and it never stops when it starts like this

Can't encode transcription: 'ህ' in language '' Encoding of string failed! Failure bytes: ffffffe1 ffffff8d ffffffad

anybody aware of this problem and how can I fine tune amh.traineddata? I have followed this tutorial GitHub - livezingy/tesstrain-win: Train Tesseract LSTM with make on Windows

Des Bw

unread,

Sep 26, 2023, 11:35:38 AM9/26/23

to tesseract-ocr

I am also training for Amharic.

I am pretty sure you are using Windows OS. I had exactly the same problem with it. It think it is contingent with Unicode. But, I was not able to solve the issue. I now installed Ubuntu on the side; and everything works fine.

Des Bw

unread,

Sep 26, 2023, 11:38:15 AM9/26/23

to tesseract-ocr

Are you planning to fine tune for a specific font, or want to improve the overall accuracy of the best model?

Reply all

Reply to author

Forward