Trying to combine files to form a single traineddata, having error in output

239 views
Skip to first unread message

Marco Vong

unread,
Apr 15, 2017, 2:59:11 AM4/15/17
to tesseract-ocr
I'm trying to train tesseract with hand written fonts, for convenience, i want to combine all 10 fonts under one name.
However, after following all the instructions, i tried to run tesseract with -l num1, error occurs, and the output txt is blank.

Error: unichar 8 in normproto file is not in unichar set.
Error: unichar q in normproto file is not in unichar set.
Error: unichar 1 in normproto file is not in unichar set.
Error: unichar + in normproto file is not in unichar set.
Error: unichar I in normproto file is not in unichar set.
Error: unichar L in normproto file is not in unichar set.
Error: unichar i in normproto file is not in unichar set.
Error: unichar J in normproto file is not in unichar set.
Error: unichar 2 in normproto file is not in unichar set.
Error: unichar 3 in normproto file is not in unichar set.
Error: unichar 5 in normproto file is not in unichar set.
Error: unichar 4 in normproto file is not in unichar set.
Error: unichar ¢ in normproto file is not in unichar set.
Error: unichar é in normproto file is not in unichar set.
Error: unichar S in normproto file is not in unichar set.
Error: unichar 6 in normproto file is not in unichar set.
Error: unichar 7 in normproto file is not in unichar set.
Error: unichar B in normproto file is not in unichar set.
Error: unichar 9 in normproto file is not in unichar set.
Error: unichar fi in normproto file is not in unichar set.
Error: unichar a in normproto file is not in unichar set.
Error: unichar ? in normproto file is not in unichar set.

is there a way that I can solve this  problem 
num1.font0.exp0.box
num1.font9.exp0.box
num1.font1.exp0.box
num1.font2.exp0.box
num1.font3.exp0.box
num1.font4.exp0.box
num1.font5.exp0.box
num1.font6.exp0.box
num1.font7.exp0.box
num1.font8.exp0.box

Alain Ghawi

unread,
Apr 21, 2017, 2:43:20 AM4/21/17
to tesseract-ocr
Hello,

I have opened many of your box files and we can clearly see that they are the same letter!!! For example, for the 1st box all of them starts with 0. The second box file all starts with 1. Therefore, I think there is a problem with your box files. Secondly, the error suggests that the normproto file is not in unicharset. Did you use unicharset_extractor properly making sure of the glyph metrics?
Reply all
Reply to author
Forward
0 new messages