I'm training Tesseract on Windows for a new font and everything went pretty well until the
set_unicharset_properties command step:
set_unicharset_properties -U .\unicharset -O .\unicharset2 -F "C:\Windows\Fonts\Roman.tff" --script_dir='C:\Program Files (x86)\Tesseract-OCR\training'
Loaded unicharset of size 7 from file .\unicharset
Setting unichar properties
Other case c of C is not in unicharset
Other case f of F is not in unicharset
Setting script properties
Failed to load script unicharset from:C:\Program Files (x86)\Tesseract-OCR\training/Latin.unicharset
Warning: properties incomplete for index 3 = C
Warning: properties incomplete for index 4 = 0
Warning: properties incomplete for index 5 = 1
Warning: properties incomplete for index 6 = F
Writing unicharset to file .\unicharset2
I've verified that Latin.unicharset is in the right directory.
The problem (I'm pretty sure) is on the end of this line :
Failed to load script unicharset from:C:\Program Files (x86)\Tesseract-OCR\training/Latin.unicharset
The thing is that the training software adds a "/" instead of a "\".
I've looked on unicharset_training_utils.cpp, in the line 166, the "/" is added without taking care if the command is used on Windows or Linux.
Is there a solution for Windows to load Latin.unicharset even with this "/" ?
If not, what is the easiest solution ?
For information, my unicharset2 file looks like that :
7
NULL 0 Common 0
Joined 7 0,255,0,255,0,0,0,0,0,0 Latin 1 0 1 Joined # Joined [4a 6f 69 6e 65 64 ]a
|Broken|0|1 f 0,255,0,255,0,0,0,0,0,0 Common 2 10 2 |Broken|0|1 # Broken
C 5 0,255,0,255,0,0,0,0,0,0 Latin 3 0 3 C # C [43 ]A
0 8 0,255,0,255,0,0,0,0,0,0 Common 4 2 4 0 # 0 [30 ]0
...