For now, everything is ok. I can see eng.strangelabelmachinefont.exp0.box file created with this content:
8
NULL 0 Common 0
Joined 7 0,255,0,255,0,0,0,0,0,0 Latin 1 0 1 Joined # Joined [4a 6f 69 6e 65 64 ]a
|Broken|0|1 15 0,255,0,255,0,0,0,0,0,0 Common 2 10 2 |Broken|0|1 # Broken
1 8 0,255,0,255,0,0,0,0,0,0 Common 3 2 3 1 # 1 [31 ]0
8 8 0,255,0,255,0,0,0,0,0,0 Common 4 2 4 8 # 8 [38 ]0
3 8 0,255,0,255,0,0,0,0,0,0 Common 5 2 5 3 # 3 [33 ]0
5 8 0,255,0,255,0,0,0,0,0,0 Common 6 2 6 5 # 5 [35 ]0
0 8 0,255,0,255,0,0,0,0,0,0 Common 7 2 7 0 # 0 [30 ]0
Reading unicharset ...
Bad format in tr file, reading fontname, unichar
Bad box coordinates in boxfile string! 0 Common 0
Bad format in tr file, reading box coords
Bad box coordinates in boxfile string! 7 0,255,0,255,0,0,0,0,0,0 Latin 1 0 1 Joined # Joined [4a 6f 69 6e 65 64 ]a
Bad format in tr file, reading box coords
Bad box coordinates in boxfile string! 15 0,255,0,255,0,0,0,0,0,0 Common 2 10 2 |Broken|0|1 # Broken
Bad format in tr file, reading box coords
Bad box coordinates in boxfile string! 8 0,255,0,255,0,0,0,0,0,0 Common 3 2 3 1 # 1 [31 ]0
Bad format in tr file, reading box coords
Bad box coordinates in boxfile string! 8 0,255,0,255,0,0,0,0,0,0 Common 4 2 4 8 # 8 [38 ]0
Bad format in tr file, reading box coords
Bad box coordinates in boxfile string! 8 0,255,0,255,0,0,0,0,0,0 Common 5 2 5 3 # 3 [33 ]0
Bad format in tr file, reading box coords
Bad box coordinates in boxfile string! 8 0,255,0,255,0,0,0,0,0,0 Common 6 2 6 5 # 5 [35 ]0
Bad format in tr file, reading box coords
Bad box coordinates in boxfile string! 8 0,255,0,255,0,0,0,0,0,0 Common 7 2 7 0 # 0 [30 ]0
Bad format in tr file, reading box coords
Building master shape table
Computing shape distances...
Stopped with 0 merged, min dist 999.000000
Computing shape distances...
Stopped with 0 merged, min dist 999.000000
Computing shape distances...
Stopped with 0 merged, min dist 999.000000
Computing shape distances... 0
Stopped with 0 merged, min dist 999.000000
Computing shape distances... 0
Stopped with 0 merged, min dist 999.000000
Computing shape distances... 0
Stopped with 0 merged, min dist 999.000000
Computing shape distances... 0
Stopped with 0 merged, min dist 999.000000
Computing shape distances... 0
Stopped with 0 merged, min dist 999.000000
Computing shape distances...
Stopped with 0 merged, min dist 999.000000
Computing shape distances...
Stopped with 0 merged, min dist 999.000000
Computing shape distances... 0 1 2 3 4
Stopped with 0 merged, min dist 0.263473
Master shape_table:Number of shapes = 5 max unichars = 1 number with multiple unichars = 0
pytesseract.image_to_string(Image.open(image_path), lang="eng", config='--psm 10 --oem 3 -c tessedit_char_whitelist=0123456789')
But my goal is to create dataset to recognize digits in this situation for example:
I also try with some algorithms to remove these horizontal lines but results are not better, so it's better than to create custom .dataset
Does anyone have any suggestion, is this problem with my version on tesseract, or I have to something manually with unicharset file?
Thanks.