Myanmar.unicharset file in langdata repo is incomplete.

46 views
Skip to first unread message

Pndaza

unread,
Jun 22, 2019, 8:03:53 AM6/22/19
to tesseract-ocr
When running combine_lang_model, the following waring occurs.

Warning: properties incomplete for index 6 = ိ
Warning: properties incomplete for index 11 = ်
Warning: properties incomplete for index 22 = ြ
Warning: properties incomplete for index 39 = ွ
Warning: properties incomplete for index 42 = ့
Warning: properties incomplete for index 45 = ဲ
Warning: properties incomplete for index 57 = ှ
Warning: properties incomplete for index 58 = ံ
Warning: properties incomplete for index 64 = ီ
Warning: properties incomplete for index 84 = ဩ
Warning: properties incomplete for index 105 = ဪ

I checks Myanmar.unicharset .ြ  ဩ and ဪ are missing. 
I added index for ဩ and waring disappear.
others contain in Myanmar.unicharset.

The following is comment from unicharset.h
// Returns true if any of the top/bottom/width/bearing/advance ranges/stats is empty.

ိ 0 58,59,255,255,211,215,0,0,0,0 Myanmar 44 17 44 ိ # ိ [102d ]
ီ 0 58,59,255,255,211,215,0,0,0,0 Myanmar 45 17 45 ီ # ီ [102e ]
ဲ 0 58,59,255,255,211,215,0,0,0,0 Myanmar 49 17 49 ဲ # ဲ [1032 ]
ံ 0 58,59,255,255,211,215,0,0,0,0 Myanmar 50 17 50 ံ # ံ [1036 ]
့ 0 0,0,255,255,211,215,0,0,0,0 Myanmar 51 17 51 ့ # ့ [1037 ]
 
bearing and advance for those are zero,
i look at Devanagari.unicharset 

 ं 0 62,76,194,242,81,178,0,27,0,77 Devanagari 2 17 2 ं # ं [902 ]

and I changed for ိ
ိ 0 58,59,255,255,211,215,1,27,1,27 Myanmar 44 17 44 ိ # ိ [102d ]
Nomore Warining.

  • min_bearing, max_bearing: how far from the usual start position does the leftmost part of the character begin.

  • min_advance, max_advance: how far from the printer’s cell left do we advance to begin the next character.

I readed doc but I can't understand and calculate.

pls help me.





Bûn-lī Tshuà

unread,
Aug 16, 2019, 4:54:47 AM8/16/19
to tesseract-ocr
According #2481, the glyph metrics aren't used in LSTM training.

You may just skip the warnings.



Pndaza於 2019年6月22日星期六 UTC+8下午8時03分53秒寫道:
Reply all
Reply to author
Forward
0 new messages