Albrecht Hilker
unread,Jul 7, 2014, 8:55:37 PM7/7/14Sign in to reply to author
Sign in to forward
You do not have permission to delete messages in this group
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to tesser...@googlegroups.com
The manual "Training Tesseract 3" says:
> Tesseract needs to know about different shapes of the same character by
having different fonts separated explicitly.
> This used to be limited to
32 fonts, but the limit has been raised to 64.
> It is set by the constant
MAX_NUM_CONFIGS defined in intproto.h.
> Note that runtime is heavily
dependent on the number of fonts provided, and training more than 32
will result in a significant slow-down.
I analyzed the number of fonts in eng.traineddata and I was very surprised that there have been 358 fonts trained !
get_fontinfo_table().size() returns 358 !
Can anybody explain me this contradiction ?
Fonts in eng.traineddata:
AR_PL_UKai_CN,
AR_PL_UKai_Patched,
AR_PL_UKai_TW,
AR_PL_UMing_CN_Light,
AR_PL_UMing_Patched_Light,
AR_PL_UMing_TW_MBE_Light,
Aboriginal_Sans,
Aboriginal_Sans_Bold_Italic,
Aboriginal_Sans_Italic,
Aboriginal_Serif,
Aboriginal_Serif_Bold,
Aboriginal_Serif_Bold_Italic,
Aboriginal_Serif_Italic,
Abyssinica_SIL,
AlArabiya,
AlBattar,
AlHor,
AlManzomah,
AlMohanad,
Andale_Mono,
Ani,
AnjaliOldLipi,
Arab,
Arial,
Arial_Black,
Arial_Bold,
Arial_Bold_Italic,
Arial_Italic,
BPG_Chveulebrivi,
BPG_Chveulebrivi_Bold,
BPG_Courier,
BPG_Courier_Bold,
BPG_Elite,
BPG_Elite_Bold,
BPG_Glaho,
BPG_Glaho_Bold,
BPG_Rioni,
BPG_Rioni_Bold,
BPG_Unicode_Standard,
Baekmuk_Batang,
Baekmuk_Batang_Patched,
Baekmuk_Dotum,
Baekmuk_Gulim,
Baekmuk_Headline,
Bangla,
Bitstream_Vera_Sans,
Bitstream_Vera_Sans_Bold,
Bitstream_Vera_Sans_Bold_Oblique,
Bitstream_Vera_Sans_Mono,
Bitstream_Vera_Sans_Mono_Bold,
Bitstream_Vera_Sans_Mono_Bold_Oblique,
Bitstream_Vera_Sans_Mono_Oblique,
Bitstream_Vera_Sans_Mono_Roman,
Bitstream_Vera_Sans_Oblique,
Bitstream_Vera_Sans_Roman,
Bitstream_Vera_Serif,
Bitstream_Vera_Serif_Bold,
Bitstream_Vera_Serif_Roman,
CaslonishFraxx,
Century_Schoolbook_L,
Century_Schoolbook_L_Bold,
Century_Schoolbook_L_Bold_Italic,
Century_Schoolbook_L_Italic,
Century_Schoolbook_L_Roman,
Chandas,
Cloister_Black_Light,
Comic_Sans_MS,
Comic_Sans_MS_Bold,
Cortoba,
Courier_New,
Courier_New_Bold,
Courier_New_Bold_Italic,
Courier_New_Italic,
DejaVu_Sans,
DejaVu_Sans_Bold,
DejaVu_Sans_Bold_Oblique,
DejaVu_Sans_Condensed,
DejaVu_Sans_Condensed_Bold,
DejaVu_Sans_Condensed_Bold_Oblique,
DejaVu_Sans_Condensed_Oblique,
DejaVu_Sans_Mono,
DejaVu_Sans_Mono_Bold,
DejaVu_Sans_Mono_Bold_Oblique,
DejaVu_Sans_Mono_Oblique,
DejaVu_Sans_Oblique,
DejaVu_Sans_Ultra-Light,
DejaVu_Serif,
DejaVu_Serif_Bold,
DejaVu_Serif_Bold_Italic,
DejaVu_Serif_Bold_Oblique,
DejaVu_Serif_Bold_Semi-Condensed,
DejaVu_Serif_Condensed_Bold,
DejaVu_Serif_Condensed_Bold_Italic,
DejaVu_Serif_Condensed_Italic,
DejaVu_Serif_Italic,
DejaVu_Serif_Oblique,
DejaVu_Serif_Semi-Condensed,
Dimnah,
Dustismo,
Dustismo_Roman,
Dustismo_Roman_Bold,
Dustismo_Roman_Italic,
Dustismo_Roman_Italic_Bold,
Dyuthi,
East_Syriac_Adiabene,
East_Syriac_Ctesiphon,
Electron,
Estrangelo_Antioch,
Estrangelo_Edessa,
Estrangelo_Midyat,
Estrangelo_Nisibin,
Estrangelo_Quenneshrin,
Estrangelo_Talada,
Estrangelo_TurAbdin,
FreeMono,
FreeMono_Bold,
FreeMono_Bold_Italic,
FreeMono_Bold_Oblique,
FreeMono_Italic,
FreeMono_Oblique,
FreeSans,
FreeSans_Bold,
FreeSans_Bold_Oblique,
FreeSans_Oblique,
FreeSerif,
FreeSerif_Bold,
FreeSerif_Bold_Italic,
FreeSerif_Italic,
Furat,
Garuda,
Garuda_Bold,
Garuda_Bold_Oblique,
Garuda_Oblique,
GentiumAlt,
GentiumAlt_Italic,
Georgia,
Georgia_Bold,
Georgia_Bold_Italic,
Georgia_Italic,
Granada,
Graph,
Hani,
Haramain,
Hor,
IPAGothic,
IPAMincho,
IPAPGothic,
IPAPMincho,
IPAUIGothic,
Impact,
Impact_Condensed,
Jamrul,
Jamrul_Semi-Expanded,
Japan,
Jet,
Kalimati,
Kalyani,
Kayrawan,
Kedage,
Kedage_Bold,
Kedage_Bold_Italic,
Kedage_Italic,
Khalid,
Khmer_OS,
Khmer_OS_Battambang,
Khmer_OS_Bokor,
Khmer_OS_Content,
Khmer_OS_Fasthand,
Khmer_OS_Freehand,
Khmer_OS_Metal_Chrieng,
Khmer_OS_Muol,
Khmer_OS_Muol_Light,
Khmer_OS_Muol_Pali,
Khmer_OS_Siemreap,
Khmer_OS_System,
Kochi_Gothic,
Kochi_Mincho,
LKLUG,
Lateef,
Likhan,
Linux_Biolinum_O,
Linux_Biolinum_O_Bold,
Linux_Libertine_O,
Linux_Libertine_O_Bold,
Linux_Libertine_O_Bold_Italic,
Linux_Libertine_O_C,
Linux_Libertine_O_Italic,
Lohit_Assamese,
Lohit_Bengali,
Lohit_Gujarati,
Lohit_Hindi,
Lohit_Malayalam,
Lohit_Oriya,
Lohit_Punjabi,
Lohit_Tamil,
Lohit_Telugu,
Loma,
Loma_Bold,
Loma_Bold_Oblique,
Loma_Oblique,
Lucida_Bright,
Lucida_Bright_Italic,
Lucida_Bright_Semi-Bold,
Lucida_Bright_Semi-Bold_Italic,
Lucida_Sans,
Lucida_Sans_Oblique,
Lucida_Sans_Semi-Bold,
Lucida_Sans_Semi-Bold_Oblique,
Lucida_Sans_Typewriter,
Lucida_Sans_Typewriter_Bold,
Lucida_Sans_Typewriter_Bold_Oblique,
Mallige,
Mallige_Bold,
Mallige_Bold_Italic,
Mallige_Italic,
Mashq,
Meera,
Metal,
Mitra_Mono,
Monapo,
Mukti_Narrow,
Mukti_Narrow_Bold,
Nada,
Nagham,
Nice,
Norasi,
Norasi_Bold,
Norasi_Bold_Italic,
Norasi_Bold_Oblique,
Norasi_Italic,
Norasi_Oblique,
OpenSymbol,
Ostorah,
Padauk,
Padauk_Bold,
Petra,
Phetsarath_OT,
Pothana2000,
Proclamate_Light,
Purisa_Light,
Rachana,
Rachana_w01,
RaghuMalayalam,
Rehan,
Rekha,
Saab,
Salem,
Samanata,
Samyak_Gujarati,
Samyak_Oriya,
Sazanami_Gothic,
Sazanami_Mincho,
Scheherazade,
Serto_Batnan,
Serto_Batnan_Bold,
Serto_Jerusalem,
Serto_Jerusalem_Bold,
Serto_Jerusalem_Italic,
Serto_Kharput,
Serto_Malankara,
Serto_Mardin,
Serto_Mardin_Bold,
Serto_Urhoy,
Serto_Urhoy_Bold,
Shado,
Sharjah,
TAMu_Kadambri,
TAMu_Kalyani,
TAMu_Maduram,
TSCu_Comic,
TSCu_Paranar,
TSCu_Paranar_Bold,
TSCu_Paranar_Italic,
TSCu_Times,
TakaoExGothic,
TakaoExMincho,
TakaoGothic,
TakaoMincho,
TakaoPGothic,
TakaoPMincho,
Tarablus,
Tholoth,
Tibetan_Machine_Uni,
Times_New_Roman,
Times_New_Roman_Bold,
Times_New_Roman_Bold_Italic,
Times_New_Roman_Italic,
TlwgMono,
TlwgMono_Bold,
TlwgMono_Bold_Oblique,
TlwgMono_Oblique,
TlwgTypewriter,
TlwgTypewriter_Bold,
TlwgTypewriter_Bold_Oblique,
TlwgTypewriter_Oblique,
Trebuchet_MS,
Trebuchet_MS_Bold,
Trebuchet_MS_Bold_Italic,
Trebuchet_MS_Italic,
URW_Bookman_L,
URW_Bookman_L_Bold,
URW_Bookman_L_Bold_Italic,
URW_Bookman_L_Italic,
URW_Bookman_L_Light_Italic,
UmePlus_Gothic,
UmePlus_P_Gothic,
UnBatang,
UnBatang_Bold,
UnDotum,
UnDotum_Bold,
UnifrakturMaguntia,
Unikurd_Web,
Uttara,
VL_Gothic,
VL_PGothic,
Vemana2000,
Verdana,
Verdana_Bold,
Verdana_Bold_Italic,
Verdana_Italic,
Walbaum-Fraktur,
Webdings,
WenQuanYi_Zen_Hei,
Wyld,
Wyld_Italic,
aakar,
batang,
chandas1-1,
chandas1-2,
cheluvi,
dotum,
gargi,
gulim,
hline,
ipag,
ipagp,
ipagui,
ipam,
ipamp,
kalimati,
kochi-gothic,
kochi-gothic-subst,
kochi-mincho,
kochi-mincho-subst,
lklug,
lohit_bn,
lohit_gu,
lohit_hi,
lohit_ml,
lohit_or,
lohit_pa,
lohit_ta,
lohit_te,
monapo,
ori1Uni,
padmaa,
padmaa_Bold,
suruma