Help setting params to distinguish close-together characters

31 views
Skip to first unread message

Adriel Matei

unread,
Jun 25, 2024, 1:46:07 PM (4 days ago) Jun 25
to tesseract-ocr
Hi!

I am trying to use tesseract to programatically read numbers. The program works fine, except it seems to frequently make mistakes on the `74` sequence. Namely, it misses the existence of the `7` entirely. I think this is caused by the font being used making the two characters extremely close together (check the attachment for reference).

Are there any parameters I could set to improve the situation? I am already using a character whitelist, and have tried with the numeric mode turned both on and off.

Thanks in advance!
score.png

Tom Morris

unread,
Jun 26, 2024, 2:00:29 PM (3 days ago) Jun 26
to tesseract-ocr
Dealing with a font which is so heavily kerned (the technical term for the intercharacter spacing) is going to be difficult, I suspect.

One possibility might be to train the 74 combo as effectively a ligature and recognize it as single symbol, but I have no idea if a) you can invest this level of effort or b) whether it'll work.

Tom

p.s. There are also a number of kerning related parameters, but I've never played with them:

$ tesseract --print-parameters | grep -E "kern|kn"
tosp_redo_kern_limit 10 No.samples reqd to reestimate for row
tosp_old_to_constrain_sp_kn 0 Constrain relative values of inter and intra-word gaps for old_to_method.
tosp_only_small_gaps_for_kern 0 Better guess
tosp_fuzzy_limit_all 1 Don't restrict kn->sp fuzzy limit to tables
tosp_rule_9_test_punct 0 Don't chng kn to space next to punct
tosp_flip_fuzz_kn_to_sp 1 Default flip
tosp_flip_fuzz_sp_to_kn 1 Default flip
tosp_old_sp_kn_th_factor 2 Factor for defining space threshold in terms of space and kern sizes
tosp_threshold_bias1 0 how far between kern and space?
tosp_threshold_bias2 0 how far between kern and space?
tosp_gap_factor 0.83 gap ratio to flip sp->kern
tosp_kern_gap_factor1 2 gap ratio to flip kern->sp
tosp_kern_gap_factor2 1.3 gap ratio to flip kern->sp
tosp_kern_gap_factor3 2.5 gap ratio to flip kern->sp
tosp_enough_small_gaps 0.65 Fract of kerns reqd for isolated row stats
tosp_table_kn_sp_ratio 2.25 Min difference of kn & sp in table
tosp_table_fuzzy_kn_sp_ratio 3 Fuzzy if less than this
tosp_fuzzy_kn_fraction 0.5 New fuzzy kn alg
tosp_min_sane_kn_sp 1.5 Don't trust spaces less than this time kn
tosp_init_guess_kn_mult 2.2 Thresh guess - mult kn by this
tosp_max_sane_kn_thresh 5 Multiplier on kn to limit thresh
tosp_flip_caution 0 Don't autoflip kn to sp when large separation
tosp_large_kerning 0.19 Limit use of xht gap with large kns
tosp_dont_fool_with_small_kerns -1 Limit use of xht gap with odd small kns
tosp_silly_kn_sp_gap 0.2 Don't let sp minus kn get too small
Reply all
Reply to author
Forward
0 new messages