Hi, I'm using tesseract to recognize small fragments of text like this (actual images I'm using):



Numers are fixed lenght (7 digits) and letters are always 2 chars uppercase. I'm using a whitelist (a different one depeding if the fragment is text or digits, I know this in advance). And it works reasonable well. The size of these fragments is fixed, I rescale them to the same height (54 pixels, I could change it or add some borders). These are extracted from smartphone pictures so the original resolution varies a lot.
I'm using lang "eng+ita" because in this way I get better results. I'm also using user-patterns but they are not helping much. I'm using the api through
tesserocr python bindings.
I think there are may parameters I can fine tune but I tried a few (load_system_dawg, load_freq_dawg, textord_min_linesize) but none of these improved the results (a very small textord_min_linesize=0.2 made them worse, so they are being used). I've read the FAQ and the docs but there are really too many parameters to understand what to change and how.
In particular my current problem is adaptive learning: when I process a large batch of pictures the result varies depending on other fragments. Fragments that are perfectly readable and correctly classified when processed individually, give different, wrong, results when processed in a batch (I mean reusing the same api instance for multiple images).
I tried to disable it but
it looks like it cannot be disabled when using multiple languages(?).
If I use only "ita" (and no whitelist, no learning) the first image in this post is recognized as (text [confidence]):
('5748788\n\n', [81])
('5748788\n\n', [81])
('5748788\n\n', [81])
('5748788\n\n', [81])
With learning (multiple calls, no whitelist, lang: ita):
('5748788\n\n', [81])
('5748788\n\n', [81])
('5748788\n\n', [90])
('5748788\n\n', [90])
so it improves to a higher confidence (I do not know how much the confidence value matters in real life). It looks like learning is doing something good even with no whitelist (I could use the whitelist anyway, just to be sure, but the starting point looks better).
I'm wondering if I can do some kind of "warmup" with learning enabled and later turn it off (I'll try this today). But how many samples do I need? And it seems a little hacky.
Or maybe there is some way to print debug informations from the learning part to see what parameters are changed and set them manually later (I tried a few debug params but got no output).
Or maybe it is quite easy to manually find good parameters for this kind of regular text to get close to 90 confidence.
On the "AT" fragment I get 89 confidence and I think it may be quite low for this kind of simple clean text.
What I need are (good) consistent results in all situations for the same image. What do you think?
Thanks, bye
Lorenzo