After 3 days in Tesseract code (urgh), here is Tessnet2 version 2.03.2
The corrections deals with the following problems
- Confidence was not very useful, the value was strange. This has been
corrected, setting the variable tessedit_write_ratings=true. After
many test I found this mode is the best for confidence accuracy. Value
range from 0 (perfect) to 255 (reject) . When value goes over 160 this
really mean the OCR was bad.
- Calling DoOCR twice was not giving the same result. It was, as
expected, a problem with global variables. The problem is almost
fixed, sometime it doesn’t work but right now I can’t find what is not
- I expose Tesseract variables and expose a GetVariableList() method.
Interessting variables are tessedit_char_whitelist or
tessedit_char_blacklist to set before calling Tessnet2.Init().
- Misspelled Width for Word variable (thanks Lothar) has been
I didn’t implement character array with confidence info, simply
because all characters in a word have the same confidence value.
Internally tesseract build words and create characters from these