Hi all,
I'm trying to set up tesseract to scan German documents. So far everything works just fine, except tesseract won't recognize the character "§". This is slightly frustrating, since the documents in question are mostly legal stuff and the "§" is used a lot. It has the meaning of article or section and is not uncommon at all.
I tried to add it as a user-pattern oder user-word without success. I then scanned the files at github in tesseract-ocr/langdata/tree/master/deu and it seems the § is neither in the desired_characters file nor anywhere in the deu.wordlist.
Does that mean, that tesseract does not try to find a § in the documents at all? If so, is there a way to add the character to the language data without completely retraining tesseract? I'm not sure I could do a full training myself.
Thanks