Hi,
I started using Tesseract 4.0.0 (with LTSM) recently and it works amazingly well—much better than Affinity and Nuance that were bundled with my ScanSnap scanner.
However, with certain documents, there’s the issue that the digit “4” occasionally gets recognized as a “9”. Example below.
I’ve read
in the Wiki that you can fine-tune for certain characters.
Is this the way to go here? Or is there maybe an easier/better approach? At first glance, the fine-tuning approach seemed to be a bit complicated to me, so before I wrap my head around this, I’d appreciate if you could give my some guidance if this is the right thing to do here.
This is my command-line:
tesseract "$1" "${1%.*}" -l best/deu --tessdata-dir "$DIR/tessdata" --psm 11 --oem 1 txt pdf
The documents are 1200 DPI (monochrome).
Thanks,
Aaron