Hi all,
Fast forward a couple weeks and I remember that I hate programming GUIs and even more trying to coordinate backend processes.
Also I am not a Python programmer by trade (though I do love it for scripting), so I need some help.
First you need tesseract installed, which actually performs the HOCR.
If you run hocreditor.py it opens a small window to choose a PDF file and an output directory. If you hit the Run Hocr button it does work, it will add a new directory to your chosen output directory and throw PNG images for each page and an hocr output for each page.
My goal was/is to then start interacting with the PNG and HOCR to correct bad OCR and map words/areas to columns for the required CSV.
But I can't get the back-end processing to not hold the GUI hostage.
As I said I'm not a Python programmer, so I would appreciate any suggestions. Even if it is that I have started all wrong. All feedback accepted.
cheers,
jared