I'd like to be able to pull the confidences from the character recognition engine - ideally at the character level but the word level would be sufficient. Looking at the hOCR specification (
https://docs.google.com/document/d/1QQnIQtvdAC_8n92-LhwPcjtAUFwBlzE8EWnKAxlgVf0/preview) it seems like this information should be embedded in the hOCR output. However, the most recent version of ocropus (0.7) doesn't appear to contain this information in the hOCR output. The only data I see in the hOCR output are the bbox coordinates. Are there any command line switches I can include to provide additional information in the hOCR output, specifically the confidence measures? Or can this only be achieved programmatically? If so, has anyone had any luck patching the code to provide this additional feature?
-Elliot