Stefan Heise
unread,Jun 5, 2012, 2:36:10 PM6/5/12Sign in to reply to author
Sign in to forward
You do not have permission to delete messages in this group
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to ocr...@googlegroups.com
I'm trying to get started using OCRopus and find it very cumbersome. Who of you is actually productively using OCRopus and how did you learn it?
This is where I'm currently:
After a bit of research and a bugfix I managed to install OCropus 0.5 and actually do a test run that didn't return a fatal error. The result is unusable, though - so now I need to get into the details of training and configuration. The first point I want to improve is the binarization, which returned unusable results - way too light, there was basically nothing more to recognize in the binarized picture. "ocropus-preproc -h" tells me some parameters to tweak: Ground truth extension, zoom, character component size, halftone removal, deskewing, sigma and k value. The issue is: I don't really know what any of these parameters mean exactly, or how to sensibly use them. Sure, there is Google and Wikipedia, and I have actually watched all the YouTube videos available, but at the end of the day I was not able to find out concrete measures how to improve my binarization results. I tried using some estimated numbers for sigma and k, but that apparently had no effect whatsoever. What I - and apparently other newbie users around here - really need is a manual-like introduction to the whole system, like: "A ground truth is defined as abc, while a ground truth extension is xyz. ... Parameter x needs to be a value between y and z, lower x means ... higher x means..."
I feel like there must be an OCRopus bootcamp somewhere, maybe a lecture or a manual that I just completely missed in my search and that enabled all the other users to actually make productive use of OCRopus. I'm a computer scientist and somewhat experienced software developer, so I can take technical language and am a quick learner. I'd even be willing to pay someone to teach me (within reasonable boundaries) or would be willing to write such a manual in return. Can anyone help me by pointing me to the right resources, or is personal training for OCRopus usage (maybe remote) available?