How to Read Prepared/Generated Forms (known layout) and Obtain Check-Box Data

52 views
Skip to first unread message

kingIZZZY

unread,
Oct 7, 2014, 1:34:23 PM10/7/14
to ocr...@googlegroups.com
BH

Hello,

I am an experienced programmer, but absolute newbie to OCR / document analysis / all computer optical recognition.

Desired Effect (workflow I'm trying to program)
  • Dynamically generate a form intended to be printed and filled out IRL
  • Scan the completed form and obtain its data

Type of Data
  • Highest Priority: Check boxes, filled in by pen / pencil / marker etc., marked with check-mark, X, diagonal strike, etc.
  • Optional: Written Numbers, circled options,

Theoretical Coding Solution
  • When generating a form, store layout / coordinate information of form elements
  • Place recognizable anchors (rotated 'L' s or '+' symbols) at the corners of the printed page to define a general known rectangular area
  • Print a bar-code or numeric identifier at pre-defined coordinates in the rectangle area
  • Obtain data out of form elements using layout/format information & coordinates previously stored for this identified form

Bottom line: Is this possible? How to do this? What do I need to learn in order to get to a point where I know how to use OCRopus (or other libraries) to achieve these results?




Related Links (describe some technical aspects & bits of theoretical solutions, but no practical road-map of how to actualize this)

Rick Leir

unread,
Nov 4, 2014, 8:20:30 AM11/4/14
to ocr...@googlegroups.com
One problem is 'registration', where you shift the scanned image left/right and up/down until it matches the original.  I suspect OCRopus layout analysis could help with this. Does the scanned image need to be scaled?

If you do not care whether there are checkmarks or X's then this is no longer an OCR project.  You may do well to script something using perlmagick or gimp script.

You said 'written numbers'. That is a difficult OCR problem, unless you can capture stylus strokes as they do on a smartphone screen.
Reply all
Reply to author
Forward
0 new messages