form writer experiment

Chris Blow

unread,

Nov 25, 2009, 10:00:19 PM11/25/09

to talkin...@googlegroups.com

I quickly made a simple form based on PFIF using markup and stylesheets:

http://www.flickr.com/photos/unthinkingly/4134367591/ ... based on PFIF schema: http://zesty.ca/pfif/

It's sort of OCR friendly because the form fields are grids. I would remove the grid and the print style for browser based data entry.

This is based on Hayesha's feedback, I just wanted to see if I could create a nice print form in markup and get it reliably into print, ideally PDF. (Not yet, it looks terrible in print so far!) The application is a tiny sinatra app, which you can run by typing `ruby talkingpapers/app.rb` -- docs are at http://www.sinatrarb.com/

My secondary goal was also to have a flexible DSL (HAML) for the form language, which I think is reasonable advantage over PDF coordinate drawing, for example:

http://github.com/unthinkingly/talkingpapers/blob/master/sinatra/views/index.haml

I just put a couple hours into it so far in order to test the idea, but we could run the entire app on this, sinatra's quite well suited for usb-stick web apps from what I can tell

Let me know what you think about that too!

c

Michal Migurski

unread,

Nov 26, 2009, 1:52:37 AM11/26/09

to talkin...@googlegroups.com

That's pretty cool. I like the spirit of the Sinatra library - Ruby's
definitely not been my cup of tea (residual bad aftertaste from Rails)
but the way it's run from a single command is quite nice. The idea
seems like it could be implemented equally well in Python, right? When
I try to run it, though, I get this:

> % ruby app.rb
> /System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/
> 1.8/rubygems/custom_require.rb:31:in `gem_original_require': no such
> file to load -- sinatra (LoadError)
> from /System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/
> ruby/1.8/rubygems/custom_require.rb:31:in `require'
> from app.rb:3

I assume I'm missing some external library, right?

Looking at the HAML file, I'm wondering how you'd express precise page
coordinates for finding form elements in a scan? If I'm reading it
correctly, it seems to be a kind of non-HTML HTML, or a template for
outputting HTML, is that right?

I just signed up to the list, by the way - mostly I've been exchanging
mails with Chris, John, and Robert. Quite a lot of them actually. My
code progress is here:
http://github.com/migurski/talkingpapers

...and my print / scan progress, showing a scan with extracted form
fields, is here:
http://things.teczno.com/talking-papers/2009-11-22/results.html

My own efforts in form writing have used the PHP library FPDF, which
is dumb but quite good at putting boxes onto paper. My tendency would
be to keep the interchange and domain formats (e.g., PFIF) separate
from the representation formats, so that the actual Talking Papers
bits only really understood uniquely-identifiable boxes on paper, and
let something else handle the underlying semantics.

-mike.

----------------------------------------------------------------
michal migurski- mi...@stamen.com
415.558.1610

Matt Berg

unread,

Nov 26, 2009, 2:05:30 AM11/26/09

to talkingpapers

This all looks really cool.

There is also a great python library called reportlab that we've been using quite a bit with RapidSMS. It has support for plugin graphics, etc.

While I realize it isn't critical, I'd vote strongly for trying to keep everything in one language. If there are some core things that don't change then mixing languages is not a huge deal. Keeping what the avg. coder who wants to design a new form to a single language would greatly simplify things.

http://www.reportlab.org/
http://www.reportlab.org/samples/
http://www.reportlab.org/samples/medical-forms-printing/

Thanks,

Matt

On Thu, Nov 26, 2009 at 6:52 AM, Michal Migurski <mi...@stamen.com> wrote:

That's pretty cool. I like the spirit of the Sinatra library - Ruby's definitely not been my cup of tea (residual bad aftertaste from Rails) but the way it's run from a single command is quite nice. The idea seems like it could be implemented equally well in Python, right? When I try to run it, though, I get this:

% ruby app.rb
/System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/1.8/rubygems/custom_require.rb:31:in `gem_original_require': no such file to load -- sinatra (LoadError)
from /System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/1.8/rubygems/custom_require.rb:31:in `require'
from app.rb:3

I assume I'm missing some external library, right?

Looking at the HAML file, I'm wondering how you'd express precise page coordinates for finding form elements in a scan? If I'm reading it correctly, it seems to be a kind of non-HTML HTML, or a template for outputting HTML, is that right?

I just signed up to the list, by the way - mostly I've been exchanging mails with Chris, John, and Robert. Quite a lot of them actually. My code progress is here:
http://github.com/migurski/talkingpapers

...and my print / scan progress, showing a scan with extracted form fields, is here:
http://things.teczno.com/talking-papers/2009-11-22/results.html

My own efforts in form writing have used the PHP library FPDF, which is dumb but quite good at putting boxes onto paper. My tendency would be to keep the interchange and domain formats (e.g., PFIF) separate from the representation formats, so that the actual Talking Papers bits only really understood uniquely-identifiable boxes on paper, and let something else handle the underlying semantics.

-mike.

On Nov 25, 2009, at 6:50 PM, Chris Blow wrote:

I quickly made a simple form based on PFIF using markup and stylesheets:

http://www.flickr.com/photos/unthinkingly/4134367591/ ... based on PFIF schema: http://zesty.ca/pfif/

It's sort of OCR friendly because the form fields are grids. I would remove the grid and the print style for browser based data entry.

This is based on Hayesha's feedback, I just wanted to see if I could create a nice print form in markup and get it reliably into print, ideally PDF. (Not yet, it looks terrible in print so far!) The application is a tiny sinatra app, which you can run by typing `ruby talkingpapers/app.rb` -- docs are at http://www.sinatrarb.com/

My secondary goal was also to have a flexible DSL (HAML) for the form language, which I think is reasonable advantage over PDF coordinate drawing, for example:

http://github.com/unthinkingly/talkingpapers/blob/master/sinatra/views/index.haml

I just put a couple hours into it so far in order to test the idea, but we could run the entire app on this, sinatra's quite well suited for usb-stick web apps from what I can tell

Let me know what you think about that too!

c

----------------------------------------------------------------
michal migurski- mi...@stamen.com
415.558.1610

--
ICT Coordinator
Millennium Villages Project
Columbia University, NYC
W: +1.212.854.7993
M: +1.646.463.1273
V: +1.312.239.0169
Skype: mlberg

Matt Berg

unread,

Nov 26, 2009, 2:11:38 AM11/26/09

to talkingpapers

BTW - Great work guys! This is amazing early progress.

I think we'll need how to get entries on a single row. This would really be good for mass registrations.

Ie)

BC = barcode

BC Last Name BC First Name BC Gender BC DOB etc

2009/11/26 Matt Berg <mlb...@gmail.com>

This is getting pretty awesome. We definitely should incorporate this into our stuff.

---------- Forwarded message ----------
From: Michal Migurski <mi...@stamen.com>
Date: Thu, Nov 26, 2009 at 6:52 AM
Subject: Re: form writer experiment
To: talkin...@googlegroups.com

That's pretty cool. I like the spirit of the Sinatra library - Ruby's definitely not been my cup of tea (residual bad aftertaste from Rails) but the way it's run from a single command is quite nice. The idea seems like it could be implemented equally well in Python, right? When I try to run it, though, I get this:

% ruby app.rb
/System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/1.8/rubygems/custom_require.rb:31:in `gem_original_require': no such file to load -- sinatra (LoadError)
from /System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/1.8/rubygems/custom_require.rb:31:in `require'
from app.rb:3

I assume I'm missing some external library, right?

Looking at the HAML file, I'm wondering how you'd express precise page coordinates for finding form elements in a scan? If I'm reading it correctly, it seems to be a kind of non-HTML HTML, or a template for outputting HTML, is that right?

I just signed up to the list, by the way - mostly I've been exchanging mails with Chris, John, and Robert. Quite a lot of them actually. My code progress is here:
http://github.com/migurski/talkingpapers

...and my print / scan progress, showing a scan with extracted form fields, is here:
http://things.teczno.com/talking-papers/2009-11-22/results.html

My own efforts in form writing have used the PHP library FPDF, which is dumb but quite good at putting boxes onto paper. My tendency would be to keep the interchange and domain formats (e.g., PFIF) separate from the representation formats, so that the actual Talking Papers bits only really understood uniquely-identifiable boxes on paper, and let something else handle the underlying semantics.

-mike.

On Nov 25, 2009, at 6:50 PM, Chris Blow wrote:

I quickly made a simple form based on PFIF using markup and stylesheets:

http://www.flickr.com/photos/unthinkingly/4134367591/ ... based on PFIF schema: http://zesty.ca/pfif/

It's sort of OCR friendly because the form fields are grids. I would remove the grid and the print style for browser based data entry.

This is based on Hayesha's feedback, I just wanted to see if I could create a nice print form in markup and get it reliably into print, ideally PDF. (Not yet, it looks terrible in print so far!) The application is a tiny sinatra app, which you can run by typing `ruby talkingpapers/app.rb` -- docs are at http://www.sinatrarb.com/

My secondary goal was also to have a flexible DSL (HAML) for the form language, which I think is reasonable advantage over PDF coordinate drawing, for example:

http://github.com/unthinkingly/talkingpapers/blob/master/sinatra/views/index.haml

I just put a couple hours into it so far in order to test the idea, but we could run the entire app on this, sinatra's quite well suited for usb-stick web apps from what I can tell

Let me know what you think about that too!

c

----------------------------------------------------------------
michal migurski- mi...@stamen.com
415.558.1610

--
ICT Coordinator
Millennium Villages Project
Columbia University, NYC
W: +1.212.854.7993
M: +1.646.463.1273
V: +1.312.239.0169
Skype: mlberg

Matt Berg

unread,

Nov 26, 2009, 2:13:40 AM11/26/09

to talkingpapers

Sorry another good example of what I mean.

http://www.reportlab.org/samples/advanced-tables/

Chris Blow

unread,

Nov 26, 2009, 2:40:29 AM11/26/09

to talkin...@googlegroups.com

I assume I'm missing some external library, right?

yes:

port install rubygems

gem install sinatra

gem install haml

or something similar to that. see http://www.sinatrarb.com/documentation.html PS: If you edit the stylesheet, you will also need to `cd ~/git/talkingpapers/sinatra/views/ && compass -w` to kick start the SASS renderer.

Sinatra is super fun! You might also like: http://monkrb.com/

My own efforts in form writing have used the PHP library FPDF, which is dumb but quite good at putting boxes onto paper.

I really, really like the simplicity of that, but keep in mind that we do need to build the web based form also, for data entry in the browser.

HAML

HAML renders into html, I also use SASS which compiles in to CSS. They are both really great abstractions of some creaky languages. I use these together with a CSS grid framework called Susy, which could do an extremely dense grid

Here's an example CSS grid using a different framework: http://www.flickr.com/photos/unthinkingly/4134844401/

layout

Really I suspect that we will never really have total control over the layout. So my hope is that we can do this without precise positioning, just bigger forms, and barcodes on each question. Does that not seem feasible?

This is why I only did one row per question in my mockup. Each row is terminated by the barcode. There is no barcode at the top because each barcode can contain the survey id as part of the question id. Does that make sense?

My tendency would be to keep the interchange and domain formats (e.g., PFIF) separate from the representation formats, so that the actual Talking Papers bits only really understood uniquely-identifiable boxes on paper, and let something else handle the underlying semantics.

I agree -- just using PFIF because I needed a schema to represent.

PFIF was used to enter 90,000 records after Katrina!

cheers

c

Michal Migurski

unread,

Nov 26, 2009, 1:24:43 PM11/26/09

to talkin...@googlegroups.com

Thanks Matt!

On Nov 25, 2009, at 11:05 PM, Matt Berg wrote:

> While I realize it isn't critical, I'd vote strongly for trying to
> keep everything in one language. If there are some core things that
> don't change then mixing languages is not a huge deal. Keeping what
> the avg. coder who wants to design a new form to a single language
> would greatly simplify things.

I mostly agree, though I think we could also benefit from some
internal separation.

The most language-inflexible part of the application is currently the
reader. It's written in Python, and shells out to two additional
binaries, the C-based SIFT and the Java-based QR decode. I make heavy
use of NumPy in there to keep the inner loops tight, and it *could* be
rewritten, but it'd be a lot of work.

I don't think it's necessary to pull the rest of the app along in
Python, though. I have been using PHP for the other bits and imagined
that we might simply shell out to Python when we had to read a form,
for example. Meanwhile, Chris is making a lot of excellent progress in
Ruby with a form generator.

I'll writ more about this in a response to his most recent mail.

ReportLab, by the way, looks pretty good. Aaron Cope used it for a lot
of map experimentation last year, and I think he was pretty happy with
it.

-mike.

Michal Migurski

unread,

Nov 26, 2009, 2:56:41 PM11/26/09

to talkin...@googlegroups.com

On Nov 25, 2009, at 11:40 PM, Chris Blow wrote:

>> I assume I'm missing some external library, right?
>

> yes:
> port install rubygems
> gem install sinatra
> gem install haml

It worked! Thanks.

>> My own efforts in form writing have used the PHP library FPDF,
>> which is dumb but quite good at putting boxes onto paper.
>

> I really, really like the simplicity of that, but keep in mind that
> we do need to build the web based form also, for data entry in the
> browser.

Yeah, agree....

(more below)

>> layout
>
> Really I suspect that we will never really have total control over
> the layout. So my hope is that we can do this without precise
> positioning, just bigger forms, and barcodes on each question. Does
> that not seem feasible?

It introduces a harder problem: finding bits of form and code in a
scan without quite knowing where to look for them.

How about this?

There's a regular, web-based part of the site (on a stick, or the web,
whatever) that publishes plain old HTML form fields. HTML5 would be
nice because it has expanded "type" attributes like dates and numbers,
but whatever. Those forms can be filled in and submitted in the
regular browser way, and the script behind them handles all the
semantic bits - PFIF, etc. All the domain specificity lives behind
this form.

Meanwhile, the paper part of Talking Papers consumes this form and
gets it onto paper using whatever method seems appropriate. I'm
pulling for the FPDF way, mostly because it's pretty much done and
pretty much works and I understand it the best, but we could equally
well do reportlab, HAML, or something else. This part of the
application doesn't have any idea about PFIF or RDF triples or
anything, it only knows about the URL of the form, the collection of
form fields that live inside it, and the printed position of each form
field on a piece of paper. When the paper forms are filled-out,
scanned, and Turked or HCR'd, the resulting values are simply POSTed
back to the original form, just like if a human had done it. Again, no
semantic awareness, just taking written characters from a piece of
paper and putting them into an HTTP POST request.

The main advantage here is that it keeps one kind of difficult problem
(schemas, databases, applications) completely separate from another
kind of difficult problem (print, layout, paper, scanning, HCR,
submission). It means that when some other disaster relief group wants
to use Talking Papers we can just tell them to publish a useful HTML
form and then plug it into the paper-based front end.

It's like with Walking Papers - there's no point in that application
where I address OSM's data itself in any meaningful way, I just make
sure that the right geographic areas are covered and that Potlatch
(the Flash-based editor) shows a useful background image. Some Germans
were able to write a plugin for the Java-based editor JOSM based on
the service with zero input from me and only the information in a page
like this to go on: http://walking-papers.org/scan.php?id=th3dwzgk . I
liked that.

>> My tendency would be to keep the interchange and domain formats
>> (e.g., PFIF) separate from the representation formats, so that the
>> actual Talking Papers bits only really understood uniquely-
>> identifiable boxes on paper, and let something else handle the
>> underlying semantics.
>

> I agree -- just using PFIF because I needed a schema to represent.
>
> PFIF was used to enter 90,000 records after Katrina!

PFIF sounds like a total winner for the missing persons use case.

-mike.

Chris Blow

unread,

Nov 30, 2009, 9:20:04 PM11/30/09

to talkin...@googlegroups.com

> It introduces a harder problem: finding bits of form and code in a
> scan without quite knowing where to look for them.

I see -- probably also we have a concern here with multiple pages.
Hadn't thought of that.

Does walkingpapers do multipage?

> How about this?
>
> There's a regular, web-based part of the site (on a stick, or the
> web, whatever) that publishes plain old HTML form fields. HTML5
> would be nice because it has expanded "type" attributes like dates
> and numbers, but whatever. Those forms can be filled in and
> submitted in the regular browser way, and the script behind them
> handles all the semantic bits - PFIF, etc. All the domain
> specificity lives behind this form.

sounds great -- totally believe in separation of concerns and this
seems like a natural cleaving point.

so, three modules?

1/ talkingpapers-reader (consumes scanned paper and yields all the
data in a series of POSTs)
2/ talkingpapers-writer (consumes web forms and produces print-ready
forms)
3/ talkingpapers-web (manages frontend UI for scanning and survey
creation)

... presumably you can make reader/writer mutually compatible and I
can will get the web-based form builder to kick out forms that meet
your spec.

> Meanwhile, the paper part of Talking Papers consumes this form and
> gets it onto paper using whatever method seems appropriate. I'm
> pulling for the FPDF way, mostly because it's pretty much done and
> pretty much works and I understand it the best, but we could equally
> well do reportlab, HAML, or something else.

I have been thinking about this a bit and I think i now see some of
the benefits of using a custom form drawing code -- most importantly,
tight control over the rendering of elements across all platforms.

My only concern: To me it seems cumbersome to render a well-known
markup language like HTML into a custom markup language.

But I am excited to see you do it! :) Presumably you will parse the
document and look for html elements, which then represent those
elements in the PDF?

c

Reply all

Reply to author

Forward