Tech piece

Matt Berg

unread,

Nov 19, 2009, 6:34:11 PM11/19/09

to Talking Papers

Robert,

This is awesome.

Can you point me towards some of the code/technology that sahana and
others are using for this.

We probably want to create a wiki/resource page with the different
code libraries that can tackle some of these different pieces.

One thing I'm wondering early is if we should go with OCR (certainly
would be nice) or output scan tron type forms where you fill in
boxes. I imagine we'll probably want to do both.

I'm super interested in this for childcount.

Every 90 days - most countries now do massive malnutrition screening
drives where they take the muac and weight of kids under five. I'm
imagining a scenario where via sms and manual data entry we create
living registeries of the kids. Then at each screening we just output
the list of each of the kids. Once they come for the screening they
fill in the muac and weight and oedema check then we scan everything
at the end. This would save us an SMS per child.

Matt

Chris Blow

unread,

Nov 19, 2009, 7:27:47 PM11/19/09

to Talking Papers

So excellent to hear your use case Matt!

I really would like to get a series of specific use cases lined up. I
can visualize them lightly, then we can use them to guide our design
and code.

And can't believe I hadn't remembered scantron sheets. Must be a
repressed memory!

Cheers to the new list.

c
chris blow
http://unthinkigly.com

Matt Berg

unread,

Nov 19, 2009, 7:29:43 PM11/19/09

to talkin...@googlegroups.com

Chris,

Thanks! I'll try and spec it out the use case bit better in the near future.

Really am curious to see if anyone has cobbled any of these parts together yet.

Matt

--
ICT Coordinator
Millennium Villages Project
Columbia University, NYC
W: +1.212.854.7993
M: +1.646.463.1273
V: +1.312.239.0169
Skype: mlberg

Matt Berg

unread,

Nov 19, 2009, 7:30:25 PM11/19/09

to talkin...@googlegroups.com

Chris - fyi you're url on the footer is broke.

Robert Kirkpatrick

unread,

Nov 19, 2009, 8:22:33 PM11/19/09

to Talking Papers

Matt,

> Can you point me towards some of the code/technology that sahana and
> others are using for this.

I've invited Chamindra to this group. Hopefully he'll point the way
for us.

> We probably want to create a wiki/resource page with the different
> code libraries that can tackle some of these different pieces.

Absolutely. Chris has already stood up a repository on github, which
you can get to via http://www.talking-papers.org. You'll see that
Mike Migurski already checked in a chunk of Python. I suspect you
won't object to our starting to prototype in Python. ;-)

> One thing I'm wondering early is if we should go with OCR (certainly
> would be nice) or output scan tron type forms where you fill in
> boxes. I imagine we'll probably want to do both.

I think we need a lot of flexibility. Note also that Mike is
apparently looking at Crowdflower for OCR, which is pretty cool. I
don't know how pricing works, though.

> I'm super interested in this for childcount.

That would be a great way to pilot this. Also, I got pinged today by
Edouard Legoupil, who does GIS for UNHCR in Hungary. He's very keen
to pilot this in HCR camps around the world. I've invited him into
this group. Hopefully he can tell us more about his requirements.

> Every 90 days - most countries now do massive malnutrition screening
> drives where they take the muac and weight of kids under five. I'm
> imagining a scenario where via sms and manual data entry we create
> living registeries of the kids. Then at each screening we just output
> the list of each of the kids. Once they come for the screening they
> fill in the muac and weight and oedema check then we scan everything
> at the end. This would save us an SMS per child.

Heck yes that is a GREAT use case, and identifying such cost savings
will make a big difference.

Interestingly, I've received several comments that Talking Papers
might be perceived as competing with heavily-funded handheld data
collection programs. That perception needs to be nipped in the bud.
Whenever and wherever hi-tech is viable, sustainable and appropriate,
it should be used. Talking Papers is about making better use of paper-
based processes that *can't* be upgraded yet.

> Matt

Matt Berg

unread,

Nov 19, 2009, 8:24:54 PM11/19/09

to talkin...@googlegroups.com

There is always a role of paper. It's the based portable way to view data.

Think there are tremendous gains to make paper more effective by marrying it with technology.

Matt Berg

unread,

Nov 19, 2009, 8:25:12 PM11/19/09

to talkin...@googlegroups.com

Don't see any code up there yet.

Robert Kirkpatrick

unread,

Nov 19, 2009, 8:28:55 PM11/19/09

to Talking Papers

Sorry -- try here:

http://github.com/migurski/talkingpapers

On Nov 19, 5:25 pm, Matt Berg <mlb...@gmail.com> wrote:
> Don't see any code up there yet.
>
>
>
> On Fri, Nov 20, 2009 at 1:24 AM, Matt Berg <mlb...@gmail.com> wrote:
> > There is always a role of paper. It's the based portable way to view data.
>
> > Think there are tremendous gains to make paper more effective by marrying
> > it with technology.
>
> > On Fri, Nov 20, 2009 at 1:22 AM, Robert Kirkpatrick <
> > rgkirkpatr...@gmail.com> wrote:
>
> >> Matt,
>
> >> > Can you point me towards some of the code/technology that sahana and
> >> > others are using for this.
>
> >> I've invited Chamindra to this group. Hopefully he'll point the way
> >> for us.
>
> >> > We probably want to create a wiki/resource page with the different
> >> > code libraries that can tackle some of these different pieces.
>
> >> Absolutely. Chris has already stood up a repository on github, which

> >> you can get to viahttp://www.talking-papers.org. You'll see that

Mikel

unread,

Nov 19, 2009, 10:33:04 PM11/19/09

to Talking Papers

Scantron! Totally forgotten. I know exactly why we've repressed the
memory, too much pain!

Talking Papers == Free and Open Scantron ;)

John Crowley

unread,

Nov 19, 2009, 11:08:12 PM11/19/09

to talkin...@googlegroups.com

I told Jennifer Leaning @ HHI about this new project today. She is thrilled and already has some use cases for it. Likewise, I am thrilled. I've just been overwhelmed with deliverables. More next week from me.

- John

Jason Adams

unread,

Nov 21, 2009, 1:01:19 AM11/21/09

to Talking Papers

I've never worked in the field, but wouldn't scantron be a real pain?
I could barely stand filling in the bubbles for my name on tests.. Or
are we talking multiple choice type questions?

Chris Blow

unread,

Nov 21, 2009, 1:24:00 AM11/21/09

to talkin...@googlegroups.com

I think multiple choice

( ) water
( ) food
( ) electricity

and likert scales

( ) not at all
( ) a small amount
( ) a large amount

would be a good first pass. .... but what do i know -- can we get a
bunch of examples of real-world forms used in these contexts?

Robert can you post the entire Sahana survey that you used to make the
mockup?

cheers
c

Matt Berg

unread,

Nov 21, 2009, 3:06:20 PM11/21/09

to talkingpapers

I think there is some value.

Say you had a grid of 0-9. So I'd fill in three cells for 1 5 0 for example.

I'll do a mockup of what I need as a reference.

Matt

2009/11/21 Chris Blow <cgb...@gmail.com>

Robert Kirkpatrick

unread,

Nov 21, 2009, 9:55:21 PM11/21/09

to talkin...@googlegroups.com

Chris,

Here is a description of what Sahana is doing. Let me see if I can get Chamindra in on this conversation to share more with us.

http://www.sahana.lk/wiki/doku.php/dev:sahana_xform

Chris Blow

unread,

Nov 21, 2009, 10:58:53 PM11/21/09

to talkin...@googlegroups.com

Thanks Robert, this is great to see. Really excited to learn more about how they have used this in the field. I have been thinking more about how to do this based on Mike's great work so far.

Here's is how I understand the basic arc:

- build a data model that represents a form (with a web app, the "form writer")

- use the customize model to generate form markup (in xhtml)

- use stylesheets to create a print-ready page

- forms are completed in the field.

- forms are scanned (by the "form reader")

- data is inferred via OCR

- model is inferred via QRcode

- differing models are merged intelligently

- the data and models are corrected on a web interface (the "form fixer")

- the merged, corrected data is presented and visualized (?)

thoughts?

c

Robert Kirkpatrick

unread,

Nov 22, 2009, 5:56:14 AM11/22/09

to Talking Papers

Chris,

In terms of the form creation side from a user perspective, I think in
many cases we want users to have a simple experience where they are
creating fields by dragging controls onto a blank page and configuring
them. That is, they are creating the schema and the UI as a single
process. The resulting form might have a "Display in Web View" mode,
in which multiple-choice fields are displayed as droplists, and a
'Display for Printing" mode in which the same field is displayed
differently -- God forbid, even as a set of Scantron-like bubbles.
Perhaps as Edouard suggests, we could eventually have a Wizard-based
interaction for beginners, as well as an expert mode that lets you
define the schema first and then create multiple layouts, localized
versions, etc. Under the hood, meanwhile, yes, the process is as you
describe: define the schema, generate the markup, use a stylesheet to
create a PDF.

On the form processing side, yes, the reason I favor something like
RDF/Turtle is that I am keen to support flexible schema evolution,
where fields are getting add and dropped over time and additions just
get merged in with each subsequent dataset.

Hayesha Somarathne

unread,

Nov 22, 2009, 11:51:21 PM11/22/09

to talkin...@googlegroups.com

Hi all,

I'm Hayesha Somarathne and I have been involve with the development of the Sahana XForms. I would like to be part of this community to share my experience on the work carried out during the past couple of years.

I strongly believe that this forum is the best place to get more feedback on the work I have done and to identify the areas which I can work on to extend its functionality to make it more user-friendly and easily integratabtle to any system (especially for web applications ). At the moment the form generation works fine within the Sahana application but unfortunately I'm not in position provide you a demo version where you can test how it works, but I'll try to make it available ASAP.

On Sun, Nov 22, 2009 at 8:25 AM, Robert Kirkpatrick <rgkirk...@gmail.com> wrote:

Chris,

Here is a description of what Sahana is doing. Let me see if I can get Chamindra in on this conversation to share more with us.

http://www.sahana.lk/wiki/doku.php/dev:sahana_xform

Since we have only managed to finish part of this (only the form generation). I mean our overall plan was to come up with a mechanism to generate the form automatically out of the selected web form with a single mouse click and get it printed to be used for data collection. Once data is collected the scanned form will passed through a HCR (Handwritten Character Recognition) module to pass the collected data to a XML file. Next step is to use this XML file and update the system automatically through a browser based plugin.

At the moment we have managed to come up with working model for the automatic form generation but it need further refinements on how exactly the specific layouts on that form should be organized, such as

Small text field areas
Larger test field areas
Date capturing fields (DD/MM/YYYY or MM/DD/YYYY or YYYY/MM/DD etc..)
and other specific fields that requires variable length spaces.

We also have developed a module(HCR module) to extract the characters on the scanned form but this needs lots of improvements.

The browser based plugin - still this is on the conceptual level but it was the best way we managed to I identified. We can get more feedback on this from Chamindra who gave me this idea and encourage me to come up with this work.

Thank you.

--
හයේෂ සෝමරත්න (Hayesha Somarathne)
http://sahana.lk
http://thoughtsandideas.wordpress.com

Chris Blow

unread,

Nov 23, 2009, 12:20:05 AM11/23/09

to talkin...@googlegroups.com

Hayesh,

a walk-though of the Sahana experience would be invaluable. Please let us know how we can help.

best,

c

Hayesha Somarathne

unread,

Dec 7, 2009, 12:45:53 AM12/7/09

to talkin...@googlegroups.com

Hi All,

.At the moment the form generation works fine within the Sahana application but unfortunately I'm not in position provide you a demo version where you can test how it works, but I'll try to make it available ASAP.

I have set up a demo which you can have a look at it over here. Please note that have used the development version of Sahana instance(apologies for that), so sometimes you might experience difficulty when accessing some pages and some error messages might appear. Since this functionality is still in its earlier stage we haven't integrated to our stable releases yet.

Please follow the following steps see how xform works

Goto DISASTER VICTIM REGISTRY >> Add Disaster Victim >> Add New Individuals
Next got the footer section, you'll see a link called XForm on the footer section
Click the XForm link, then the page gets formated to the printable form.
As usual goto File > Print to print the page
Just click refresh (F5) to reload the original page

Apologies for not replying frequently since I got myself busy in the office. I appreciate you patient on this.

Thank you in advance.

Regards.

Chris Blow

unread,

Dec 7, 2009, 12:58:00 AM12/7/09

to talkin...@googlegroups.com

Fantastic, thanks Hayesha!

I will check it out.

c

Chris Blow

unread,

Dec 9, 2009, 10:22:59 PM12/9/09

to talkin...@googlegroups.com

Hayesha and Praneeth

This is great -- is this using sahanapy? I would love to understand more about how the xforms transition happens -- it looks quite good. Have you had success with those registration marks at the top? Perhaps we should make the talkingpapers-web module part of sahanapy?

I would appreciate any advice from the sahana team on how this project generally fits into your efforts, and how you would advise our development path.

Based on Hayesha's survey of existing OCR approachs, the handwriting recognition part sounds like it is going to be harder than I though. What else can we experiment with besides CellWriter? Is there an acknowledged benchmark? Presumably writing our own handwriting recognition would be a waste of time? Notably, if we are going to need to do handwriting training then we also have a significant burden during deployment ...

oh and also: Does anyone have any thoughts on my last email, particularly the breakdown of three modules (reader/writer/web)?

chris

On Dec 6, 2009, at 9:45 PM, Hayesha Somarathne wrote:

Chris Blow

unread,

Dec 9, 2009, 10:36:19 PM12/9/09

to talkin...@googlegroups.com

also, praneeth can I get the source code for the relevant work? I do not see a link in your repo.

Thanks!

c

Chris

Hayesha Somarathne

unread,

Dec 10, 2009, 12:21:24 AM12/10/09

to talkin...@googlegroups.com

Hi Chris and all,

On Thu, Dec 10, 2009 at 8:52 AM, Chris Blow <cgb...@gmail.com> wrote:

Hayesha and Praneeth

This is great -- is this using sahanapy? I would love to understand more about how the xforms transition happens -- it looks quite good. Have you had success with those registration marks at the top? Perhaps we should make the talkingpapers-web module part of sahanapy?

Sahana was initially developed using PHP and later part of our community adapted Python, so now we have two versions of Sahana. But initial version(PHP) is more feature rich that Python (SahanaPy). I started the development of XForm version with initial Sahana version but I haven't tested it with SahanaPy. (I'm not sure its ok to separate both as PHP or Python versions but both tries to cater for a common need by utilizing its own inherent technological features ).

Since the current functionality(XForm) only depends on the XHTML markup of the page we can very easily adapt it to be used with SahanaPy as well but still I haven't tested it. I'll elaborate the XHTML markup required on the page for the form generation in a coming post(Hopefully by this weekend). Please bare the inconvenience.

The OCR module we developed use the registration marking on the form was used to identify the correct layout of the form.

I would appreciate any advice from the sahana team on how this project generally fits into your efforts, and how you would advise our development path.

Based on Hayesha's survey of existing OCR approachs, the handwriting recognition part sounds like it is going to be harder than I though. What else can we experiment with besides CellWriter? Is there an acknowledged benchmark? Presumably writing our own handwriting recognition would be a waste of time? Notably, if we are going to need to do handwriting training then we also have a significant burden during deployment ...

Yes, based on the experience I had so far with OCR, it's best to adapt an existing solution for the handwritten character recognition because it saves lots of our time and effort spent for developing our own module and its hard to achieve the expected results. As Parneeth mentioned we can adapt a tool like CellWriter, Tesseract on this.

oh and also: Does anyone have any thoughts on my last email, particularly the breakdown of three modules (reader/writer/web)?

Here's is how I understand the basic arc:

- build a data model that represents a form (with a web app, the "form writer")

- use the customize model to generate form markup (in xhtml)

- use stylesheets to create a print-ready page

- forms are completed in the field.

- forms are scanned (by the "form reader")

- data is inferred via OCR

- model is inferred via QRcode

- differing models are merged intelligently

- the data and models are corrected on a web interface (the "form fixer")

- the merged, corrected data is presented and visualized (?)

This sounds good and it provides a good foundation on how we can modularize each functionality.

Regards

Praneeth Bodduluri

unread,

Dec 10, 2009, 12:27:13 AM12/10/09

to talkin...@googlegroups.com

Hayesha's work is based on printcss with Sahana ( the PHP version ). My work is for SahanaPy written in python. I have used ReportLab for direct generation of the forms from the database schema. The code does not have a UI front end to generate as such - since it was put together last week.

You can find my code at : https://code.launchpad.net/~lifeeth/sahana/sahanapy-trunk

The relevant code can be found at : http://bazaar.launchpad.net/~lifeeth/sahana/sahanapy-trunk/annotate/head%3A/controllers/ocr.py

To run the code you can download : http://trunkdemo.sahanapy.org:8000/init/static/web2py.zip
and execute : python web2py.py (you will need reportlab installed with your python for form generation)

I have setup a demo at : http://trunkdemo.sahanapy.org:8000

To login to the demo you can use the username : te...@example.com and password: letmein

You can generate the forms on the demo by going to :
http://trunkdemo.sahanapy.org:8000/init/ocr/create/or_organisation

or_organisation can be replaced with any of the other tables in the database.

--
Praneeth

Hayesha Somarathne

unread,

Dec 13, 2009, 4:46:25 AM12/13/09

to talkin...@googlegroups.com

Hi all,

This is great -- is this using sahanapy? I would love to understand more about how the xforms transition happens -- it looks quite good. Have you had success with those registration marks at the top? Perhaps we should make the talkingpapers-web module part of sahanapy?

I would appreciate any advice from the sahana team on how this project generally fits into your efforts, and how you would advise our development path.

Based on Hayesha's survey of existing OCR approachs, the handwriting recognition part sounds like it is going to be harder than I though. What else can we experiment with besides CellWriter? Is there an acknowledged benchmark? Presumably writing our own handwriting recognition would be a waste of time? Notably, if we are going to need to do handwriting training then we also have a significant burden during deployment ...

oh and also: Does anyone have any thoughts on my last email, particularly the breakdown of three modules (reader/writer/web)?

In the XForm module our (Chamindra and me) plan was to come up with a functionality:

Easily integrate and to be able to select custom layouts for labels (eg: date formats, lengthy descriptions, choice boxes, etc... )
Reduce the dependency on the available technologies (especially the server side technologies) and system specific functionality
Extend it to be use with any web application in the future just by linking it as a library and invoking the functionality with a single click.

In meeting these goals I selected to use JavaScript and CSS for the development. Please navigate here to find more information about how I have structured the phases of development.

Regards

Robert Kirkpatrick

unread,

Dec 15, 2009, 7:28:39 PM12/15/09

to Talking Papers

Chris et al.,

The approach you have suggested seems to me to be the right one. The
key aspect for me is the bit about "differing models are merged
intelligently". One option we might consider is to use Mesh4X format
for schema encoding in the QRCode. Mesh4X (http://www.mesh4x.org) as
you may recall is FeedSync format that encodes schema in RDF. When
two Mesh4X-based data sets are subjected to a merge event, changes to
both data and schema are merged. Where conflicts in either are
detected, a number of resolution options are possible.

Interestingly, while I was in Amman, I had several discussions
regarding the data scrubber application, and feedback was
overwhelmingly positive. Several conference participants asked to be
granted early access to it once we have a beta version.

Would you be able to find the time to put together a storyboard on the
workflow below?

Robert

====================

Reply all

Reply to author

Forward