Safe, Small, and Interesting Tasks

36 views
Skip to first unread message

Luther Tychonievich

unread,
Sep 29, 2014, 9:46:52 AM9/29/14
to root...@googlegroups.com
All the conversation on tsc-public (http://fhiso.org/mailman/listinfo/tsc-public_fhiso.org) has been fascinating to follow, but I've also recently been thinking about an issue that is not information-standard related.

FamilySearch Indexing has been very successful (about 5 records indexed per second so far this year) without the personal identity and story motivators that I often hear mentioned by genealogists and family historians.  In thinking about its success, I have considered the following characteristics:
  1. It is safe.  Multiple people index each record; you can check a batch in undone if it confuses you; etc.  There is no user-perceived risk.

  2. Each task is small.  You can do meaningful work in minutes.

  3. It is interesting.  There is some puzzling needed to do it, it isn't rote.

  4. It isn't frustrating.  Handwriting may be ugly and hard to read, but there aren't any brick walls.

  5. It isn't personal.  These aren't *your* people, so if a particular batch is too hard, you can return it un-indexed and grab another batch instead without hurting yourself.

  6. Clear expectations.  The tool tells you what it wants to know.
Am I missing anything important to its success?


I propose the acronym SSIT for these kinds of Safe, Small, and Interesting Task presentations of genealogical work.  "Gamification" is sometimes used to suggest SSITs, but that term is also used to suggest a lot of other ideas like achievement badges and flashy interfaces and so on so I think we need a term for the game-like tasks.  If there is another term already in use, I'd be interested to learn it.

What other parts of family history could be presented as a set of SSITs?  Could we
  • Get the rest of the information in each source from the indexers? I was indexing some obituaries the other day and most of them had information that I couldn't record within Indexing, both about the deceased (occupations, military engagements, etc.) and about other people.  How do we enable extracting this and still keep the expectations clear?

  • Get useful judgments on pair-wise person matching?  "Consider these two people. Do you think they are the same individual?"   How could we make the basis for judgment clear enough that a novice could do it well?

  • Get useful judgments on three-way person matching?  "Suppose that person A and person B are distinct individuals.  Is person C more likely to be a match for A, a match for B, or a third individual?"  I suspect that three-way matching would be less error-prone than pair-wise matching because "more similar" is an easier and less ambiguous judgment than "similar enough."

  • Refine citations?  "Someone claimed this newspaper article was evidence for 'Mark Twain' being a pseudonym, not a name.  Please identify where that claim is made in the article."  (Do we even want this kind of data?)

  • Do some kind of quality review?
What else could we break down into SSITs?


One more question for this post: do we know what to do with the output of new kinds of SSITs?  Indexing currently uses its output to populate a search engine, which is safe because it doesn't mess up anyone's tree.  Do we similarly separate all SSIT output and present it as hints to the big-picture researchers?  Do we let decisions made in SSITs change a shared tree directly?  Do we aggregate the SSIT results into a separate myopic-view-generated conclusion? 

—Luther

Tony Proctor

unread,
Sep 29, 2014, 10:05:36 AM9/29/14
to root...@googlegroups.com
I'm not sure that any of these are simple enough for an SSIT Luther. For instance, associating people -- if it's going to be useful -- could take ages to assess; doing it solely based on name/age/etc can only be a hint and may lead the lazy astray.
 
What about adding annotation to bulk OCR? I'm thinking here about, say, newspaper digitisation. It's usually treated as amorphous text but if some mark-up was added to indicate references to people, places, dates, etc., then it would greatly increase its usefulness, and make up for the poor search capabilities that we're generally offered. Given the right tools then this could be made into a simple task.
 
What I'm suggesting here is adding what STEMMA terms "shallow semantics" since it would not draw a conclusion about who or what is referenced, or an actual date value; simply that the selected text is such a reference.
 
    Tony
--

---
You received this message because you are subscribed to the Google Groups "rootsdev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rootsdev+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Jeremy Foote

unread,
Sep 29, 2014, 12:38:39 PM9/29/14
to root...@googlegroups.com
Luther,

You may be interested in the literature around "crowdsourcing" (e.g., http://con.sagepub.com/content/14/1/75.short) and particularly the work of Michael Bernstein (http://hci.stanford.edu/msb/).

I think that you are exactly right that these sorts of projects are successful based very much on the novelty/interestingness/enjoyment of the activity, as well as the size of the smallest useful contribution (the smaller the better).

In addition, it's important to have opportunities for growth and change, and ideally to form a community with other advanced users (see http://dl.acm.org/citation.cfm?id=1099205). Having a project whose ideals and goals one supports is also important for long-term participation - http://dl.acm.org/citation.cfm?id=1297798

I think that crowdsourcing principles are already guiding a few different efforts. Transcription is one (with FamilySearch Indexing, Ancestry.com, and BillionGraves/FindAGrave as good examples), but I thnk FindARecord's research opportunities is another great one - https://www.findarecord.com/research/familysearch/opportunities. Their tool identifies the sorts of simple, useful, and interesting tasks that are involved in building a joint family tree.

I think Tony's suggestion is a great one. I think that these systems are most interesting, and generally most useful, when paired with output from a program - in Tony's example, the annotations contributed could be used to train a better entity extractor.

One other task that comes to mind would be a tool that identified "orphaned" individuals. In FS Tree, for example, you see entries with only a first name and birth year, for example. Many of these could be deleted/merged, since in reality they don't refer to any one individual.

I'm happy to provide further scientific literature or examples of successful crowdsourcing projects, but this email is long enough already.

Best,
Jeremy

Dallan Quass

unread,
Sep 29, 2014, 1:33:59 PM9/29/14
to root...@googlegroups.com
SSIT's are a great way of thinking about the problem. +1 for FindARecord's research opportunities, though I believe these are interesting only for your own tree.

Ben Brumfield

unread,
Sep 29, 2014, 2:10:50 PM9/29/14
to root...@googlegroups.com
I like the SSIT acronym, and haven't seen a similar formulation in the crowdsourcing literature I'm familiar with.

Generally the folks I talk to in crowdsourcing cultural heritage or natural science talk about the granularity of the task, or the granularity of the data.  Fine-grained data (e.g. census entries) lends itself well to both to gamification and algorithmic quality control (by comparing double- or triple-keyed versions of the same datum).

However, the desire to leverage the benefit of SSITs can bias the materials a project chooses and its user engagement strategies.  Once you get to a paragraph of text-or even a smaller unit like a sentence or address--the variations between different transcriptions of the same text become so large that it becomes very difficult to perform algorithmic comparisons.  This tends to drive attention away from longer textual materials, or sometimes encourage SSIT approaches for inappropriate materials.  (I remember the early FS Indexing work on Freedman's Bureau letters, which asked volunteers to tag the three most common proper names within the text, only.)

The intrinsic motivations of a user following the narrative as they transcribe can be shattered if a project forces them out of the flow of a text in order to keep them moving to the next unit of work.  To see that--plus a bit of a user rebellion against statistical comparison for quality control, take a look at the "Why is there no back button?" thread on the Operation War Diary forums: http://talk.operationwardiary.org/#/boards/BWD000000g/discussions/DWD000003l

Obviously I'm somewhat biased--my own FromThePage tool supports the editing approach over the indexing approach, so it's headed farther and farther away from SSITs.  We are using them as fundamental data structures in FreeREG2, although I'd like to think we remain cognizant of the volunteer's desire to 'own' a whole parish register, and their wish to review and revise their work.

Ben


Luther Tychonievich

unread,
Oct 1, 2014, 2:36:40 PM10/1/14
to root...@googlegroups.com
Thanks for the feedback, all!


On Mon, Sep 29, 2014 at 2:10 PM, Ben Brumfield <benw...@gmail.com> wrote:
Once you get to a paragraph of text-or even a smaller unit like a sentence or address--the variations between different transcriptions of the same text become so large that it becomes very difficult to perform algorithmic comparisons.

I am surprised by this.  I naïvely assumed that a word-diff of such transcripts would show just a few points of disagreement that could be handled with a simple case-by-case majority voting.  I suppose I should have assumed it was more difficult…
 
[…] or sometimes encourage SSIT approaches for inappropriate materials.

This is a problem, certainly.  Even when Indexing US Population Censuses I noticed the tool skipped several columns of data: too much focus on small, perhaps?

Are there numbers out there for how many people you lose in the crowd-sourced tasks if the tasks get more individually involved?

[…] take a look at the "Why is there no back button?" thread […]

Fascinating read.  It is easy when thinking about technology to forget how complex humans are.  Thanks for the link.
 


On Mon, Sep 29, 2014 at 10:38 AM, Jeremy Foote <jdfo...@gmail.com> wrote:

Luther,

You may be interested in the literature around "crowdsourcing" (e.g., http://con.sagepub.com/content/14/1/75.short) and particularly the work of Michael Bernstein (http://hci.stanford.edu/msb/).

I've skimmed it before, but it hasn't been one of my main interests.  It was a pretty young field when I last looked at it.  Are there conclusions in the field yet (that is, principles that have been confirmed in several studies by several independent researchers)?
  
In addition, it's important to have opportunities for growth and change, and ideally to form a community with other advanced users (see http://dl.acm.org/citation.cfm?id=1099205). Having a project whose ideals and goals one supports is also important for long-term participation - http://dl.acm.org/citation.cfm?id=1297798

I agree, these are nice, yet I don't see them much in FS Indexing; besides sometimes being asked to complete or rectify other Indexers work, I am at least not aware growth and change. Do you think FS Indexing be more effective if we had more of this?  Are we going to see it drop off as all the current uses grow bored?
 
I'm happy to provide further scientific literature or examples of successful crowdsourcing projects, but this email is long enough already.

A link to a good survey paper would be nice.  

I'd also be interested in examples of *unsuccessful* crowdsourcing. What makes these efforts fail?
 
On Mon, Sep 29, 2014 at 9:05 AM, Tony Proctor <to...@proctor.net> wrote:
I'm not sure that any of these are simple enough for an SSIT Luther. For instance, associating people -- if it's going to be useful -- could take ages to assess; doing it solely based on name/age/etc can only be a hint and may lead the lazy astray.

There certainly are some associations that are difficult and require careful thought from a large number of angles, but there seem to be a lot that are pretty straight forward too (or maybe it just seems that way to me because of the uncommon surnames in my family…).  

I wonder if we could present enough three-way comparisons to enough people to get good relative similarity measures from which we could present higher-quality suggestions to researchers?
 

Ben Brumfield

unread,
Oct 1, 2014, 4:48:01 PM10/1/14
to root...@googlegroups.com
On Wed, Oct 1, 2014 at 1:36 PM, Luther Tychonievich <tychon...@gmail.com> wrote:
On Mon, Sep 29, 2014 at 2:10 PM, Ben Brumfield <benw...@gmail.com> wrote:
Once you get to a paragraph of text-or even a smaller unit like a sentence or address--the variations between different transcriptions of the same text become so large that it becomes very difficult to perform algorithmic comparisons.

I am surprised by this.  I naïvely assumed that a word-diff of such transcripts would show just a few points of disagreement that could be handled with a simple case-by-case majority voting.  I suppose I should have assumed it was more difficult…
There are really two challenges to matching contributions: determining which data to compare, and determining how to compare them.

You can see an example I prepared for a summer workshop on crowdsourcing at http://www.miaridge.com/hilt-summer-school-crowdsourcing-cultural-heritage/ -- download the Thursday slide zip file, and look at slides 6-10 of "HILT Crowdsourcing 13 Thursday 1045.pdf"
 
If you'd like to play around with reconciling real data, there's a set produced for a citizen science hackathon I participated in last year here: https://github.com/idigbio-citsci-hackathon/ReconciliationUI/tree/master/db/notes_from_nature_exports  The primary variation seems to come from how much normalization a user does to the data they see.

Regardless, determining which variations are meaningful and which are accidental, and then choosing which of loosely-matching (thus "correct") contributions to use for the reconciled result -- that's pretty hard.


Are there numbers out there for how many people you lose in the crowd-sourced tasks if the tasks get more individually involved?

I'm not aware of any numbers on drop-off rates.  I do know that every crowdsourcing project I'm aware of (and I've spoken to the people behind dozens) sees a power law distribution in which the majority of the work is performed by a small group of super-volunteers.  These volunteers are highly motivated and care passionately about quality, frequently creating and advocating for community standards if they're working in a vacuum.

Here's a good example, by a New Zealander who's one of the top contributors to the Smithsonian Institution's Transcription Center: http://siobhanleachman.tumblr.com/post/96042084076/transcribing-tips-from-a-fellow-volunpeer

I suspect that such power users--who are more concerned about accuracy of their favorite leisure activity--are not dissuaded by the complexity of the activity.

That said, task complexity may reduce the number of people participating in a crowdsourcing project.  For organizations that are more focused on outreach or public education than they are on productivity or quality, that could be a big problem.  How does a project value 100 pages transcribed by 1 volunteer vs. 10 volunteers transcribing only one page apiece?  (cf the anecdote at the very end of my ALA talk this summer: http://manuscripttranscription.blogspot.com/2014/07/collaborative-digitization-at-ala-2014.html )  My gut feel is that FS Indexing has a substantial element of outreach and participation to it, so may not be typical of other volunteer crowdsourcing projects.

Regarding your comments on Justin's response, you and Justin both might be interested in _Crowdsourcing our Cultural Heritage_, collection edited by Mia Ridge which was published this week: http://www.ashgate.com/isbn/9781472410221

I reviewed the book (and plan to write chapter-by-chapter summaries/reviews for my blog) and can say that it is very good indeed.  I was most surprised by the findings of Waisda? that crowd-contributed video tags were qualitatively different from tags used by professional archivists, and proved to be empirically more useful to researchers.

The introduction: http://www.ashgate.com/pdf/SamplePages/Crowdsourcing-our-Cultural-Heritage-Intro.pdf

Ben

Jeremy Foote

unread,
Oct 6, 2014, 4:40:16 PM10/6/14
to root...@googlegroups.com
Ben has given you some great resources. Here are just a couple more.

Mako Hill studied a little bit about some projects similar to Wikipedia, that didn't make it.
http://mako.cc/academic/hill-almost_wikipedia-DRAFT.pdf

Most of the literature acknowledges that most projects are not successful (in the sense that they don't become large projects - e.g., the median number of participants in a github project is 1), but then move on to discuss the successful projects. This paper by Mako is an exception.

From a broad, theoretical, economic perspective about peer production, both Eric von Hippel and Yochai Benkler have written a lot about these ideas (Benkler's Wealth of Networks is a great, but overly long extension of this paper - http://www.benkler.org/CoasesPenguin.html).

Best,
Jeremy

--
Reply all
Reply to author
Forward
0 new messages