Islandora for crowd-sourced TEI transcription of historical documents?

231 views
Skip to first unread message

Conal Tuohy

unread,
Nov 23, 2010, 1:28:08 AM11/23/10
to islandora
I am evaluating software for a crowd-sourced transcription system for
historical records.

I've heard that it is possible to easily set up a online transcription
environment in which people can transcribe the text of images and
(crucially) TEI-encode the transcript, but I haven't seen any
documentation of the feature beyond a simple mention that it's
possible, and I haven't seen any website which actually seems to
provide this feature.

Does anyone have a pointer to Islandora websites which do this? Can
anyone provide me with more information about the functionality of
Islandora in this area?

Regards

Conal Tuohy

ppound

unread,
Nov 23, 2010, 8:58:01 AM11/23/10
to islandora, dmo...@upei.ca
Currently we are using TEI on the Islandlives site. If you go to this
url http://www.islandlives.ca/fedora/ilives_book_viewer/ilives:85365
you will be viewing the first page of a book. If you click the T
button near the top you will get a text view beside the image view.
This text is TEI encoded. So if you then click -> a few times to get
to a page that has some meaningful text you can see the TEI. So thats
one view of the TEI. If you have an account and are logged in you
would have access to another viewer which shows the Image on the left
and allows you to modify the TEI on the right.

I have cc'd Donald Moses at UPEI as he maybe able to demo the TEI
editor section or possibly give you a temporary account.

Thanks,
Paul

Don

unread,
Nov 23, 2010, 2:13:21 PM11/23/10
to islandora
Hi Conal:

Let me know if you'd like to get a demo of the service. I can skype
with you and share my screen if that helps?

We generate our TEI (primarily structural) in an automated way by
transforming our encoded OCR output (ABBYY XML) into TEI. We do
automated name entity recognition using GATE and though that is
somewhat imperfect ... and it's one of the reasons why we developed a
basic TEI editor as part of the tool chain. As a final step we take
the book TEI and split it into individual TEI documents so each page
object has its associated TEI datastream.

The editor is a drupal module that works with the Islandora module and
Fedora.

Here's a book in the generic page viewer with the page image (a
JP2000) on the left and the TEI datastream transformed into basic HTML
on the right.
https://docs.google.com/leaf?id=0B9ExopvhuLhLOGYzMjBjYmMtMDcyNS00NTBlLTk3YTktNmZhNjZkM2MyMTE3&sort=name&layout=list&num=50

Here's the same book as viewed in a basic WYSIWYG interface which
provides indicated encoded elements with different highlight colors -
eg. placeName, persName, orgName, date, etc.
https://docs.google.com/leaf?id=0B9ExopvhuLhLMDFlYTU0YmEtMjhmZC00ODc3LThkNjgtYTQwZDBkMDdkMDc4&sort=name&layout=list&num=50

By highlighting text or an existing encoded element you can add/remove/
edit the element.
https://docs.google.com/leaf?id=0B9ExopvhuLhLZmQ1MTU0OWItZDdhYi00OTYyLWI5OWQtMDRmYzUzZTFiMDgw&sort=name&layout=list&num=50

The user can also display and edit the raw TEI and edit it in place as
well.
https://docs.google.com/leaf?id=0B9ExopvhuLhLN2Y0YTczMmUtNjBjMy00ZWJkLWE1ZTYtMTc0NGJkN2E0NDk4&sort=name&layout=list&num=50

The interface buttons are really poor ... we know what they mean, but
for a crowd sourced tool you'd definitely want them to be more
meaningful.

If you've got any questions, let me know.
Don

On Nov 23, 9:58 am, ppound <paul.po...@gmail.com> wrote:
> Currently we are using TEI on the Islandlives site.  If you go to this
> urlhttp://www.islandlives.ca/fedora/ilives_book_viewer/ilives:85365
Reply all
Reply to author
Forward
0 new messages