oai-ore experimental implementation in Chronicling America

Ed Summers

unread,

May 22, 2009, 10:47:42 AM5/22/09

to oai...@googlegroups.com

Just a quick note to let you all know that there is an experimental
implementation of oai-ore running up at the Library of Congress in the
Chronicling America application [1]. Chronicling America is the web
view on data collected for the National Digital Newspaper Program
(NDNP). NDNP is a 20-year joint project of the National Endowment for
the Humanities and the Library of Congress to digitize and aggregate
historic newspaper in the United States. Right now there are close to
a million digitized newspaper pages available, and additionally
there's also information about 140,000 newspaper titles.

This experimental implementation is really just us dipping our toes
into the world of linked-data and using the oai-ore vocabulary to
express various nested aggregations of objects: newspaper titles,
issues, pages and batches of data sent from awardees. Since we were
playing in the linked data space we chose to use rdf/xml directly
instead of atom, but this is a moving target.

To give you a practical example of what's there, here are some HTML
pages from which you ought to be able to follow your nose to the the
resource map using auto-discovery:

Tite: San Francisco Call [2]
Issue: San Francisco Call, 1895-03-05 [3]
Page: San Francisco Call, 1895-03-05, page sequence 1 [4]

When you drill down to the page aggregation you'll see that it
aggregates resources like the pdf for the page, an ocr xml file, an
ocr text file, a thumbnail, and a jpeg200 file.

I imagine there are some glitches so please be gentle, but I would be
interested in any feedback you have. Also please feel free to fire up
your oai-ore bots.

//Ed

[1] http://chroniclingamerica.loc.gov
[2] http://chroniclingamerica.loc.gov/lccn/sn85066387/
[3] http://chroniclingamerica.loc.gov/lccn/sn85066387/1895-03-05/ed-1/
[4] http://chroniclingamerica.loc.gov/lccn/sn85066387/1895-03-05/ed-1/seq-1/