Account Options

  1. Sign in
The old Google Groups will be going away soon.
Switch to the new Google Groups.
Google Groups Home
« Groups Home
OpenLibrary data refresh 3 - Hobbit Edition
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  7 messages - Collapse all  -  Translate all to Translated (View all originals)
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
Ian Davis  
View profile  
 More options May 4 2009, 4:05 pm
From: Ian Davis <m...@iandavis.com>
Date: Mon, 4 May 2009 21:05:24 +0100
Local: Mon, May 4 2009 4:05 pm
Subject: [ol] OpenLibrary data refresh 3 - Hobbit Edition

Hi all,

I'm uploading a refresh of the open library data that includes two main
changes: linkage to the new LCSH dataset and a first crude stab at
FRBRisation.

To provide the subject linkage I downloaded the dump of RDF from
http://id.loc.gov/authorities/ and parsed it to create a local database
mapping skos:prefLabel to URI (using gdbm). Then I modified my json2rdf.py
script to use this database and look up the subjects.

An example of an item where all the subjects matched:

<http://ol.dataincubator.org/works/6102>

This worked less well:

<http://ol.dataincubator.org/works/5991>

And this didn't match any:

<http://ol.dataincubator.org/works/62514>

I haven't done any in-depth analysis of whether these are subjects that
should have matched or if they are simply not available in LCSH.

For the FRBRisation I combined two approaches. For each edtiion record I
created a Work resource and one Manifestation for each ISBN listed in the
record. I remembered that dublin core had isVersionOf/hasVersion properties
which seemed to fit the semantic I wanted so I used them to link Works
directly to Manifestations bypassing any Expression layer. The results of
this approach were moderately successful as can be seen by this work which
has 5 manifestations.

<http://ol.dataincubator.org/works/61705>

The main problem is that there is no distingushing information between the
versions. To address this I grabbed a dump of ThingISBN (which is free for
non-commercial use) and parsed it to create a lookup database of ISBN ->
work id. I think used that to link Manifestations to Works, only creating a
new work resource where there was no work id in ThingISBN.

The coverage of ThingISBN turns out to be pretty good. Here's a work with 3
Manifestations:

<http://ol.dataincubator.org/works/60798>

Since I'm only working with 1% of the total open library corpus I wanted to
increase the number of records that were likely to be related. I grepped the
entire corpus for hobbit and tolkien and ran those records through the
converter.

Here's The Hobbit, which looks pretty good:

<http://ol.dataincubator.org/works/59650>

Here's another work that represents The Hobbit:

<http://ol.dataincubator.org/works/152602>

And the annotated Hobbit:

<http://ol.dataincubator.org/works/152611>

I also made some minor formatting changes for skos:prefLabel which is now
composed of title_prefix, title and subtitle. I also removed the
dct:isPartOf properties that I added in last time - that was my
misunderstanfing of the void convention on datasets. I did a tiny bit of
work on formatting series titles but it needs a whole lot more work - I
think a lot of information has been lost by open library in their conversion
from MARC.

Ian


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Bruce D’Arcus  
View profile  
 More options May 5 2009, 1:13 pm
From: Bruce D’Arcus <bdar...@gmail.com>
Date: Tue, 5 May 2009 10:13:01 -0700 (PDT)
Local: Tues, May 5 2009 1:13 pm
Subject: Re: OpenLibrary data refresh 3 - Hobbit Edition

On May 4, 4:05 pm, Ian Davis <m...@iandavis.com> wrote:

> I'm uploading a refresh of the open library data that includes two main
> changes: linkage to the new LCSH dataset and a first crude stab at
> FRBRisation.

Nice Ian. Just a couple of minor comments:

1) as w/periodicals, would be nice to have publishers as URIs if
possible; maybe http://publishers.dataincurbator.org?

2) the versions/manifestations; do you have the input data to be able
to assign them a bibo type as well?

Bruce


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Ian Davis  
View profile  
 More options May 5 2009, 3:54 pm
From: Ian Davis <m...@iandavis.com>
Date: Tue, 5 May 2009 20:54:21 +0100
Local: Tues, May 5 2009 3:54 pm
Subject: Re: OpenLibrary data refresh 3 - Hobbit Edition

On Tue, May 5, 2009 at 6:13 PM, Bruce D’Arcus <bdar...@gmail.com> wrote:
> On May 4, 4:05 pm, Ian Davis <m...@iandavis.com> wrote:

> > I'm uploading a refresh of the open library data that includes two main
> > changes: linkage to the new LCSH dataset and a first crude stab at
> > FRBRisation.

> Nice Ian. Just a couple of minor comments:

Thanks :)

> 1) as w/periodicals, would be nice to have publishers as URIs if
> possible; maybe http://publishers.dataincurbator.org?

Yes, that's a good idea. Apart from the publisher data in open library and
periodicals sets is there a good database we could target for conversion?

> 2) the versions/manifestations; do you have the input data to be able
> to assign them a bibo type as well?

The records contain a format indicator which could give hints. There is a
list of values I have come across so far at the end of <
http://code.google.com/p/dataincubator/wiki/OpenLibrary> It's not clear to
me how to map them to specific classes - I need help here.

 Ian


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Ross Singer  
View profile  
 More options May 5 2009, 9:37 pm
From: Ross Singer <rossfsin...@gmail.com>
Date: Tue, 5 May 2009 21:37:47 -0400
Local: Tues, May 5 2009 9:37 pm
Subject: Re: OpenLibrary data refresh 3 - Hobbit Edition

On Tue, May 5, 2009 at 3:54 PM, Ian Davis <m...@iandavis.com> wrote:
> Yes, that's a good idea. Apart from the publisher data in open library and
> periodicals sets is there a good database we could target for conversion?

We could probably get a good set of journal publisher data from the
CUFTS KB [1] or the old jake data (although it's now quite out of
date).  Not sure of the licensing.

I assume the OL is already loaded with the LC bib records in archive.org?

-Ross.
1. http://cufts2.lib.sfu.ca/knowledgebase/


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Leigh Dodds  
View profile  
 More options May 6 2009, 6:10 am
From: Leigh Dodds <leigh.do...@talis.com>
Date: Wed, 6 May 2009 11:10:48 +0100
Local: Wed, May 6 2009 6:10 am
Subject: Re: OpenLibrary data refresh 3 - Hobbit Edition

Hi Ross,

The CUFTS KB looks really useful, plenty of journals and titles in there.

Cheers,

L.

2009/5/6 Ross Singer <rossfsin...@gmail.com>

--
Leigh Dodds
Programme Manager, Talis Platform
Talis
leigh.do...@talis.com
http://www.talis.com

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Tom Pasley  
View profile  
 More options May 6 2009, 7:40 pm
From: Tom Pasley <tom.pas...@gmail.com>
Date: Thu, 7 May 2009 11:40:19 +1200
Local: Wed, May 6 2009 7:40 pm
Subject: Re: OpenLibrary data refresh 3 - Hobbit Edition

Ross,

Thanks for reminding me of CUFTS... your point about archive.org is well
made.

Another place to watch, (even though the content is also on there too), is
the Biodiversity Heritage Library, which is getting a helping hand from our
friends at OCLC: http://twitter.com/chrisfreeland/statuses/1668746642

Biodiversity Heritage Library is a good source for scientific articles, etc.
if that's your area... mostly historic stuff, but useful for entomologists,
etc...

cheers,

Tom


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Ian Davis  
View profile  
 More options May 6 2009, 8:16 pm
From: Ian Davis <m...@iandavis.com>
Date: Thu, 7 May 2009 01:16:20 +0100
Local: Wed, May 6 2009 8:16 pm
Subject: Re: OpenLibrary data refresh 3 - Hobbit Edition

On Wed, May 6, 2009 at 2:37 AM, Ross Singer <rossfsin...@gmail.com> wrote:

> I assume the OL is already loaded with the LC bib records in archive.org?

AFAIK it contains the LC data - it certainly looks like LC data in a lot of
the records (to my untrained eye and through the JSON filter).

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »