DigitalNZ metadata as RDF

40 views
Skip to first unread message

Conal Tuohy

unread,
Jul 20, 2017, 12:11:16 AM7/20/17
to DigitalNZ
I have been working for a while on a project, called "oceania.digital", to convert metadata from DigitalNZ (and also the National Library of Australia's "Trove" service, as well as some other sources) into a web of Linked Data. It's taken a long time to work through a bunch of problems (moreso with Trove than with DigitalNZ).

But finally, over just the last couple of weeks I have successfully downloaded about 3.8M item records from DigitalNZ, converted those records into RDF graphs, and stored them in a SPARQL graph store. That's everything but the newspapers (for now).

Currently the mapping to RDF is super-simplistic; a DigitalNZ "field" is mapped directly to an RDF property. In other words, the DigitalNZ schema effectively defines a "DigitalNZ" ontology. One interesting result already is that the set of fields in the DigitalNZ API includes several which are not documented publicly. 

Fields whose name ends with "-url" are converted to RDF object properties (in some cases the value is tweaked to make it a valid URI). Other fields become integer or dateTime properties if the DigitalNZ API flags them as such; otherwise they are strings. The source code for this conversion is here:

Next steps are to analyse the dataset and work out a better mapping to some well-known ontology. I believe that, although DigitalNZ have mapped a huge number of data sources to their schema, the descriptive vocabularies used are still rather specific to each provider. I expect there'll be a need for a lot of mapping (e.g. skos:closeMatch, etc) to make those vocabularies actually align.

The project website is here: http://oceania.digital/ though no actual functionality is available yet. In particular, you can't browse or search the data (I will get to that next week, with luck). 

However, there is a backup of the RDF available for download at http://oceania.digital/backups/ if anyone is interested to have a look. The backup is a gzipped nquads file (produced by Fuseki's backup operation, and suitable for restoring using Jena tdbloader). The file is 1.1GB to download and about 14.5GB when uncompressed.

If you're interested in following the progress of this project ― perhaps even chipping in with some help! ― then you may want to follow the twitter account @OceaniaDigital. https://twitter.com/OceaniaDigital/status/887878851799293952


Con

Stuart A. Yeates

unread,
Jul 20, 2017, 5:16:10 AM7/20/17
to digi...@googlegroups.com
I think a really interesting follow up question is: how many of the digitalnz contributors already have RDF representations that we can link to trivially?

Clearly Wikimedia Commons and Wikipedia do, anyone else?

cheers
stuart

--
...let us be heard from red core to black sky

--

---
You received this message because you are subscribed to the Google Groups "DigitalNZ" group.
To unsubscribe from this group and stop receiving emails from it, send an email to digitalnz+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Conal Tuohy

unread,
Jul 20, 2017, 7:33:07 AM7/20/17
to digi...@googlegroups.com
Thanks Stuart, that's an excellent question. I believe Auckland Museum do too.


--
Reply all
Reply to author
Forward
0 new messages