NLA Integration Planning

281 views

Skip to first unread message

Greg Pendlebury

unread,

Mar 8, 2012, 9:06:11 PM3/8/12

to redbox-repo

Hey all,

With the Community Day coming up next week and the impending release of ReDBox v1.5 I wanted to post these notes to the list for public dispersal now that I have collated them into something (hopefully) coherent.

I want to thank the NLA rep who was very patient and helpful in all the back and forth that went on putting this together. Where appropriate I have some excerpts from their email in red blow (I haven't named names, simply because I didn't ask permission... there's no secret sauce in this email), and of course Nick Nicholas when he was at ANDS started the dialogue, and I have one quote from him in blue. In both cases I think it is worth noting that those emails (and my quotes) are simply a reflection of the way things are now (or a few weeks ago) and historically. Given the level of interest in this area from anyone dealing with ANDS there may well be changes in the future that no one can predict.

Below is what I expect to be the most achievable working solution we will have in ReDBox / Mint for NLA integration, including non-system business processes. This has been bounced on NLA and falls in line with their expectations and current practice as well. I would love to see more automation available for some steps, but it is my outsider's understanding that NLA is not resourced to undertake work in this area for now, given that there are no ongoing development projects within the NLA for this system, only ANDS funded development on our side... and we would be the only users asking for the interface anyway (as in all the various metadata stores, whatever software is used).

1) Mint records that are ready to publish are generated as basic (ie. stub) EAC-CPF records as a datastream of the Mint object (alongside the existing marc/oai_dc/rif datastreams). They will be using an identifier decided by your local curation model (eg. Handle or 'Local') and contain enough data to reasonably identify the person.

2) They are also exposed in a specific OAI-PMH feed (portal based) from your Mint server for the NLA to harvest. The NLA "... can schedule harvests daily if required or less frequently (e.g. monthly) or whatever time frame suits the contributor". So we can hopefully set this to something with minimal downtime (like hourly?) NLA have specifically noted that they "... don't offer a way for people to push their data to us (eg SRU Update)". I think that sort of interface would help tremendously if anyone had the resources to collaborate with NLA for development (assuming the NLA is even in a position to do so)... just tea leaf gazing on my part though.

3) The institution then needs to use the Trove Identities Manager (TIM) tool provided by the NLA according to their normal practices to get their identities into Trove (production). Once in Trove they will have been allocated new national level IDs (or merged with existing ones)... this is what you are using TIM to decide on (my words... and understanding). This will be the most manual step in the process and the key bottleneck if you have plans on mass publication (Peter?). If you release small sets of parties at a time (or even individuals) this may well only have minimal impact... aside from the new tool.

4) Mint will have a housekeeping task running that is polling Trove's SRU interfaces looking for the IDs we sent them. Once they are found we can retrieve the record and it's newly assigned NLA ID. This will be stored in Mint. The NLA have very specifically noted that "All identifiers are equal" and give no particular onus to one type of ID (and there are several). I think however at this stage we will be targeting IDs of the form "http://nla.gov.au/nla.party-XYZ" since they are both unique and externally resolvable. Just my opinion... any arguments for a particular choice?

5) Mint records that have an NLA ID will render much richer EAC-CPF renditions and be exposed to NLA once again. With the NLA ID inside this metadata the ongoing process of maintaining national level records will be much easier (hopefully automated). Whether this means no work in TIM, or greatly reduced work in TIM is unknown (by me). The NLA have indicated that "[t]he match rate will be very good if the nla.party id was in an institutions / contributors records".

5) When ReDBox publishes records to RDA it will be using the NLA ID that Mint has on file for creating linked data. Mint will no longer publish parties to RDA if they have been sent to NLA (presumably this means every party, so no more will be going to RDA at all).

6) When RDA observes NLA IDs in relationships "... ANDS will fetch the corresponding record from the NLA in RIF-CS overnight, and ingest it into RDA", thus closing the loop.

7) Not intended for current work, but we have also identified a way to poll the history of an identity via SRU, so if other institutions provided relevant information on identities we care about in Mint we can choose to take action in future development work. ie. Cross-insutitional work and/or over-time career movements etc. It may be more appropriate just to leave this at the National level. Using the ID suggested above should be that 'clicks' on a party always resolve to Trove, so Mint would be nothing more that a management platform for your institutional 'shard' of that identity.

Ta,
Greg

Toby at UWS

unread,

Jun 22, 2012, 1:24:18 AM6/22/12

to redbo...@googlegroups.com

Hi everyone,

Being the visual person that I am, I had to translate this very informative post into a picture form, in order to get my head around it. If it's wrong that's not entirely my fault; I had help. Has anyone expanded on this, and actually nutted out some approaches to better integration (does anyone really want to)?

Toby