Artefactual response to RiC-CM Draft

754 views

Skip to first unread message

Dan Gillean

unread,

Jan 25, 2017, 3:59:50 PM1/25/17

to eg...@ica.org, ica-eg...@lists.village.virginia.edu, ICA-AtoM Users

The following feedback has been collectively prepared by Artefactual staff in response to the Records in Context Conceptual Model (RiC-CM) 0.1 Draft released by the ICA's Expert Group on Archival Description (EGAD).

-----

We would like to begin by thanking EGAD for all their work on RiC to date. We are grateful for all the efforts of EGAD, and have also enjoyed reading the excellent feedback from members of the archival community. We are excited that EGAD has chosen a linked data approach to modelling archival description to better represent the complex relationship between archival materials and the contexts in which they are created, managed and disseminated.

We agree with much of the feedback that has already been shared publicly, and will try to avoid repetition here. As developers of open source archival management software, we wanted to share some of our internal discussions and questions that have arisen as we have reviewed the RiC-CM draft with an eye to systems implementation. As a model whose primary expression will be linked data, RiC is necessarily a standard that assumes implementation in some kind of networked descriptive system - this suggests to us several immediate considerations.

General implementation challenges

Through our experience with dozens of data migrations over many years, we are all too aware of how many institutions still rely on Word documents, XML authoring tools or bespoke databases as the basis of their finding aids - and how many have yet to adopt any content standard to guide their local descriptive practices. RiC will require even greater technical proficiency to implement properly, incorporating technologies still novel to the archival community (see also section 3.4 of the InterPARES Trust response) which cannot be readily implemented outside of a software system. With this radical shift, many small and medium archives risk being left behind. How does the ICA intend to support adoption of the new standard? Will the ICA continue to maintain the existing four standards for those archives who may be unable or unwilling to make the move to linked data?

The role of content standards, data interoperability and harmonization

EGAD intends, in its final version of RiC, to create a “two-part standard: a conceptual model for archival description (RiC-CM), and an ontology (RiC-O)” (RiC-CM Consultation Draft v.01, p. 1). The flexibility of the first draft of RiC-CM leaves much room for implementation - the same data could be modeled a number of different ways. For example, the draft’s own example diagram on page 93 does not make use of the top-level Date entity, instead using date attributes present in other entities or relations to bound time. A conceptual model and an ontology both fulfill very different roles from a content standard, which aims to facilitate consistency in descriptive fields and interoperability across space and time. In fact, section I.5 of ISAD(G)’s introduction emphasizes these points:

This set of general rules for archival description is part of a process that will

a. ensure the creation of consistent, appropriate, and self explanatory descriptions;

b. facilitate the retrieval and exchange of information about archival material;

c. enable the sharing of authority data; and

d. make possible the integration of descriptions from different locations into a unified

information system.

While the ontology and conceptual model might provide enough of a framework for consistent modeling of descriptions across space and time, it does not seem to address the specific descriptive practices to be followed within free-text descriptive attributes such as a scope and content. What role do the ICA and EGAD see the existing ICA content standards playing in the future? Will subsequent versions of RiC provide further specificity to ensure consistent descriptive practices across domains and jurisdictions, as ISAD(G) previously sought to provide?

Chris Hurley has pointed out the vast number of relationship types between entities - 792 in the current draft - and we agree that these should be constrained to better ensure consistent application. The InterPARES Trust response rightly points out how the list of relations might easily be simplified and halved by removing the confusing notion of past vs present tense from the relations, relying instead on the existing date attributes to bound time. On the other hand, we wonder whether it is necessary for EGAD to enumerate all possible combinations of subject, relationship and object, rather than simply providing the relationships as predicates and allowing users to determine what kinds of connections to make with them (using metadata application profiles - see below). For example, is it necessary for the model to list all the different entities that the relationship “associated with” (in both present and past form) can be used to link together? We would be interested to hear other commenters’ thoughts on this, since we are not certain whether others would agree that the detailed list is unnecessary.

We also hope EGAD will consider the role that metadata application profiles will play in implementation and interoperability, and would like to know what guidance the ICA could provide on this. Two illustrative examples come to mind: METS and PCDM. The Metadata Exchange and Transmission Standard (METS) was developed to facilitate data exchange and transmission between repositories and tools. However, it is an extremely flexible and permissive standard, making data exchange without a shared application profile difficult, as the METS generated by one system can rarely be parsed by that of another without intervention. The Portland Common Data Model (PCDM) was similarly developed to provide a common mechanism for data interoperability between Hydra implementers, though it has grown beyond a Hydra specific model. However, the community found in early versions that the model was so general and flexible that multiple interpretations of the same data, each valid within the model, prevented interoperability anyway. They have since set out more specific parameters and a formalized way of documenting a specific application profile (see PCDM Profile Template). RiC might benefit from this lesson, and consider testing this kind of scenario in advance. In some cases, constraints will produce data that can be more readily combined and shared, giving it greater utility. Perhaps EGAD considers this to be the responsibility of implementers - however, if this is the case, then the role of the ICA in standards development should be interrogated: is it not still to ensure consistency across space and time, and to facilitate exchange and reuse? For developers to be able to implement the system while still supporting exchange and interoperability, we will need consistent implementation guidelines so that any systems implementation can be designed to be able to exchange data with other systems easily.

Additionally, we are somewhat surprised by the response of M. Clavaud on the ICA-EGAD list-serv (2016-10-04) eschewing the reuse of existing ontologies. While there are certainly areas in which this may be appropriate, as a wholesale approach it strikes us contrary to the linked data best practice of reusing standard vocabularies when possible (see for example the W3C Best Practices), and represents an enormous maintenance burden for the ICA. The W3C’s SKOS is a perfect example - is it truly necessary for RiC-O’s Concept/Thing entity to repeat this work so completely? We urge EGAD to consider a more balanced approach in what it chooses to reuse vs. what is designed anew. If the approach taken by RiC is informed by metadata application profiles, then RiC’s role becomes simpler - offering implementation guidelines for data consistency by reusing existing vocabularies and ontologies, as well as helpful extensions where existing ontologies do not meet the specific needs of archivists.

Missing entities

Since RiC-CM and RiC-O seem aimed at providing resources for the management of all archival functions and activities, we note several other possible entities that do not seem to be covered by the proposed model. Namely, we ask EGAD to consider the role that Rights, Accessions, and Physical storage play in the management of archival information. Greg Bak has previously pointed out that more might be needed to capture dependency information, and we note that EGAD itself has acknowledged that fields that capture the role of the archivist in shaping the record are still lacking. Rights are a crucial element if data are to be exchanged and reused. Conditions of access and conditions of use are listed as properties of record-related entities, but might it not be desirable to declare how the rights are related to an agent acting as the rights-holder of the records in question? Why reduce a complex entity with its own properties from a thing to a string, making it less machine-actionable in the future, and inconsistently implemented? Similarly, while different jurisdictions and institutions will handle accessions and physical storage information differently (or exclude them entirely), we still see them represented in archival data often enough to need a consistent method for expressing them within the RiC models.

We would point out as well that some entities seem to be missing important properties - for example, we see no clear way to indicate that a Date might be approximate or uncertain, a key feature of archival description. We also strongly support the TS-DACS response on RiC-P36 gender, and on identity in general.

Record vs Record-set

While conceptually we understand why EGAD has proposed the concept of the record set, our experience suggests that implementing this distinction in practice in an archival management system will be a hindrance over time. Unexpected changes may bring a new record into a record set, thereby invalidating any shared properties of a record set (3.5 and 3.6 in RiC-CM). Further, granularity may grow over time - for example, a box that is described as a record (an item) may have its contents described at a later date - suddenly our item-level box record must become a record set. If a record set and record are fundamentally different entities in the data model, with different attributes and relationships, then switching between entities will be difficult to implement and may lead to the loss of data that is not valid for the new entity.

In AtoM’s data model, all records are simply “information objects” with the same available properties, some of which may be inherited automatically from higher levels of description. We believe a more flexible approach such as this might ultimately be beneficial for systems implementors - it keeps the data model simpler, thereby ensuring more consistency in implementation, and makes all properties available to all records regardless of type or level. A record may still describe an aggregation - the way its properties are used would clarify this.

Next steps

Overall we are impressed with the work of EGAD to date and are excited to see steps being taken to represent archival description as linked data. However, as we have mentioned above, we worry about the ability of under-resourced institutions to take advantage of the standard and its accompanying ontology when they are finalized, given that many of these institutions may have spent years achieving a basic level of compliance with existing ICA standards. As software developers we also have a vested interest in making sure that any new standard is compatible with the ability to write software for implementation. We hope, therefore, that EGAD keeps implementation considerations in mind as it begins work on the next iteration of the model. We also hope that ICA does not plan to cease its standard-related activities once RiC-CM and RiC-O are finalized, as publishing a new standard is only the first step toward making the standard usable by practitioners world-wide.

-----

Dan Gillean, MAS, MLIS
AtoM Program Manager
Artefactual Systems, Inc.
604-527-2056

@accesstomemory

tat...@gmail.com

unread,

Feb 15, 2019, 9:10:06 AM2/15/19

to AtoM Users

Hi Dan, how are you?

I read on this Masters dissertation (https://repositorio.ufscar.br/bitstream/handle/ufscar/10520/PPGCI_Mestrado_Humberto_Moraes.pdf?sequence=1&isAllowed=y - page 49) that version 2.5 is going to have RIC. But going on the roadmap I didn't find anything that says so. Is Artefactual planning on adding this description in AtoM?

Cheers,

Tatiana Canelhas

Dan Gillean

unread,

Feb 15, 2019, 12:00:23 PM2/15/19

to ICA-AtoM Users

Hi Tatiana,

Wow! This is very interesting, thank you for sharing. I wish I could read Brazilian Portuguese - I am relying on Google Translate to peruse the text!

The short answer: unfortunately, this assertion is incorrect. AtoM 2.5 will not support RiC - though we hope in the future that AtoM3 will.

First, the standard is not complete, or even close to ready, from all outward appearances. RiC-CM has made no publicly visible progress since the initial 0.1 release, and the revision process has been behind closed doors, so it is not possible to determine how the community feedback is being addressed. As far as I have seen thus far, the only official update as to the status of RiC development was made on the ICA website in October 2018:

https://www.ica.org/en/update-development-of-ric-cm-records-in-contexts-conceptual-model-and-of-ric-o-records-in-contexts

This update claimed the following:

EGAD plans to release a draft of RiC-CM v2, and call for comments, by the beginning of 2019.

A call for reviewers, that will concern a beta version of RiC-O, will be published in December 2018.

I have not personally seen evidence of either thus far, though I hope they may be announced soon.

My personal opinion: Whether or not RiC ever sees broad adoption in the global archival community will depend in part on how well they respond to the feedback they received, and whether or not the ICA can demonstrate its ability to steward the maintenance of a standard and address the needs of its supporting community. A lot of people spent considerable time crafting very insightful responses with important feedback, and if none of this is reflected at all in the revisions, then combined with the long delays and lack of transparency around the development process, this could be enough to doom the proposal to irrelevance. My sincere hope is that EGAD has been spending this time revisiting many of the original premises of the RiC draft in light of the valuable feedback received, and the next draft will show this consideration.

In particular, I remain concerned that the draft standard makes no use of existing ontologies / vocabularies, something we highlighted in our original response as a W3C linked data best practice. The excellent work carried out by Docuteam and the State Archives Canton of Wallis in developing the Matterhorn RDF data model shows that pretty much everything outlined in RiC can be implemented using existing ontologies, and I hope that the next revision of RiC will at least consider this feedback in its approach, and avoid falling into what we might call the "927 problem."

In any case, as AtoM is a standards-based application, Artefactual does not want to incorporate a standard still in development, which is likely to change significantly as it nears completion and adoption.

Second, no one has sponsored the development of linked data serialization for AtoM. While it could be possible to develop an RDF export or JSON-LD API endpoint without significantly altering AtoM's underlying data model, AtoM 2 itself would require a massive overhaul to be able to author and consume linked data.

We at Artefactual believe that linked data represents an important next step for the cultural heritage community. Rather than trying to cram such functionality on top of AtoM 2's aging framework, we prefer to take this as an opportunity to consider the next generation of AtoM, designed from the outset as a modular, flexible, linked data driven tool. The dissertation pulled screenshots of the AtoM development timeline from these slides, but apparently missed full title of the slide deck, which references AtoM3. This is where we hope to concentrate our efforts in the future.

So far, AtoM3 remains an idea. We await further leadership from the recently formed AtoM Foundation (which is still in the process of establishing itself and considering how it can expand for international participation), as well as funding to begin research and development. In the meantime, we intend to continue developing and maintaining AtoM 2 until we have full or near-to-full feature equivalency in AtoM3 and a migration or upgrade path for current AtoM users. Depending on funding and community leadership, this may or may not entail half-steps that bridge the two - such as developing a new access front end that can be used with AtoM 2, but also comprises one of the key building blocks for AtoM3. Time will tell.

Thank you for sharing the paper!

Cheers,

Dan Gillean, MAS, MLIS
AtoM Program Manager
Artefactual Systems, Inc.
604-527-2056

@accesstomemory

--
You received this message because you are subscribed to the Google Groups "AtoM Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ica-atom-user...@googlegroups.com.
To post to this group, send email to ica-ato...@googlegroups.com.
Visit this group at https://groups.google.com/group/ica-atom-users.
To view this discussion on the web visit https://groups.google.com/d/msgid/ica-atom-users/6086613b-6c71-4ba0-bd7d-32ce730108f1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Creighton Barrett

unread,

Feb 21, 2019, 10:23:38 AM2/21/19

to AtoM Users

Hi everyone,

I just wanted to update this thread with a bit of information about AtoM 3 and the newly formed AtoM Foundation. The AtoM Foundation was established in 2018 to provide governance and promote community involvement in the development and maintenance of AtoM 3. You can read more about the Foundation on our website: https://accesstomemoryfoundation.org/

The purpose of the Foundation is to oversee the development and adoption of multi-lingual, multi-repository, standards-based open-source software that will succeed AtoM 2. As Dan mentioned, the Foundation is still in the process of getting organized, but we are fortunate to already have a growing international membership: https://accesstomemoryfoundation.org/membership-2/

And we welcome new institutional and individual members.

The Board of Directors is very close to launching a bilingual survey on design principles for AtoM 3 that we will share on this list and elsewhere. The survey will be open to the global archival community and will help us gather information on linked data support and other features that could potentially form the basis of AtoM 3.

I will be sure to post about the survey when it is open. In the meantime, please feel free to contact the Board of Directors with any questions about AtoM 3 or the AtoM Foundation: https://accesstomemoryfoundation.org/contact/

Best regards,

Creighton Barrett

Chair, Board of Directors

Access to Memory Foundation / Fondation Access to Memory

https://accesstomemoryfoundation.org/

virtu...@yahoo.com.br

unread,

Feb 23, 2019, 11:13:43 PM2/23/19

to AtoM Users

Hello everyone,
Some update on RiC-CM v0.2 can be found at:
https://web.esrc.unimelb.edu.au/ICAD/index.html.

Browsing by "Digital objects", for example, one can find some diagrams refered as RiC-CM v0.2.