-----
We would like to begin by thanking EGAD for all their work on RiC to date. We are grateful for all the efforts of EGAD, and have also enjoyed reading the excellent feedback from members of the archival community. We are excited that EGAD has chosen a linked data approach to modelling archival description to better represent the complex relationship between archival materials and the contexts in which they are created, managed and disseminated.
We agree with much of the feedback that has already been shared publicly, and will try to avoid repetition here. As developers of open source archival management software, we wanted to share some of our internal discussions and questions that have arisen as we have reviewed the RiC-CM draft with an eye to systems implementation. As a model whose primary expression will be linked data, RiC is necessarily a standard that assumes implementation in some kind of networked descriptive system - this suggests to us several immediate considerations.
General implementation challenges
Through our experience with dozens of data migrations over many years, we are all too aware of how many institutions still rely on Word documents, XML authoring tools or bespoke databases as the basis of their finding aids - and how many have yet to adopt any content standard to guide their local descriptive practices. RiC will require even greater technical proficiency to implement properly, incorporating technologies still novel to the archival community (see also section 3.4 of the InterPARES Trust response) which cannot be readily implemented outside of a software system. With this radical shift, many small and medium archives risk being left behind. How does the ICA intend to support adoption of the new standard? Will the ICA continue to maintain the existing four standards for those archives who may be unable or unwilling to make the move to linked data?
The role of content standards, data interoperability and harmonization
EGAD intends, in its final version of RiC, to create a “two-part standard: a conceptual model for archival description (RiC-CM), and an ontology (RiC-O)” (RiC-CM Consultation Draft v.01, p. 1). The flexibility of the first draft of RiC-CM leaves much room for implementation - the same data could be modeled a number of different ways. For example, the draft’s own example diagram on page 93 does not make use of the top-level Date entity, instead using date attributes present in other entities or relations to bound time. A conceptual model and an ontology both fulfill very different roles from a content standard, which aims to facilitate consistency in descriptive fields and interoperability across space and time. In fact, section I.5 of ISAD(G)’s introduction emphasizes these points:
This set of general rules for archival description is part of a process that will
a. ensure the creation of consistent, appropriate, and self explanatory descriptions;
b. facilitate the retrieval and exchange of information about archival material;
c. enable the sharing of authority data; and
d. make possible the integration of descriptions from different locations into a unified
information system.
While the ontology and conceptual model might provide enough of a framework for consistent modeling of descriptions across space and time, it does not seem to address the specific descriptive practices to be followed within free-text descriptive attributes such as a scope and content. What role do the ICA and EGAD see the existing ICA content standards playing in the future? Will subsequent versions of RiC provide further specificity to ensure consistent descriptive practices across domains and jurisdictions, as ISAD(G) previously sought to provide?
Chris Hurley has pointed out the vast number of relationship types between entities - 792 in the current draft - and we agree that these should be constrained to better ensure consistent application. The InterPARES Trust response rightly points out how the list of relations might easily be simplified and halved by removing the confusing notion of past vs present tense from the relations, relying instead on the existing date attributes to bound time. On the other hand, we wonder whether it is necessary for EGAD to enumerate all possible combinations of subject, relationship and object, rather than simply providing the relationships as predicates and allowing users to determine what kinds of connections to make with them (using metadata application profiles - see below). For example, is it necessary for the model to list all the different entities that the relationship “associated with” (in both present and past form) can be used to link together? We would be interested to hear other commenters’ thoughts on this, since we are not certain whether others would agree that the detailed list is unnecessary.
We also hope EGAD will consider the role that metadata application profiles will play in implementation and interoperability, and would like to know what guidance the ICA could provide on this. Two illustrative examples come to mind: METS and PCDM. The Metadata Exchange and Transmission Standard (METS) was developed to facilitate data exchange and transmission between repositories and tools. However, it is an extremely flexible and permissive standard, making data exchange without a shared application profile difficult, as the METS generated by one system can rarely be parsed by that of another without intervention. The Portland Common Data Model (PCDM) was similarly developed to provide a common mechanism for data interoperability between Hydra implementers, though it has grown beyond a Hydra specific model. However, the community found in early versions that the model was so general and flexible that multiple interpretations of the same data, each valid within the model, prevented interoperability anyway. They have since set out more specific parameters and a formalized way of documenting a specific application profile (see PCDM Profile Template). RiC might benefit from this lesson, and consider testing this kind of scenario in advance. In some cases, constraints will produce data that can be more readily combined and shared, giving it greater utility. Perhaps EGAD considers this to be the responsibility of implementers - however, if this is the case, then the role of the ICA in standards development should be interrogated: is it not still to ensure consistency across space and time, and to facilitate exchange and reuse? For developers to be able to implement the system while still supporting exchange and interoperability, we will need consistent implementation guidelines so that any systems implementation can be designed to be able to exchange data with other systems easily.
Additionally, we are somewhat surprised by the response of M. Clavaud on the ICA-EGAD list-serv (2016-10-04) eschewing the reuse of existing ontologies. While there are certainly areas in which this may be appropriate, as a wholesale approach it strikes us contrary to the linked data best practice of reusing standard vocabularies when possible (see for example the W3C Best Practices), and represents an enormous maintenance burden for the ICA. The W3C’s SKOS is a perfect example - is it truly necessary for RiC-O’s Concept/Thing entity to repeat this work so completely? We urge EGAD to consider a more balanced approach in what it chooses to reuse vs. what is designed anew. If the approach taken by RiC is informed by metadata application profiles, then RiC’s role becomes simpler - offering implementation guidelines for data consistency by reusing existing vocabularies and ontologies, as well as helpful extensions where existing ontologies do not meet the specific needs of archivists.
Missing entities
Since RiC-CM and RiC-O seem aimed at providing resources for the management of all archival functions and activities, we note several other possible entities that do not seem to be covered by the proposed model. Namely, we ask EGAD to consider the role that Rights, Accessions, and Physical storage play in the management of archival information. Greg Bak has previously pointed out that more might be needed to capture dependency information, and we note that EGAD itself has acknowledged that fields that capture the role of the archivist in shaping the record are still lacking. Rights are a crucial element if data are to be exchanged and reused. Conditions of access and conditions of use are listed as properties of record-related entities, but might it not be desirable to declare how the rights are related to an agent acting as the rights-holder of the records in question? Why reduce a complex entity with its own properties from a thing to a string, making it less machine-actionable in the future, and inconsistently implemented? Similarly, while different jurisdictions and institutions will handle accessions and physical storage information differently (or exclude them entirely), we still see them represented in archival data often enough to need a consistent method for expressing them within the RiC models.
We would point out as well that some entities seem to be missing important properties - for example, we see no clear way to indicate that a Date might be approximate or uncertain, a key feature of archival description. We also strongly support the TS-DACS response on RiC-P36 gender, and on identity in general.
Record vs Record-set
While conceptually we understand why EGAD has proposed the concept of the record set, our experience suggests that implementing this distinction in practice in an archival management system will be a hindrance over time. Unexpected changes may bring a new record into a record set, thereby invalidating any shared properties of a record set (3.5 and 3.6 in RiC-CM). Further, granularity may grow over time - for example, a box that is described as a record (an item) may have its contents described at a later date - suddenly our item-level box record must become a record set. If a record set and record are fundamentally different entities in the data model, with different attributes and relationships, then switching between entities will be difficult to implement and may lead to the loss of data that is not valid for the new entity.
In AtoM’s data model, all records are simply “information objects” with the same available properties, some of which may be inherited automatically from higher levels of description. We believe a more flexible approach such as this might ultimately be beneficial for systems implementors - it keeps the data model simpler, thereby ensuring more consistency in implementation, and makes all properties available to all records regardless of type or level. A record may still describe an aggregation - the way its properties are used would clarify this.
Next steps
Overall we are impressed with the work of EGAD to date and are excited to see steps being taken to represent archival description as linked data. However, as we have mentioned above, we worry about the ability of under-resourced institutions to take advantage of the standard and its accompanying ontology when they are finalized, given that many of these institutions may have spent years achieving a basic level of compliance with existing ICA standards. As software developers we also have a vested interest in making sure that any new standard is compatible with the ability to write software for implementation. We hope, therefore, that EGAD keeps implementation considerations in mind as it begins work on the next iteration of the model. We also hope that ICA does not plan to cease its standard-related activities once RiC-CM and RiC-O are finalized, as publishing a new standard is only the first step toward making the standard usable by practitioners world-wide.
EGAD plans to release a draft of RiC-CM v2, and call for comments, by the beginning of 2019.
A call for reviewers, that will concern a beta version of RiC-O, will be published in December 2018.
--
You received this message because you are subscribed to the Google Groups "AtoM Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ica-atom-user...@googlegroups.com.
To post to this group, send email to ica-ato...@googlegroups.com.
Visit this group at https://groups.google.com/group/ica-atom-users.
To view this discussion on the web visit https://groups.google.com/d/msgid/ica-atom-users/6086613b-6c71-4ba0-bd7d-32ce730108f1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Access to Memory Foundation / Fondation Access to Memory
Browsing by "Digital objects", for example, one can find some diagrams refered as RiC-CM v0.2.
Best regards from Brazil,
Carlos Menegozzo