Pointer files

Rachel MacGregor

unread,

Oct 22, 2015, 12:05:52 PM10/22/15

to archivematica

HI,

I'm a Archivematica newbie and still getting up to speed with the technical side of things. I wondered if anyone could explain how the pointer files fit into the Archivematica workflow, their precise function within it (I can see the metadata they contain and understand that they link the AIPs to AICs etc) and probably most puzzling to me why some transfers create them and others don't.

Apologies if should be obvious or has been asked before but I couldn't see anything in the literature, wiki or in this group.

Cheers

Rachel

Justin Simpson

unread,

Oct 22, 2015, 1:31:44 PM10/22/15

to archiv...@googlegroups.com

Hi Rachel,

Pointer files were added to Archivematica with the 1.0 release, the very first one was created in February 2013, I believe. They are still not documented properly, so it is not surprising that you have some questions.

I'll try to to summarize what Artefactual was thinking about, when we decided to add this functionality to Archivematica, and point to some development work that is ongoing now. I am curious to get feedback from you and anyone else interested in this topic. We are going to try and do a better job of communicating the concept before making additional changes this time around, this is as good a place as any to start.

An Archivematica AIP is meant to be a self-describing container. Inside there are a set of original objects, and all the metadata supplied by the user and created by Archivematica during processing. All of the files inside the AIP are tied together by the METS file.

Archivematica puts the contents of each AIP into a Bag. This bag can then be compressed. This leads to a bootstrapping problem. If you came across an AIP, stored as a .7z file, you would need to know that this is a 7zip file, and how to open a 7zip file. There is nothing explicity recorded anywhere, that tells you how this AIP was compressed and how to uncompress it.

This was the first motivation for creating pointer files. The pointer file is where we store metadata about the creation and storage of the AIP. The pointer file is a METS file, with an amdSec, a fileSec and structMap. Inside the amdSec, there is a techMD, that in turn wraps a PREMIS object, which describes the AIP. This PREMIS object includes a checksum for the .7z file, a uuid and details about the format of the file (a pronom entry). The amdSec also contains a digiprovMD section, which contains a PREMIS event , detailing the compression event.

When you AIP is a single, compressed file, the pointer file gives you enough information to be able to understand what the AIP is and how to open it. Then you can access the METS file that is inside the AIP, to understand the actual contents of the AIP.

The second motivation for having pointer files came up when we were first developing the Archivematica Storage Service and adding LOCKSS as an AIP storage location. LOCKSS has the concept of an Archival Unit, which is a fixed size. This can be set to any size, in theory, so in order for Archivematica to store an AIP in LOCKSS, it is sometimes necessary to break up the AIP into chunks. These chunks cannot be larger than the maximum size of the AU. In the storage service, if you configure a LOCKSS Space, you are asked to enter the size of the AU, to allow the Storage Service to do this chunking.

The Archivematica pipeline still creates an AIP, puts it in a bag and compresses it into a 7z file and then sends it to the storage service. The storage service has to then break up the AIP into more than one chunk. The details of this chunking process are important to be able to recreate the original aip, obviously. This is stored in the pointer file as PREMIS event, (one for the compression, one for the chunking), and the fileSec and structMap of the pointer file get updated by the storage service to include entries for each chunk.

There is some information in the archivematica wiki about pointer files, hidden in the AIC and dataset preservation sections, including a sample pointer file:

https://wiki.archivematica.org/Dataset_preservation#Sample_pointer.xml_file
https://wiki.archivematica.org/AIC

There are details in the page about LOCKSS integration also, with several pointer file samples:

https://wiki.archivematica.org/LOCKSS_Integration#Sample_pointer_files

In theory, any transformation of an AIP that happens, due to the physical requirements of the storage system, for example, could be recorded in a pointer file. For example, storing an AIP in an object store (like Swift in OpenStack, or DuraCloud, or directly in Amazon S3) would probably require chunking, just like LOCKSS, and so the details of that should be recorded in the pointer file.

All of what I have said so far applies to AIPs that are bagged and then compressed. When AIP's are stored uncompressed (which is an option you have during processing), then there is no pointer file created. This is due to a limitation in PREMIS version 2.2. PREMIS 2 does not allow PREMIS objects to be intellectual entities, they are supposed to only by physical entities. So when we tried to figure out how to describe the AIP, in a pointer file, there was nothing to describe. So we sort of gave up at that point, or more properly we have been waiting for PREMIS 3. Now that PREMIS 3 exists, this is something that should be revisited. We should be able to create pointer files for uncompressed aips.

I am not a PREMIS expert, by any means. Evelyn McLellan, the Artefactual president, is, and she will be at iPres in a couple of weeks, as part of the PREMIS Implementation Fair https://groups.google.com/forum/#!topic/digital-curation/WEL7uyR4RAg I encourage anyone who happens to be in Chapel Hill on November 6th to find Evelyn and ask her about this.

There are also interesting things that pointer files might enable in regards to storing AIP's in a repository, or in an object store. Archivematica always assumes that an AIP should be one physical file, that is a compressed bag. We added support for uncompressed aips, but those are still Bags. There are compelling reasons for not actually storing an AIP in this fashion, when you have a repository based on something like FEDORA, or you have an object store like Swift or S3. You might want to store each individual digital object at its own URI, to enable access to original files. But then you don't really have an AIP. Pointer files might be a way to allow this to work.

Artefactual is starting to work on a new feature for the Bentley Library at the University of Michigan that is related to this idea. Michigan has an insitutional repository called Deep Blue, which is currently based on DSpace, and one day will probably be Hydra (i.e. FEDORA) based. The Bentleys descriptive practices up til now have involved describing down to the Series level, and creating a DSpace Item for each series, and then uploading a zip file, containing all the digital objects that are part of that series, to the DSpace Item. Sometimes individual files are uploaded instead of a zip file. If Archivematica were uploading AIP's to be stored in the repository, right now they would be 7zip files, containing all the digital objects, but also containing all the metadata and submission documentation and logs, and the METS file. Bentley wants to separate this administratively important material from the original digital objects and any access or preservation derivatives. This is for the convenience of their clients, who don't really want to look at a METS file, but also because the metadata in an AIP might include information that needs to be redacted (for example the list of all the files in an original transfer, some of which may have been excluded from the final AIP, and the output of things like Bulk extractor, the logs may include Personally Identifying Information).

We are speculating that we could deal with this in the Storage Service, by transforming the original AIP (given to the storage services as a Bag) into 2 separate zip files. One would contain the digital objects, one would contain everything else (metadata, logs, submission documentation, METS, bag manifest, etc). The details of this transformation would be recorded in the pointer file.

I will end with a final, slightly sad looking link:

https://wiki.archivematica.org/AIP_pointer_file

This is a page we created 2 years ago, but have not updated yet. I would encourage anyone interested in this topic of how pointer files work and why, to add comments and ideas to that wiki page.

Justin Simpson
Director of Archivematica Technical Services
www.artefactual.com
604-527-2056

--
You received this message because you are subscribed to the Google Groups "archivematica" group.
To unsubscribe from this group and stop receiving emails from it, send an email to archivematic...@googlegroups.com.
To post to this group, send email to archiv...@googlegroups.com.
Visit this group at http://groups.google.com/group/archivematica.
For more options, visit https://groups.google.com/d/optout.

Rachel MacGregor

unread,

Oct 23, 2015, 4:16:22 AM10/23/15

to archivematica

Fantastic - this was exactly the sort of detail I needed and I see immediately that the reason for some files having them and some not was because of the compression levels (I have been experimenting with trying different ones but hadn't made the link).

Hopefully this will be of interest to other new users and perhaps encourage others to contribute to the wiki. I'm also getting to grips with PREMIS but sadly can't make iPRES15 (wrong side of the pond).

Best wishes,

Rachel

Kari Smith

unread,

Oct 2, 2017, 3:06:54 PM10/2/17

to archivematica

Hi Justin,

How much work would it take to have a pointer file created for each AIP? I'd like to include the xml Pointer file as an attached External file in our ArchivesSpace Access records. Compressing (zipping) the AIP goes against our digital preservation practice so we'd want basically the same info, but for an uncompressed Bag.

Thanks for your response,

Kari

MIT Libraries

Reply all

Reply to author

Forward