As Archivematica users know, a standard installation of Archivematica includes a Format Policy Registry (FPR) that contains rules, commands and tools for a wide variety of preservation actions that are performed automatically during ingest. One type of rule is normalization: there are hundreds of rules for normalizing (converting file formats to a select set of preservation formats) during ingest. If the user chooses to normalize during ingest, these rules are invoked automatically on any ingested file for which there is a normalization rule.
There are valid reasons to normalize extensively upon ingest. First, it means narrowing your holdings down to a smaller number of formats for long-term preservation, formats that are today considered to be sustainable and “preservation-friendly”. This means keeping an eye on, say, a dozen formats rather than several dozen or even hundreds of formats, depending on the diversity of your content producers. Second, it allows you to spot and address issues with formats during ingest, rather than discovering them years down the road when they may be harder to address. For example, that image file may not normalize properly because it has a colourspace issue; better to fix that issue now, with current tools and knowledge, than discover and attempt to fix it sometime in the future. Third, it means a certain amount of work up front, permitting a higher level of confidence that a lot of the heavy lifting on digital preservation has been done by the time the content is placed into long-term storage - that AIP is DONE and it won’t have to be touched for a long time.
The downside of extensive use of normalization is the size of your AIPs, particularly when it comes to video files. Nearly all ingested born-digital video files are compressed, and when Archivematica runs the default normalization rule - convert to ffv1/lpcm in an mkv wrapper - a small video file can produce a very large master derivative. If you’re interested to find out more about why this happens, see Ashley Blewer’s blog post at https://bits.ashleyblewer.com/blog/2019/09/19/ffv1-bigger-than-before/. The same can be true for raster images - a JPEG file can be highly compressed, and an uncompressed TIFF preservation copy can be much larger than the JPEG file. On a small scale this might not make much of a difference, but JPEGs are ubiquitous, and a few thousand JPEGS across a few SIPs can have a noticeable impact on processing time and storage.
Ubiquity is the key here, and this brings us to the main point of this post. Should we change the default settings in Archivematica to skip normalization for highly ubiquitous files like JPEGs and h264-encoded mp4 files? Keep in mind that the settings could always be changed: the normalization rules would still be there but they would just be disabled for certain formats. However, we are aware that not all users edit FPR rules, and that the defaults Archivematica ships with are often considered de facto recommendations by Artefactual Systems.
We would love to hear from digital curators and preservationists out there. What is your opinion on normalizing everything that can be normalized? Do you edit the default FPR rules, and if so, why? Would such a change in Archivematica’s default rules have a negative impact on you, or, in your opinion, on the wider community of users? Do you have opinions about specific formats? An open discussion on this discussion list would be great, but if you’re feeling shy, please email me at evelyn[at]artefactual[dot]com.
Regards,
Evelyn McLellan
Systems Archivist & Metadata Specialist
Artefactual Systems
--
You received this message because you are subscribed to the Google Groups "archivematica" group.
To unsubscribe from this group and stop receiving emails from it, send an email to archivematic...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/archivematica/35b4d602-674b-4ceb-ad34-3cd95094b09a%40googlegroups.com.
Thanks for doing that Tim! I hit publish on your post twice and it still kept being deleted ¯\_(ツ)_/¯
On Thu, Oct 17, 2019 at 9:57 AM Timothy Walsh <timothyr...@gmail.com> wrote:
--Hi Evelyn,Thanks for starting this conversation! For some reason Google Groups keeps marking my post as spam and deleting it, so I've published some of my thoughts on my blog here:Looking forward to hearing from others!Tim
You received this message because you are subscribed to the Google Groups "archivematica" group.
To unsubscribe from this group and stop receiving emails from it, send an email to archiv...@googlegroups.com.
Hi Evelyn,
Thanks for posing this question to the group. As our digital preservation program is relatively new and have been Archivematica users for just over a year now these are questions we are starting to ask ourselves often - if not daily. For a little context, like Tim I am also at Concordia University but in the Records Management and Archives department which sits outside the library system.
My comment here is more general and does not get into specific formats but could still be useful for overall consideration in terms of how we are working. Generally speaking our normalization workflows have started to diverge between digital content arriving via private donations versus content arriving through institutional transfers.
We have decided to follow the default Archivematica FPR
normalization rules more rigorously for digital content arriving from private
sources as opposed to those arriving from institutional units. We are working
with the reality that the dynamics at play when an individual or group has
decided to donate digital records to our department is quite different. There
is an emotional - conscious decision often involved with private donations
whereas the institutional transfers are an obligation of staff at the University.
Whether or not it is true in all cases the bar feels higher to implement digital
preservation best practice when content is arriving from a private source; Expectations,
detailed negotiations, trust all come into the larger picture. All of this to
say that we are leaning more heavily on the default Archivematica preservation
rules when it comes to private donations.
On the Institutional side of things we have been working with the default preservation normalization rules on a case-by-case basis. In a recent example, we received an accession of convocation videos in .mov and .mp4 video containers. Needless to say these videos are very large in size. We have made the decision not to do any normalization up front and only create access copies in an “on demand” scenario. As a secondary example, we also recently received a transfer of approximately 1,000 files of administrative records from a faculty at the school. There was very little media included in this accession so in this case we did attempt preservation normalization on this accession.
I think (and I hope J
) as we receive more and more content clearer patterns will emerge and we will
be able to react more consistently. We have also been including a document in
the submission documentation sub-directory of the AIP which we hope
rationalizes our decision making a little.
The other opportunity that presents itself on the institutional records side is our ability to work with and have some influence on records creators within units to create and send us files in a more standardized way that would fall in line with digital preservation best practice. With private donors this would likely never be an option.
Again we are still relatively new to working with digital content and with Archivematica but these are some early observations from our side of things.
John Richan
Digital Archivist, Digital Archivist – Concordia University