Re-ingestion mechanics in the long-term

85 views
Skip to first unread message

buck...@gmail.com

unread,
Jul 30, 2024, 3:51:26 PMJul 30
to archivematica
Hi all,

I'm curious about how re-ingesting works. I realize there's documentation, but it didn't really touch on the longer-term mechanics/implications (unless I missed it). I may be missing some fundamental concepts here, but bear with me:

It's the year 2050. Civilization has fallen. Worse than that, no one uses the TIFF format anymore. It's time to migrate those ancient photos from TIFF to the best preservation format available, let's say format X. But here's the thing, the original files weren't TIFFs. They were Kodak Photo CD (PCD) images that someone had a blast making at a mall kiosk in the late 1990s. Being the good archivists we are, they were ingested into Archivematica and TIFFs were created as preservation copies. We're now ready to re-ingest and create new and shiny format X copies for preservation, but how does Archivematica actually do that? The original PCD files are over 50 years old now; there is absolutely no way to convert them to format X directly. When re-ingesting, does Archivematica attempt to normalize/migrate using the originals (PCDs) or the preservation copies (TIFFs)? What happens to the old preservation copies when new preservation copies are made? I did a test on my end and the preservation copies I made before re-ingesting were still there. Is that intentional? Is the goal to keep a copy of literally every format a file has ever been migrated to? I have to image that'd quickly become unmanageable.

My questions then are A) how does Archivematica handle these things, and B) what are people's plans for or actual experience managing this in Archivematica?

And bonus question C) is this question missing an obvious point and/or an affront to Digital Preservation?

All thoughts welcome.

Thanks!
Jarad

Sarah Romkey

unread,
Aug 8, 2024, 5:49:35 AMAug 8
to archiv...@googlegroups.com
Hi Jarad,

I am hoping you'll still get some community responses but I thought I would chime in!

Currently anyway, Archivematica on reingest will re-normalize the original files, not the preservation copies. It's the subtle difference between normalization and migration, or at least that's how I understand it/Archivematica executes it. I realize your question is theoretical but if you were literally facing this now, I think what I would suggest is some scripting outside of Archivematica to extract the Preservation copies and re-process them (either through Archivematica or externally) into the new format, and re-make your AIPs with some explanatory metadata files. I guess a question would be at this point does it make sense to keep the original PCD files?

Regarding keeping the first normalized copies, no that's not the intended behaviour (I double checked the original pull request comments: https://github.com/artefactual/archivematica/pull/391#issue-128655623) so something may have gone awry in your test. In the METS file there should be a deletion event for the first normalized copy- let me know if you don't see that and I'll do some testing. 

Cheers,

Sarah
 
Sarah Romkey, MAS,MLIS
Head of Hosting and SaaS Products
@archivematica / @accesstomemory




--
You received this message because you are subscribed to the Google Groups "archivematica" group.
To unsubscribe from this group and stop receiving emails from it, send an email to archivematic...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/archivematica/4cd7cc0c-2818-4825-b9e2-68000b57de93n%40googlegroups.com.

buck...@gmail.com

unread,
Aug 8, 2024, 11:57:08 AMAug 8
to archivematica
Hi Sarah,

Thanks for the response! I did another test first converting the PCDs to TIFF and then re-ingesting (full re-ingest) after changing the rule to convert PCDs to JPG instead. The original TIFF files are still there alongside the new JPGs. In the METS file the TIFFs are part of the deleted file group, but there is no premis event related to their deletion. The normalization to TIFF events are still there and the TIFFs are still listed as a related objects to the original PCDs. Not expected behavior then, if I'm understanding you correctly?

And thanks for your insight on migration workflows. Interesting idea. The extraction script is easy enough I think since the METS file has all the necessary info, so I guess it's just determining the best way to integrate it into a workflow that jives with preexisting Archivematica workflows and premis creation. I think. I'd love to hear if anyone else has workflows/plans for migration.

Thanks again,

Jarad

Sarah Romkey

unread,
Aug 8, 2024, 1:51:00 PMAug 8
to archiv...@googlegroups.com
You have uncovered a bug! I filed it and we'll plan to take care of it in the next release: https://github.com/archivematica/Issues/issues/1708 Thanks for the report Jarad.

Cheers,

Sarah

Sarah Romkey, MAS,MLIS
Head of Hosting and SaaS Products
@archivematica / @accesstomemory



Reply all
Reply to author
Forward
0 new messages