Empty directories removed between transfer and AIP

63 views
Skip to first unread message

Andrew Berger

unread,
Oct 27, 2015, 3:47:00 PM10/27/15
to archivematica
Hi,

I recently noticed that empty directories included in a transfer are not carried over to the AIP. Is there a micro-service that specifically removes empty directories or is this happening as a side effect of having a new bag created at the Prepare AIP stage?

As I understand it, empty directories will be preserved if you create a bag "in place" using either bagit-python or bagit-java (with the keepemptydirs option), but if you create a new bag with bagit-java, there's no way to keep empty directories.[1] The BagIt specification mentions making a .keep file in the empty directory to get the path into the manifest, but inserting a .keep in every empty directory seems like a workaround that ideally wouldn't be necessary.

This does raise the question of how to validate an AIP that includes empty directories in the absence of .keep files. One possibility would be to run two validation steps, one for the bagit manifest, and one for the directory tree.

[1] http://sourceforge.net/p/loc-xferutils/mailman/message/32384643/

Andrew

Sarah Romkey

unread,
Nov 5, 2015, 5:38:44 PM11/5/15
to archiv...@googlegroups.com
Hi Andrew,

We did a little investigation today but we'll need to poke at the code a little further to determine exactly where the directories are being removed- it looks like the bagit command is trying to preserve empty directories but I've tested and gotten the same behavior as you:


This suggests to me that the directories are being removed somewhere else in the code but I'm not sure where yet.

If you create a transfer structure report, empty directories are recorded there. This might be a viable workaround for you?

I'd like to probe a little at this question in terms of why you would want to keep empty directories. Is it about reflecting the original order so to speak, or some other need? I'm interested to hear your thoughts and anyone else's who would like to chime in.

Cheers,

Sarah

Sarah Romkey, MAS,MLIS
Systems Archivist
Artefactual Systems
604-527-2056
@archivematica / @accesstomemory



--
You received this message because you are subscribed to the Google Groups "archivematica" group.
To unsubscribe from this group and stop receiving emails from it, send an email to archivematic...@googlegroups.com.
To post to this group, send email to archiv...@googlegroups.com.
Visit this group at http://groups.google.com/group/archivematica.
For more options, visit https://groups.google.com/d/optout.

Andrew Berger

unread,
Nov 25, 2015, 12:33:15 PM11/25/15
to archiv...@googlegroups.com
Hi Sarah,

Sorry for my belated reply. I'm still catching up on email after being away from the office for a while.

I see two main reasons for keeping empty directories, both of which tie back to original order. First, even if there's no functional need to use an empty directory, I can see the directory name and metadata, however minimal, as having significance within the context of some larger whole.

On a more functional level, there are pieces of software that expect directories to exist but leave them empty when not being actively used. If for some reason you were to preserve an Archivematica installation, you'd likely find a lot of the subdirectories under /sharedDirectory to be empty at any given time. That might not be the greatest example since arguably you'd want to preserve the packages and dependencies rather than the running machine, but for some computing/software environments, the existing directory structure may be all you have.

That said, I'll check out the transfer structure report option. I could see this being like filename sanitization: if the empty directories are removed and the removal logged in a way that allows you to "restore" them for access, that seems like it would get around the problem of not being able to validate empty directories using BagIt.

Thanks,
Andrew

lindsay...@amvb.be

unread,
Jan 8, 2016, 9:21:38 AM1/8/16
to archivematica
Hi Sarah,

Empty directories are indeed being removed, but I don't see any logfile where this is mentioned. The empty directories are still in the transfer structure report, without any mention that these were being removed. And comparing this report with the actual directories in the AIP isn't desirable when you are analyzing an archive of thousands of directories. Isn't there any other way to controll which (empty) directories are removed, e.g. a logfile?

Cheers,

Lindsay Simons

Op donderdag 5 november 2015 23:38:44 UTC+1 schreef Sarah Romkey:

Sarah Romkey

unread,
Jan 13, 2016, 1:38:21 PM1/13/16
to archiv...@googlegroups.com
Hello Lindsay,

This is a good point, it would be appropriate to log the removal of empty directories. I have made an issue ticket here:

https://projects.artefactual.com/issues/9294

Since this is an unsponsored fix at the moment I can't say for sure when we will get to it. If anyone is interested in supporting this either through Artefactual or through a community pull request, please feel free to get in touch.

Cheers,

Sarah

Sarah Romkey, MAS,MLIS
Archivematica Program Manager
Artefactual Systems
604-527-2056
@archivematica / @accesstomemory



Reply all
Reply to author
Forward
0 new messages