What's the current state of handling archives with _MACOSX directories?

49 views
Skip to first unread message

Alex Garnett

unread,
Jun 12, 2015, 12:19:11 PM6/12/15
to archiv...@googlegroups.com
Hi folks,

I know I've seen this come up on the list multiple times, but I wanted to check whether Archivematica does any of its own "sanitization" (up to, I suppose, the point of removing useless cache files) on ingested objects containing _MACOSX paths and similar as of release 1.4. I thought this was handled up to a point but some collaborators recently noted that Archivematica had attempted to perform normalization routines on the effectively-junk files within a _MACOSX subdirectory rather than the desired input elsewhere in an archive.

Thanks!

Misty De Meo

unread,
Jun 15, 2015, 6:48:25 PM6/15/15
to archiv...@googlegroups.com
Hi, Alex,

Archivematica deletes certain unwanted or hidden files, but __MACOSX isn't one of those right now. It might be a candidate to add - though unlike other files that we autodelete, it's not always safe, since there might actually be meaningful content in the resource fork that wouldn't be anywhere else in the transfer.

The scripts for removing hidden files are here, if you want to see the files we do look for:
https://github.com/artefactual/archivematica/blob/qa/1.x/src/MCPClient/lib/clientScripts/removeUnneededFiles.py
https://github.com/artefactual/archivematica/blob/qa/1.x/src/MCPClient/lib/clientScripts/removeHiddenFilesAndDirectories.py

Best,
Misty

--
You received this message because you are subscribed to the Google Groups "archivematica" group.
To unsubscribe from this group and stop receiving emails from it, send an email to archivematic...@googlegroups.com.
To post to this group, send email to archiv...@googlegroups.com.
Visit this group at http://groups.google.com/group/archivematica.
For more options, visit https://groups.google.com/d/optout.



--
Misty De Meo
Software Developer / Systems Analyst
Artefactual Systems
www.artefactual.com

Sarah Romkey

unread,
Jun 15, 2015, 6:56:43 PM6/15/15
to archiv...@googlegroups.com
Hi Alex and all,

I just wanted to note too that there is a potential development task in our ArchivesSpace-Archivematica integration work with Bentley Library, described here:

https://archivesspace.atlassian.net/browse/AASWF-26?jql=project%20%3D%20AASWF

In essence, instead of (or maybe in addition to) having a hard-coded list of file types that Archivematica will always remove (like .DS_Store) there would also be a way of configuring what file types to remove, or deaccession, during processing.

We haven't determined yet if this particular task is a high priority for inclusion in the project, and if it is, what implementation will look like. For example, maybe it makes sense to choose file types to deaccession on the fly while processing, or maybe it makes sense to configure this option in the administration tab.

As always, we'd be happy to hear thoughts from the community!

Cheers,

Sarah

Sarah Romkey, MAS,MLIS
Systems Archivist
Artefactual Systems
604-527-2056
@archivematica / @accesstomemory


Alex Garnett

unread,
Jun 17, 2015, 1:16:33 PM6/17/15
to archiv...@googlegroups.com
Thanks for both of your responses. I agree that sometimes you wouldn't want to destroy the resource fork, though it does seem like there are still some issues in terms of Archivematica electing to normalize empty files from the _MACOSX directory rather than the files that users likely have in mind ... any chance you'd be interested in looking at a sample of the offending archives to potentially fix this behaviour?

And in the meantime, thanks for pointing me at the part of the code that does this currently so we can tweak it if need be!


-alex

--
You received this message because you are subscribed to a topic in the Google Groups "archivematica" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/archivematica/ktIriuxrTDM/unsubscribe.
To unsubscribe from this group and all its topics, send an email to archivematic...@googlegroups.com.

Misty De Meo

unread,
Jun 17, 2015, 1:20:55 PM6/17/15
to archiv...@googlegroups.com
We don't have any kind of "don't try to normalize/process" blacklist yet, for files that aren't deleted by the cleanup microservices, but I think that's a good idea! It's not on our roadmap but I could see it being quite useful for some users.

Misty

L Snider

unread,
Jun 17, 2015, 2:15:53 PM6/17/15
to archiv...@googlegroups.com
One question, which may or may not relate...If one has bagged the files before ingest, would Archivematica take the _MACOSX files out of the bag? If so, wouldn't that make the bag a problem? I didn't think this has to do with bagged files, but I just wanted to check.

Cheers

Lisa

Sarah Romkey

unread,
Jun 17, 2015, 3:18:54 PM6/17/15
to archiv...@googlegroups.com
Hi Lisa,

Do you mean there may be a problem because there would be files extracted that are not in the bag manifest? I'm not sure if that has ever been tested, has anyone in the community run across this problem?

Cheers,

Sarah

Sarah Romkey, MAS,MLIS
Systems Archivist
Artefactual Systems
604-527-2056
@archivematica / @accesstomemory



L Snider

unread,
Jun 17, 2015, 3:47:52 PM6/17/15
to archiv...@googlegroups.com
Hi Sarah,

Yes, correct. The bag manifest may show them and then they disappear...so the manifest would then be wrong and the bag would be off.

I recently fought with Baggit and 'invivible' Mac files, so this is why I ask.

Cheers

Lisa
Reply all
Reply to author
Forward
0 new messages