/tmp purging of DIPUploads and other questions

33 views
Skip to first unread message

arth...@gmail.com

unread,
Dec 16, 2022, 9:35:03 AM12/16/22
to AtoM Users
I have an Atom 2.6.4-184 installation which is used regularly by the Archivematica 1.13.2 DIP Upload.
In AtoM I configured a special folder to avoid using the classic /tmp.
Instead, I created a /atomdata disk that I use both as a "SWORD deposit directory" and as AtoM storage.
For the "SWORD deposit directory" I use /atomdata/dipuploads, while the storage uses /atomdata/atom/.
I realized that the "SWORD deposit directory" /atomdata/dipuploads is never cleaned up and there are still all the uploads done to date, for a total of about 700GB.
I instinctively checked the contents of the /atomdata/atom storage and there are only 155GB there.

So I have these doubts:

  1. Why is the "SWORD deposit directory" /atomdata/dipuploads not automatically cleaned up?
    Is it necessary to do this explicitly with a script that - for example - deletes uploads older than a certain number of days?

  2. Why in front of 700GB of DIPUpload there are only 155GB of storage used by AtoM?
    Is there a way to verify exactly the correspondence between the DIPs present in the "SWORD deposit directory" /atomdata/dipuploads and those loaded in the AtoM storage?

  3. If, hypothetically, I discover that some DIPs have not been correctly loaded in AtoM, is there a way to re-run a load starting from a DIP present in the "SWORD deposit directory" /atomdata/dipuploads?
Thanks in advance for your answers (and your patience).

Arthy

Dan Gillean

unread,
Dec 16, 2022, 10:37:22 AM12/16/22
to ica-ato...@googlegroups.com
Hi Arthy, 

Why is the "SWORD deposit directory" /atomdata/dipuploads not automatically cleaned up?
Is it necessary to do this explicitly with a script that - for example - deletes uploads older than a certain number of days?

This was reported to us a while back, and is considered a bug. We have addressed it in the upcoming 2.7 release, which the team is STILL hoping to formally release before the end of 2022! 

See the following issue tickets related to this: 
Various jobs and clipboard exports may also leave files that you don't need after a while - but we don't want to fully automate the process of cleaning these up, since they need to be available for a user to download. Consequently, we have also added the following command-line task in 2.7, which could be used as part of a scripted task run regularly if desired: 

Why in front of 700GB of DIPUpload there are only 155GB of storage used by AtoM?
Is there a way to verify exactly the correspondence between the DIPs present in the "SWORD deposit directory" /atomdata/dipuploads and those loaded in the AtoM storage?

I'm not sure. Failed uploads that got restarted? Previous attempts that were deleted in AtoM? Unfortunately I can't think of an easy way to correlate the two, sorry. 


If, hypothetically, I discover that some DIPs have not been correctly loaded in AtoM, is there a way to re-run a load starting from a DIP present in the "SWORD deposit directory" /atomdata/dipuploads?

Other than using Archivematica's reingest module to generate and upload a new DIP, the only other method we have is a command-line task that can be used to upload DIPs. You still need to prepare a simple CSV per DIP to tell AtoM where each object in the DIP should be attached. This task is present in 2.6 already - see: 
Cheers, 

Dan Gillean, MAS, MLIS
AtoM Program Manager
Artefactual Systems, Inc.
604-527-2056
@accesstomemory
he / him


--
You received this message because you are subscribed to the Google Groups "AtoM Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ica-atom-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ica-atom-users/40e5c704-6de4-4f95-98a3-e9f145268c5fn%40googlegroups.com.

arth...@gmail.com

unread,
Dec 21, 2022, 1:50:23 PM12/21/22
to AtoM Users
Hi Dan,
thank you very much for the very fast reply, everything is clear.

As for the correspondence between the DIPs present in the "SWORD deposit directory" and the assets actually loaded into AtoM, a possible solution came to mind.
Perhaps it makes more sense to check the "accesses" table of the Archivematica MCPServer DB and, for each SIP, check (through the value of the "resource" column) if the corresponding assets actually exist in AtoM (using the API /api/informationobjects/< slug> of Atom).
Or at least that can be a starting point.

Thanks bye

Arthy
Reply all
Reply to author
Forward
0 new messages