uploadDIP transfer folder not deleted after completed processes

96 views
Skip to first unread message

Christofer Engzell

unread,
Dec 16, 2020, 4:16:16 AM12/16/20
to archivematica

Archivematica version: 1.11.2 

OS: Ubuntu 16.01 

 

Hello,

 

As part of our configuration the uploadDIP folder for a specific transfer is not deleted after  having completed all the microprocesses attached to the transfer.

In particular the following relevant options in Processing Configuration are pre-selected for this transfer flow:

Upload DIP: Upload DIP to AtoM/Binder

Store DIP: Do not store

Store DIP location: Default location

Store AIP: Yes

 

The relevant microprocesses are handled under the “Upload DIP” microservice during the ingest. The following microprocesses are completed successfully (from top to bottom):

Job: Upload DIP

Job: Choose config for AtoM/Binder DIP upload

Job: Upload DIP

Job: Move to the UploadedDIPs directory

Job: Store DIP?

Job: Handle unstored DIP

 

As a result of completing the microservice, the DIP is uploaded and handled correctly by AtoM. However, while the “Job: Move to the UploadedDIPs directory” COPES the DIP from sharedDirectory/watchedDirectories/uploadDIP/ to sharedDirectory/watchedDirectories/uploadedDIPs/, there appears to be no function/code in the microservice to DELETE the transfer from the /uploadDIP/ folder once the microservice is completed. Because of this, the /uploadDIP/ folder continues to be filled with each DIP from each transfer.

 Upon rebooting the machine, and in particular restarting the archivematica-mcp-server service (which we believe monitors folders), the service re-detects the transfer folders in the /uploadDIP/ folder and begins to re-upload them to AtoM. It did this regardless of if the transfer was still present in the dashboard or not. For those transfers that were still present in the dashboard in the ingest tab, a visual duplication happened for the microprocesses involved with the DIP upload. Because this affected all previously uploaded DIPs, this caused significant issues for us to repair the receiving AtoM environment as hundreds of “Jobs” and corresponding files were again inserted into their Slug targets there.

 It is difficult to gauge what the root of the problem is without a more detailed understanding of the code. As far as we can determine, given our options and setup, there is no command executed to clear the /uploadDIP/ folder. However, that does not necessarily mean that this code is missing, maybe it is not functioning, or perhaps it is an issue with separate code intended to monitor the /uploadDIP/ folder and not storing this information somewhere once uploaded.

 For now, as a temporary solution we have inserted into the code for the microprocess “Job: Handle unstored DIP”, archivematica/src/MCPClient/lib/clientScripts/handle_unstored_dip.py, https://github.com/artefactual/archivematica/blob/stable/1.12.x/src/MCPClient/lib/clientScripts/handle_unstored_dip.py

below row 61 with appropriate indentation:

        try:

            logger.info("Special operation: remove uploadDIP folder: %s", sip_path)

            shutil.rmtree(sip_path)

        except OSError as e:

            logger.error("Directory removal failed with: %s", e)

 

The purpose of that client script is not specifically to delete the /uploadDIP/ folder,  but we believe this is the appropriate place to delete the folder in the design of the microservice – at the end of its execution, when we are certain that the previous microprocesses are completed.

How is the /uploadDIP/ folder for the relevant transfer intended to be deleted, given our configuration? 

Are we misunderstanding the purpose of the /uploadDIP/ folder, how it is monitored, or the procession of the microservice?

Christofer Engzell

unread,
Dec 16, 2020, 4:38:19 AM12/16/20
to archivematica
We also noted the function  rmtree_upload_dip_transitory_loc in the aip_store.py as a likely suspect: https://github.com/artefactual/archivematica/blob/stable/1.12.x/src/MCPClient/lib/clientScripts/store_aip.py 

But this code, as far as we understand, never executes because Job: Store AIP is only called with "SIP" as package_type. The function therefore halts at row 114 without deleting the /uploadDIP/ transfer folder.

Christofer Engzell

unread,
Dec 16, 2020, 4:53:05 AM12/16/20
to archivematica
Some VERY quick testing suggests that the store_aip.py is re-utilized by the "Job: Store DIP" microprocess, and that is where the /uploadDIP/ folder is intended to be deleted, called with the appropriate package_type "DIP". However as chosen in the Processing Configurations and noted above, we do not want to store the DIP. There seem to be an edge case here, where for us the DIP is generated, uploaded and not stored. The deletion probably only occurs if the DIP is generated, uploaded and stored. 

Joseph Anderson

unread,
Dec 16, 2020, 10:19:09 AM12/16/20
to archivematica
I've noticed this very same issue when using the upload to contentDM option. It's a bit annoying. In that case, an additional copy of the DIP is stored in /var/archivematica/sharedDirectory/watchedDirectories/uploadedDIPs/ as well as the uploadDIP, the idea being that the copy in uploadedDIP is to be manually downloaded. For our purposes, we are using an automation-tools script in which I set up to delete both copies after transferring to our access system, so they're not stored there indefinitely. Sounds like a real headache that if your system gets restarted, they were reuploaded.

I asked about this issue earlier this year but never got a response—might need to be reported as a bug. I never followed up on it because I wrote a workaround in our automation script.

-Joseph

Christofer Engzell

unread,
Dec 17, 2020, 6:15:37 AM12/17/20
to archivematica
I see the same copy made to /uploadedDIPs/ as part of the microprocess "Job: Move to the UploadedDIPs directory" but it is specifically a copy only, not a "move" as the job title implies. 

We delete the /uploadedDIPs/ content as part of clearing various artifacts after completing a transfer, by going Administration->Processing Storage Usage->Calculate Disk Usage->Clear. We haven't automated this deletion, simply because of uncertainty regarding what its purpose is (is the folder re-used by other microprocesses, used to verify anything?) and its relative harmlessness. 

I want to thank you for linking to the automation-tools, I notice that there is the idea of generating a new DIP from a stored AIP there without re-ingest and limiting it to original files (no preservation format files), which is obviously interesting for us since we don't store DIPs. I will have to dig into that at a later time.
Reply all
Reply to author
Forward
0 new messages