Hi Jarrod,
Thank you for opening up the discussion here.
I have been following your work in the Australasia preserves forum and see
you've come to the correct conclusion that the tmp files shouldn't be a problem
to delete.
The library used to create the temporary directories is from the Python standard
The function declaration in the storage_service describes a delete_after flag
which is set to True:
But this won't work for S3 (and probably other storage protocols in the storage
service). It seems that S3 fits through the gaps in a (if I read the code
correctly):
1. Items in S3 storage need to be downloaded locally, creating a first temporary
directory.
2. If the AIP is uncompressed, the temporary location isn't deleted because
this first temporary directory isn't set within the same scope as the fixity
check, and so the delete flag is never triggered:
There is quite a good safety mechanism here to ensure that local AIPs are not
deleted, but it could be more clever.
3. If the AIP is compressed, the first temporary directory is created as in 2.
This is also never deleted because it is out of the scope of the calling
function.
A second temporary directory is created because the LoC bagit tool that we
use won't work on compressed files (at least, that's what the code says):
And this second temporary directory is the one that is deleted at the end of
the function:
In short, it's always the first temporary directory created when an AIP is
downloaded from a remote location that is never deleted.
can be improved upon.
If storage remains tight, then having two temporary AIP folders might need
to be accounted for. If you have ideas for these improvements to the storage
service then feel free to contribute to the conversation on the GitHub issue or
discuss a potential pull-request.
All the best,
Ross