Hi Frederica,
What an interesting find about the filenames! I'm not sure about the METS.xml, but my guess is that the "Filesystem (transfer)" version results from the scripts that Archivematica runs to
replace certain characters in filenames with underscores. A quick test with spaces in filenames does not cause any issues in our environment (AM 1.16/DV 6.8.1).
I'm from Scholars Portal/OCUL, which sponsored the development of this integration by Artefactual a number of years ago. We have seen some renewed interest in the integration recently and I have been doing a bit of testing here and there to see what issues need addressing though I haven't had been able to dedicate time to this.
The error that you shared is one that we've also run into and I shared the findings from our investigation on this specific issue in this
thread. As a summary, we determined that this error appears when the dataset that AM receives is missing the RData derivative for one or more tabular files. I'm not sure if/when it changed, but it seems like
DV only creates the RData file when a user manually requests it through the DV interface. Once generated, DV then caches the RData file for future use and that cached file can then be sent to AM for processing. When the cached file is present, processing proceeds as normal.
If you find that this is the same underlying cause for the error you're getting, we'd be interested to hear your thoughts on possible fixes. These are the ones we've come up with but we're open to other ideas!
- Derivatives are required: If the JSON file lists tabular files, DV creates RData derivatives on the fly (if they don’t already exist) and sends them to Archivematica for processing with the rest of the dataset
- Derivatives are optional: If RData derivatives are not available, Archivematica skips the file, excludes it from subsequent jobs, and continues processing. A log of skipped RData derivatives should also be generated
- Derivatives are excluded: Archivematica only receives user-uploaded data files (i.e. tab-delimited and RData derivatives from DV are not sent to AM for processing), with the rationale that they could all be recreated.
I'd be interested to learn about your use case and requirements generally too, if you'd be willing to share :)
Best,
Julie