Archivematica–Dataverse integration (AM 1.18 / DV 6.7.1): ingest failure with Dataverse transformed files

36 views
Skip to first unread message

Federica Zanardini

unread,
Apr 30, 2026, 7:33:21 AMApr 30
to archivematica
Hello Archivematica community,
I’m testing the integration between Archivematica 1.18 and Dataverse 6.7.1, and I’d like to share an issue we are encountering and ask whether others have seen similar behavior.
The ingest fails in the Dataverse workflow when the source Dataverse dataset includes files transformed by Dataverse itself, for example a .tab file generated from an original .csv.
Specifically:

- datasets containing only the originally uploaded files ingest correctly;
- datasets containing both original files and Dataverse‑generated files cause the workflow to stop at the “Dataverse METS XML” microservice;
the error reported is:

archivematica.MCPClient.clientScripts.parse_dataverse_mets.ParseDataverseError:
Exiting. Returning the database objects for our Dataverse files has failed

This suggests difficulties when resolving file objects described in the Dataverse METS, in the presence of transformed / derivative files.
In parallel, we have also posted a question to the Dataverse community to better understand whether Dataverse‑generated files (e.g. .tab) require special handling or filtering by downstream systems.
I’d be grateful for any insight on:

- whether this is a known limitation of the Dataverse transfer type,
- expected behavior regarding transformed files,
or recommended workarounds / configuration approaches.

Best regards,
Federica

Federica Zanardini

unread,
May 7, 2026, 8:34:01 AMMay 7
to archivematica
Hi all,
after inspecting the transfer artifacts, we found inconsistencies in filename encoding across different layers of the workflow:

| Layer                                | Filename example                 |
| ------------------------------------  | ------------------------------------------- |
| Dataverse API                 | `Western Blot.tab`                 |
| Filesystem (transfer)     | `Western_Blot.tab`                |
| Generated METS.xml    | `Western+Blot/Western+Blot.tab` |

This shows three different representations of the same object:

* spaces (` `)
* underscores (`_`)
* URL encoding (`+`)

Because of this mismatch, Archivematica cannot:

a) correctly resolve file paths from the METS
b) match Dataverse file objects to filesystem entries
c) complete the Dataverse parsing stage

This results in a failure during `parse_dataverse_mets.py` with an empty or incomplete mapping.
The issue appears to be resolved when enforcing a single normalization rule (e.g., replacing spaces with underscores consistently across all steps).
Has anyone implemented a fix to enforce consistent encoding between filesystem and METS generation?

Any insights or pointers to best practices would be greatly appreciated.

Federica

Julie S

unread,
May 8, 2026, 12:13:32 PMMay 8
to archivematica
Hi Frederica,

What an interesting find about the filenames! I'm not sure about the METS.xml, but my guess is that the "Filesystem (transfer)" version results from the scripts that Archivematica runs to replace certain characters in filenames with underscores. A quick test with spaces in filenames does not cause any issues in our environment (AM 1.16/DV 6.8.1).

I'm from Scholars Portal/OCUL, which sponsored the development of this integration by Artefactual a number of years ago. We have seen some renewed interest in the integration recently and I have been doing a bit of testing here and there to see what issues need addressing though I haven't had been able to dedicate time to this. 

The error that you shared is one that we've also run into and I shared the findings from our investigation on this specific issue in this thread. As a summary, we determined that this error appears when the dataset that AM receives is missing the RData derivative for one or more tabular files. I'm not sure if/when it changed, but it seems like DV only creates the RData file when a user manually requests it through the DV interface. Once generated, DV then caches the RData file for future use and that cached file can then be sent to AM for processing. When the cached file is present, processing proceeds as normal.

If you find that this is the same underlying cause for the error you're getting, we'd be interested to hear your thoughts on possible fixes. These are the ones we've come up with but we're open to other ideas!
  • Derivatives are required: If the JSON file lists tabular files, DV creates RData derivatives on the fly (if they don’t already exist) and sends them to Archivematica for processing with the rest of the dataset
  • Derivatives are optional: If RData derivatives are not available, Archivematica skips the file, excludes it from subsequent jobs, and continues processing. A log of skipped RData derivatives should also be generated
  • Derivatives are excluded: Archivematica only receives user-uploaded data files (i.e. tab-delimited and RData derivatives from DV are not sent to AM for processing), with the rationale that they could all be recreated.
I'd be interested to learn about your use case and requirements generally too, if you'd be willing to share :)

Best,
Julie
Reply all
Reply to author
Forward
0 new messages