Updating directory .XMLs?

Skip to first unread message

Andrés Abad

Mar 7, 2024, 4:05:41 PMMar 7
to AtoM Users
I'm trying to simplify a data cleanup job, as the descriptions are intended to follow a .CSV format. I've made a simple script that modifies the EAD .XML file and replaces its descriptions.  However, I'm not sure if there's an option to "update" the .XML file associated with a directory. What should I do?

Dan Gillean

Mar 8, 2024, 8:29:34 AMMar 8
to ica-ato...@googlegroups.com
Hi Andrés, 

I apologize but I don't fully understand the situation. Are you trying to use EAD XML that you have already exported as a way to update existing data in AtoM?

Unless you are caching your XML files, AtoM will generate the EAD XML on demand from the archival description hierarchy - the EAD XML is not stored statically in the database; it is assembled from the various MySQL tables and serialized to either a CSV or an XML file on export, depending on the options chosen. 

If you are trying to bulk cleanup data in your system by exporting data, fixing it, and then reimporting it, then using the CSV to roundtrip data updates will certainly be easier to script and likely more reliable for matching and updating the existing data. That's because on the command-line, the CSV import task has a --roundtrip option. When used, the only matching criteria on the CSV will be whether or not the legacyId values on the export CSV match EXACTLY the object ID values for those records in the database (which is exactly what AtoM puts in the legacyId column by default on export). Meaning: so long as you do not alter the legacyId values, you should be able to update almost all other fields in the CSV, and still use the existing match-and-update logic to replace the existing descriptions when you reimport using the --roundtrip option. See the task options here: 
Now, AtoM's EAD XML import task DOES have an --update option... but honestly, it was poorly designed and does not currently work well. The problem is that an EAD XML file contains an entire hierarchy - but the update task wants to proceed by checking, matching, and updating descriptions record by record. Additionally, when AtoM deletes a description, any descendant of that record are also deleted - meaning that with delete and replace, AtoM will find the top level description, delete it and replace it, and then when it looks for the next match, it finds nothing (because it was already deleted by the cascade). 

So: it's possible to do updates with EAD XML, but it's much more finicky. If working with the CSV import and export is an option, then doing this via the command-line with the --update and --roundtrip options will generally yield better results. 

In the meantime, here are some general suggestions if you intend to proceed with performing record updates via EAD XML import: 
  1. Make a backup of your database FIRST, and be sure you review the results of any import attempt carefully before deciding if it worked or not. Make more backups between successive imports so you can easily roll back to a previous state without losing your progress as you proceed.
  2. Ideally, work in a test installation, not your production environment, as this will likely take some time and some of the results may be unexpected. 
  3. Work in small batches - ideally, start one EAD XML file at a time, so you can carefully gauge how well the process worked, and tweak the results accordingly
  4. Be sure you understand the matching criteria used, what can and cannot be updated via import, and all the various options when preparing. This means making sure you are very familiar with the existing documentation, such as: 
  5. Part of reviewing the results should include looking at related entities, to make sure that existing records were used, and not duplicated - for example, any subject access point terms in your descriptions; any related authority records; the related archival institution, etc. Removing any one of these from a description does NOT delete it from AtoM; it merely unlinks it. Similarly, editing them in your description will not delete and replace the old term (related entities will not be updated in the same way from an archival description import), it will create the new term and leave the old terms unlinked, so be sure to include a review of related entities in your quality assurance review
  6. AtoM includes options to skip unmatched records during update imports. Normally I always recommend this option with match-and-update imports. However, with the delete-and-replace option, you may want to experiment with NOT using it, given how I described the hierarchical deletion problem for EAD XML above. It's possible that if AtoM is trying to find a match on a child record and that has already been deleted, and then AtoM imports the record as a new record instead - but it's PART of the fonds in the EAD XML - that the outcome will still work. Honestly, it's been a while since I have experimented with XML roundtrippiing myself, so you will have to try things out and see what works best
That's all I can think of for now. I hope I have understood your question properly and provided some useful resources! If not, please feel free to clarify your question, and I will try to respond. 


Dan Gillean, MAS, MLIS
AtoM Program Manager
Artefactual Systems, Inc.
he / him

On Thu, Mar 7, 2024 at 4:05 PM Andrés Abad <andres...@gmail.com> wrote:
I'm trying to simplify a data cleanup job, as the descriptions are intended to follow a .CSV format. I've made a simple script that modifies the EAD .XML file and replaces its descriptions.  However, I'm not sure if there's an option to "update" the .XML file associated with a directory. What should I do?

You received this message because you are subscribed to the Google Groups "AtoM Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ica-atom-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ica-atom-users/6f1e95b0-fce6-40de-93aa-86d8175998f1n%40googlegroups.com.
Reply all
Reply to author
0 new messages