Right now, if you imported your descriptions (or created new ones via the user interface, or created them via Archivematica, as you have done), then found them in AtoM post-import, and added them to the clipboard and re-exported, AtoM will populate the legacyID column with the internal ID of the information object (aka the objectID) on export. Why is that? First, some context.
As our documentation states here and here, you can use the legacyID and parentID columns to manage the creation of hierarchical relationships in a CSV. On import, the legacyID value is written to the keymap table in the source_id column, along with the source_name - if no source name is provided by the user (there's an option to add one in the CLI import commands, but not for imports via the user interface or from Archivematica), then the CSV's filename is used as a default.
You might expect then, that in an export, you would get this same legacyID back out, drawn from the source_id field - but you don't. Instead, you get the information object ID! This is part of what is leading to issues with users who want to use AtoM's new "match and update" CSV import option to roundtrip descriptions in the same system. So why is this?
It hearkens back to the original purpose of the keymap table, and the legacyID column, which came from one of Artefactual's first ever large-scale data migration projects, when the Archives Association of British Columbia moved MemoryBC into ICA-AtoM, around 2010. In case anything went wrong during the initial migration import, we wanted a way to be able to relate the imported descriptions back to the source descriptions from the legacy database - and so, the legacyID field was added to capture the source system's unique ID as a reference point, and then we were also able to leverage that by adding the parentID column for hierarchical relationships. As such, the original purpose of the legacyID column has always been to provide a unique ID reference to the source system. When you export data from AtoM, it is used in the same way - the assumption being that this CSV, if it is to be imported again, will be imported into a different target system, AtoM or otherwise.
Later, when we were performing the ArchivesCanada migration, we had multiple CSVs of different entities (such as authority records, etc) that we wanted to keep related, so the source_name option was added. That way, you could import an authority record CSV with a specific source name, and follow it with a description CSV using the same source_name, which would lead to better linking and entity matching.
The "match and update" import functionality in 2.4 was added much later, and users have reported challenges in matching to existing descriptions, trying to export, update, and then re-import a CSV into the same AtoM instance. This is largely because this particular use case (roundtripping to update in a single system) was not the primary use case around which the update options were added - and so the code makes use of the values in the keymap table for the first attempt to match - specifically, the source_name and source_id (aka legacyID) values. If you roundtrip a CSV in AtoM however, the legacyID values in the export will NOT be the same as those in the original import (and if the descriptions you export were originally created manually via the user interface, then there will be no value at all saved in the source_id column of the keymap table for those descriptions). Because of this, the first attempt to match (based on source_name and source_id) fails - unless the user has 1) used the exact same file name (or use the CLI to import, and used the same --source-name option value both times), and 2) modified the export's legacyID to use the same values that were used in the original import.
When a match on source_name and source_id fails, AtoM will try its second matching option - that is, looking for an exact match on title, identifier, and repository name. Of course, this means that if you wanted to update these fields, a match will fail - and it is also technically possible (though perhaps unlikely) in AtoM to have multiple descriptions that meet all 3 matching criteria: only the final slug of a description absolutely must be unique when creating descriptions in AtoM.
There are ways that we could improve this functionality, and we have tried to capture some of these ideas in the following ticket:
We could also include additional columns on export - such as a sourceID column that would actually export the original legacyID values, and/or an objectID column that would always be used to look for matches first on import if populated. This would at least allow users to have all the relevant ID values together, so a CSV could be modified for better roundtripping.
Regardless of the approach chosen, this would require some analysis (ideally based around concrete use cases and user stories) and development for us to implement. As our development philosophy articulates, we generally require community support to be able to add new functionality to AtoM - either in the forum of development sponsorship, or community code contributions.
In your case, I would recommend first that with any large CSV import, and especially with an update import, you get in the habit of always making a backup of your database before proceeding. That way, if something goes wrong, you can load the saved backup and recover your work, instead of having to manually remove any accidental imports. I would also suggest that you use the "Skip unmatched" CSV import option. Normally, if no match is found on an update import, AtoM will assume it is a new description and import it as such. If you use the "Skip unmatched" option, then when no matching record is found, AtoM skips the row and does not import it.
For your import to work, you will need to ensure that the title, repository, and identifier match exactly the existing descriptions. You could also try looking in the Administration area of the top-level description to see if a source-name was captured, and if so, rename your import CSV to match it:
If you are trying to update the repository name, title, and/or identifier with your import, the match will not work currently, unfortunately. This is because no source_id values would have been written to the database during Archivematica's creation process, so the first method for matching will fail, and the import will therefore rely on a match on those 3 fields as the next criteria.
If you are still unable to get it to work, then you may need to manually update the descriptions. In the future, you might want to consider reversing your workflow a bit: creating a CSV of your descriptive metadata using AtoM's CSV import templates, importing it into AtoM, and then sending your DIP objectss to an existing description in your hierarchy after.
Regards,
--
You received this message because you are subscribed to the Google Groups "AtoM Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ica-atom-user...@googlegroups.com.
To post to this group, send email to ica-ato...@googlegroups.com.
Visit this group at https://groups.google.com/group/ica-atom-users.
To view this discussion on the web visit https://groups.google.com/d/msgid/ica-atom-users/931fd43a-c261-4cac-95e0-4118204b7de8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
To view this discussion on the web visit https://groups.google.com/d/msgid/ica-atom-users/262ed758-f7f1-4cbe-9f89-20d31ff579cb%40googlegroups.com.