How to add archival descriptions to child-level DIP via csv-import

185 views
Skip to first unread message

yohashimo...@gmail.com

unread,
Sep 9, 2018, 4:37:02 AM9/9/18
to AtoM Users
Hi everybody,
I'm working with Archivematica 1.7.1 and AtoM 2.4.

On Archivematica, I arranged a SIP for AtoM, allotted Fonds and Series level metadata to each directory in the SIP, created a DIP and AIP, and successfully uploaded the DIP to AtoM.

On AtoM side, I made sure that there were the DIP at Item level below the Fonds-level description targeted from Archivematica DIP upload.

Next, I wanted to add archival descriptions below the Fonds-level description because Series-, Files- and Item-level had a scarce information.
I tried updating the existing descriptions with csv-import but AtoM did not update them but created a new description, because I think they did not have legecy ID, source_name and identifier that need to be matched in a csv-file for updating.

What should I do for updating the DIP descriptions?

Dan Gillean

unread,
Sep 11, 2018, 11:14:50 AM9/11/18
to ICA-AtoM Users

Hi Yo, 

You may be able to continue refining the match-and-update criteria to get it to work, but if not, there's not currently any other method of bulk updating existing descriptions - so you may need to modify them manually. Sorry! 

The challenge you are encountering is that the primary use case for the CSV update functionality is updating descriptions in a different AtoM system, after they have been imported once. It has become clear that most people want to update descriptions in their own system, but for this to work effectively, we may need to do further development. 

This has come up a few times, and I understand the rationale and current functionality is not entirely clear to our community. Below, I would like to clarify how the current import works using the keymap table, what the history of this table was, and why it is currently difficult to update in a single system as you are trying to do. 

---------------------

Right now, if you imported your descriptions (or created new ones via the user interface, or created them via Archivematica, as you have done), then found them in AtoM post-import, and added them to the clipboard and re-exported, AtoM will populate the legacyID column with the internal ID of the information object (aka the objectID) on export. Why is that? First, some context.


As our documentation states here and here, you can use the legacyID and parentID columns to manage the creation of hierarchical relationships in a CSV. On import, the legacyID value is written to the keymap table in the source_id column, along with the source_name - if no source name is provided by the user (there's an option to add one in the CLI import commands, but not for imports via the user interface or from Archivematica), then the CSV's filename is used as a default.


You might expect then, that in an export, you would get this same legacyID back out, drawn from the source_id field - but you don't. Instead, you get the information object ID! This is part of what is leading to issues with users who want to use AtoM's new "match and update" CSV import option to roundtrip descriptions in the same system. So why is this?


It hearkens back to the original purpose of the keymap table, and the legacyID column, which came from one of Artefactual's first ever large-scale data migration projects, when the Archives Association of British Columbia moved MemoryBC into ICA-AtoM, around 2010. In case anything went wrong during the initial migration import, we wanted a way to be able to relate the imported descriptions back to the source descriptions from the legacy database - and so, the legacyID field was added to capture the source system's unique ID as a reference point, and then we were also able to leverage that by adding the parentID column for hierarchical relationships. As such, the original purpose of the legacyID column has always been to provide a unique ID reference to the source system. When you export data from AtoM, it is used in the same way - the assumption being that this CSV, if it is to be imported again, will be imported into a different target system, AtoM or otherwise.


Later, when we were performing the ArchivesCanada migration, we had multiple CSVs of different entities (such as authority records, etc) that we wanted to keep related, so the source_name option was added. That way, you could import an authority record CSV with a specific source name, and follow it with a description CSV using the same source_name, which would lead to better linking and entity matching.


The "match and update" import functionality in 2.4 was added much later, and users have reported challenges in matching to existing descriptions, trying to export, update, and then re-import a CSV into the same AtoM instance. This is largely because this particular use case (roundtripping to update in a single system) was not the primary use case around which the update options were added - and so the code makes use of the values in the keymap table for the first attempt to match - specifically, the source_name and source_id (aka legacyID) values. If you roundtrip a CSV in AtoM however, the legacyID values in the export will NOT be the same as those in the original import (and if the descriptions you export were originally created manually via the user interface, then there will be no value at all saved in the source_id column of the keymap table for those descriptions). Because of this, the first attempt to match (based on source_name and source_id) fails - unless the user has 1) used the exact same file name (or use the CLI to import, and used the same --source-name option value both times), and 2) modified the export's legacyID to use the same values that were used in the original import.


When a match on source_name and source_id fails, AtoM will try its second matching option - that is, looking for an exact match on title, identifier, and repository name. Of course, this means that if you wanted to update these fields, a match will fail - and it is also technically possible (though perhaps unlikely) in AtoM to have multiple descriptions that meet all 3 matching criteria: only the final slug of a description absolutely must be unique when creating descriptions in AtoM.


There are ways that we could improve this functionality, and we have tried to capture some of these ideas in the following ticket:



We could also include additional columns on export - such as a sourceID column that would actually export the original legacyID values, and/or an objectID column that would always be used to look for matches first on import if populated. This would at least allow users to have all the relevant ID values together, so a CSV could be modified for better roundtripping.


Regardless of the approach chosen, this would require some analysis (ideally based around concrete use cases and user stories) and development for us to implement. As our development philosophy articulates, we generally require community support to be able to add new functionality to AtoM - either in the forum of development sponsorship, or community code contributions.


---------------------


In your case, I would recommend first that with any large CSV import, and especially with an update import, you get in the habit of always making a backup of your database before proceeding. That way, if something goes wrong, you can load the saved backup and recover your work, instead of having to manually remove any accidental imports. I would also suggest that you use the "Skip unmatched" CSV import option. Normally, if no match is found on an update import, AtoM will assume it is a new description and import it as such. If you use the "Skip unmatched" option, then when no matching record is found, AtoM skips the row and does not import it. 


For your import to work, you will need to ensure that the title, repository, and identifier match exactly the existing descriptions. You could also try looking in the Administration area of the top-level description to see if a source-name was captured, and if so, rename your import CSV to match it: 


If you are trying to update the repository name, title, and/or identifier with your import, the match will not work currently, unfortunately. This is because no source_id values would have been written to the database during Archivematica's creation process, so the first method for matching will fail, and the import will therefore rely on a match on those 3 fields as the next criteria. 


If you are still unable to get it to work, then you may need to manually update the descriptions. In the future, you might want to consider reversing your workflow a bit: creating a CSV of your descriptive metadata using AtoM's CSV import templates, importing it into AtoM, and then sending your DIP objectss to an existing description in your hierarchy after. 


Regards, 


Dan Gillean, MAS, MLIS
AtoM Program Manager
Artefactual Systems, Inc.
604-527-2056
@accesstomemory


--
You received this message because you are subscribed to the Google Groups "AtoM Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ica-atom-user...@googlegroups.com.
To post to this group, send email to ica-ato...@googlegroups.com.
Visit this group at https://groups.google.com/group/ica-atom-users.
To view this discussion on the web visit https://groups.google.com/d/msgid/ica-atom-users/931fd43a-c261-4cac-95e0-4118204b7de8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

yohashimo...@gmail.com

unread,
Sep 13, 2018, 5:30:17 PM9/13/18
to AtoM Users
Hi Dan,

Thank you for your reply in detail.
I will try some methods for adding description based on your suggestion.

Yo


2018年9月12日水曜日 0時14分50秒 UTC+9 Dan Gillean:

yohashimo...@gmail.com

unread,
Sep 16, 2018, 2:53:32 AM9/16/18
to AtoM Users
Hi Dan,

I talked with my team members about this issue and made a plan for updating.
At first, we uploaded DIP to which hierarchy metadata was allotted in Archivematica to the targeted description at Fonds level in AtoM.
Secondly, we exported a CSV file to add descriptive metadata to each column at series and file level. In doing so, we didn't change the title, identifier, and repository column at Fonds level to update the description.
Finally, we imported the CSV file by selecting "ignore match and create new..." option.
By this mean, we succeeded in updating the given description but the thumbnails of DIP had disappeared while we could see the contents of the DIP. Probably I think the digitalObjectURL didn't work well.
Could you give me any advice to make the thumbnails show?

Yo 


2018年9月14日金曜日 6時30分17秒 UTC+9 yohashimo...@gmail.com:

Dan Gillean

unread,
Sep 17, 2018, 10:26:37 AM9/17/18
to ICA-AtoM Users
Hi Yo, 

I'm glad to hear you are making progress, and managed to successfullly match and update. I'm not sure why the digital object derivatives were affected. However, you could try running the command-line task to regenerate the digital object derivatives to see if that helps. Basic syntax: 
  • php symfony digitalobject:regen-derivatives
Depending on how many digital objects you have, this task can take a while to run - but there are several additional options you might use to target just the descriptions in general. For example, if it is only this one hierarchy that is affected, and it is only the thumbnails that are missing, you could try something like the following. Assuming the slug of your target top-level description is "my-slug," the following should only regenerate thumbnails for that hierarchy:
  • php symfony digitalobject:regen-derivatives --type="thumbnail" --slug="my-slug"
For more details, see: 
Note that you might want to restart services and clear the application cache after, to ensure you are seeing the most up-to-date version of the page, and not a cached version. If you have installed using Ubuntu 16.04, you could restart PHP-FPM, memcached, and clear the application cache with the following: 
  • sudo systemctl restart php7.0-fpm
  • sudo systemctl restart memcached
  • php symfony cc
Don't forget to clear your browser cache as well, or test the results in an incognito browser window (where the cache is generally disabled by default). Let us know if that helps!

Cheers, 

Dan Gillean, MAS, MLIS
AtoM Program Manager
Artefactual Systems, Inc.
604-527-2056
@accesstomemory

Reply all
Reply to author
Forward
0 new messages