bulk updating taxonomy terms?

68 views
Skip to first unread message

matpe...@gmail.com

unread,
Feb 22, 2019, 10:58:53 AM2/22/19
to AtoM Users
We are working with 2.4 and have a NASA Name Authority File that we are using to update Subjects terms in Atom 2.4. The terms already exist, but in or SKOS update, some additional relationships have been defined (The NASA NAF has quite a few properties with scope notes and subject relationships: http://metadataregistry.org/vocabulary/show/id/453.html. We are outputting from the MD registry and then importing the XML using CLI.

Is there a way to update the subject terms rather than to create duplicate entries? This seems to be possible with EAD and EAC but not taxonomies. We're looking into creating those duplicates and then using the taxonomy normalize command, but the this seems to trigger a merge where the new info is dropped and the records and linkages are associated with the older version of the term. Is there a way to enforce normalizing to the newer version instead?

Thanks
Matt
Goddard Library - NASA GSFC

Dan Gillean

unread,
Feb 22, 2019, 6:59:53 PM2/22/19
to ICA-AtoM Users
Hi Matt, 

Unfortunately, AtoM doesn't currently have an established method of updating terms the way that descriptive data can be updated. However, I have identified a simple way you can modify the taxonomy normalize task to accomplish what you want. 

First, my warnings: I AM NOT A DEVELOPER. Please back up all your data, and proceed at your own risk. I tested this locally and it worked, but you take on the responsibility for the outcome if you choose to proceed. 

The next warning requires a brief explanation of how this task is working first. You can see the code for the task in lib/task/taxonomy/taxonomyNormalizeTask.class.php (shown here in our code repository). Essentially, it is fetching all terms with an exact match on term name, and it is ordering them by the term's object ID. The default sort for this is ascending. When the merge is executed, it merges the information object relations from any duplicate terms into the first term - which based on the ascending sort order of the term IDs, tends to mean that the oldest term is preserved. 

It's important to note that this task does NOT have any capacity to merge data - that could get rather complicated for non repeatable fields (like code, broader term, etc). Do you want to cram the data from both (or multiple) records into a single field? Do you want one to overwrite the other? If so, which? Etc. 

Instead, all this task is doing is moving the information object relations (AKA the links to archival descriptions) from the duplicate term to the one that is going to be preserved. 

Now, if you wanted to make sure that the newest term duplicates (aka your newly imported subjects) are preserved instead of the oldest, then we can make a very small change in line 112 of this task. Right now it reads: 
As I mentioned, the default sort here is in ascending order. However, we can modify this line to order the terms in descending order like so: 
  • ORDER BY t.id DESC"';
Here's an image of the modification I made locally: 

taxonomy-normalize.png

Hopefully that might help you achieve what you need. However, I do want to pass on one more important warning about hierarchical relations. 

Let's say you have the following term hierarchy: 
  • Beverages
    • Warm beverages
      • Coffee
      • Tea
Now, you run your import, and you end up with a new duplicate "Beverages" term, like so: 
  • Beverages
    • Warm beverages
      • Coffee
      • Tea
  • Beverages
If you modify the task to sort descending and then run the taxonomy normalize task, what will happen? It turns out that Warm beverages, Coffee, and Tea will all be deleted. 

So, put another way, the warning is this: ONLY information object relations are passed from the term(s) to be deleted to the one being kept. Hierarchical relations are NOT. When the task progresses to deleting the duplicate terms, this delete action cascades, so that descendant terms are also deleted. 

This means that if you have hierarchical relations in your existing subjects that are not recreated in your new import, then all those child terms will be lost if you attempt this method.  If your taxonomy organization is flat, this won't be an issue, but if not, you may need to do some manual work either pre or post-task to recreate lost terms and relations. 

I hope this might still help! 

Finally: don't forget to change the task back after - or at least, don't forget about the modification you've made! 

Cheers, 

Dan Gillean, MAS, MLIS
AtoM Program Manager
Artefactual Systems, Inc.
604-527-2056
@accesstomemory


--
You received this message because you are subscribed to the Google Groups "AtoM Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ica-atom-user...@googlegroups.com.
To post to this group, send email to ica-ato...@googlegroups.com.
Visit this group at https://groups.google.com/group/ica-atom-users.
To view this discussion on the web visit https://groups.google.com/d/msgid/ica-atom-users/f7d33fa9-00da-4bc9-ab15-8076e62c2547%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Matt Pearson

unread,
Feb 25, 2019, 7:22:23 AM2/25/19
to AtoM Users
Dan,

THANKS. Great work-around.

the real test will be when we want to update the terms with *additional* hierarchies. For example

When our taxonomy term

A-train

related to EOS
related to Aqua
related to CALIPSO
related to Cloudsat
related to Glory
preferred label A-Train
alternative label The Afternoon Constellation

is updated/replaced by

A-train

related to EOS
related to Aqua
related to Aura
related to CALIPSO
related to Cloudsat
related to Glory
related to OCO-2
related to GCOM-W1
preferred label A-Train
alternative label The Afternoon Constellation

Will the original links in our AtoM instance be maintained or re-created?

We'll have annual updates to make for terms covering quite a few missions,programs, and instruments. We'll try to test this out sometime soon and report back on the thread!
Reply all
Reply to author
Forward
0 new messages