AtoM SKOS Import

63 views
Skip to first unread message

Elissa Sperling

unread,
May 20, 2022, 4:42:32 PM5/20/22
to AtoM Users
Hello,

I am trying to import a thesaurus which contains about 20K concepts and 160K relationships. I started by testing a small sample of concepts that are linked to each other. I found that duplicates were created in the process. We therefore ran the taxonomy normalization command and this helped to remove duplicate concepts. The issue, however, is that not all of the relationships were retained. More specifically, one duplicate concept pair had the broader term in one record and the related terms in another record. When deduped, these duplicate records were not merged -- instead it seems that only one record was retained, and it happened to be the concept with the BT, so all the RTs were lost.

Is there a workaround for this? How can I ensure that all relationships will be properly retained? Also, I did play around with the ordering of concepts in the SKOS RDF XML file and found that it affected which relationships were retained if any, but I still lost some relationships. Even if that did solve the problem for this small sample, it would not be an easy task to reorder the full file. Any guidance you can offer would be most appreciated. Thank you!

Elissa 

Dan Gillean

unread,
May 23, 2022, 9:44:56 AM5/23/22
to ICA-AtoM Users
Hi Elissa, 

AtoM's SKOS support is pretty basic at the moment. Not all SKOS elements and relationships are supported (for example, hidden labels, semantic relationships, mapping properties, examples, history/editorial/change notes, integrity conditions), and if the vocabulary mixes in other RDF vocabulary elements, AtoM may not know what to do with them. Additionally, while SKOS can support multi-hierarchies, currently AtoM taxonomies cannot. 

Without looking at your SKOS file I'm not sure why some relationships are duplicated, but I suspect that AtoM expects all required information to be in the SKOS file itself - it won't follow URL links mid-file to find the proper label for related terms, for example, so it could have something to do with how AtoM is parsing your file. 

In short, you may need to make some local modifications to the SKOS file to get it to import, and depending on the contents of the SKOS file, may also need to manually add back in some information if it currently uses unsupported elements. The best way to do this would likely be to make a simple test taxonomy of related terms in AtoM and then export them, and take a look at the results. You can then use this as a reference for what AtoM expects and supports when looking at the SKOS file itself. 

One other limitation to keep in mind: currently there's no way in AtoM to export all terms in a taxonomy if they are not all directly related (e.g. sibling top-level terms) - so for the purposes of this experiment, make sure all terms in your test hierarchy are related to a single parent term, so you can export from that term and get all the descendants. For example, in the Subjects taxonomy, you might want to make a top-level "Subjects" term first, and then add your test terms underneath this. 

Finally, it may be worth considering what exact problem you are trying to solve by importing 20K terms into AtoM, and whether or not there might be better ways to solve it. We did a client project in the past where we imported all Library of Congress Subject Heading terms into an AtoM instance. ... and then a year later, we did another project to remove most of them. Turns out, having hundreds of thousands of terms in AtoM was not a good user experience for staff or for end users -  for example: 
  • Having so many terms had an impact on performance, making some pages load slower
  • Additionally, some terms with many relationships  (both to other terms and to descriptions) could not be edited/moved/deleted/etc via the user interface, because the web browser would hit the timeout limits before the operation would complete and all related resources could be updated
  • Staff had trouble using the autocompletes to find the desired terms because there were so many available options, many of which were not intuitively the first thing a user would search for. As such, it didn't necessarily help with consistency of use to have so many available controlled vocabulary terms
  • AtoM doesn't have a sort option on taxonomy browse pages to filter by relationships, so end users would see pages and pages of terms with no actual links to descriptions when trying to browse Subjects. In the end, this made subject-based discovery essentially impossible for end users. 
  • etc
So again, it's worth asking what the real problem is that you're trying to solve. As an example: perhaps the problem is that staff creating terms on the fly leads to inconsistent usage, and hurts end-user discovery. In that case, adding 20K terms to choose from may mean that access points are still applied inconsistently - and/or the proposed solution may create additional problems. If you want your staff to use a small subset of standardized controlled vocabulary terms for consistency, better discovery, etc. then perhaps selecting a smaller subset of the target vocabulary terms and creating them manually in AtoM's user interface might mean more upfront work, but a better end result. In such a case, you can still use the target vocabulary as your source, and in fact use the sourceNote field to provide a link directly to the reference term if desired (meaning you're still using the controlled vocabulary, just being selective about what terms you add). 

That's just an example, but revisiting the actual problem and trying to think of different ways to solve it may help you approach this issue from another angle, and uncover unexpected solutions. 

None of these are likely the ideal responses you were hoping for, but I do hope they help you find a workaround. Good luck! 

Dan Gillean, MAS, MLIS
AtoM Program Manager
Artefactual Systems, Inc.
604-527-2056
@accesstomemory
he / him


--
You received this message because you are subscribed to the Google Groups "AtoM Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ica-atom-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ica-atom-users/ff635d10-6376-4bcb-b49f-38750364a715n%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages