Managing multi-repository instances of MAIN

72 views
Skip to first unread message

Natalie Vielfaure

unread,
Jan 26, 2019, 9:34:39 PM1/26/19
to AtoM Users
Hi,

I would be interested in hearing from people who have managed instances of AtoM that hold descriptions created by multiple repositories/user accounts. More specifically, if they've ever dealt with data clean-ups within such instances to resolve a significant level of inconsistencies across the database. If so, I'd be interested in hearing about any helpful tips, lessons learned, etc. as well as how people have ensured consistency long-term.

Thanks!

Dan Gillean

unread,
Jan 30, 2019, 11:25:11 AM1/30/19
to ICA-AtoM Users
Hi Natalie, 

I wanted to give other community members a chance to respond first, but since no one has so far - and since I spent a stint as the Archival Network Services Coordinator for the AABC (managing MemoryBC), I thought I would share some links and  thoughts. 

In terms of general management, I think it is helpful to have some clearly defined policies around data entry, such as: required and recommended fields for each entity, and the use of terms such as subject/place/genre access points (and policies around the creation of new term). There are also some settings that have global effects worth reviewing - does your site keep the reference code inheritance setting on? If yes, do all members know about this and understand how they should be constructing local identifiers at lower levels? 

Then there are optional features - it can be useful to decide in advance if you want contributors to use them consistently or not, or if that matters to you. For example, the ability to generate finding aids - is that something you want users to use? Do all contributors know about this feature? Do contributor accounts have access to generating finding aids, or does the coordinator need to do it? What about uploading or linking digital objects - does the portal site have a policy on their use? Do you have enough storage space for everyone to upload whatever they want, or does there need to be a policy in place to manage this?

It also helps to have the site admin review descriptions before they are published, and ensuring all users understand what the coordinator is evaluating when checking descriptions. Beyond checking for conformance with the policies I suggested above, I would also recommend looking for things like: 
  • Has inheritance been used properly? I.e. are lower-level records directly linked to the same creator/repository as the top-level records? If yes, this should be removed, so AtoM's inheritance can function. This allows for better scalability and performance (especially later during cleanup projects) as your database grows
  • Are there existing terms, authority records, etc. being duplicated where they could have been linked?
  • Are there features in AtoM that aren't being used but which might increase the usability of the records for researchers if added? For example, AtoM can create complex relationships between authority records. If a family authority record is created, have relevant individual person authority records been linked to it?
I also wanted to suggest taking a look at some of the existing resources other multi-repository sites and portals have created for their users. For example, the AABC has a few policies related to MemoryBC on their website, here: 

In terms of data cleanup, this is much easier if you have access to the system's backend, so you can make use of command-line tools, SQL queries, cleanup scripts, and bulk imports/exports as needed. 

If you're cleaning up terms in one of the access point taxonomies, the taxonomy normalization task can be helpful - it will merge any terms together (preserving links to any related descriptions on both terms) that have identical authorized forms of name. This means you could clean up your access points by: 
  • Reviewing terms in the user interface and identifying points of commonality for merging - for example, if you have four similar terms (automobile, automobiles, cars and trucks, and vehicles), deciding which term will be used consistently going forward
  • Editing the name of any terms you want to merge to make them identical to the term you wish to keep
  • Running the taxonomy normalization task
Note that when merging, the oldest of the two terms is preserved and the relations moved to that original term before the duplicate is deleted - the additional fields in the term record are *not* evaluated. So if you had two subjects labeled "automobiles", and the newer one has a scope note and source note but the older term has nothing but the authorized form of name, then the scope and source notes will be lost when merging.   

Addtionally, AtoM 2.5 will include a command line task that will check the creator against parent records, and replace direct links with inheritance where it would produce the same result. We don't have the documentation prepared for this yet, but you can see the issue ticket here: 
If you want to take advantage of this prior to the 2.5 release, I've also shared a link below for where a script version has been shared in the forum. 

If you want to check your records for the consistent use or failure to use specific fields, then you might be able to use expert searching to check this. 

For example, you can search for records that have no data in a specific indexed field (for example, return all records that have no extent and medium statement), as well as records that do have data in a specific field (for example, show me lower level records that have a repository entered). This is described in our advanced search documentation, as well as in the following slide deck (searching for missing or populated fields is covered on slide 30):
There are also a number of SQL queries and scripts that have been shared in the forum over the years, which can be useful for cleanup. Here are a few I've found with a quick search: 
  • A thread on how to change all draft descriptions to published using SQL
  • A thread with variations on ways to bulk delete authority records
  • A thread on how to delete accidental blank descriptions from a site using a script
  • A thread with a script to delete unneeded hard links to creators at lower levels, where inheritance would produce the same result
You can see every post in the last several years that has been tagged with "SQL" here: 
This slide deck also provides an introduction to using SQL in AtoM: 
If you search for words like script, SQL, or gist, you will probably find other useful threads in the forum as well. 

There's likely much more, but hopefully this will give you some ideas, and encourage others to chime in. 

And importantly, don't forget! Any time you are making bulk edits or back end changes, we strongly recommend you make a back up of your data first

Cheers, 

Dan Gillean, MAS, MLIS
AtoM Program Manager
Artefactual Systems, Inc.
604-527-2056
@accesstomemory


--
You received this message because you are subscribed to the Google Groups "AtoM Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ica-atom-user...@googlegroups.com.
To post to this group, send email to ica-ato...@googlegroups.com.
Visit this group at https://groups.google.com/group/ica-atom-users.
To view this discussion on the web visit https://groups.google.com/d/msgid/ica-atom-users/cbd426fa-4ccd-433c-8944-c5a1eb6d52cd%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Natalie Vielfaure

unread,
Jan 30, 2019, 7:24:58 PM1/30/19
to AtoM Users
Thanks, Dan! This was extremely helpful! 
Reply all
Reply to author
Forward
0 new messages