Hi Natalie,
I wanted to give other community members a chance to respond first, but since no one has so far - and since I spent a stint as the Archival Network Services Coordinator for the AABC (managing
MemoryBC), I thought I would share some links and thoughts.
In terms of
general management, I think it is helpful to have some clearly defined policies around data entry, such as: required and recommended fields for each entity, and the use of terms such as subject/place/genre access points (and policies around the creation of new term). There are also some settings that have global effects worth reviewing - does your site keep the
reference code inheritance setting on? If yes, do all members know about this and understand how they should be constructing local identifiers at lower levels?
Then there are optional features - it can be useful to decide in advance if you want contributors to use them consistently or not, or if that matters to you. For example, the ability to
generate finding aids - is that something you want users to use? Do all contributors know about this feature? Do contributor accounts have access to generating finding aids, or does the coordinator need to do it? What about uploading or linking digital objects - does the portal site have a policy on their use? Do you have enough storage space for everyone to upload whatever they want, or does there need to be a policy in place to manage this?
It also helps to have the site admin review descriptions before they are published, and ensuring all users understand what the coordinator is evaluating when checking descriptions. Beyond checking for conformance with the policies I suggested above, I would also recommend looking for things like:
- Has inheritance been used properly? I.e. are lower-level records directly linked to the same creator/repository as the top-level records? If yes, this should be removed, so AtoM's inheritance can function. This allows for better scalability and performance (especially later during cleanup projects) as your database grows
- Are there existing terms, authority records, etc. being duplicated where they could have been linked?
- Are there features in AtoM that aren't being used but which might increase the usability of the records for researchers if added? For example, AtoM can create complex relationships between authority records. If a family authority record is created, have relevant individual person authority records been linked to it?
I also wanted to suggest taking a look at some of the existing resources other multi-repository sites and portals have created for their users. For example, the AABC has a few policies related to MemoryBC on their website, here:
In terms of
data cleanup, this is much easier if you have access to the system's backend, so you can make use of
command-line tools, SQL queries, cleanup scripts, and bulk imports/exports as needed.
If you're cleaning up terms in one of the access point taxonomies, the
taxonomy normalization task can be helpful - it will merge any terms together (preserving links to any related descriptions on both terms) that have identical authorized forms of name. This means you could clean up your access points by:
- Reviewing terms in the user interface and identifying points of commonality for merging - for example, if you have four similar terms (automobile, automobiles, cars and trucks, and vehicles), deciding which term will be used consistently going forward
- Editing the name of any terms you want to merge to make them identical to the term you wish to keep
- Running the taxonomy normalization task
Note that when merging, the oldest of the two terms is preserved and the relations moved to that original term before the duplicate is deleted - the additional fields in the term record are *not* evaluated. So if you had two subjects labeled "automobiles", and the newer one has a scope note and source note but the older term has nothing but the authorized form of name, then the scope and source notes will be lost when merging.
Addtionally, AtoM 2.5 will include a command line task that will check the creator against parent records, and replace direct links with inheritance where it would produce the same result. We don't have the documentation prepared for this yet, but you can see the issue ticket here:
If you want to take advantage of this prior to the 2.5 release, I've also shared a link below for where a script version has been shared in the forum.
If you want to check your records for the consistent use or failure to use specific fields, then you might be able to use expert searching to check this.
For example, you can search for records that have no data in a specific indexed field (for example, return all records that have no extent and medium statement), as well as records that
do have data in a specific field (for example, show me lower level records that have a repository entered). This is described in our advanced search
documentation, as well as in the following slide deck (searching for missing or populated fields is covered on slide 30):
There are also a number of SQL queries and scripts that have been shared in the forum over the years, which can be useful for cleanup. Here are a few I've found with a quick search:
- A thread on how to change all draft descriptions to published using SQL
- A thread with variations on ways to bulk delete authority records
- A thread on how to delete accidental blank descriptions from a site using a script
- A thread with a script to delete unneeded hard links to creators at lower levels, where inheritance would produce the same result
You can see every post in the last several years that has been tagged with "SQL" here:
This slide deck also provides an introduction to using SQL in AtoM:
If you search for words like script, SQL, or gist, you will probably find other useful threads in the forum as well.
There's likely much more, but hopefully this will give you some ideas, and encourage others to chime in.
And importantly, don't forget! Any time you are making bulk edits or back end changes, we strongly recommend you make a back up of your data first!
Cheers,