Finding aids at series level and size limitations

70 views
Skip to first unread message

Jarad

unread,
Sep 15, 2020, 12:44:08 PM9/15/20
to AtoM Users
Hi all,

I'm pretty sure I know the answer to this, but wanted to confirm before I give up: is it possible to upload a finding aid at a level lower than the top level of a collection? I have several series in the same fonds that have a very large number of folders within and combining file level descriptions for these series into a single, fonds-level finding aid would be impossible (we're talking tens of thousands of pages in a PDF). As such, I was hoping I could upload one finding aid for each individual series rather than an entire fonds-level finding aid.

Can this be done?

Just for clarity's sake:
  • I am not currently looking to create descriptions in AtoM for all of the lower level descriptions, I just want them to be in the custom finding aid I am uploading myself
  • Despite my preference for series systems, I am not currently looking to make each series a top level description
  • My back-up plan is to manually upload series level finding aids elsewhere and use the finding aid RAD field to link to them, but I was hoping to use the built-in option for the sake of data purity and search-ability
Follow-up to this that may render the question moot:

I have done a bit of testing by uploading some of the larger series finding aids to test top level descriptions and have found that that AtoM's search function seems to break down when attempting to parse through very large finding aids. For example, I uploaded a 5,000-ish page finding aid and though the search was able to return a result for a term near the top of the finding aid, it would not return results for terms that I know exist lower down in the finding aid.

Is this inability to parse through large self-made finding aids a limitation of AtoM as software or is this a question of resources (ie, if I throw enough RAM, processing power, etc.  towards my instance, will it be able to handle this)?


Any assistance anyone can provide on the above topics would be much appreciated.

Best regards,
Jarad

Dan Gillean

unread,
Sep 16, 2020, 11:11:13 AM9/16/20
to ICA-AtoM Users
Hi Jarad, 

RE: Finding aids

Right now, AtoM can only associate one generated or uploaded finding aid per archival unit (i.e. per hierarchy). Changing this would require development - and I think we'd also want to be careful about design, and think about how to make it clear to end users that further detailed finding aids may be available at lower levels. 

In the meantime, one workaround you might consider would be uploading the lower-level finding aids as digital objects on the related series. Your custom finding aid uploaded at the fonds/collection level could indicate that more detailed file lists etc are available at the series level. AtoM will still attempt to index the finding aid text when uploaded as a digital object - though keep in mind that this workaround will affect the kinds of results you see when doing searches and filtering or faceting for finding aids and/or digital objects. 

RE: PDFs, OCR and/or text layers, and AtoM indexing

You're right that AtoM currently will cut off the indexing of a text layer after a certain point. Please see this previous forum thread, where I've added an explanation: 
In 2.6, this is essentially the same currently - AtoM stores the text layer in the property tables as a key/value pair. The field we're currently using is set to TEXT type in MySQL, which has a character limit of 65,535 characters or bytes by default. Remember as well that AtoM uses UTF-8 character encoding, in which characters can be 1-4 bytes, depending, so this isn't a 1:1 match between characters in the database and letters as we think of them. See also: 
We haven't globally increased this to MEDIUMTEXT or larger because it could have significant impacts on search throughout AtoM - for example, if you have one hundred different 5,000 page finding aids then odds are some content in there is going to match almost every user query, and you're going to get a lot of noise in the search results, with those same 100 descriptions turning up all the time and it not being obvious from the record metadata why. 

However, locally you could try modifying this field, and changing the type via MySQL. I would strongly recommend you make a backup of your data first, just in case there are unintended consequences, but you could assess how it impacts search and decide if it will better meet your needs. 

You'll need to restart MySQL and PHP-FPM after, and you would also want to re-extract the text layer after, using this command-line task: 
You would also want to rebuild the search index, so that newly extracted text shows up in search results. 

Hope this helps!

Cheers, 

Dan Gillean, MAS, MLIS
AtoM Program Manager
Artefactual Systems, Inc.
604-527-2056
@accesstomemory
he / him


--
You received this message because you are subscribed to the Google Groups "AtoM Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ica-atom-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ica-atom-users/228222cb-14d4-4bad-a0e0-a206fc7303bdn%40googlegroups.com.

Jarad

unread,
Sep 21, 2020, 10:11:38 AM9/21/20
to AtoM Users
Hi Dan,

Thanks for the reply and the useful information. Not unexpected, but it's good to have confirmation. Here's to work-arounds!

Cheers,
Reply all
Reply to author
Forward
0 new messages