Very big AtoM sites

Tim Schofield

unread,

Jan 18, 2018, 10:45:03 AM1/18/18

to AtoM Users

Hi Dan

We have a potential AtoM project which will involve something like 170 archival institutions

and many thousands of collections.

The question is: What is better, one huge AtoM site for everyone - which could grow

unmanageably large OR perhaps a single AtoM installation for each institution - but this seems clumsy because we know

some institutions only have very small collections - OR some type of compromise between the two.

Can you give me any advice?

Regards

Tim Schofield

Dan Gillean

unread,

Jan 18, 2018, 11:36:17 AM1/18/18

to ICA-AtoM Users

Hi Tim!

Interesting! I think it might depend on how large you expect the site to grow. We have the ArchivesCanada site (780 institutions, 900K+ descriptions, 111K+ authority records, etc) running on the 2-site setup we generally recommend for very large installations. Essentially, it's a public-facing read-only site and a secure read/write site, with a replication script used to periodically copy the database, digital objects, and search index from the R/W site to the public one. This allows for not only greater security, but aggressive caching on the public site to help improve performance. We offer this by default to our Premium+ hosting clients, and we have a bit more information on the setup and it's advantages on our website, here:

https://www.artefactual.com/services/site-hosting/scaling-up-with-premium-hosting/

Slides 27-34 of the following deck also provide a bit more insight into how Simon Fraser University is managing this for their two archival repositories:

https://www.slideshare.net/accesstomemory/atom-implementations

See especially slide 33 for a generalized diagram of how the setup works.

We've made the replication script that we use available in our Artefactual Labs GitHub repository, here:

https://github.com/artefactual-labs/atom-replication

I would say that if you expect your site to grow beyond 2 million+ records, then you may need to consider another option. One site per institution sounds like it could be overkill - I wonder if there is some kind of regional or content-based grouping you could consider?

One more advanced possibility would be to consider a number of smaller sites, and building a custom front-end that can read from multiple AtoM indexes. This is something we hope to explore further in a next-generation version of AtoM (to better support union catalogues and portal sites without having to actually duplicate the data and maintain it in multiple places), but in the meantime, there are some examples of this being done in the wild. Most notably, see the UBC example in slides 13-18 of the same deck above. The University of British Columbia Library has built themselves a custom front-end application that pulls data from a number of different sources, including AtoM - by reading directly from the Elasticsearch API. It might be possible to develop something similar that can draw from multiple ES indexes.

However you choose to proceed, keep us posted as to your progress and what you discover along the way!

Cheers,

Dan Gillean, MAS, MLIS
AtoM Program Manager
Artefactual Systems, Inc.
604-527-2056

@accesstomemory

--
You received this message because you are subscribed to the Google Groups "AtoM Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ica-atom-users+unsubscribe@googlegroups.com.
To post to this group, send email to ica-atom-users@googlegroups.com.
Visit this group at https://groups.google.com/group/ica-atom-users.
To view this discussion on the web visit https://groups.google.com/d/msgid/ica-atom-users/f95aa879-9b72-4820-be02-c680db06420c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

John Hewson

unread,

Jan 18, 2018, 1:37:06 PM1/18/18

to AtoM Users

Hi Tim,

For a small-scale example of what Dan's suggesting, see also www.sjcarchives.org.uk , which searches across the ES indexes of two AtoM sites. The method could readily scale to hundreds of sites, which could indeed be a mixture of separate sites and union catalogues sharing sites, all federated under an ES cluster. Hitting ES with a light-weight custom script is also relatively resource efficient, which could be an advantage for a very busy site.

John

Kevin Dusenberry

unread,

Aug 10, 2023, 4:56:19 PM8/10/23

to AtoM Users

Tim, did you ever get this project up and running?

Kevin Dusenberry

Digital Archivist

General Commission on Archives and History

Reply all

Reply to author

Forward