scalability and archivematica

502 views
Skip to first unread message

dfj...@g.syr.edu

unread,
Sep 7, 2017, 3:09:46 PM9/7/17
to archivematica
Hi everyone,

I'm part of a team evaluating preservation systems for deployment at our institution. We have narrowed things down to archivematica vs. preservica and one of the things that our IT folks have pointed to is related to scalability. 

Specifically, we have a HUGE backlog of digital A/V objects that gets into many, many TBs of data (400-600TB approximately) and they are concerned about a single server processing issue. That is, if we were able to get to a point where we were continuously pushing objects to an archivematica queue, would the software be able to handle that kind of volume? (Apparently the Preservica installation can distribute this load across multiple servers so that is not the issue there).

Has anyone out there dealt with this type of volume (i.e. continuous processing of large video files) with success? Or run into any significant backlog issues? 

Many thanks in advance.

--Deirdre Joyce
Metadata Services Librarian
Syracuse University Libraries

Timothy Walsh

unread,
Sep 7, 2017, 5:53:41 PM9/7/17
to archivematica
Hi Deirdre,

We have a slightly different (more heterogenous) backlog at my institution, but also many TBs of files to ingest and with several processing configurations needed (normalization vs. no normalization, extracting files from archives and disk images or leaving them alone, etc.) depending on the source/type of data.

To accommodate this, we're just now moving to a setup of four Archivematica servers -- one running a central/shared Storage Service, and three processing pipelines. Each of the processing pipelines will be setup with automation tools scripts so that they're capable of continually ingesting data, with a different processing configuration per server and calling some pre-transfer scripts that do things like grab metadata from our CMS and write it into each AIP's METS file.

This is a step forward for us, but we're already successfully been using several servers in parallel for a year or so now, so I can attest to the fact that Archivematica is capable of scaling up that way. Scaling issues are more likely to happen if you anticipate transfers/SIPs to contain many tens or hundreds of thousands of file each (which is something I believe Artefactual is working on as well). If your concern is size of files, or amount of material to be transferred through in a given amount of time, you can definitely configure your setup to spread throughput across several servers and shouldn't run into issues.

Cheers,

Tim

Tim Walsh
Digital Archivist
Canadian Centre for Architecture

dfj...@g.syr.edu

unread,
Sep 8, 2017, 12:48:20 PM9/8/17
to archivematica
Thanks, Tim. That's super helpful. 

If you don't mind me asking, I can anticipate the next question from our IT staff which is: how much extra support time from IT staff (in terms of application/server support) do you think it takes/will take to implement/maintain this type of configuration?

Thanks again! 

--Deirdre 

Timothy Walsh

unread,
Sep 11, 2017, 10:18:10 AM9/11/17
to archivematica
Hi Deirdre,

Happy to help. We have a support contract with Artefactual, so our local IT's time is limited and spent on things like provisioning and providing access to the VMs that Archivematica is running on and managing our digital object storage. That said, the move from one to several servers takes some up-front configuration and will mean that if you have customizations (e.g., a customized FPR or automation tools deployment), these need to be applied to each server individually. On the whole though, in my experience the difference in time between maintaining Archivematica on one server vs. several is not huge, and is barely noticeable with a support contract.

Thanks,
Tim

dfj...@g.syr.edu

unread,
Sep 14, 2017, 8:59:39 AM9/14/17
to archivematica
Tim - thank you SO much!!! Very, very helpful.
Reply all
Reply to author
Forward
0 new messages