Hi Tim!
Interesting! I think it might depend on how large you expect the site to grow. We have the
ArchivesCanada site (780 institutions, 900K+ descriptions, 111K+ authority records, etc) running on the 2-site setup we generally recommend for very large installations. Essentially, it's a public-facing read-only site and a secure read/write site, with a replication script used to periodically copy the database, digital objects, and search index from the R/W site to the public one. This allows for not only greater security, but aggressive caching on the public site to help improve performance. We offer this by default to our Premium+ hosting clients, and we have a bit more information on the setup and it's advantages on our website, here:
Slides 27-34 of the following deck also provide a bit more insight into how Simon Fraser University is managing this for their two archival repositories:
See especially slide 33 for a generalized diagram of how the setup works.
We've made the replication script that we use available in our Artefactual Labs GitHub repository, here:
I would say that if you expect your site to grow beyond 2 million+ records, then you may need to consider another option. One site per institution sounds like it could be overkill - I wonder if there is some kind of regional or content-based grouping you could consider?
One more advanced possibility would be to consider a number of smaller sites, and building a custom front-end that can read from multiple AtoM indexes. This is something we hope to explore further in a next-generation version of AtoM (to better support union catalogues and portal sites without having to actually duplicate the data and maintain it in multiple places), but in the meantime, there are some examples of this being done in the wild. Most notably, see the UBC example in slides 13-18 of the same deck above. The University of British Columbia Library has built themselves a custom front-end application that pulls data from a number of different sources, including AtoM - by reading directly from the Elasticsearch API. It might be possible to develop something similar that can draw from multiple ES indexes.
However you choose to proceed, keep us posted as to your progress and what you discover along the way!
Cheers,