--
You received this message because you are subscribed to the Google Groups "DSpace Technical Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dspace-tech...@googlegroups.com.
To post to this group, send email to dspac...@googlegroups.com.
Visit this group at https://groups.google.com/group/dspace-tech.
For more options, visit https://groups.google.com/d/optout.
Hi Hernán,
The simple answer here is that, currently, there is no "standard" high availability setup for DSpace, and DSpace has no inherent ability to do load balancing or clustering on its own.
That said, DSpace is essentially just a web application that runs
on Tomcat (or similar), uses a PostgreSQL database (or similar) to
store metadata/relationships, and uses Apache Solr for
searching/browsing. Each of these three tools (Tomcat, PostgreSQL
and Solr) *do* provide clustering options. So, it may be
plausible to rely on the clustering options at those levels to
create a DSpace cluster.
However, I'll admit that I'm not aware of anyone who has done that before. If someone has, I'm hoping they will speak up here to provide us all a bit more clues/hints. There is an older (outdated now) wiki page where such discussions started a long time ago, but they never came to any final decision/proposal:
https://wiki.duraspace.org/display/DSPACE/Clustering
All that said, I suspect there are others who would be of interested in more easily enabling clustering within DSpace itself. That seems like it'd make a wonderful addition to the software platform, but it'd take one (or more) institutions who could help us to better define the gaps, what is missing/needed, and then start to figure out a way forward. DSpace has no centralized development team (developers are volunteers or allowed to work on the project by their institutions). So we are entirely reliant on the institutions using DSpace to help us make such improvements (see how to contribute [1]). If we can find a few interested users, we also could establish a formal DSpace Clustering Interest Group [2] that could begin to define the use cases, needs, etc for the benefit of us all.
The topic of clustering is one that comes up every once in a while on this mailing list. If others are interested in helping to move this idea forward, I'd encourage you to voice you opinions/experience here. All we'd need to establish a Clustering Interest Group would be some interested individuals and one or more willing to chair / co-chair those group meetings.
Sincerely,
Tim
[1]
https://wiki.duraspace.org/display/DSPACE/How+to+Contribute+to+DSpace
[2]
https://wiki.duraspace.org/display/DSPACE/DSpace+Interest+Groups
-- Tim Donohue Technical Lead for DSpace & DSpaceDirect DuraSpace.org | DSpace.org | DSpaceDirect.org
From my investigations into DSpace, the key element that I would like to de-couple from DSpace is SOLR.Say you were going to build a new frontend to DSpace that heavily used the DSpace REST API. You could have multiple servers, each running tomcat and the DSpace REST API deployed. With nginx outside of that proxying / load balancing. No problems. Especially as you have postgres as an external service (rds), the assetstore is located outside of DSpace (s3). However, I don't see how you can run multiple instances of DSpace SOLR. SOLR stores data, and it wouldn't be as simple as just adding another server running the webapp. But you would need to coordinate the SOLR cluster, using SolrCloud / ZooKeeper. Maybe its not as complicated as I think. But, I thought that I read at one point that DSpace had some custom solr code present, or the solr configs would have to be managed, and I'm not sure how much work it would be to build up a solr cluster with that config.