Thank you Terry. How fast do your DSpace grow? How many items per month
or year? Do you do clustering / load balancing? What kind of hardware do
you need to run it? I would be grateful if you can share those information.
Vlastik
On 8/23/19 6:28 PM, Terry Brady wrote:
> Here are some details about DigitalGeorgetown.
>
> * Total items: 546,000
> * Public items: 397,000
> * Citation only items: ~470,000
>
> As we tested and migrated to DSpace 6x, we did encounter a few
> performance issues. We have contributed patches to DSpace 6x releases
> (and to the future DSpace 6.4 release) to help resolve these issues.
>
> We preserve our assets in the APTrust (Academic Preservation Trust)
> service, so we do not run the DSpace checksum checker on our DSpace
> instance.
>
> Terry
>
> On Fri, Aug 23, 2019 at 7:48 AM Tim Donohue <
tim.d...@lyrasis.org
> <mailto:
tim.d...@lyrasis.org>> wrote:
>
> Hello Vlastimil,
>
> Unfortunately, the size of DSpace sites is very difficult to track
> overall (it relies entirely on self reporting).
>
> I know there are very large sites out there... a few that come to
> mind are U of Cambridge (
https://www.repository.cam.ac.uk
> <
https://www.repository.cam.ac.uk/>), and Georgetown University
> (
https://repository.library.georgetown.edu/). I cannot claim to
> know exactly how large the sites are though, as each of these sites
> may have access restricted content (which is not even visible on the
> web). However, in terms of public content alone each has 250-350
> thousand items.
>
> I also admit that I don't know whether there are larger sites out
> there. But, maybe institutions on this mailing list will
> self-report if they have more than 400 thousand items. (I know I'd
> love to hear which sites have >400K items!)
>
> I think Mark Wood gave a thorough answer regarding the number of
> items possible in a DSpace. Technically, the biggest limitation is
> the amount of server space & memory available (as larger sites need
> more of each). For each release we attempt to make DSpace as
> performant (and memory lean) as we can, and as memory issues are
> reported we resolve them as bugs in a new release. For example, for
> the upcoming DSpace 7 release (which is still under active
> development) we are running more detailed performance testing as
> detailed
> here:
https://wiki.duraspace.org/display/DSPACE/DSpace+7+Performance+Testing
> At this time, that performance testing is more geared towards
> minimizing CPU load and memory overall (which will also help in
> scaling).
>
> Tim
>
> ------------------------------------------------------------------------
> *From:*
dspace-c...@googlegroups.com
> <mailto:
dspace-c...@googlegroups.com>
> <
dspace-c...@googlegroups.com
> <mailto:
dspace-c...@googlegroups.com>> on behalf of Vlastimil
> Krejčíř <
kre...@ics.muni.cz <mailto:
kre...@ics.muni.cz>>
> *Sent:* Friday, August 23, 2019 5:57 AM
> *To:* DSpace Community <
dspace-c...@googlegroups.com
> <mailto:
dspace-c...@googlegroups.com>>
> *Subject:* [dspace-community] Scalability of DSpace
> <mailto:
dspace-communi...@googlegroups.com>.
> <
https://groups.google.com/d/msgid/dspace-community/a37b7af1-59eb-4a7e-b302-196cadbed7a0%40googlegroups.com?utm_medium=email&utm_source=footer>.
> <mailto:
dspace-communi...@googlegroups.com>.
> To view this discussion on the web visit
>
https://groups.google.com/d/msgid/dspace-community/DM5PR22MB05727332D082F1B9BEB443BCEDA40%40DM5PR22MB0572.namprd22.prod.outlook.com
> <
https://groups.google.com/d/msgid/dspace-community/DM5PR22MB05727332D082F1B9BEB443BCEDA40%40DM5PR22MB0572.namprd22.prod.outlook.com?utm_medium=email&utm_source=footer>.
> <mailto:
dspace-communi...@googlegroups.com>.
> To view this discussion on the web visit
>
https://groups.google.com/d/msgid/dspace-community/CAMp2YEwjrRz7B%2B%2BXtyC0gV-gW90aukC5o3s2o%2B9pf4y5wE_uZA%40mail.gmail.com
> <
https://groups.google.com/d/msgid/dspace-community/CAMp2YEwjrRz7B%2B%2BXtyC0gV-gW90aukC5o3s2o%2B9pf4y5wE_uZA%40mail.gmail.com?utm_medium=email&utm_source=footer>.