On Thu, Feb 06, 2020 at 09:46:49AM +0200, Alan Orth wrote:
> Dear list,
>
> I'm testing an upgrade of a DSpace 5.8 instance to DSpace 6.3 and one of
> the first things I notice is that Discovery indexing is about three or four
> times slower than it was before. On the same hardware, my repository with
> ~85,000 items takes 30 minutes to index with DSpace 5 and three hours with
> DSpace 6.3 and DSpace 6.4-SNAPSHOT. My development environment is on Linux
> with a fast SSD and lots of RAM, so I fear it will be even worse on our
> production server.
>
> I have read that the new Hibernate database layer in DSpace 6 involves much
> more complicated or time-consuming database queries. How are other people
> handling this? We're using PostgreSQL 9.6. Could it be time to move to
> something higher to hopefully gain something from PostgreSQL's own advances?
I don't know that upgrading PostgreSQL will help your indexing
performance all that much, but it shouldn't hurt. We run production
against Pg 10.9 and I develop DSpace 5, 6, and 7 against 12.1.
Hibernate does tend to fetch more stuff, but it also caches very
aggressively and rather well, so it's hard to say whether it is
contributing to any particular slow-down. There have been specific
DSpace operations in which Hibernate was found to be a source of
excess activity, but I think that most of them have been addressed in
patches scheduled for 6.4. I have no doubt that there are others.
Probably the most methodical approach would be to run indexing with a
profiler and find out where the time is being spent. Since
command-line indexing involves three processes (bin/dspace, Pg, and
Tomcat (running Solr)) it would be good to pay particular attention to
time spent waiting on another process.
Short of profiling, tools like 'top' and 'iotop' will give a rough
idea of whether the system is generally busier and suggest which parts
are responsible. You might be able to set up 'strace' or the like to
log mainly I/O calls and grind some statistics out of the log.
(I really should try some of these myself....)
--
Mark H. Wood
Lead Technology Analyst
University Library
Indiana University - Purdue University Indianapolis
755 W. Michigan Street
Indianapolis, IN 46202
317-274-0749
www.ulib.iupui.edu