Java Memory Issue when running dspace solr-reindex-statistics

251 views
Skip to first unread message

Agustín Alfieri

unread,
Oct 20, 2023, 8:16:57 AM10/20/23
to DSpace Technical Support
Hello,
I'm doing a migration from Dspace 4.7 to 7.6 and since I need to be able to see Solr statistics. So, I'm migrating from 4.7 to 6.4 first to be able to export the Solr statistics for later import in 7.6.

I've done some tests of the full migration that worked fine, but when we tried to do it for production we encountered an error during step 14 of https://wiki.lyrasis.org/display/DSDOC6x/Upgrading+DSpace. When I try to run the [dspace]/bin/dspace solr-reindex-statistics command it run for several hours and then  threw this error:solr-reindex-statistics ERROR.png
On dspace.log all I can see is that the process stopped somewhere near finishing the export:dspace.log .png
And on solr.log I found what seems to be the underlying error:
solr_log ERROR.png

This is not the first problem I encounter with GC or Java heap space when running this command, and thus I tried giving tomcat more memory to fix it. These were my settings when running the command:

Ubuntu 22.04
Dspace 6.4
Tomcat 9 with:
  • [Service]
  • Type=forking
  • User=tomcat
  • Group=tomcat
  • Environment="JAVA_HOME=/opt/jdk1.8.0_371"
  • Environment="JAVA_OPTS=-Xms4096M -Xmx16384M -Djava.security.egd=file:///dev/urandom -Djava.awt.headless=true"
  • Environment="CATALINA_BASE=/opt/tomcat"
  • Environment="CATALINA_HOME=/opt/tomcat"
  • Environment="CATALINA_PID=/opt/tomcat/temp/tomcat.pid"
  • Environment="CATALINA_OPTS=-Xms4096M -Xmx16384M -server -XX:+UseParallelGC"
  • ExecStart=/opt/tomcat/bin/startup.sh
  • ExecStop=/opt/tomcat/bin/shutdown.sh
/dspace/bin/dspace:
  • #Allow user to specify java options through JAVA_OPTS variable
  • if [ "$JAVA_OPTS" = "" ]; then
  •   #Default Java to use 256MB of memory
  •   JAVA_OPTS="-Xms1024m -Xmx16384m -Dfile.encoding=UTF-8"
  • fi
The repository has around 16.000 items and has been running since 2007

I believe there's a problem that I'm not seeing since I don't understand why this process would need so much memory. I welcome any solution or suggestion!

Related:

Agustín Alfieri

unread,
Oct 23, 2023, 8:25:39 AM10/23/23
to DSpace Technical Support
Some more info:

I ran a new test giving Tomcat 25GB of max memory (the server has 32 GB) and the script run without problems. I saw that for some reason  the memory use started growing really fast around the time the export was finishing (and the import starting). Is this ok? I feel like it's using a lot of memory for no reason and I'm wondering if I'm doing something wrong.
79754c3c-cfb0-4f25-950b-cf27cef23354.jfif

DSpace Technical Support

unread,
Oct 23, 2023, 11:34:20 AM10/23/23
to DSpace Technical Support
Hi,

While I cannot say with certainty, this issue sounds similar in nature to this performance bug: https://github.com/DSpace/DSpace/pull/8980   (There were also some general indexing performance issues fixed between 6.x and 7.x)

We are currently testing/reviewing that proposed 7.x fix.  The issue is that batch processes can sometimes be overly slow / memory intensive.  That is definitely not expected behavior and we are working on finding and solving these issues in DSpace 7.x.

As DSpace 6.4 is no longer under maintenance, these fixes will not be backported.  But, we are working to solve performance issues in 7.x.  If you'd like to help, you could test the above PR, but it can only be applied to DSpace 7.x and not to 6.x.

Tim

Agustín Alfieri

unread,
Oct 23, 2023, 11:48:59 AM10/23/23
to DSpace Technical Support
Hi Tim, 

I understand. I'm willing to help test this once we're over with the update to Dspace 7 but the problem is that dspace solr-reindex-statistics is not working on Dspace 7 (https://github.com/DSpace/DSpace/issues/8181). So this particular situation can't be tested on Dspace 7.

Agustín
Reply all
Reply to author
Forward
0 new messages