Hello everyone,
I am trying to use DSpace 6.4 stats-util with '-s' parameter to shard the DSpace statistics to separate cores based on the year. When I let the utility run for a while, I get a following error report from Nagios:
State: CRITICAL
Date/Time: Wed Jan 11 10:08:09 CET 2023
Additional Info: CRITICAL - Socket timeout
When I try to access the user interface (XMLUI), I get a the following error:
...
Caused by: org.hibernate.exception.GenericJDBCException:
Could not open connection
+
Caused by: java.sql.SQLException: Cannot get a
connection, pool error Timeout waiting for idle object
...
I think this is somehow related to running stat-util script and I don't know how to solve this issue.
What I did:
1. I run the 'stats-util -s' command from terminal, I get the initial message:
'Moving: 45459190 into core statistics-2021'
This means, that stats-util is trying to move 45 459 190 individual statistics SOLR records to a new shard statistics-2021.
2. Then I run `tail /opt/dspace/log/solr.log -f | grep "org\.apache\.solr\.core\.SolrCore \@ \[statistics\] webapp\=\/solr path\=\/select"` to monitor the progress:
I can see that stats-util is selecting statistics records from SOLR in a batch of 10 000 records and I can monitor query time in the QTime attribute. Then example line I get from solr.log is shown below:
2023-01-11 10:53:07,617 INFO org.apache.solr.core.SolrCore @ [statistics] webapp=/solr path=/select params={csv.mv.separator=|&q=*:*&csv.escape=\&start=340000&fq=time:([2021\-01\-01T00\:00\:00Z+TO+2022\-01\-01T00\:00\:00Z]+NOT+2022\-01\-01T00\:00\:00Z)&rows=10000&wt=csv} hits=45459190 status=0 QTime=650
You can see, that QTime value is initially under a second, but as the script runs for a while, QTime value gradually rises and reaches times over several minutes per query.
3. When I get to approximately 8 000 000 processed records (as indicated by the value of 'start' parametr in the ?select query, Nagios starts reporting socket timeout (as described above) and after accessing user interface, I get the 'SQLException: Cannot get a connection, pool error Timeout waiting for idle object' (as described above).
4. I have to terminate the stats-util process and restart postgress and/or tomcat to resume the normal operation of our DSpace installation.
Our DSpace installation details:
DSpace version: 6.4
Java Runtime Environment Version: 1.8.0_352
Java Runtime Environment Vendor: OpenJDK 64-Bit Server VM
Operating System Version: Centos 7 3.10.0-1160.62.1.el7.x86_64
Total memory available on server: 24 GB
Tomcat memory assigned: 3GB
DSpace cmd tools memory assigned: 5GB
Database: Postgresql 9.6
DB related configuration in local.cfg:
- db.maxconnections = 100
- db.maxwait = 5000
- db.maxidle = 30
I would appreciate any insights into this issue and any help solving this.
Thank you,
with best regards,
Jakub Řihák
Central Library
Charles University