DSpace 6.4 stats-util -s issue

122 views
Skip to first unread message

jakub...@gmail.com

unread,
Jan 11, 2023, 5:43:29 AM1/11/23
to DSpace Technical Support
Hello everyone,

I am trying to use DSpace 6.4 stats-util with '-s' parameter to shard the DSpace statistics to separate cores based on the year. When I let the utility run for a while, I get a following error report from Nagios:

State: CRITICAL

Date/Time: Wed Jan 11 10:08:09 CET 2023

Additional Info: CRITICAL - Socket timeout

When I try to access the user interface (XMLUI), I get a the following error:

...
Caused by: org.hibernate.exception.GenericJDBCException: Could not open connection
+
Caused by: java.sql.SQLException: Cannot get a connection, pool error Timeout waiting for idle object
...

I think this is somehow related to running stat-util script and I don't know how to solve this issue.

What I did:
1. I run the 'stats-util -s' command from terminal, I get the initial message:

'Moving: 45459190 into core statistics-2021' 

This means, that stats-util is trying to move 45 459 190 individual statistics SOLR records to a new shard statistics-2021. 

2. Then I run `tail /opt/dspace/log/solr.log -f | grep "org\.apache\.solr\.core\.SolrCore \@ \[statistics\] webapp\=\/solr path\=\/select"` to monitor the progress:

I can see that stats-util is selecting statistics records from SOLR in a batch of 10 000 records and I can monitor query time in the QTime attribute. Then example line I get from solr.log is shown below:

2023-01-11 10:53:07,617 INFO  org.apache.solr.core.SolrCore @ [statistics] webapp=/solr path=/select params={csv.mv.separator=|&q=*:*&csv.escape=\&start=340000&fq=time:([2021\-01\-01T00\:00\:00Z+TO+2022\-01\-01T00\:00\:00Z]+NOT+2022\-01\-01T00\:00\:00Z)&rows=10000&wt=csv} hits=45459190 status=0 QTime=650

You can see, that QTime value is initially under a second, but as the script runs for a while, QTime value gradually rises and reaches times over several minutes per query.

3. When I get to approximately 8 000 000 processed records (as indicated by the value of 'start' parametr in the ?select query, Nagios starts reporting socket timeout (as described above) and after accessing user interface, I get the 'SQLException: Cannot get a connection, pool error Timeout waiting for idle object' (as described above).

4. I have to terminate the stats-util process and restart postgress and/or tomcat to resume the normal operation of our DSpace installation.

Our DSpace installation details:

DSpace version: 6.4
Java Runtime Environment Version: 1.8.0_352
Java Runtime Environment Vendor: OpenJDK 64-Bit Server VM
Operating System Version: Centos 7 3.10.0-1160.62.1.el7.x86_64
Total memory available on server: 24 GB
Tomcat memory assigned: 3GB
DSpace cmd tools memory assigned: 5GB
Database: Postgresql 9.6
DB related configuration in local.cfg:
- db.maxconnections = 100
- db.maxwait = 5000
- db.maxidle = 30

I would appreciate any insights into this issue and any help solving this. 

Thank you,
with best regards,

Jakub Řihák
Central Library
Charles University
Reply all
Reply to author
Forward
0 new messages