Notes on PostgreSQL connection pooling with a Tomcat JNDI resource

Alan Orth

unread,

Dec 20, 2017, 3:09:50 AM12/20/17

to DSpace Technical Support

Hi,

I've just spent half a day figuring out how to enable Tomcat's database connection pooling for DSpace and wanted to share some notes here. The best general information about JNDI configuration is is the Tomcat documentation[0], but the DSpace configuration reference for version 6.x[1] also shares a few pieces of information as well. I'm using DSpace 5.5 with Tomcat 7 on both macOS and Ubuntu 16.04, so keep that in mind if you're following along. This is probably better written as a blog post or wiki page, but I'm recording it here for now.

First, I added a JDBC Resource to the GlobalNamingResources in Tomcat's server.xml:

<Resource name="jdbc/dspace" auth="Container" type="javax.sql.DataSource"
    driverClassName="org.postgresql.Driver"
    url="jdbc:postgresql://localhost:5432/dspace"
    username="dspace"
    password="dspace"
      initialSize='5'
      maxActive='50'
      maxIdle='15'
      minIdle='5'
      maxWait='5000'
      validationQuery='SELECT 1'
      testOnBorrow='true' />

The pool size and idle settings come from a comment by Mark Wood on the DS-3564 ticket on DSpace's issue tracker[2]. I will monitor the database connections and DSpace logs and adjust as needed.

Second, I added a ResourceLink to each web application's context XML file, ie for solr.xml:

<Context docBase="/Users/aorth/dspace/webapps/solr">
<ResourceLink global="jdbc/dspace" name="jdbc/dspace" type="javax.sql.DataSource"/>
</Context>

Third, I had to manually download a JDBC driver (latest for JDK 8 is postgresql-42.1.4.jar) from PostgreSQL's website[3] and copy it to Tomcat's library path. This was a bit of a mystery to me, because each DSpace web application bundles a copy of the PostgreSQL JDBC driver for its own connections, but Tomcat doesn't use any of those when configured to use a global JDBC resource (could be that they aren't available when Tomcat starts?). Manually copying one of the web application's postgresql-9.1-901-1.jdbc4.jar files to Tomcat's library path does work, but I felt it was cleaner to use the upstream driver from PostgreSQL's website than to depend on a file from the DSpace build. Also, on my Ubuntu 16.04 system I tried installing the libpostgresql-jdbc-java package but it didn't work. In any case, just know that you'll need to get a JDBC driver and copy it to Tomcat's library path.

Last, I uncommented the db.jndi line in dspace.cfg. I left all the old settings like db.username, db.password, etc intact because DSpace falls back to those if the JDBC resource is not available and it seems to expect them to be defined even if they aren't used.

After restarting Tomcat the effect of using a correctly configured global database connection pool is visible instantly. For months and years I've been struggling with database connection issues due to load from bots, users, harvesters, etc. In the attached munin monitoring graph you can see that the number of idle database connections dropped drastically after deploying this. With the default DSpace settings every web application gets its own pool of the size specified in dspace.cfg—that's right, the default connection pool in dspace.cfg is not shared between web applications as the comment would lead you to believe! This becomes very hard to manage on a busy site where you need to keep increasing the maximum number of connections in both dspace.cfg and the system's PostgreSQL configuration. Also, there is no way to tell how many connections each of your web applications is using. As you can see in the graph, there were a large amount of idle connections, perhaps ten or more per web application depending on your db.maxidle setting. Looking in pg_stat_activity I can see some of those connections had been around for days or weeks even. Now all DSpace web applications share one connection pool and the number of active connections rises and falls according to need. It will be interesting to see how the graphs look in a few weeks of running this on our development server.

Now a few notes that should be addressed on the bug tracker perhaps:

- There is no indication in the DSpace logs that your JNDI resource is being used, although you do see many errors in the logs when the configuration is NOT successful, ie "Error retrieving JNDI context: jdbc/dspace". You only know it is working if you see "PostgreSQL JDBC Driver" in the application_name column of the PostgreSQL's pg_stat_activity table.

- The dspace command line tools don't seem to be able to use the JNDI resource and fall back to connecting to PostgreSQL directly. When monitoring the pg_stat_activity table while running `dspace database info` I noticed that eight (8) database connections were made by this one tool alone!

I hope this proves to be useful. I'm happy to hear people's thoughts or comments on this!

Cheers,

[0] https://tomcat.apache.org/tomcat-7.0-doc/jndi-datasource-examples-howto.html#PostgreSQL
[1] https://wiki.duraspace.org/display/DSDOC6x/Configuration+Reference
[2] https://jira.duraspace.org/browse/DS-3564
[3] https://jdbc.postgresql.org/download.html

--

Alan Orth
alan...@gmail.com
https://picturingjordan.com
https://englishbulgaria.net
https://mjanja.ch

postgres_connections_ALL-day-fs8.png

Mark H. Wood

unread,

Jan 3, 2018, 4:17:02 PM1/3/18

to DSpace Technical Support

Thanks for posting these notes. I'm sure they will be helpful. You've shown some tools that I didn't know about.

A pool instantiated by DSpace is probably effectively invisible to other webapp.s even in the same JVM. The Servlet spec. tries very hard to create the illusion that each webapp. is floating in a kind of womb, where everything it needs is mysteriously provided for it from somewhere beyond its perception. Each has its own classloader, for example, and so things that a webapp. creates for itself tend to be known only in places that are not accessible by other webapp.s. I could wish that DSpace made more thorough use of the Servlet environment rather than behaving as if it is standalone code.

You're quite correct that the command-line tools don't share a pool with any of the webapp.s, because the launcher runs in a different process with its own address space. This is one reason to continue specifying pool settings in dspace.cfg -- IMO this should be the *only* use of those settings. It *is* possible to supply a pool to the command line out of JNDI -- I've done it -- but you need to supply a directory service to the process. I can say a little about that if anybody is interested. You could provide in dspace.cfg settings more appropriate to the command line, if your webapp.s are set up with pools (tuned for their needs) from JNDI.

The reason you don't have to tinker with directory services for webapp.s is that the <Resource> and <ResourceLink> elements are causing your Servlet container (Tomcat) to populate an internal directory service with objects such as your connection pool. This is specified by Java EE, but many Servlet containers implement it even when not required by the relevant spec.s.

You *do* need to supply any DBMS drivers to the container itself, because the pool and connections are created by the container and so must be visible from *its* classloader(s), which (in Tomcat anyway) are on a branch of the hierarchy that is parallel to those of the webapp.s. I also would use the latest released driver.

It should be simple to provide a log message when the resolution of jdbc/dspace *succeeds*, and I think we should. There's already a Jira issue about making the other case less scary, and perhaps this should be included in that work.

Tim Donohue

unread,

Jan 3, 2018, 4:37:05 PM1/3/18

to Mark H. Wood, DSpace Technical Support

Hi Alan & Mark,

These notes look like the start to some enhanced documentation around setting up DSpace + Tomcat JNDI (hint, hint).

I'm wondering (out loud) if we should take these concepts/ideas and turn them into official documentation in the "Installing DSpace" section (maybe somewhere under "Advanced Installation"?): https://wiki.duraspace.org/display/DSDOC6x/Installing+DSpace

Thanks though for sharing the notes and Q&A here. I think this will be very helpful for others who wish to go this route.

- Tim

--
You received this message because you are subscribed to the Google Groups "DSpace Technical Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dspace-tech...@googlegroups.com.
To post to this group, send email to dspac...@googlegroups.com.
Visit this group at https://groups.google.com/group/dspace-tech.
For more options, visit https://groups.google.com/d/optout.

--

Tim Donohue
Technical Lead for DSpace & DSpaceDirect
DuraSpace.org | DSpace.org | DSpaceDirect.org

Tom Desair

unread,

Jan 3, 2018, 5:11:01 PM1/3/18

to DSpace Technical Support

I just wanted to add a small note that having 1 single DB pool for all Tomcat webapps can (and has) lead to problems. Your current pool size is 50. This means that if you have (malicious) crawlers hitting your OAI endpoint, this can deplete the available database connections available for the web UI (XMLUI or JSPUI). The other way around can also happen.

But using JNDI DB pools also give you more fine-grained control over the connection distribution over the different web apps. For example, a default PostgreSQL installation comes with a max connection limit of 100. This means you can safely use around 70 connections (from experience). You can then divided these connections with individual JNDI pools like this:

OAI: 15 connections
REST: 15 connections
WEB UI: 40 connections

Let me know if you've created a JNDI DB pool wiki page. I'll then try to add some useful information on JDBC interceptors (https://tomcat.apache.org/tomcat-7.0-doc/jdbc-pool.html#Configuring_JDBC_interceptors).

Tom Desair
250-B Suite 3A, Lucius Gordon Drive, West Henrietta, NY 14586
Gaston Geenslaan 14, Leuven 3001, Belgium
www.atmire.com

- Tim

To unsubscribe from this group and stop receiving emails from it, send an email to dspace-tech+unsubscribe@googlegroups.com.

To post to this group, send email to dspac...@googlegroups.com.
Visit this group at https://groups.google.com/group/dspace-tech.
For more options, visit https://groups.google.com/d/optout.

--
Tim Donohue
Technical Lead for DSpace & DSpaceDirect
DuraSpace.org | DSpace.org | DSpaceDirect.org

--

You received this message because you are subscribed to the Google Groups "DSpace Technical Support" group.

To unsubscribe from this group and stop receiving emails from it, send an email to dspace-tech+unsubscribe@googlegroups.com.

Hardy Pottinger

unread,

Jan 3, 2018, 5:40:42 PM1/3/18

to Tom Desair, DSpace Technical Support

Hi, please do create this wiki page, I'd love to read it. Thanks!

--Hardy

Alan Orth

unread,

Jan 4, 2018, 12:31:20 AM1/4/18

to Hardy Pottinger, Tom Desair, DSpace Technical Support

That's a cool idea to use a separate pool for each web application, Tom! I'd much rather have my OAI fail to establish a database connection than my XMLUI. ;)

Since I wrote the original mailing list message two weeks ago I've had DSpace fail to establish a database connection a few thousand times and I've increased my pool's max active from 50 to 75 and then 125 — our site gets about four million hits per month (from looking at nginx logs), so I'm still trying to find the "sweet spot" for the pool settings. Anything's better than setting the pool in dspace.cfg, though.

I wish other people would share their pool settings and experiences.

- Tim

To unsubscribe from this group and stop receiving emails from it, send an email to dspace-tech...@googlegroups.com.

To post to this group, send email to dspac...@googlegroups.com.
Visit this group at https://groups.google.com/group/dspace-tech.
For more options, visit https://groups.google.com/d/optout.

--
Tim Donohue
Technical Lead for DSpace & DSpaceDirect
DuraSpace.org | DSpace.org | DSpaceDirect.org

--
You received this message because you are subscribed to the Google Groups "DSpace Technical Support" group.

To unsubscribe from this group and stop receiving emails from it, send an email to dspace-tech...@googlegroups.com.

To post to this group, send email to dspac...@googlegroups.com.
Visit this group at https://groups.google.com/group/dspace-tech.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "DSpace Technical Support" group.

To unsubscribe from this group and stop receiving emails from it, send an email to dspace-tech...@googlegroups.com.

To post to this group, send email to dspac...@googlegroups.com.
Visit this group at https://groups.google.com/group/dspace-tech.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "DSpace Technical Support" group.

To unsubscribe from this group and stop receiving emails from it, send an email to dspace-tech...@googlegroups.com.

To post to this group, send email to dspac...@googlegroups.com.
Visit this group at https://groups.google.com/group/dspace-tech.
For more options, visit https://groups.google.com/d/optout.

--

Alan Orth

unread,

Jan 11, 2018, 2:37:49 AM1/11/18

to Hardy Pottinger, Tom Desair, DSpace Technical Support

To continue the discussion on a slightly related note: I've just finished dealing with the fallout caused by some new bot — the only fingerprint of which is its unique-but-normal-looking user agent — hitting our XMLUI with 450,000 requests from six different IPs over just a few hours. This generated a ridiculous amount of load on the server, including 160 PostgreSQL connections and 52,000 Tomcat sessions before I was able to mitigate it. Surprisingly, since I had increased out pool size to 300 after my last message, we never got pool timeout or database connection errors in dspace.log, but the site was very unresponsive — and this is on a beefy server with SSDs, plenty of RAM, large PostgreSQL buffer cache, etc! I ended up having to rate limit this user agent in our frontend nginx web server using the limit_req_zone module[0].

So a bit of a mixed success and frustration here. No amount of pool tweaking will fix this type of issue, because there's always another bigger, stupider bot that comes along eventually and doesn't match the "bot" user agent. I will definitely look into implementing separate pools as Tom had suggested, though, to limit the damage caused by high load to certain DSpace web applications. Keep sharing your experiences! This is very valuable and interesting to me.

[0] https://github.com/ilri/rmg-ansible-public/commit/368faaa99028c8e0c8a99de3f6c253a228d5f63b

Cheers!

Alan Orth

unread,

Jan 13, 2018, 11:01:03 PM1/13/18

to Hardy Pottinger, Tom Desair, DSpace Technical Support, Mark Wood

I've been testing DSpace 6.2 with Tomcat 8.5.24 and noticed that DSpace does not start if you supply a database pool from JNDI(!). It seems to be a known issue, as Mark Wood created a Jira ticket for it[0]. Also, DSpace 6.2 still has the db.jndi setting in dspace.cfg, even though this setting is no longer user customizable. I've made a pull request against the dspace-6_x branch to remove it, but I did not create a Jira ticket. Is it required?

[0] https://jira.duraspace.org/browse/DS-3434
[1] https://github.com/DSpace/DSpace/pull/1917

Regards,

Hardy Pottinger

unread,

Jan 16, 2018, 10:41:15 AM1/16/18

to Alan Orth, Tom Desair, DSpace Technical Support, Mark Wood

Hi, Alan, yes, we require a matching Jira issue for each pull request. When you create a new issue, please mention DS-3434 as being related. Thanks!

--Hardy

- Tim

To unsubscribe from this group and stop receiving emails from it, send an email to dspace-tech+unsubscribe@googlegroups.com.

To post to this group, send email to dspac...@googlegroups.com.
Visit this group at https://groups.google.com/group/dspace-tech.
For more options, visit https://groups.google.com/d/optout.

--
Tim Donohue
Technical Lead for DSpace & DSpaceDirect
DuraSpace.org | DSpace.org | DSpaceDirect.org

--
You received this message because you are subscribed to the Google Groups "DSpace Technical Support" group.

To unsubscribe from this group and stop receiving emails from it, send an email to dspace-tech+unsubscribe@googlegroups.com.

To post to this group, send email to dspac...@googlegroups.com.
Visit this group at https://groups.google.com/group/dspace-tech.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "DSpace Technical Support" group.

To unsubscribe from this group and stop receiving emails from it, send an email to dspace-tech+unsubscribe@googlegroups.com.

To post to this group, send email to dspac...@googlegroups.com.
Visit this group at https://groups.google.com/group/dspace-tech.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "DSpace Technical Support" group.

To unsubscribe from this group and stop receiving emails from it, send an email to dspace-tech+unsubscribe@googlegroups.com.

To post to this group, send email to dspac...@googlegroups.com.
Visit this group at https://groups.google.com/group/dspace-tech.
For more options, visit https://groups.google.com/d/optout.

--
Alan Orth
alan...@gmail.com
https://picturingjordan.com
https://englishbulgaria.net
https://mjanja.ch

Alan Orth

unread,

Jan 16, 2018, 12:36:16 PM1/16/18

to Hardy Pottinger, Tom Desair, DSpace Technical Support, Mark Wood

Thanks, Hardy. I've created a new Jira issue[0], updated my git commit message with the DS issue number, and updated my pull request[1] title and description to include the fact that this is slightly related to DS-3434. I think that's good to go now.

[0] https://jira.duraspace.org/browse/DS-3803
[1] https://github.com/DSpace/DSpace/pull/1917

Cheers,

- Tim

To unsubscribe from this group and stop receiving emails from it, send an email to dspace-tech...@googlegroups.com.

To post to this group, send email to dspac...@googlegroups.com.
Visit this group at https://groups.google.com/group/dspace-tech.
For more options, visit https://groups.google.com/d/optout.

--
Tim Donohue
Technical Lead for DSpace & DSpaceDirect
DuraSpace.org | DSpace.org | DSpaceDirect.org

--
You received this message because you are subscribed to the Google Groups "DSpace Technical Support" group.

To unsubscribe from this group and stop receiving emails from it, send an email to dspace-tech...@googlegroups.com.

To post to this group, send email to dspac...@googlegroups.com.
Visit this group at https://groups.google.com/group/dspace-tech.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "DSpace Technical Support" group.

To unsubscribe from this group and stop receiving emails from it, send an email to dspace-tech...@googlegroups.com.

To post to this group, send email to dspac...@googlegroups.com.
Visit this group at https://groups.google.com/group/dspace-tech.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "DSpace Technical Support" group.

To unsubscribe from this group and stop receiving emails from it, send an email to dspace-tech...@googlegroups.com.

Hayden Young

unread,

Jan 30, 2018, 7:11:43 AM1/30/18

to DSpace Technical Support

Hi Alan

Thanks for the in-depth analysis and possible remedies. I implemented the jndi connections as you had outlined and tested that it did indeed use the Tomcat settings and not the DSpace configuration. As soon as I redeployed our DSpace it certainly felt as if there was an improvement in performance. A couple of changes were required to the jndi tomcat configuration in server.xml (probably related to running tomcat8) which were changing maxWait to maxWaitMillis and maxActive to maxTotal.

Unfortunately, we were soon back to where we started with org.apache.http.conn.ConnectionPoolTimeoutException: Timeout waiting for connection from pool errors when browsing bitstreams. We also noticed the problem arose quicker than it has in the past; usually it is 1 to 2 weeks before we see this problem; this time it has happened within 3 days. The site does not experience much traffic and a check of the Postgres idle connections shows only 8 which seems well below what would cause the db error.

Are you able to outline possible steps to analyzing the problem from DSpace's end?

Our other solution is to put PG Bouncer between the db and dspace although my feeling is that this is only a temporary solution.

Thanks

Hayden Young
https://www.knowledgearc.com

Alan Orth

unread,

Jan 30, 2018, 9:09:57 AM1/30/18

to Hayden Young, DSpace Technical Support

You're welcome, Hayden. Glad it helped.

Regarding your performance issues, you need to try to understand where the load is coming from. You might just have legitimate traffic, either from users or search bots. Start by looking at your web server logs. We use nginx in front of Tomcat, so I started logging requests to XMLUI, OAI, and REST separately so I could see exactly what was happening in each. You'll get much more insight once you start to understand the nature of your traffic.

For example, Google and Bing respect robots.txt so you if enable the cron job for nightly sitemap generation, then you can submit your sitemap to their web master tools consoles, and then alter your robots.txt so that dynamic search pages like Discovery and Browse are forbidden. This is a big help because those pages are computationally heavy. See this wiki page for how to enable sitemaps:

https://wiki.duraspace.org/display/DSDOC5x/Search+Engine+Optimization

On that note, from looking at my logs I saw that Baidu doesn't respect robots.txt and just crawls all the dynamic pages. I used nginx to identify Baidu by its user agent and throttle its requests using nginx's limit_req module. I wrote about it on my blog a few months ago:

https://mjanja.ch/2017/11/rate-limiting-baiduspider-using-nginx/

Another important tweak along those lines is Tomcat's Crawler Session Manager valve. Basically, Tomcat assigns all users a new session (JSESSIONID) when they connect. Bots are users too, so they get one as well. The problem is that search bots like Google, Bing, and Baidu often crawl from fifty (50) IP addresses concurrently, and each one of those gets a session, each of which takes up precious CPU time, memory, and database resources. If you enable the Crawler Session Manager valve Tomcat will force all matching user agents to use one session. See the Tomcat docs or look at our server.xml for more information:

https://github.com/ilri/rmg-ansible-public/blob/master/roles/dspace/templates/tomcat/server-tomcat7.xml.j2

Sometimes I will get an alert that our server's CPU load is very high so I'll go look at the logs and do some grep and awk magic to figure out the IP addresses for the day's top ten users. Often its illuminating. For example, one day in December some client in China downloaded 20,000 PDFs from us. Another time some open access bot started harvesting us and made 400,000 requests in a few hours. I looked at the logs, found the contact information in their user agent string, and contacted their developers. They acknowledged that there bot had gone awry due to a bug and fixed it.

Now our repository has been using a pool with a maximum of 300 connections for a month or so and I haven't had DSpace crash due to database issues since then. We still have high load every few days and users complain that the site is "down", but when I look at PostgreSQL activity I see we have 211 active database connections—so there's a problem but it's not the database! This is when I started to look elsewhere in the application stack to alleviate this issue. For example, Tomcat's default maxThreads is 200, so it's likely that this is the bottle neck now. I've recently bumped it and its companion processorCache up to 400 and I'm monitoring things now.

So start with looking at the logs! They are very interesting and enlightening. :)

Cheers,

Mark H. Wood

unread,

Feb 1, 2018, 8:43:35 AM2/1/18

to DSpace Technical Support

If you have stuck connections, you may want to try some of the pool provider's connection testing parameters. I've had good results using 'validationQuery="SELECT 1"' with 'testOnBorrow="true"'.

Also I've found that older PostgreSQL drivers seem to see this problem more frequently, so I try to keep Tomcat updated with the latest released DBMS drivers.

Alan Orth

unread,

Feb 7, 2018, 6:07:55 PM2/7/18

to Mark H. Wood, DSpace Technical Support

Is there some kind of database transaction leaking issue with DSpace, particularly the XMLUI? Basically, I find that it doesn't matter what I my pool size is. Eventually it always gets full. Sometimes it takes a few days, sometimes it happens a few times in one day. Today I saw this in dspace.log.2018-02-07:

org.apache.tomcat.jdbc.pool.PoolExhaustedException: [http-bio-127.0.0.1-8443-exec-328] Timeout: Pool empty. Unable to fetch a connection in 5 seconds, none available[size:250; busy:250; idle:0; lastwait:5000].

The pool size is 250, and yet PostgreSQL activity shows literally all of the 250 connections as either idle or idle in transaction:

$ psql -c 'select * from pg_stat_activity' | grep -c "PostgreSQL JDBC"
250
$ psql -c 'select * from pg_stat_activity' | grep "PostgreSQL JDBC" | grep -c idle
250
$ psql -c 'select * from pg_stat_activity' | grep "PostgreSQL JDBC" | grep -c "idle in transaction"
187

What is going on? I'm going crazy restarting the server and staring at log files trying to figure out what is going on. We're using DSpace 5.5 with PostgreSQL 9.5.10 and the latest PostgreSQL JDBC driver from jdbc.postgresql.org (42.2.1) on fast hardware with plenty of memory and CPU. But DSpace. Always. Crashes. Every. Day. It seems DS-3551 might help, but we can't move to DSpace 5.7 yet, and our 5.5 branch doesn't build if I cherry-pick the commits for DS-3551[0].

[0] https://jira.duraspace.org/browse/DS-3551

Let's get to the bottom of this,

On Thu, Feb 1, 2018 at 3:43 PM Mark H. Wood <mwood...@gmail.com> wrote:

If you have stuck connections, you may want to try some of the pool provider's connection testing parameters. I've had good results using 'validationQuery="SELECT 1"' with 'testOnBorrow="true"'.

Also I've found that older PostgreSQL drivers seem to see this problem more frequently, so I try to keep Tomcat updated with the latest released DBMS drivers.

--

You received this message because you are subscribed to the Google Groups "DSpace Technical Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dspace-tech...@googlegroups.com.
To post to this group, send email to dspac...@googlegroups.com.
Visit this group at https://groups.google.com/group/dspace-tech.
For more options, visit https://groups.google.com/d/optout.

Tim Donohue

unread,

Feb 8, 2018, 12:26:56 PM2/8/18

to Alan Orth, Mark H. Wood, DSpace Technical Support

Hi Alan,

We do occasionally find areas of DSpace that fail to properly close database connections or otherwise clean up after themselves. So, it's not unheard of for database leaks to be found and then fixed. Obviously though, if we were aware of a specific leak in XMLUI, we'd get it fixed.

I do know that significant work at the database layer went into DSpace 6 (with the move to Hibernate)... and I've heard reports (from a few folks) that these sorts of issues are less frequent in DSpace 6.x. (As I have heard of other large sites that said they had to restart DSpace every once in a while prior to 6.x, but are not seeing the same issues with 6.x. I've not seen this behavior myself, but it's very possible that it is encountered more with highly active and/or larger sites)

That unfortunately doesn't solve the issues you are seeing though. Is your site extremely active (in terms of concurrent users)? Are you perhaps having a lot of bad behaving spiders that are hitting your site and possibly using up database connections? 250 connections in the pool sounds like plenty, but it obviously may be dependent on how many concurrent hits you are seeing.

In searching around, I also came across some older (circa 2010) notes on our wiki at https://wiki.duraspace.org/display/DSPACE/Idle+In+Transaction+Problem It doesn't provide any exact solutions though, but it shows that you are not alone (I'm not sure I'd recommend killing "idle in transaction" processes though, as that wiki page suggests). Also here's more info from Postgres on what "idle in transaction" means: https://www.postgresql.org/docs/9.2/static/monitoring-ps.html (it also suggests that looking at pg_locks system view might provide info on what transactions are still open, etc).

So, I don't have any answers here, but maybe these will provide you with clues or inspiration to dig in a bit further. Please do let us know what you find, and feel free to post back with any clues or breadcrumbs you discover along the way.

- Tim

Alan Orth

unread,

Feb 10, 2018, 10:14:11 AM2/10/18

to Tim Donohue, Mark H. Wood, DSpace Technical Support

Tim,

Any idea why DSpace thinks those connections are busy when PostgreSQL itself reports them as idle?

In order to try to understand the traffic that is coming in I wrote a Munin plugin[0] that uses JMX to monitor the number of active sessions in DSpace applications (REST doesn't appear to create sessions). From that it's clear that we have quite a lot of traffic sometimes, with nearly 1500 active sessions sometimes (see attached)! We aggressively add bot user agents to Tomcat's Crawler Session Manager Valve so all of these bots like Google, Bing, and Baidu that browse from fifty IP addresses concurrently get lumped into one JSESSIONID.

In any case I've adjusted my JDBC pool parameters to include the parameters removeAbandoned='true' and testOnReturn='true' in addition to those Mark and I were using earlier in this thread. Things are calm on the site the last few days but let's see what happens during the week. We're in the process of upgrading from 5.5 to 5.8, which will include some database fixes, but I'm definitely looking forward to moving our repository to DSpace 6.x to take advantage of the Hibernate work.

[0] https://github.com/ilri/rmg-ansible-public/blob/master/roles/munin/files/jmx_dspace_sessions

Cheers,

jmx_dspace_sessions-week-fs8.png

Alan Orth

unread,

Feb 27, 2018, 8:48:42 AM2/27/18

to Tim Donohue, Mark H. Wood, DSpace Technical Support

Hello,

The JDBC pool parameter removeAbandoned='true' absolutely solves the idle in transaction issue.

However! In the past two weeks since I enabled it, I notice that we don't have any "idle in transaction" connections in pg_stat_activity anymore—and only a handful of active connections at any given time. The default timeout is to prune "abandoned" connections after 60 seconds, which doesn't seem to be a problem except on the XMLUI submissions page. One user with many (thousands?) of submissions says this page is very slow and experiences "Socket closed" errors eventually. This is presumably PostgreSQL killing his connection. I've tried increasing this to 90 and even 180 seconds, but it doesn't help. What is happening on the /submissions page?

Anyways, as an experiment, this morning I turned off the removeAbandoned='true' setting on the JDBC pool and within a few hours DSpace was down and there were 200 "idle in transaction" connections again. *sigh* For what it's worth, pg_stat_activity shows 100 "idle in transaction" connections running the following query:

SELECT * FROM resourcepolicy WHERE resource_type_id= $1 AND resource_id= $2 AND action_id= $3

I'm hoping for some insight on what this query is, and about the XMLUI submissions page being slow! Again, we're running DSpace 5.5, PostgreSQL 9.5.11, and XMLUI with ~70,000 items.

Thanks,

Tom Desair

unread,

Feb 27, 2018, 10:34:15 AM2/27/18

to Alan Orth, DSpace Technical Support

Some other useful configuration options you can use in combination with the "removeAbandoned='true'" setting:

abandonWhenPercentageFull: Only start cleaning up abandoned connections if the pool is used for more than X %.
jdbcInterceptors="ResetAbandonedTimer": Make sure the "abondoned" timer is reset every time there is activity on a connection

https://tomcat.apache.org/tomcat-7.0-doc/jdbc-pool.html#Tomcat_JDBC_Enhanced_Attributes

@Alan, can you check if any of these indexes improves the performance of that query: https://github.com/DSpace/DSpace/pull/1791/files#diff-5244ef3f861b63643134797c3866e16a ?

Tom Desair
250-B Suite 3A, Lucius Gordon Drive, West Henrietta, NY 14586
Gaston Geenslaan 14, Leuven 3001, Belgium
www.atmire.com

To unsubscribe from this group and stop receiving emails from it, send an email to dspace-tech+unsubscribe@googlegroups.com.

To post to this group, send email to dspac...@googlegroups.com.
Visit this group at https://groups.google.com/group/dspace-tech.
For more options, visit https://groups.google.com/d/optout.

--
Alan Orth
alan...@gmail.com

https://picturingjordan.com
https://englishbulgaria.net
https://mjanja.ch

--
You received this message because you are subscribed to the Google Groups "DSpace Technical Support" group.

To unsubscribe from this group and stop receiving emails from it, send an email to dspace-tech+unsubscribe@googlegroups.com.

To post to this group, send email to dspac...@googlegroups.com.
Visit this group at https://groups.google.com/group/dspace-tech.
For more options, visit https://groups.google.com/d/optout.

--
Tim Donohue
Technical Lead for DSpace & DSpaceDirect
DuraSpace.org | DSpace.org | DSpaceDirect.org

--
Alan Orth
alan...@gmail.com
https://picturingjordan.com
https://englishbulgaria.net
https://mjanja.ch

--

You received this message because you are subscribed to the Google Groups "DSpace Technical Support" group.

To unsubscribe from this group and stop receiving emails from it, send an email to dspace-tech+unsubscribe@googlegroups.com.

Alan Orth

unread,

Feb 28, 2018, 10:22:29 AM2/28/18

to Tom Desair, DSpace Technical Support

Thanks, Tom!

Those indexes really seem to help on the XMLUI /submissions page. For example, on my production server I just tried to clear 22 old unfinished submissions and it took about five minutes to finish. Then, while the server was processing those operations, I used this query from the PostgreSQL 9.5 docs[0] to capture some snapshots of the PostgreSQL locks:

SELECT * FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid = psa.pid;

I observed a growing number of locks — 500, 800, and eventually 2,000 — during the process, most of which were on the resourcepolicy and metadatavalue tables, ie:

SELECT * FROM resourcepolicy WHERE resource_type_id= $1 AND resource_id= $2 AND action_id= $3

SELECT * FROM MetadataValue WHERE resource_id= $1 and resource_type_id = $2 ORDER BY metadata_field_id, place

During normal DSpace usage it seems that the number of locks is usually around 100 or 200.

I applied the indexes from DS-3636[1] on my test server and tried the same operation and it finished almost instantly. This pull request is a no brainer in my opinion—I would recommend reviving the dialog around it on Jira and GitHub. For reference, I've uploaded one of the logs of 2,000 locks that I recorded during this process[2].

[0] https://www.postgresql.org/docs/9.5/static/view-pg-locks.html
[1] https://github.com/DSpace/DSpace/pull/1791
[2] https://gist.github.com/alanorth/3c4fea2a277abfa4c816d66d9c795100

Regards,

To unsubscribe from this group and stop receiving emails from it, send an email to dspace-tech...@googlegroups.com.

To post to this group, send email to dspac...@googlegroups.com.
Visit this group at https://groups.google.com/group/dspace-tech.
For more options, visit https://groups.google.com/d/optout.

--
Alan Orth
alan...@gmail.com

https://picturingjordan.com
https://englishbulgaria.net
https://mjanja.ch

--
You received this message because you are subscribed to the Google Groups "DSpace Technical Support" group.

To unsubscribe from this group and stop receiving emails from it, send an email to dspace-tech...@googlegroups.com.

To post to this group, send email to dspac...@googlegroups.com.
Visit this group at https://groups.google.com/group/dspace-tech.
For more options, visit https://groups.google.com/d/optout.

--
Tim Donohue
Technical Lead for DSpace & DSpaceDirect
DuraSpace.org | DSpace.org | DSpaceDirect.org

--
Alan Orth
alan...@gmail.com
https://picturingjordan.com
https://englishbulgaria.net
https://mjanja.ch

--
Alan Orth
alan...@gmail.com
https://picturingjordan.com
https://englishbulgaria.net
https://mjanja.ch

--
You received this message because you are subscribed to the Google Groups "DSpace Technical Support" group.

To unsubscribe from this group and stop receiving emails from it, send an email to dspace-tech...@googlegroups.com.

To post to this group, send email to dspac...@googlegroups.com.
Visit this group at https://groups.google.com/group/dspace-tech.
For more options, visit https://groups.google.com/d/optout.

Reply all

Reply to author

Forward