[Dspace-tech] tomcat reporting memory leak?

1,011 views
Skip to first unread message

Damian Marinaccio

unread,
Aug 25, 2015, 2:12:31 PM8/25/15
to dspac...@lists.sourceforge.net

I’m seeing the following log messages in catalina.out:

 

INFO: Deploying web application directory ROOT

Sep 19, 2010 8:01:33 PM org.apache.catalina.loader.WebappClassLoader clearReferencesThreads

SEVERE: The web application [] appears to have started a thread named [FinalizableReferenceQueue] but has failed to stop it.

 This is very likely to create a memory leak.

Sep 19, 2010 8:01:33 PM org.apache.catalina.loader.WebappClassLoader clearReferencesThreads

SEVERE: The web application [] appears to have started a thread named [MultiThreadedHttpConnectionManager cleanup] but has f

ailed to stop it. This is very likely to create a memory leak.

Sep 19, 2010 8:01:33 PM org.apache.catalina.loader.WebappClassLoader clearThreadLocalMap

SEVERE: The web application [] created a ThreadLocal with key of type [org.dspace.services.caching.ThreadLocalMap] (value [o

rg.dspace.services.caching.ThreadLocalMap@d32560]) and a value of type [java.util.HashMap] (value [{}]) but failed to remove

 it when the web application was stopped. This is very likely to create a memory leak.

Sep 19, 2010 8:01:33 PM org.apache.catalina.loader.WebappClassLoader clearThreadLocalMap

SEVERE: The web application [] created a ThreadLocal with key of type [java.lang.ThreadLocal] (value [java.lang.ThreadLocal@

3adaaa]) and a value of type [org.apache.cocoon.environment.internal.EnvironmentStack] (value [[]]) but failed to remove it

when the web application was stopped. This is very likely to create a memory leak.

Sep 19, 2010 8:01:33 PM org.apache.catalina.loader.WebappClassLoader clearThreadLocalMap

SEVERE: The web application [] created a ThreadLocal with key of type [java.lang.ThreadLocal] (value [java.lang.ThreadLocal@

1ea0b8a]) and a value of type [org.apache.xerces.parsers.SAXParser] (value [org.apache.xerces.parsers.SAXParser@bfa709]) but

 failed to remove it when the web application was stopped. This is very likely to create a memory leak.

Sep 19, 2010 8:01:33 PM org.apache.catalina.loader.WebappClassLoader clearThreadLocalMap

SEVERE: The web application [] created a ThreadLocal with key of type [java.lang.ThreadLocal] (value [java.lang.ThreadLocal@

9b9a36]) and a value of type [org.apache.lucene.index.SegmentTermEnum] (value [org.apache.lucene.index.SegmentTermEnum@1a95d

e6]) but failed to remove it when the web application was stopped. This is very likely to create a memory leak.

Sep 19, 2010 8:01:33 PM org.apache.catalina.loader.WebappClassLoader clearThreadLocalMap

SEVERE: The web application [] created a ThreadLocal with key of type [java.lang.ThreadLocal] (value [java.lang.ThreadLocal@

3adaaa]) and a value of type [org.apache.cocoon.environment.internal.EnvironmentStack] (value [[]]) but failed to remove it

when the web application was stopped. This is very likely to create a memory leak.

Sep 19, 2010 8:01:33 PM org.apache.catalina.loader.WebappClassLoader clearThreadLocalMap

SEVERE: The web application [] created a ThreadLocal with key of type [java.lang.ThreadLocal] (value [java.lang.ThreadLocal@

53bd6e]) and a value of type [org.apache.lucene.index.SegmentTermEnum] (value [org.apache.lucene.index.SegmentTermEnum@1b9c0

86]) but failed to remove it when the web application was stopped. This is very likely to create a memory leak.

Sep 19, 2010 8:01:33 PM org.apache.catalina.loader.WebappClassLoader clearThreadLocalMap

SEVERE: The web application [] created a ThreadLocal with key of type [java.lang.ThreadLocal] (value [java.lang.ThreadLocal@

ecd7c]) and a value of type [org.apache.lucene.index.SegmentTermEnum] (value [org.apache.lucene.index.SegmentTermEnum@1d4afe

d]) but failed to remove it when the web application was stopped. This is very likely to create a memory leak.

Sep 19, 2010 8:01:33 PM org.apache.catalina.loader.WebappClassLoader clearThreadLocalMap

SEVERE: The web application [] created a ThreadLocal with key of type [java.lang.ThreadLocal] (value [java.lang.ThreadLocal@

6a081c]) and a value of type [org.apache.lucene.index.SegmentTermEnum] (value [org.apache.lucene.index.SegmentTermEnum@13a9a

cb]) but failed to remove it when the web application was stopped. This is very likely to create a memory leak.

Sep 19, 2010 8:01:33 PM org.apache.catalina.loader.WebappClassLoader clearThreadLocalMap

SEVERE: The web application [] created a ThreadLocal with key of type [java.lang.ThreadLocal] (value [java.lang.ThreadLocal@

88a3ce]) and a value of type [org.apache.lucene.index.SegmentTermEnum] (value [org.apache.lucene.index.SegmentTermEnum@ba2bb

5]) but failed to remove it when the web application was stopped. This is very likely to create a memory leak.

Sep 19, 2010 8:01:33 PM org.apache.catalina.loader.WebappClassLoader clearThreadLocalMap

SEVERE: The web application [] created a ThreadLocal with key of type [java.lang.ThreadLocal] (value [java.lang.ThreadLocal@

1ea0b8a]) and a value of type [org.apache.xerces.parsers.SAXParser] (value [org.apache.xerces.parsers.SAXParser@b481ba]) but

 failed to remove it when the web application was stopped. This is very likely to create a memory leak.

Sep 19, 2010 8:01:33 PM org.apache.catalina.loader.WebappClassLoader clearThreadLocalMap

SEVERE: The web application [] created a ThreadLocal with key of type [java.lang.ThreadLocal] (value [java.lang.ThreadLocal@

8b8bf0]) and a value of type [org.apache.lucene.index.SegmentTermEnum] (value [org.apache.lucene.index.SegmentTermEnum@1903d

f2]) but failed to remove it when the web application was stopped. This is very likely to create a memory leak.

Sep 19, 2010 8:01:33 PM org.apache.catalina.loader.WebappClassLoader clearThreadLocalMap

SEVERE: The web application [] created a ThreadLocal with key of type [java.lang.ThreadLocal] (value [java.lang.ThreadLocal@

ecd7c]) and a value of type [org.apache.lucene.index.SegmentTermEnum] (value [org.apache.lucene.index.SegmentTermEnum@d2dbab

]) but failed to remove it when the web application was stopped. This is very likely to create a memory leak.

Sep 19, 2010 8:01:33 PM org.apache.catalina.loader.WebappClassLoader clearThreadLocalMap

SEVERE: The web application [] created a ThreadLocal with key of type [java.lang.ThreadLocal] (value [java.lang.ThreadLocal@

9b9a36]) and a value of type [org.apache.lucene.index.SegmentTermEnum] (value [org.apache.lucene.index.SegmentTermEnum@153ca

70]) but failed to remove it when the web application was stopped. This is very likely to create a memory leak.

Sep 19, 2010 8:01:33 PM org.apache.catalina.loader.WebappClassLoader clearThreadLocalMap

SEVERE: The web application [] created a ThreadLocal with key of type [java.lang.ThreadLocal] (value [java.lang.ThreadLocal@

3adaaa]) and a value of type [org.apache.cocoon.environment.internal.EnvironmentStack] (value [[]]) but failed to remove it

when the web application was stopped. This is very likely to create a memory leak.

Sep 19, 2010 8:01:33 PM org.apache.catalina.loader.WebappClassLoader clearThreadLocalMap

SEVERE: The web application [] created a ThreadLocal with key of type [java.lang.ThreadLocal] (value [java.lang.ThreadLocal@

6a081c]) and a value of type [org.apache.lucene.index.SegmentTermEnum] (value [org.apache.lucene.index.SegmentTermEnum@f86ec

]) but failed to remove it when the web application was stopped. This is very likely to create a memory leak.

Sep 19, 2010 8:01:33 PM org.apache.catalina.loader.WebappClassLoader clearThreadLocalMap

SEVERE: The web application [] created a ThreadLocal with key of type [java.lang.ThreadLocal] (value [java.lang.ThreadLocal@

88a3ce]) and a value of type [org.apache.lucene.index.SegmentTermEnum] (value [org.apache.lucene.index.SegmentTermEnum@16313

e3]) but failed to remove it when the web application was stopped. This is very likely to create a memory leak.

Sep 19, 2010 8:01:33 PM org.apache.catalina.loader.WebappClassLoader clearThreadLocalMap

SEVERE: The web application [] created a ThreadLocal with key of type [java.lang.ThreadLocal] (value [java.lang.ThreadLocal@

1ea0b8a]) and a value of type [org.apache.xerces.parsers.SAXParser] (value [org.apache.xerces.parsers.SAXParser@1a88001]) bu

t failed to remove it when the web application was stopped. This is very likely to create a memory leak.

Sep 19, 2010 8:01:33 PM org.apache.catalina.loader.WebappClassLoader clearThreadLocalMap

SEVERE: The web application [] created a ThreadLocal with key of type [java.lang.ThreadLocal] (value [java.lang.ThreadLocal@

53bd6e]) and a value of type [org.apache.lucene.index.SegmentTermEnum] (value [org.apache.lucene.index.SegmentTermEnum@1aeaf

42]) but failed to remove it when the web application was stopped. This is very likely to create a memory leak.

Sep 19, 2010 8:01:33 PM org.apache.catalina.loader.WebappClassLoader clearThreadLocalMap

SEVERE: The web application [] created a ThreadLocal with key of type [java.lang.ThreadLocal] (value [java.lang.ThreadLocal@

8b8bf0]) and a value of type [org.apache.lucene.index.SegmentTermEnum] (value [org.apache.lucene.index.SegmentTermEnum@1741c

0b]) but failed to remove it when the web application was stopped. This is very likely to create a memory leak.

Sep 19, 2010 8:01:33 PM org.apache.catalina.loader.WebappClassLoader clearThreadLocalMap

SEVERE: The web application [] created a ThreadLocal with key of type [java.lang.ThreadLocal] (value [java.lang.ThreadLocal@

9b9a36]) and a value of type [org.apache.lucene.index.SegmentTermEnum] (value [org.apache.lucene.index.SegmentTermEnum@1dd54

78]) but failed to remove it when the web application was stopped. This is very likely to create a memory leak.

Sep 19, 2010 8:01:33 PM org.apache.catalina.loader.WebappClassLoader clearThreadLocalMap

SEVERE: The web application [] created a ThreadLocal with key of type [java.lang.ThreadLocal] (value [java.lang.ThreadLocal@

3adaaa]) and a value of type [org.apache.cocoon.environment.internal.EnvironmentStack] (value [[]]) but failed to remove it

when the web application was stopped. This is very likely to create a memory leak.

Sep 19, 2010 8:01:33 PM org.apache.catalina.loader.WebappClassLoader clearThreadLocalMap

SEVERE: The web application [] created a ThreadLocal with key of type [java.lang.ThreadLocal] (value [java.lang.ThreadLocal@

53bd6e]) and a value of type [org.apache.lucene.index.SegmentTermEnum] (value [org.apache.lucene.index.SegmentTermEnum@1d4ad

e8]) but failed to remove it when the web application was stopped. This is very likely to create a memory leak.

Sep 19, 2010 8:01:33 PM org.apache.catalina.loader.WebappClassLoader clearThreadLocalMap

SEVERE: The web application [] created a ThreadLocal with key of type [java.lang.ThreadLocal] (value [java.lang.ThreadLocal@

ecd7c]) and a value of type [org.apache.lucene.index.SegmentTermEnum] (value [org.apache.lucene.index.SegmentTermEnum@1e5865

d]) but failed to remove it when the web application was stopped. This is very likely to create a memory leak.

Sep 19, 2010 8:01:33 PM org.apache.catalina.loader.WebappClassLoader clearThreadLocalMap

SEVERE: The web application [] created a ThreadLocal with key of type [java.lang.ThreadLocal] (value [java.lang.ThreadLocal@

6a081c]) and a value of type [org.apache.lucene.index.SegmentTermEnum] (value [org.apache.lucene.index.SegmentTermEnum@1d884

3d]) but failed to remove it when the web application was stopped. This is very likely to create a memory leak.

Sep 19, 2010 8:01:33 PM org.apache.catalina.loader.WebappClassLoader clearThreadLocalMap

SEVERE: The web application [] created a ThreadLocal with key of type [java.lang.ThreadLocal] (value [java.lang.ThreadLocal@

88a3ce]) and a value of type [org.apache.lucene.index.SegmentTermEnum] (value [org.apache.lucene.index.SegmentTermEnum@1d1f6

0e]) but failed to remove it when the web application was stopped. This is very likely to create a memory leak.

Sep 19, 2010 8:01:33 PM org.apache.catalina.loader.WebappClassLoader clearThreadLocalMap

 

 

 

Damian Marinaccio

The Wallace Center

90 Lomb Memorial Drive

Rochester, NY 14623

585-475-7741

dxm...@rit.edu

 

CONFIDENTIALITY NOTE: The information transmitted, including attachments, is intended only for the person(s) or entity to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon this information by persons or entities other than the intended recipient is prohibited. If you received this in error, please contact the sender and destroy any copies of this information.

 

Tom De Mulder

unread,
Aug 25, 2015, 2:12:33 PM8/25/15
to Damian Marinaccio, dspac...@lists.sourceforge.net
On Mon, 20 Sep 2010, Damian Marinaccio wrote:

> I'm seeing the following log messages in catalina.out:
> [...]
> SEVERE: The web application [] appears to have started a thread named [FinalizableReferenceQueue] but has failed to stop it.
> This is very likely to create a memory leak.

There are quite a few memory leaks in DSpace. We have a cronjob to restart
Tomcat nightly, because otherwise it'll break the next day.


Best,

--
Tom De Mulder <td...@cam.ac.uk> - Cambridge University Computing Service
+44 1223 3 31843 - New Museums Site, Pembroke Street, Cambridge CB2 3QH
-> 20/09/2010 : The Moon is Waxing Gibbous (80% of Full)

Panyarak Ngamsritragul

unread,
Aug 25, 2015, 2:12:39 PM8/25/15
to dspac...@lists.sourceforge.net

Hi,

There are 2 points here:
1. In our repository, we have configured to allow crawler to browser our
site by putting a robot.txt with only one line :
User-agent: *
I have checked with webmaster tools and it reports that the crawler access
was success. Anyway, I am not quite sure that should be OK. The problem
is that internal error messages are being sent to me everyday saying that
the crawler cannot access certain pages. I have checked the handles
attached and found that those are non-existent pages... Can any of you
please suggest what I should do to get rid of this kind of errors ?

2. I also submited sitemaps to Google, the latest result reported in
webmaster tools is:
Sitemap: http://kb.psu.ac.th/psukb/sitemap
Status: OK
Type: Index
Submitted: 17/7/2010
Downloaded:17/9/2010
URLs submitted: 4,545
URLs in web index: 3,785

Should I stop the crawler as mentioned in 1? and what happened to the
URLs which reported as not in web index?

Thanks.

Panyarak Ngamsritragul
Khunying Long Athakravisunthorn Learning Resources Center
Prince of Songkla University
Hat Yai, Songkhla, Thailand

--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


Graham Triggs

unread,
Aug 25, 2015, 2:12:45 PM8/25/15
to Tom De Mulder, dspac...@lists.sourceforge.net, Damian Marinaccio
On 20 September 2010 15:59, Tom De Mulder <td...@cam.ac.uk> wrote:
On Mon, 20 Sep 2010, Damian Marinaccio wrote:

> I'm seeing the following log messages in catalina.out:
> [...]
> SEVERE: The web application [] appears to have started a thread named [FinalizableReferenceQueue] but has failed to stop it.
> This is very likely to create a memory leak.

There are quite a few memory leaks in DSpace. We have a cronjob to restart
Tomcat nightly, because otherwise it'll break the next day.


Hi all,

Oh, welcome to my world!!

I'm going to start off by pointing out that the majority of DSpace code is actually quite well behaved. Going back to the codebase circa 1.4.2 / 1.5, and using the JSP user interface - I've got *thirty* spearate DSpace repositories / applications running in a single Tomcat instance, which has operated without a restart in over 90 days. And whilst be able to undeploy and redeploy any of those applications at will - or just reload them so that they pick up new configuration.

That does require a bit of careful setup / teardown in the context listeners (that wasn't always part of the DSpace code), and you need to get certain JARs - particularly the database/pooling drivers - out of the web applications entirely and into the shared level of Tomcat. Most of that is actually just good / recommended practise for systems administration of a Java application server anyway.

I was careful to point out that I have achieved that with pre-1.6 code and JSP only. Both 1.6 and XML ui (of any age) change the landscape. XML ui has always taken a large chunk of resources, although whilst it was still based on Cocoon 2.1, I managed to at least clean up it's startup / shutdown behaviour by repairing it's logging handler. This behaviour has changed with Cocoon 2.2, and I'll come back to that shortly.

So, 1.6 - I've been doing some work on the resource usage and clean loading/unloading of both JSP and XML using 1.6.2 recently, and neither are clean out of the box.

The first issue you run into is the FinalizableReferenceQueue noted in the stack trace above. This is coming from a reference map in reflectutils - and was found to be a cleanup problem in course of DSpace 2 development (the kernel / services framework was backported from that work). I added a LifecycleManager to reflectutils that was released as version 0.9.11 that allows the internal structures to be shutdown cleanly, and implemented this as part of DSpace 2, however this appears to have been ignored in the backport.

So, with the reflectutils/Lifecycle changes, and careful placement of JARs, etc. I did get the JSP ui to unload cleanly last week. I would note that I didn't stress the application too heavily, so there may be some operations that might trigger different code paths that are still a problem, but at the baseline it was working correctly.

XML ui has proven to be a somewhat more challenging beast. I first ran into two problems that are inside Cocoon 2.2 itself - 1) in the sitemap processing, it's using a stack inside a ThreadLocal, but it never removes the stack when it empties it, and 2) in one class relating to flowscript handling, it does not clean up the Mozilla Rhino engine correctly when it's finished using it (curiously, it's used in a number of places, and everywhere else it appears to be structured correctly to clean up - just this one class is screwed up).

With locally patched versions of the sitemap and flowscript JARs from Cocoon (the ThreadLocal patch isn't really guaranteed to not leak in unexpected circumstances - but it was sufficient to remove the problem in the scope of this testing. Basically, ThreadLocal is really dangerous to use), I then ran into another issue, this time with the CachingService that was backported.

With XML ui, it's using the RequestScope function of the caching service (it didn't appear to be exercising this part with JSP - that may just be because I only ran through limited code paths). For the RequestScope, it's tying the cache not to the request object... but to a ThreadLocal. And that ThreadLocal isn't being cleaned up at the end of the request. (The shutdown code is also incapable of doing the job it's intended for, as it will only ever execute on a single thread, and not see all the other threads that may have processed requests).

There is a high probability of this leaking memory all over the place, and there is also the nasty potential of leak information across requests that is undesirable.

I made another hacked version that removes the ThreadLocal, but replicates a lot of it's thread affinity behaviour (so, it still has the nasty side effects of the implementation, but at least removed the hold the system had over the application resources). XML ui was *still* not unloading correctly, and at this point the profiler stopped giving me pointers to strong references that were being held. So right now I'm not sure what else is up - but there is at least one more troubling part of the code remaining in there.

I have repeatedly warned about the consequences of overly-complicated code and using 'clever tricks' under the hood. A lot of what I've mentioned above *can* be replaced with a much simpler architecture, that's much easier to understand, easier to maintain, and does not have the same problems.

If this matters to you, then it's going to take more than just me to stand up and say this.

G

Pottinger, Hardy J.

unread,
Aug 25, 2015, 2:12:49 PM8/25/15
to Graham Triggs, Tom De Mulder, dspac...@lists.sourceforge.net, Damian Marinaccio
Hi, Graham, for what it's worth, I'll stand with you. :-) I think addressing the issues you've discovered is really important. Here's an idea: how about some new unit and/or performance tests that check if a class and/or app is unloading cleanly? In other words, would it be possible to express the tests you have in such a way that they could be part of the new testing framework? Are there JIRA issues, and/or patches for what you have already found/fixed?

--Hardy

Claudia Jürgen

unread,
Aug 25, 2015, 2:12:52 PM8/25/15
to dspac...@lists.sourceforge.net, Graham Triggs
Hello Graham,

this is an important point. Apart from the issues mentioned, a simpler
architecture will help DSpace adopt to new requirements/technology
changes and stay flexible and easy to manage.

Furthermore too much "clever tricks under the hood" will raise the risk
that with change in the committer team (people do change jobs, or just
priorities change and so the commitment) important knowledge will not be
available anymore and has to be regained at some cost.

Maybe we need the old arch board back or something similar.

Best put this on the list for the committer and all meetings or a
special meeting. Needs a bit more space to talk about.

Have a sunny day

Claudia



Am 21.09.2010 13:52, schrieb Graham Triggs:
...
> I have repeatedly warned about the consequences of overly-complicated code
> and using 'clever tricks' under the hood. A lot of what I've mentioned above
> *can* be replaced with a much simpler architecture, that's much easier to
> understand, easier to maintain, and does not have the same problems.
>
> If this matters to you, then it's going to take more than just me to stand
> up and say this.
>
> G
>
>
>
>
> ------------------------------------------------------------------------------
> Start uncovering the many advantages of virtual appliances
> and start using them to simplify application deployment and
> accelerate your shift to cloud computing.
> http://p.sf.net/sfu/novell-sfdev2dev
>
>
>
> _______________________________________________
> DSpace-tech mailing list
> DSpac...@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/dspace-tech

--
Claudia Juergen
Universitaetsbibliothek Dortmund
Eldorado
0231/755-4043
https://eldorado.tu-dortmund.de/

TAYLOR Robin

unread,
Aug 25, 2015, 2:12:53 PM8/25/15
to Pottinger, Hardy J., Graham Triggs, Tom De Mulder, dspac...@lists.sourceforge.net, Damian Marinaccio
Hi Graham,

I don't have time at the moment to consider some of the bigger issues you raise but I would like to echo Hardy's comments. Historically many Dspace installations have had little content and been lightly used. I think this has allowed us to develop without much consideration for performance. I would like to see the sort of testing you have done becoming part of our procedures prior to release, rather than being left to the bigger sites, such as BioMed, to sort out after the event.

Cheers, Robin.


Robin Taylor
Main Library
University of Edinburgh
Tel. 0131 6513808
> --------------------------------------------------------------
> ----------------
> Start uncovering the many advantages of virtual appliances
> and start using them to simplify application deployment and
> accelerate your shift to cloud computing.
> http://p.sf.net/sfu/novell-sfdev2dev
> _______________________________________________
> DSpace-tech mailing list
> DSpac...@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/dspace-tech
>
--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.


Tom De Mulder

unread,
Aug 25, 2015, 2:12:54 PM8/25/15
to dspac...@lists.sourceforge.net

I am very happy to see that this issue seems finally to be taken seriously. However, I find myself getting a bit frustrated that it was never taken seriously when I raised it in the past.

I think the DSpace source code carries with it a lot of historical baggage, and it could do with being addressed even without making fundamental changes to the basic architecture. Although my personal favourite would be a completely new architecture with more loosely coupled modules, but fixing memory leaks and the associated slow performance would be a good start.

I can add that, for example, deleting a collection with 1200 items on our rather powerful DSpace machines will take two hours, and uses most of the available memory. You can see why I would like that no longer to be the case.


Best regards,

Tim Donohue

unread,
Aug 25, 2015, 2:13:04 PM8/25/15
to dspac...@lists.sourceforge.net
Hi all,

I'm sorry if any of you have felt that this issue is not being taken
seriously in the past. The reality of the situation is that we (the
DSpace Developers/Committers) currently depend on feedback/testing from
larger DSpace instances around these sorts of scalability and memory
issues. As DSpace is Community Built & Supported Software, there are a
couple things to keep in mind:

(1) DSpace software has zero full-time developers. All Committers are
volunteers and can only devote as much time as their individual
institutions allow. Although I officially have "DSpace" in my title, I
also wear several hats in DuraSpace. Therefore, even I don't have much
time in a given week to devote towards actual DSpace development work.

(2) We currently don't have a centralized server with enough test data
to run many of these memory or scalability tests on our own. I think
this is something we could look into improving upon (especially if
anyone has test data to donate to the cause). I agree with Robin T.
that it is in everyone's interest to improve our performance testing
prior to each release. I'd also encourage Graham (and others) to share
their testing routes so that we can work to make this happen, and start
to locate these performance issues *before* new releases, rather than after.

I'm also very happy to see these issues starting to gain some leverage.
The reality of the situation is that we need one or more volunteers to
step up and help to make these improvements or suggest testing routes
that can allow us to better investigate where memory leaks may be
occurring (or point them out if you've already found where the leaks
are). All of us want DSpace to scale well and avoid memory leaks -- if
it takes a new architecture to do so that is one possible route forward.
But, the main thing to keep in mind is that DSpace is built &
maintained by volunteer developers -- so, we need to find the volunteers
(and convince their institutions) to help make this happen.

It sounds like we've already located a few interested parties in this
discussion. So, I hope that we can move forward with this work soon and
perhaps even make some quick improvements in time for the rapidly
approaching 1.7.0 release.

If you'd like to volunteer to help us out, please let us know how you'd
like to help!

- Tim

--
Tim Donohue
Technical Lead for DSpace Project
DuraSpace.org
> ------------------------------------------------------------------------------

Sands Alden Fish

unread,
Aug 25, 2015, 2:13:05 PM8/25/15
to Tim Donohue, dspac...@lists.sourceforge.net
On Sep 22, 2010, at 12:10 PM, Tim Donohue wrote:

(2) We currently don't have a centralized server with enough test data
to run many of these memory or scalability tests on our own.  I think
this is something we could look into improving upon (especially if
anyone has test data to donate to the cause).  

There's a lot of Creative Commons licensed content in the DSpace-sphere.  Perhaps an effort to gather what various sites are willing to donate into a DuraSpace repository would give us the amount of data we need, as well as beneficial heterogeneity in said data?  Perhaps beyond this (and certainly there would be other considerations here) it could be set up in such a way that the data could be (extremely) easily replicated into one's test environment to put an instance through its paces?

I agree with Robin T.
that it is in everyone's interest to improve our performance testing
prior to each release.  I'd also encourage Graham (and others) to share
their testing routes so that we can work to make this happen, and start
to locate these performance issues *before* new releases, rather than after.

Pursuant of a first step in this direction (and one that would help me personally), I'd like to ask if anyone out there has an Apache JMeter test plan file that is/could be generalized for use stressing any DSpace application.  I know that each instance has its own customizations, URL patterns, areas to stress, etc. but there is a lot that could be covered generally for any implementation.  Does this exist out there?  I have always just cobbled together a very simplistic setup that hits the front page, community-list, some particular items and URLs.  Perhaps we can collaboratively build one out with everyone's input.



--
sands fish
Software Engineer
MIT Libraries
Technology Research & Development
sa...@MIT.EDU
E25-131

Mark H. Wood

unread,
Aug 25, 2015, 2:13:07 PM8/25/15
to dspac...@lists.sourceforge.net
A random collection of thoughts which occurred while reading this
thread:

o Performance, scalability, complexity, and ruggedness are sometimes
competing influences on the design of code. We can improve in all
of these aspects. Sometimes all of those influences will conspire
to suggest a particular design, and at other times we will have to
trade them off against one another. And performance, in
particular, is tricky to characterize, because a design that
performs best at small scale may be worst at large scale or vice
versa.

What I think I am getting at here is that we want many different
kinds of goodness and we need to pursue them together if we want to
achieve any of them in a meaningful way.

o The testing work has also introduced some new automated reports
that we should be reviewing. Have you seen how many FIXMEs there
are, and what they are saying? Quite motivational. The Findbugs
report is also interesting in spots.

o Where it seems that code must be complex, thorough documentation of
the thought behind it will not only capture important knowledge for
the next person who has to work there, but can also provide
opportunities to realize: "good heavens, did I really write that?
there must be a better way...." When I find myself writing
absurd comments, it is usually because I have been writing (or was
about to write) absurd code.

o Best practice and commonest practice w.r.t. deployment of libraries
seem to be antithetical in the Java universe. I was quite pleased
to discover that I'm not the only one who thinks that Tomcat's /lib
directory is on the app. classpath for good reasons.

o The DSpace 2 architecture (which we are approaching by easy stages)
attempts to address looser coupling and similar OO goals.

--
Mark H. Wood, Lead System Programmer mw...@IUPUI.Edu
Balance your desire for bells and whistles with the reality that only a
little more than 2 percent of world population has broadband.
-- Ledford and Tyler, _Google Analytics 2.0_

Mark H. Wood

unread,
Aug 25, 2015, 2:13:08 PM8/25/15
to dspac...@lists.sourceforge.net
And one point I forgot:

o Volunteers dont' have to write code. If you aren't quite ready to
step into the DSpace tarball with torch and machete, but can read
Java, you can review the code and make suggestions. "Many eyes
make all bugs shallow."

Bug reports (including performance problems) are always useful.

And just asking, "why is this so slow?" can help to focus attention on
design decisions which perhaps didn't get quite as much attention
as they deserved. Keep asking until you get a sensible answer.

Flavio Botelho

unread,
Aug 25, 2015, 2:13:10 PM8/25/15
to dspac...@lists.sourceforge.net
On Wed, Sep 22, 2010 at 4:51 PM, Mark H. Wood <mw...@iupui.edu> wrote:

> o  Best practice and commonest practice w.r.t. deployment of libraries
>   seem to be antithetical in the Java universe.  I was quite pleased
>   to discover that I'm not the only one who thinks that Tomcat's /lib
>   directory is on the app. classpath for good reasons.

Actually nowadays AFAIK that is universally accepted as bad practice.
It's not a coincidence Tomcat removed /common/lib in version 6.

During the start of web development with Java, what you are defending
become an obvious choice to avoid wasting resources. But the problem
is that you need to adapt code in all the applications in the
container at the same time to be able to move library versions. Which
is really difficult for company's internal code, and impossible if
there is any third-party or open-source code...
Unless you want to run one Tomcat instance per application... Which
bring us to the question what difference would make have the libs in
Tomcat's lib in this scenario?


>
> o  The DSpace 2 architecture (which we are approaching by easy stages)
>   attempts to address looser coupling and similar OO goals.
>
> --
> Mark H. Wood, Lead System Programmer   mw...@IUPUI.Edu
> Balance your desire for bells and whistles with the reality that only a
> little more than 2 percent of world population has broadband.
>        -- Ledford and Tyler, _Google Analytics 2.0_
>

Vinit

unread,
Aug 25, 2015, 2:13:14 PM8/25/15
to Panyarak Ngamsritragul, dspac...@lists.sourceforge.net
Dear Panyarak,

Check ur sitemaps file. Find out the non existing pages and delete the URLs.

Plus in robots.txt file you have to give the location of the sitemaps.xml file.

Regards
Vinit Kumar 
Senior Research Fellow
Documentation Research and Training Centre
Bangalore
MLISc   (BHU)
Varanasi, India 
Alt email: vi...@drtc.isibang.ac.in



Tom De Mulder

unread,
Aug 25, 2015, 2:13:17 PM8/25/15
to dspac...@lists.sourceforge.net
On 22 Sep 2010, at 20:22, Sands Alden Fish wrote:

> (2) We currently don't have a centralized server with enough test data
> to run many of these memory or scalability tests on our own. I think
> this is something we could look into improving upon (especially if
> anyone has test data to donate to the cause).

There is a lot of public domain data available online. I spent some time collecting some of this in a variety of formats (text, images, movies, sound, datasets) and then wrote something to use a word list (e.g. /usr/share/dict on most Linux systems) to create random metadata for them.

After all, it doesn't matter that many bitstreams will be identical.

That is how we populated our test environment here so we could replicate the problems we were seeing on the live system.

Panyarak Ngamsritragul

unread,
Aug 25, 2015, 2:13:28 PM8/25/15
to dspac...@lists.sourceforge.net

Thanks Vinit for the information.
I checked the sitemap files under DSPACE/sitemaps and found that the
handles Google crawler keep on accessing do not exist in any of the files
there. Or there are sitemap files elsewhere?

How do I include the sitemap files in robot.txt? Sorry if this is a
stupid question.

Panyarak Ngamsritragul
Khunying Long Athakravisunthorn Learning Resources Center
Prince of Songkla University
Hat Yai, Songkhla, Thailand

> Dear Panyarak,
>
> Check ur sitemaps file. Find out the non existing pages and delete the URLs.
>
> Plus in robots.txt file you have to give the location of the sitemaps.xml file.
>
> Regards
> Vinit Kumar 
> Senior Research Fellow
> Documentation Research and Training Centre
> Bangalore
> MLISc   (BHU)
> Varanasi, India 
> Alt email: vi...@drtc.isibang.ac.in

bill.a...@library.gatech.edu

unread,
Aug 25, 2015, 2:13:37 PM8/25/15
to dspac...@lists.sourceforge.net
We've been experiencing problems similar to some reported on this thread since our
upgrade to 1.6 several months ago. We're still using the jspui, and we've wondered
(among other things) if some of these problems might be alleviated by a switch to
the xmlui. Has anybody had any experience comparing the memory footprint and/or resource
usage issues between the two interfaces?

Thanks,

Bill

Bill Anderson
Software Developer
Digital Library Development
Georgia Tech Library


Tom De Mulder

unread,
Aug 25, 2015, 2:14:19 PM8/25/15
to bill.a...@library.gatech.edu, dspac...@lists.sourceforge.net
On 24 Sep 2010, at 21:17, bill.a...@library.gatech.edu wrote:

> We've been experiencing problems similar to some reported on this thread since our
> upgrade to 1.6 several months ago. We're still using the jspui, and we've wondered
> (among other things) if some of these problems might be alleviated by a switch to
> the xmlui. Has anybody had any experience comparing the memory footprint and/or resource
> usage issues between the two interfaces?

We load-tested the XMLUI (on identical hardware) and it was even worse. It ran out of memory and crashed really quickly, so we never took it into production. But your mileage may vary.

Hilton Gibson

unread,
Aug 25, 2015, 2:14:20 PM8/25/15
to dspac...@lists.sourceforge.net
See attached.
We started with a VM which had 2GB memory.
Then added 2GB to the VM, no luck.
Then luckily we had funds to buy a server.
So now we have 12GB RAM and 12CPU's. No crashes so far.
Using the XMLUI.
Does DSpace really need this and what happens when we go to one million
items ??

Cheers

hg
> ------------------------------------------------------------------------------
> Start uncovering the many advantages of virtual appliances
> and start using them to simplify application deployment and
> accelerate your shift to cloud computing.
> http://p.sf.net/sfu/novell-sfdev2dev
> _______________________________________________
> DSpace-tech mailing list
> DSpac...@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/dspace-tech

--
Hilton Gibson
Systems Administrator
JS Gericke Library
Room 1053
Stellenbosch University
Private Bag X5036
Stellenbosch
7599
South Africa

Tel: +27 21 808 4100 | Cell: +27 84 646 4758

ir1-mem.png

Mark Ehle

unread,
Aug 25, 2015, 2:14:21 PM8/25/15
to Hilton Gibson, dspac...@lists.sourceforge.net
Why was tomcat chosen as a platform for DSpace?

Tom De Mulder

unread,
Aug 25, 2015, 2:14:22 PM8/25/15
to Hilton Gibson, dspac...@lists.sourceforge.net
On 29 Sep 2010, at 11:38, Hilton Gibson wrote:

> We started with a VM which had 2GB memory.
> Then added 2GB to the VM, no luck.
> Then luckily we had funds to buy a server.
> So now we have 12GB RAM and 12CPU's. No crashes so far.
> Using the XMLUI.
> Does DSpace really need this and what happens when we go to one million items ??

A lot of the back-end code of DSpace, the very core of it, is inherently inefficient. Several tasks are executed more than once, and entire objects are created when only one attribute is needed, etc. (I'd be more specific, but I'm not a specialist on this matter, and our resident DSpace developer is on leave this week.)


I am really glad to hear from other people with problems similar to ours.

Tom De Mulder

unread,
Aug 25, 2015, 2:14:23 PM8/25/15
to Mark Ehle, dspac...@lists.sourceforge.net
On 29 Sep 2010, at 11:47, Mark Ehle wrote:

> Why was tomcat chosen as a platform for DSpace?

It wasn't. You can use any Servlet engine. We used JBoss for a while but went back to Tomcat because it fitted into our infrastructure better.

I believe DSpace was written in Java because Rob Tansley wanted to try writing a project in Java, but I could be wrong. :)


Best,

Graham Triggs

unread,
Aug 25, 2015, 2:14:24 PM8/25/15
to Hilton Gibson, dspac...@lists.sourceforge.net
On 29 September 2010 11:38, Hilton Gibson <hilton...@gmail.com> wrote:
Using the XMLUI.
Does DSpace really need this and what happens when we go to one million items ??


Does DSpace really need that? No. As I have said, I'm running 30 separate repositories - using JSPUI (circa 1.4.2 / 1.5 codebase) - all on a single server / Tomcat instance.

Some of those repositories have 1000s of items, and get quite decent levels of access.

The server has 8GB installed, 3GB heap turned over to Tomcat (plus 1GB for non-heap).

The Tomcat instance has 2GB of *free* heap space, rarely runs above 5% cpu usage, and has plenty of capacity to run more repositories (the rate at which files are opened/closed is actually a bigger issue for Tomcat startup).

Although, it's worth pointing out that the database is hosted on a separate server - I can't say how many resources that is really using, as it's shared with other services, but it is apparently 'tiny'.



What happens at one million items? Well, that's an interesting issue. But is it really the right question to be asking? How far do you want/need to be able to scale a 'monolithic' instance, before you spread it over multiple servers?

As long as you can spread it over multiple servers, it gives you a much higher ceiling than relying on a single box - and it is easier to scale for increasing size/usage by adding more boxes (you don't have to migrate).

If you focus on scaling a single installation, then you end up increasing the overall requirements (ie. memory for caching), and make it harder to have scaling over multiple boxes at all.

G

Graham Triggs

unread,
Aug 25, 2015, 2:14:25 PM8/25/15
to Tom De Mulder, dspac...@lists.sourceforge.net
On 29 September 2010 11:48, Tom De Mulder <td...@cam.ac.uk> wrote:
A lot of the back-end code of DSpace, the very core of it, is inherently inefficient

I don't entirely disagree with that statement - there are some things that can definitely be improved, particularly where you have to deal with more items in a single instance.

But take a look at my numbers - at it's core, it really isn't that bad for the vast majority of DSpace users (how many have more than even 50,000 items currently)? And some of it depends on correct system setup (Postgres version/options, etc.)

It's adding xmlui, solr, etc. that is putting a lot more demands on the system.


G

Graham Triggs

unread,
Aug 25, 2015, 2:14:26 PM8/25/15
to Mark Ehle, dspac...@lists.sourceforge.net
That begs the question as do you think something else should be chosen / recommended?

There really isn't anything preventing you using Jetty, etc. but Tomcat is actually a pretty solid server that does a lot of things quite well - and particularly in recent versions in being defensive against bad application behaviour.

And when you look at the grand scheme of things, the smaller footprint of Jetty doesn't really make a whole lot of difference.

G

Mark H. Wood

unread,
Aug 25, 2015, 2:14:27 PM8/25/15
to dspac...@lists.sourceforge.net
We're comfortably running *three* production DSpace instances in a
single Tomcat 6 with these limits:

JAVA_OPTS="-Xmx1024M -Xms768M"
JAVA_OPTS="$JAVA_OPTS -XX:MaxPermSize=128M"
JAVA_OPTS="$JAVA_OPTS -XX:PermSize=32M"

That's on a box with 3GB of physical memory. One DSpace instance is
1.6, and the other two are 1.5.

Now, I do have an old weekly reminder to check PermGen on that box,
but it is always around half filled these days. We had problems in
the past, but newer versions of DSpace seem to do much better in that
regard. I can't recall the last time we had to restart that Tomcat
just to clean up memory.

We have a development box with maybe two dozen DSpace instances, none
of them very busy at all, various versions and states of disrepair,
and we do have to restart Tomcat there from time to time if we are
doing a lot of webapp. reloading. The limits there are:

JAVA_OPTS="-Xmx1024M -Xms128M"
JAVA_OPTS="$JAVA_OPTS -XX:PermSize=192M"
JAVA_OPTS="$JAVA_OPTS -XX:MaxPermSize=384M"

on a 4GB machine.

Mark H. Wood

unread,
Aug 25, 2015, 2:14:28 PM8/25/15
to dspac...@lists.sourceforge.net
On Wed, Sep 29, 2010 at 11:48:02AM +0100, Tom De Mulder wrote:
> A lot of the back-end code of DSpace, the very core of it, is inherently inefficient. Several tasks are executed more than once, and entire objects are created when only one attribute is needed, etc. (I'd be more specific, but I'm not a specialist on this matter, and our resident DSpace developer is on leave this week.)

When your developer has time, I think that specific JIRA tickets on
these observations would be appreciated. We need all the eyes we can
borrow. It needn't be a rigorous analysis (though that would be
wonderful). Significant inefficiencies noted in passing are important
information.

Tom De Mulder

unread,
Aug 25, 2015, 2:14:29 PM8/25/15
to Graham Triggs, dspac...@lists.sourceforge.net
On 29 Sep 2010, at 13:03, Graham Triggs wrote:
>
> Some of those repositories have 1000s of items, and get quite decent levels
> of access.
>

Thousands?

I don't even want to have this discussion until you're talking hundreds of thousands, and how many hits per second. I know you like to talk down the problem, but that really isn't helping.

We run 5 DSpace instances, three of these are systems with hundreds of thousands of items, and it's dog slow and immensely resource-intensive. And yes, we want these to be single systems. Why shouldn't we?

We have other systems here at the University that are much bigger, do similar things and require far, far less in terms of resources.

Tim Donohue

unread,
Aug 25, 2015, 2:14:32 PM8/25/15
to dspac...@lists.sourceforge.net
Hi all,

Interesting thread so far and keep up the good discussion.

I think it'd be helpful to us all if we could all share more information
about our DSpace setups (similar to Mark Wood's tip on his local
JAVA_OPTS settings). The more we know about your
DSpace/Java/Tomcat/Postgres (or Oracle) configurations, server setups,
etc. the better chance we have at helping you out. There may be some
immediate performance improvements you can achieve just by tweaking your
setup/configurations slightly.

I had setup a basic template for this on the Wiki at
https://wiki.duraspace.org/display/DSPACE/ScalabilityIssues1.6
But, feel free to just send info along in any format you wish. The
template was mostly there to give everyone an idea of what type of
information can be useful to us (so that we can hopefully provide you
with some helpful suggestions and find longer term fixes).

Obviously, we also want to track down and fix any memory leaks or larger
problems as well. So if you've already discovered specific issues, let
us know about those as well, so we can add them to our Issue Tracker
(http://jira.dspace.org/) and schedule them to be resolved.

Thanks,

Tim Donohue
Technical Lead for DSpace Project
DuraSpace.org

On 9/29/2010 7:59 AM, Mark H. Wood wrote:
> We're comfortably running *three* production DSpace instances in a
> single Tomcat 6 with these limits:
>
> JAVA_OPTS="-Xmx1024M -Xms768M"
> JAVA_OPTS="$JAVA_OPTS -XX:MaxPermSize=128M"
> JAVA_OPTS="$JAVA_OPTS -XX:PermSize=32M"
>
> That's on a box with 3GB of physical memory. One DSpace instance is
> 1.6, and the other two are 1.5.
>
> Now, I do have an old weekly reminder to check PermGen on that box,
> but it is always around half filled these days. We had problems in
> the past, but newer versions of DSpace seem to do much better in that
> regard. I can't recall the last time we had to restart that Tomcat
> just to clean up memory.
>
> We have a development box with maybe two dozen DSpace instances, none
> of them very busy at all, various versions and states of disrepair,
> and we do have to restart Tomcat there from time to time if we are
> doing a lot of webapp. reloading. The limits there are:
>
> JAVA_OPTS="-Xmx1024M -Xms128M"
> JAVA_OPTS="$JAVA_OPTS -XX:PermSize=192M"
> JAVA_OPTS="$JAVA_OPTS -XX:MaxPermSize=384M"
>
> on a 4GB machine.
>
>
>
>

Mark Ehle

unread,
Aug 25, 2015, 2:14:34 PM8/25/15
to Tom De Mulder, dspac...@lists.sourceforge.net
Thanks - I was just curious.

Tim Donohue

unread,
Aug 25, 2015, 2:14:36 PM8/25/15
to dspac...@lists.sourceforge.net
Quick followup, in case it isn't clear (as I was asked about this
off-list). The preference would be to share your DSpace
setup/configuration information directly on this listserv (or you can
post up on the wiki if you prefer). That way we can get more eyes on
it, and hopefully come up with better suggestions.

Also, this may be an area where sharing this information can help us to
document some "best practices", based on recommended setups and
performance hints/tips that people have. So, I'm hoping that as this
thread continues, we can pull out the main tips/hints and document them
for future reference. At the same time, we can pull out the common
memory/performance issues so that they can be investigated further, and
hopefully resolved as soon as possible.

Committers -- it'd also be great if you can take a few moments to send
your basic setup info & Dspace size to the listserv (especially noting
anything that you may have tweaked above & beyond the normal DSpace
install docs, like JAVA_OPTS or similar settings). This can hopefully
encourage others to do the same.

- Tim

Anderson, Charles W

unread,
Aug 25, 2015, 2:14:39 PM8/25/15
to Tim Donohue, dspac...@lists.sourceforge.net
----- "Tim Donohue" <tdon...@duraspace.org> wrote:

| Quick followup, in case it isn't clear (as I was asked about this
| off-list). The preference would be to share your DSpace
| setup/configuration information directly on this listserv


Let me kick things off, then (questions truncated a bit for formatting reasons):

1) Contact Info
a) Bill Anderson / Georgia Institute of Technology / bill.a...@library.gatech.edu


2) DSpace Setup and Configuration details

a) What DSpace version are you using?
1. Dspace 1.6.2
2. Currently using JSPUI, migrating to XMLUI
3. 30,498 Items
4. 610 Communities/Collections

b) What Postgres/Oracle version are you using?
1. PostgreSQL 8.1.4

c) What Tomcat version are you using?
1. Tomcat/6.0.26 + mod_jk/1.2.30 + Apache/2.0.52

d) Is everything running on one server (DSpace/Tomcat/Posgres/etc)?
1. Everything is (currently) on the same server
2. PowerEdge 2850: 2x Intel Xeon CPU 2.80Ghz, 12Gb Memory, Red Hat AS 4 (Nahant Update 8), RAID5 Disk array

e) How much memory are you making available to Tomcat/Java?
1. (lb worker) JAVA_OPTS="-server -Xmx462M -Xms462M -XX:+UseParallelGC -Dfile.encoding=UTF-8", webapps: jspui lni oai sword xmlui
2. (lb worker) JAVA_OPTS="-server -Xmx462M -Xms462M -XX:+UseParallelGC -Dfile.encoding=UTF-8", webapps: jspui lni oai sword xmlui
3. JAVA_OPTS="-server -Xmx600M -Xms600M -XX:+UseParallelGC -Dfile.encoding=UTF-8", webapps: solr
4. lb worker method=request, socket_keepalive=True, socket_timeout=0, ping_mode=A
5. Postgres max_connections=300


3) Performance / Scalability Issues noticed
1. We've had intermittent performance problems since upgrading to 1.6 in May. At first, the problems seemed strictly SOLR-related; SOLR was grabbing hundreds of postgres connections, and eventually generating these in dspace.log:

org.apache.commons.dbcp.SQLNestedException: Cannot get a connection, pool error: Timeout waiting for idle object and these in catalina.out: SEVERE: org.apache.solr.common.SolrException: Error opening new searcher. exceeded limit of maxWarmingSearchers=2, try again later. ...followed by permgen errors and death.

2. We heavily revised our solrconfig.xml, and alleviated the problem, but didn't eliminate it. We also split our jspui between two load-balanced tomcat instance, and moved the SOLR webapp to another third instance, which also helped. Following OR 2010, on a suggesting from Peter Dietz, we revised the SOLR JSP code to use the auto-commit functionality rather than manually committing every transaction. All of this got us to the point where we weren't crashing routinely; but we still have major problems during times of heavy traffic. Generally, these take the form of a gradual slowdown followed by a complete failure to respond; this sometimes ends in spontaneous recovery, and sometimes in permgen errors and a crash. At the end of last week, following a bad patch caused by a LOCKSS harvest, we implemented a restart schedule, with our two jspui tomcat instances being automatically restarted every 6 hours alternating between one/two. We haven't had any crashes since; but we're not at all sure we've solved the problem.

3. On restart, we sometimes get a bunch of these:

Sep 28, 2010 9:00:06 AM org.apache.catalina.loader.WebappClassLoader clearReferencesThreads SEVERE: A web application appears to have started a thread named [FinalizableReferenceQueue] but has failed to stop it. This is very likely to create a memory leak

4. Other errors that lead to a service/application outage:

Sep 23, 2010 3:47:14 PM org.apache.tomcat.util.threads.ThreadPool$ControlRunnable run SEVERE: Caught exception (java.lang.OutOfMemoryError: PermGen space) executing org.apache.jk.common.ChannelSocket$SocketConnection@3aff776, terminating thread

Sep 23, 2010 10:37:04 AM org.apache.catalina.connector.CoyoteAdapter service
SEVERE: An exception or error occurred in the container during the request processing
java.lang.OutOfMemoryError: PermGen space
at java.lang.Throwable.getStackTraceElement(Native Method)
at java.lang.Throwable.getOurStackTrace(Throwable.java:591)
at java.lang.Throwable.getStackTrace(Throwable.java:582)
at org.apache.juli.logging.DirectJDKLog.log(DirectJDKLog.java:155)
at org.apache.juli.logging.DirectJDKLog.error(DirectJDKLog.java:135)
at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:274)
at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
at org.apache.jk.server.JkCoyoteHandler.invoke(JkCoyoteHandler.java:190)
at org.apache.jk.common.HandlerRequest.invoke(HandlerRequest.java:291)
at org.apache.jk.common.ChannelSocket.invoke(ChannelSocket.java:769)
at org.apache.jk.common.ChannelSocket.processConnection(ChannelSocket.java:698)
at org.apache.jk.common.ChannelSocket$SocketConnection.runIt(ChannelSocket.java:891)
at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:690)
at java.lang.Thread.run(Thread.java:619)

Sep 23, 2010 10:38:19 AM org.apache.catalina.connector.CoyoteAdapter service SEVERE: An exception or error occurred in the container during the request processing java.lang.OutOfMemoryError: PermGen space


4) Volunteer To Help?

a) Would you be willing to volunteer some time to work on a fix

Yes. We have a large DSpace installation and several smaller ones, with four systems analysts and a system administrator who work at least part time on them, and several administrative users/submitters with significant knowledge of DSpace from a users' perspective. We would be interested in helping with testing and development as time permits. Please use me as a primary contact: bill.a...@library.gatech.edu

George Stanley Kozak

unread,
Aug 25, 2015, 2:14:41 PM8/25/15
to dspac...@lists.sourceforge.net
Hi...
Based on Tim Donohue's suggestion to share configuration and setup information, here is Cornell University's Dspace configuration:

1. Server: Sun sun4v T5140
Memory size: 65312 Megabytes (4 CPUs)

2. Running DSpace 1.6.2 (JSPUI) with local mods for User Interface, Embargo, and Refworks.
db.maxconnections = 50
db.maxwait = 5000
db.maxidle = 5
2 assetstores at 300GB each (using currently 323 GB)
Number of items: 14,960
Number of Communities/Collections: 789

3. Java 1.5.0_24

4. Apache 2.29 running mod_jk to tomcat

5. Tomcat 5.5.26
JAVA_OPTS="-server -Xms1024m -Xmx2048m -Xmn64m
-Dfile.encoding=UTF-8
-XX:+UseParallelGC -verbose:gc
-Xloggc:/dspace/dspace/log/gc.log
-XX:+HeapDumpOnOutOfMemoryError
-XX:PermSize=1024m -XX:MaxPermSize=1024m
-XX:-UseGCOverheadLimit"

6. PostGreSQL 8.3
max_connections = 300
shared_buffers = 32MB
max_fsm_pages = 204800

Though we experienced some performance problems in the past, that seemed to disappear after we went to DSpace 1.5.2.

George Kozak
Digital Library Specialist
Cornell University Library Information Technologies (CUL-IT)
501 Olin Library
Cornell University
Ithaca, NY 14853
607-255-8924

Mark H. Wood

unread,
Aug 25, 2015, 2:14:42 PM8/25/15
to dspac...@lists.sourceforge.net
I'd like to point out that the discussion is broadening considerably:
a system can be slow for many reasons, not just memory starvation.

Step 1: what resource(s) are you short of? Something like LambdaProbe
can peek inside Tomcat and show you how much of each of the various
memory pools is being used. OS tools can show whether you are
swapping heavily or spending a lot of time in I/O wait or are really
CPU-bound (and what, besides Tomcat, may be eating CPU). DBMS tools
can reveal places in the schema that don't scale well, queries that
could be optimized, and additional indices that would be beneficial.

It would be really helpful for large, busy sites with performance
problems to share any such detailed observations. Some of those
problems can probably be tuned away, and some will point to specific
things for coders to investigate. Scaling experience will be valuable
both in documenting good ways to tune up for DSpace and in finding
design hotspots for rework.

Pottinger, Hardy J.

unread,
Aug 25, 2015, 2:15:13 PM8/25/15
to Mark H. Wood, dspac...@lists.sourceforge.net
Hi, first, I want to thank Mark Wood for recommending LambdaProbe, it is proving a very useful tool. I can see already that we need to increase our PermGen, and will probably borrow Mark's JAVA_OPTS settings for our production and development Tomcat instances.

In trying to further educate myself about these issues, I came across this excellent page on the Tomcat wiki, which at the end includes debugging/troubleshooting advice that is very close to the procedure Graham Triggs outlined at a recent committer's meeting. I'm forwarding this link to the list, as I think it might prove useful to others:

http://wiki.apache.org/tomcat/OutOfMemory

--Hardy

Pottinger, Hardy J.

unread,
Aug 25, 2015, 2:15:17 PM8/25/15
to dspac...@lists.sourceforge.net
1. Contact Info: Hardy Pottinger, University of Missouri (MOspace) (message sent from my main e-mail address, use that).

2.a. DSpace 1.6.2 (XMLUI) with local mods for user interface, and Shibboleth special groups handling
1 assetstore at 1TB, 35GB used (big plans! really!)
Number of items:

2.b. Oracle Database 10g Enterprise Edition Release 10.2.0.4.0, running on another server (unknown spec)

db.maxconnections = 50 (anything less than 50 is unstable)
db.maxwait = 5000
db.maxidle = 0 (idle connections are nailed by the firewall on the Oracle server, sysadmins will not change)

2.c. All RHEL-provided, Tomcat 5.5.23, running behind Apache 2.2.3 via mod_proxy

2.d. DSpace and Tomcat are on one server, Oracle db is on a shared server.

Server: Dell PowerEdge 2950
Memory size: 8110 MB
CPUs: 4 x Intel Xeon 5160 @ 3.00 GHz
OS: Red Hat Enterprise Linux Server release 5.5 (kernel 2.6.18-194.3.1.el5PAE)

2.e. JAVA_OPTS="-Xmx512M -Xms256M" (this will likely change soon, need to bump up PermGen)

3. Back when we were running on DSpace 1.5.1, after a period of about 24-36 hours of uptime, Tomcat became unavailable. Apache reported 503: service unavailable. Looking at a dump after killing all Java processes, it appeared that all database connections were unavailable. Changing the db.maxconnections and db.maxidle settings (see above) was helpful, but we are proactively rebooting tomcat and apache every night, to "clean out the cobwebs". Have not disabled the nightly reboot since the upgrade to 1.6.2, so do not have current data/log files. I'm willing to try other config settings. I'm pretty sure I saw an interesting looking patch that drops database connections (mainly for streaming situations) that I can't seem to find right now, but might help this particular stability issue.

4. Volunteer to help? Of course! I am a fledgling Java developer, reasonably competent application manager, and working with XSLT only makes me want to cry a little bit, nowadays. :-) We have a development server with a snapshot of our live repository, and are willing to do load testing, run a profiler, whatever it takes to help. We're certainly not experts in any the various tech running under the hood of DSpace, but keeping our repository running smoothly is our main job, and we aim to please. You may use my address as the main contact, but we have two more developers here who are willing to pitch in.

--Hardy
> ------------------------------------------------------------------------

Keith Gilbertson

unread,
Aug 25, 2015, 2:15:18 PM8/25/15
to Pottinger, Hardy J., dspac...@lists.sourceforge.net

On Sep 30, 2010, at 6:36 PM, Pottinger, Hardy J. wrote:

>
>
> 3. Back when we were running on DSpace 1.5.1, after a period of about 24-36 hours of uptime, Tomcat became unavailable. Apache reported 503: service unavailable. Looking at a dump after killing all Java processes, it appeared that all database connections were unavailable. Changing the db.maxconnections and db.maxidle settings (see above) was helpful, but we are proactively rebooting tomcat and apache every night, to "clean out the cobwebs". Have not disabled the nightly reboot since the upgrade to 1.6.2, so do not have current data/log files. I'm willing to try other config settings. I'm pretty sure I saw an interesting looking patch that drops database connections (mainly for streaming situations) that I can't seem to find right now, but might help this particular stability issue.


There's a patch here for xmlui:
http://jira.dspace.org/jira/browse/DS-677

I'm don't know if this addresses the problem that you used to have in 1.5, though. I think I've seen something like what you're talking about with a scheduling conflict between vacuuming or other maintenance of the database and the media filter processes running. It's been a long time though and I'm very fuzzy on this.

In 1.6, the solr statistics code seems to make heavy use of database connections to figure out which item, collections, and communities a bitstream belongs. I haven't spent enough time poking around though to see if the database usage might be reduced, or the connections held open less often.

On a side note, I noticed that in the slides for the recent presentation from @mire about the statistics system, there are a few suggestions for optimizing solr sites with heavy usage. I missed the seminar, though, so I'm not sure of the details.

--keith


Keith Gilbertson

unread,
Aug 25, 2015, 2:15:19 PM8/25/15
to Keith Gilbertson, dspac...@lists.sourceforge.net, Pottinger, Hardy J.

On Sep 30, 2010, at 7:14 PM, Keith Gilbertson wrote:
> I think I've seen something like what you're talking about with a scheduling conflict between vacuuming or other maintenance of the database and the media filter processes running. It's been a long time though and I'm very fuzzy on this.

Now that I'm thinking about it, it may have been an item import or some other long running batch process.

> I haven't spent enough time poking around though to see if the database usage might be reduced, or the connections held open less often.


Instead of "connections held open less often", I think I meant something more along the lines of "reduce the amount of time it takes to return a database connection back to the database pool and/or reduce the number of concurrent connections needed".




Pottinger, Hardy J.

unread,
Aug 25, 2015, 2:15:23 PM8/25/15
to dspac...@lists.sourceforge.net
Doh! I just noticed I forgot to include the total number of unique titles in our repository: ~7,600.

And thanks to Keith Gilbertson for the replies. I've dug through catalina.out for the past few days, and the only errors we're seeing is typical broken pipe warnings, from users moving on before a thread finishes. Hmm... maybe it's safe to scale back the nightly tomcat reboot to a weekly reboot?

--Hardy

> -----Original Message-----
> From: Pottinger, Hardy J. [mailto:Potti...@umsystem.edu]
> Sent: Thursday, September 30, 2010 5:36 PM
> To: dspac...@lists.sourceforge.net

Hilton Gibson

unread,
Aug 25, 2015, 2:15:26 PM8/25/15
to dspac...@lists.sourceforge.net
Hi All

What are your URL's  or website address's ?
------------------------------------------------------------------------------
Start uncovering the many advantages of virtual appliances
and start using them to simplify application deployment and
accelerate your shift to cloud computing.
-- 
Hilton Gibson
Systems Administrator
JS Gericke Library
Room 1053
Stellenbosch University
Private Bag X5036
Stellenbosch
7599 
South Africa

Tel: +27 21 808 4100 | Cell: +27 84 646 4758

"Simplicity is the ultimate sophistication"
	Leonardo da Vinci

Tim Donohue

unread,
Aug 25, 2015, 2:15:27 PM8/25/15
to Hilton Gibson, dspac...@lists.sourceforge.net
Hilton and all,

Quick FYI -- we actually have a list of all known DSpace installs
worldwide off the DSpace webiste (under the "Who's Using DSpace" link).

http://www.dspace.org/index.php?option=com_formdashboard&Itemid=151&lang=en

So, if you are ever curious to see someone's install, it's highly likely
you can find a link there. This list is also searchable, and you can
filter by database type, DSpace version, OS, etc.

- Tim

Pottinger, Hardy J.

unread,
Aug 25, 2015, 2:15:27 PM8/25/15
to dspac...@lists.sourceforge.net
Hey, as long as we're on the subject of config show and tell, here is what we're using for our AJP settings in Tomcat's server.xml:

<Connector port="8009"
enableLookups="false" redirectPort="8080" protocol="AJP/1.3" address="127.0.0.1" tomcatAuthentication="false"
connectionTimeout="20000" disableUploadTimeout="true" URIEncoding="UTF-8"/>

And here's a snippet from our Apache config, to show our proxy settings: (SetEnv directives borrowed from this thread: http://confluence.atlassian.com/display/DOC/Using+Apache+with+mod_proxy may or may not be helping).

<Location "/xmlui">
ProxyPass ajp://127.0.0.1:8009/xmlui
ProxyPassReverse ajp://127.0.0.1:8009/xmlui
SetEnv force-proxy-request-1.0 1
SetEnv proxy-nokeepalive 1
</Location>

<Location "/jspui">
ProxyPass ajp://127.0.0.1:8009/jspui
ProxyPassReverse ajp://127.0.0.1:8009/jspui
</Location>

<Location "/oai">
ProxyPass ajp://127.0.0.1:8009/oai
ProxyPassReverse ajp://127.0.0.1:8009/oai
SetEnv force-proxy-request-1.0 1
SetEnv proxy-nokeepalive 1
</Location>

I have a feeling I could do more in this area, to improve stability. I'm very wary of just restarting DSpace servlets (by touching the context fragment for the application to force a reload) because in the past I've noticed that doing so seems to break the proxy connection between Tomcat and Apache. Users see "503: service unavailable" errors. We instead stop, pause, and start everything (Tomcat, Apache, Handle), any time we need to load a new version of a DSpace servlet. Sometimes, erm, we have to do it twice. :-\

Anyone have any tricks for that? I'd love to be able to just reload a servlet, instead of starting and stopping the whole jalopy. :-)

--Hardy

Hilton Gibson

unread,
Aug 25, 2015, 2:15:30 PM8/25/15
to dspac...@lists.sourceforge.net
  SetEnv force-proxy-request-1.0 1
  SetEnv proxy-nokeepalive 1
</Location>

I have a feeling I could do more in this area, to improve stability. I'm very wary of just restarting DSpace servlets (by touching the context fragment for the application to force a reload) because in the past I've noticed that doing so seems to break the proxy connection between Tomcat and Apache. Users see "503: service unavailable" errors. We instead stop, pause, and start everything (Tomcat, Apache, Handle), any time we need to load a new version of a DSpace servlet. Sometimes, erm, we have to do it twice. :-\

Anyone have any tricks for that? I'd love to be able to just reload a servlet, instead of starting and stopping the whole jalopy. :-)

--Hardy 

------------------------------------------------------------------------------
Start uncovering the many advantages of virtual appliances
and start using them to simplify application deployment and
accelerate your shift to cloud computing.
-- 
Hilton Gibson
Systems Administrator
JS Gericke Library
Room 1053
Stellenbosch University
Private Bag X5036
Stellenbosch
7599 
South Africa

Tel: +27 21 808 4100 | Cell: +27 84 646 4758

"Simplicity is the ultimate sophistication"
	Leonardo da Vinci

Pottinger, Hardy J.

unread,
Aug 25, 2015, 2:15:30 PM8/25/15
to dspac...@lists.sourceforge.net
Doh! Again. The URL to our repository is: https://mospace.umsystem.edu/xmlui/.

--Hardy

Graham Triggs

unread,
Aug 25, 2015, 2:15:47 PM8/25/15
to Tom De Mulder, dspac...@lists.sourceforge.net
On 29 September 2010 14:17, Tom De Mulder <td...@cam.ac.uk> wrote:
I know you like to talk down the problem, but that really isn't helping.

This isn't about talking down the problem - it's about finding where the real problems are and not just patching the immediate concerns. And considering the interests of nearly 1000 DSpace instances that are registered on dspace.org - many of whom will probably be more worried about rampant resource usage for small repositories from adding overhead to cover up the problems of larger repositories.
 
We run 5 DSpace instances, three of these are systems with hundreds of thousands of items, and it's dog slow and immensely resource-intensive. And yes, we want these to be single systems. Why shouldn't we?

Surely the more pertinent question is why wouldn't you want to be able to run a multi-node solution? I'm sure I don't need to tell you that no matter how good a job you do of making the system perform better with larger datasets, there will always be a finite limit to how large the repository can be, how many users you can service, and how quickly it will process requests for any given hardware allocation.

Yes, DSpace can do a better job than it currently does, but it's just postponing the inevitable. How much in technology relies on just making things bigger/faster? Even our single system hardware is generally made of multiple identical components - CPUs with multiple cores, memory consisting of multiple 'sticks', each consisting of multiple storage chips, storage combining multiple hard drives each having multiple platters.

And much of our dependencies are going the same way - Oracle database clusters, Solr is designed to get scalability from running over multiple shards, even Postgres has taken a major step towards clustering / replication with it's 9.0 release.

Either way, you will always hit a hard limit with keeping things on a single system - so at some point, something has to give, whether it's separating out DSpace application, Solr and Postgres instances to separate machines, or accepting this reality in the repository and building it to scale across multiple nodes itself. This in turn would bring benefits to how easily you can scale (in theory, a lot easier to scale at the repository level than scaling each of it's individual components), as well as potentially better preservation and federation capabilities.

G

Simon Brown

unread,
Aug 25, 2015, 2:16:05 PM8/25/15
to dspace-tech@lists.sourceforge.net Tech

On 4 Oct 2010, at 15:00, Graham Triggs wrote:

> On 29 September 2010 14:17, Tom De Mulder <td...@cam.ac.uk> wrote:
> I know you like to talk down the problem, but that really isn't
> helping.
>
> This isn't about talking down the problem - it's about finding where
> the real problems are and not just patching the immediate concerns.
> And considering the interests of nearly 1000 DSpace instances that
> are registered on dspace.org - many of whom will probably be more
> worried about rampant resource usage for small repositories from
> adding overhead to cover up the problems of larger repositories.

Which nobody has requested, making this a massive red herring. I fail
to see how cutting back on unnecessary and redundant database access
constitutes "overhead to cover up the problems of larger
repositories". Any repository, regardless of size, will see
improvements with this kind of optimisation, at least one example of
which I have already highlighted (and had my arguments shouted down -
this is also, incidentally, why I haven't bothered to open any other
JIRA tickets on other performance issues we've seen. What would be the
point?)

> We run 5 DSpace instances, three of these are systems with hundreds
> of thousands of items, and it's dog slow and immensely resource-
> intensive. And yes, we want these to be single systems. Why
> shouldn't we?
>
> Surely the more pertinent question is why wouldn't you want to be
> able to run a multi-node solution? I'm sure I don't need to tell you
> that no matter how good a job you do of making the system perform
> better with larger datasets, there will always be a finite limit to
> how large the repository can be, how many users you can service, and
> how quickly it will process requests for any given hardware
> allocation.

The pertinent question for me is why, whenever the issue of
performance comes up, is one of these "theoretical future of
repositories" screeds pulled out and slammed down in front of the
conversation? People are reporting problems with the systems they have
*right now*. Or rather, they were. And yes, it is true that there is a
finite limit to what the hardware is capable of, but the quality of
the software plays a significant role in how quickly that limit is
reached. But we've had this conversation before. I don't really expect
it to end any better this time than it did then.

> Yes, DSpace can do a better job than it currently does, but it's
> just postponing the inevitable. How much in technology relies on
> just making things bigger/faster? Even our single system hardware is
> generally made of multiple identical components - CPUs with multiple
> cores, memory consisting of multiple 'sticks', each consisting of
> multiple storage chips, storage combining multiple hard drives each
> having multiple platters.

Any method of increasing the processing capabilities of a system,
either through more powerful hardware or improvements in the software,
is "postponing the inevitable" for any repository with continued
growth. The difference is in how much cost there is to any individual
repository in each of those methods. Our system, with the changes
we've made to it, struggles at around 300,000 items. People are
reporting problems (presumably running stock 1.6.2) at around 50,000,
from what I can gather. That means that the optimum size for a single
repository running unmodified 1.6.2 is less than 50,000 items, or more
than six separate DSpace instances for the number of items we hold.
That's at least a sixfold increase in hardware and operational costs.
Even in a situation where higher education funding had not just been
significantly cut, that amount of money would be rather difficult to
come by. In a situation where people are able to point to
significantly better performance from other systems on similar
hardware, it would become substantially more difficult.

> And much of our dependencies are going the same way - Oracle
> database clusters, Solr is designed to get scalability from running
> over multiple shards, even Postgres has taken a major step towards
> clustering / replication with it's 9.0 release.
>
> Either way, you will always hit a hard limit with keeping things on
> a single system - so at some point, something has to give, whether
> it's separating out DSpace application, Solr and Postgres instances
> to separate machines, or accepting this reality in the repository
> and building it to scale across multiple nodes itself. This in turn
> would bring benefits to how easily you can scale (in theory, a lot
> easier to scale at the repository level than scaling each of it's
> individual components), as well as potentially better preservation
> and federation capabilities.

Leaving aside any theoretical ideal futures for the moment, it seems
to me that the gist of this conversation is "DSpace does not support
single-instance repositories over a certain size". That being the
case, I think it would be only fair to make that lack of support
explicit in the documentation and PR materials for the software, in
order that all of the relevant information is readily available for
anyone making decisions about the future of their repository.

With regard to building the repository to scale across multiple nodes,
I think it's an excellent idea. But until it appears on a road map for
the software, an idea is all it is.

--
Simon Brown <st...@cam.ac.uk> - Cambridge University Computing Service
+44 1223 3 34714 - New Museums Site, Pembroke Street, Cambridge CB2 3QH



Tim Donohue

unread,
Aug 25, 2015, 2:16:09 PM8/25/15
to Simon Brown, dspace-tech@lists.sourceforge.net Tech
Hi Simon & All,

On 10/5/2010 10:33 AM, Simon Brown wrote:
>
> On 4 Oct 2010, at 15:00, Graham Triggs wrote:
>
>> On 29 September 2010 14:17, Tom De Mulder<td...@cam.ac.uk> wrote:
>> I know you like to talk down the problem, but that really isn't
>> helping.
>>
>> This isn't about talking down the problem - it's about finding where
>> the real problems are and not just patching the immediate concerns.
>> And considering the interests of nearly 1000 DSpace instances that
>> are registered on dspace.org - many of whom will probably be more
>> worried about rampant resource usage for small repositories from
>> adding overhead to cover up the problems of larger repositories.
>
> Which nobody has requested, making this a massive red herring. I fail
> to see how cutting back on unnecessary and redundant database access
> constitutes "overhead to cover up the problems of larger
> repositories". Any repository, regardless of size, will see
> improvements with this kind of optimisation, at least one example of
> which I have already highlighted (and had my arguments shouted down -
> this is also, incidentally, why I haven't bothered to open any other
> JIRA tickets on other performance issues we've seen. What would be the
> point?)

It's really unfortunate that you've experienced this and/or felt this
way in the past. Perhaps we haven't been able to tease out the problems
at hand as well as we could have, and I hope we can improve upon that now.

However, I'd highly recommend freely adding specific issues to our JIRA
-- it will *guarantee* that the DSpace committers will review & discuss
them (each week, we set aside time in our weekly meeting to do so -- see
https://wiki.duraspace.org/display/DSPACE/Developer+Meetings ). When
adding JIRA issues, specifics are best, that way we can narrow down
where the problem may reside.

The longer these specific issues remain outside of JIRA, the more likely
they will be accidentally overlooked in future versions of DSpace (as
JIRA is our primary means of scheduling things to be fixed in new
versions). We really do mean well, and we'd like to work with you to
resolve these issues. We're not trying to continually throw up "red
herrings" to avoid problems -- it's really a matter of attempting to
better understand where the specific issue resides.

As volunteer developers, each of the DSpace Committers all only have a
limited amount of time to work on DSpace in a given week. Therefore, the
more information you can provide us with, the better. If you know of
specific areas where there are redundant database accesses, we'd
appreciate it if you could point them out to us (or enter a JIRA issue
and we'll fix it). We want to resolve these issues, but sometimes we
don't have enough time in our normal work week to dig in deep enough to
locate them. We highly encourage sites who have stumbled across
problems in the code to report them -- that way we can look at that
specific area of the code and fix it so that it is no longer an issue.

> Leaving aside any theoretical ideal futures for the moment, it seems
> to me that the gist of this conversation is "DSpace does not support
> single-instance repositories over a certain size". That being the
> case, I think it would be only fair to make that lack of support
> explicit in the documentation and PR materials for the software, in
> order that all of the relevant information is readily available for
> anyone making decisions about the future of their repository.

I'd say we want to support single-instance repositories of larger sizes
as well. There will always be a size limit where it makes more sense to
scale across multiple nodes, but we should be working to increase that
size limit as much as we can (within reason, obviously). Although it
isn't yet explicit in our RoadMap, I think we also want to work towards
allowing DSpace to scale across multiple nodes (where it makes sense to).

Again, the best way for us to improve your immediate DSpace performance
is to better understand the exact problems you've already noticed. We
can only fix issues that we know about, and sometimes discovering where
the issue resides can be the hardest part. If you've already discovered
very specific issue(s), we'd appreciate it if you can share them. If
you haven't yet discovered the exact issue(s), we may be able to help
narrow down the problem if you can share which parts of your DSpace seem
'especially sluggish', etc.

The end result is that we really should be working together on a
resolution for the present, rather than continually arguing over ideal
futures or past discussions. Open source development works best if we
can all share information/ideas/issues/resolutions freely and openly.
Yes, that also means sometimes arguing openly -- which is perfectly OK
by me, as sometimes arguments bring us all to a better solution or route
forward. But, I do want to encourage us all to keep things constructive,
so that we can move DSpace software forward to the benefit of us all.

It's also worth mentioning Graham is already volunteering some of his
time to start digging in deeper to try and discover where some memory
issues may already reside in DSpace 1.6, no matter what size a
repository is. Just today, he's started a separate, technical thread
that may be of interest:
http://www.mail-archive.com/dspac...@lists.sourceforge.net/msg12161.html


Hopefully, as this investigation moves forward, we can all work together
to find ways to improve DSpace performance both in the short term and
longer term.

- Tim





Graham Triggs

unread,
Aug 25, 2015, 2:16:28 PM8/25/15
to Simon Brown, dspace-tech@lists.sourceforge.net Tech
On 5 October 2010 16:33, Simon Brown <st...@cam.ac.uk> wrote:
Which nobody has requested, making this a massive red herring. I fail
to see how cutting back on unnecessary and redundant database access
constitutes "overhead to cover up the problems of larger
repositories".

One person's "unnecessary and redundant database access" is another's very necessary database access - well, at least it can be.

I remember the patch for reducing the updating of browse / search indexes, and I can see why it would be useful to not do those updates during a batch import if you have an appropriate workflow.

That won't be the case for all of the repositories - quite a few will welcome the ability to see those items as and when they are added. There is also the issue of how long it takes to do the one very big update at the end of the batch run vs. incremental changes as you go - it may be less work overall, but having one big change can be more disruptive in some cases.
 
Any repository, regardless of size, will see
improvements with this kind of optimisation, at least one example of
which I have already highlighted (and had my arguments shouted down -
this is also, incidentally, why I haven't bothered to open any other
JIRA tickets on other performance issues we've seen. What would be the
point?)

No, you didn't get shouted down for raising a performance issue. Where the argument came was because you assumed that this would clearly be of benefit to "any repository", when you did nothing to address the underlying performance issues (which could have been helped quite dramatically with some small SQL tweaks and some configuration work in Postgres), and instead just bypassed them for one very specific use case.

It doesn't matter how large or small a repository is, if they don't perform batch uploads using the ItemImporter, your change will do *nothing* for them. But an alteration to the underlying SQL, and guidelines for getting the best out of Postgres would benefit everyone - regardless of how large or small the repository is, or the means by which they populate it.


The pertinent question for me is why, whenever the issue of
performance comes up, is one of these "theoretical future of
repositories" screeds pulled out and slammed down in front of the
conversation? People are reporting problems with the systems they have
*right now*.

It's not meant to be a barrier to conversation, but a question as to what you want to resolve. Do you want to address the *scalability* of DSpace, or do you just want to avoid an immediate performance bottleneck? If we conflate these, conversations are going to stall, and we're not going to make any progress.
 
Or rather, they were. And yes, it is true that there is a
finite limit to what the hardware is capable of, but the quality of
the software plays a significant role in how quickly that limit is
reached. But we've had this conversation before. I don't really expect
it to end any better this time than it did then.

I completely agree - but a solution that breaks the encapsulation of the components in the system, and leaves important indexes in an inconsistent state for an extended period of time is not an automatic win for the majority of the community.

I offered a lot of suggestions as to how that code could be better structured, improvements both to the SQL and the configuration of Postgres to handle the load more efficiently, and suggestions for further tweaks that would reduce the amount of updates that the code would have needed to do still further. All of which would have be more beneficial to the community (not just improving batch uploads, but interactive / singular deposits and edits) - and not only that, would have improved the performance of your systems further than you had so far achieved.

Any method of increasing the processing capabilities of a system,
either through more powerful hardware or improvements in the software,
is "postponing the inevitable" for any repository with continued
growth. The difference is in how much cost there is to any individual
repository in each of those methods. Our system, with the changes
we've made to it, struggles at around 300,000 items. People are
reporting problems (presumably running stock 1.6.2) at around 50,000,
from what I can gather.

This is where we need to be careful about what we are reporting. Quite a few of the issues around 1.6.x appear to be around rampant memory usage, rather than a clear function of how many records there are in the database. There are also different issues involved if we are talking about adding / editing lots of records, or simply highly accessed.

Even so, regardless of what we do to the code to make it efficient, it does not and can not absolve the system administrator of correctly maintaining both DSpace itself, and it's dependencies. I wouldn't want to get drawn on where that point is without any evidence, but there is a lot of scope for altering and improving Postgres behaviour by tweaking the memory buffers that it uses - and it's going to be vital for people to do that in order to scale beyond a certain point.

Similarly, tables like metadatavalue are going to huge quite quickly, and that will probably benefit from partitioning at some point. However, for that to be effective, it's likely that will depend on local usage, and isn't something that we can just put into the system from the start.
 
That means that the optimum size for a single
repository running unmodified 1.6.2 is less than 50,000 items, or more
than six separate DSpace instances for the number of items we hold.
That's at least a sixfold increase in hardware and operational costs.

I think we would want and should expect more than 50,000 items in a single instance - but at some point that will depend on the correct local administration of the system in order to achieve that.

That is also a brutal calculation on the implementation that misses out on a lot of factors - how much time do you spend investigating and fixing performance issues? How much time is/would be spent migrating from one hardware instance to another vs. simply adding another box to the cluster? If you are targeting smaller instances within the cluster, they are each going to be less expensive than the one big box you buy to run it as a single instance.
 
G

Tom De Mulder

unread,
Aug 25, 2015, 2:16:33 PM8/25/15
to dspace-tech@lists.sourceforge.net Tech
On 6 Oct 2010, at 15:15, Graham Triggs wrote:

[snip]

This is exactly the kind of pointless pontification that we got last time.

Any point that is raised is deflected or ignored, and you even manage to contradict yourself between paragraphs. What's it to be, should patches benefit ALL repositories, or is it fine if it's just some? Or the other way round, maybe?


I will be very happy to offer our experiences regarding large-scale DSpace instances with the community, if that can be of any help. But not if it involves having to deal with Graham Triggs.


I really do not have time for this.


--
Tom De Mulder <td...@cam.ac.uk> - Cambridge University Computing Service
+44 1223 3 31843 - New Museums Site, Pembroke Street, Cambridge CB2 3QH


Tim Donohue

unread,
Aug 25, 2015, 2:16:33 PM8/25/15
to dspace-tech@lists.sourceforge.net Tech
All,

I would really appreciate it if we could stop the negativity in this
discussion thread. I'm sorry to have to post a message of this sort
publicly, but I feel I'm unfortunately being forced to do so.

Insults and negativity on a public listserv do not help anyone. I also
personally take offense to the insulting of anyone in our DSpace
Committers group, as they are volunteering their own time (sometimes
even outside of their workplace) to make DSpace software better. Open
source software does not build and maintain itself, and our group of
Committers have made it their passion to improve DSpace for the benefit
of us all.

Despite any arguments or differences we all may have, it is in our best
interest to work together to resolve these issues in a friendly and
timely manner. There is a place for arguments and disagreements on these
DSpace mailing lists and I welcome them, provided they are kept
constructive.

I'm in touch with Cambridge around their performance issues off-list,
and hope that we can work towards a solution to these issues for
everyone involved.
Reply all
Reply to author
Forward
0 new messages