DSpace 6.2 with collection home page too slow to load if items have lots of bitsteams

790 views
Skip to first unread message

Ying Jin

unread,
Feb 21, 2018, 5:02:06 PM2/21/18
to DSpace Technical Support
Dear All,

We are experiencing a performance issue with DSpace 6.2. Some of our collections will time out/take several minutes to load. They are not big collections, one of them only have 20 items, but each item contains 100+ bitstreams (one PDF in Original bundle and 100+ of JP2 files in customized MASTER bundle which are hidden from end users). The postgresql left so many "idle in transaction" processes that will slow down the overall site. The tomcat begin to take most of the CPU time too.

We used to use 16G of memory for tomcat under v5.x, and the performance has been ok. Now, even I increased to 64G(half of our server memory) under 6.2, the performance didn't improve. 

We are using Tomcat 8.0.13, Java 1.8, Redhat Linux 6.7, and XMLUI. We have about 80+ communities, 280+ collections, 60,000+ items and 490,000 bitstreams. 

To determine the cause of the slowness, I duplicate the same collection with two copies. In one collection, I removed all MASTER files and leave PDF only. The other collection, I zipped all MASTER files as one zip file and uploaded with PDF. It turns out they all have no performance issues. Seems like the number of bitstreams will affect the performance.

After turning on the debugging, I got following information. It is too big to upload the whole log so I just put hibernate stat here. 

======
The collection has 100+ bitstreams in it
======

2018-02-07 00:43:17,724 DEBUG org.hibernate.stat.internal.ConcurrentStatisticsImpl @ HHH000117: HQL: null, time: 1ms, rows: 1

2018-02-07 00:43:17,882 INFO  org.hibernate.engine.internal.StatisticalLoggingSessionEventListener @ Session Metrics {

    237746 nanoseconds spent acquiring 1 JDBC connections;

    0 nanoseconds spent releasing 0 JDBC connections;

    552698751 nanoseconds spent preparing 48017 JDBC statements;

    12590561333 nanoseconds spent executing 48017 JDBC statements;

    0 nanoseconds spent executing 0 JDBC batches;

    929992 nanoseconds spent performing 52 L2C puts;

    188492 nanoseconds spent performing 10 L2C hits;

    1935100 nanoseconds spent performing 42 L2C misses;

    133422494 nanoseconds spent executing 2 flushes (flushing a total of 43868 entities and 58348 collections);

    562373235433 nanoseconds spent executing 20136 partial-flushes (flushing a total of 235915143 entities and 235915143 collections)

}

2018-02-07 00:43:17,884 INFO  org.hibernate.engine.internal.StatisticalLoggingSessionEventListener @ Session Metrics {

    0 nanoseconds spent acquiring 0 JDBC connections;

    0 nanoseconds spent releasing 0 JDBC connections;

    0 nanoseconds spent preparing 0 JDBC statements;

    0 nanoseconds spent executing 0 JDBC statements;

    0 nanoseconds spent executing 0 JDBC batches;

    0 nanoseconds spent performing 0 L2C puts;

    0 nanoseconds spent performing 0 L2C hits;

    0 nanoseconds spent performing 0 L2C misses;

    0 nanoseconds spent executing 0 flushes (flushing a total of 0 entities and 0 collections);

    0 nanoseconds spent executing 0 partial-flushes (flushing a total of 0 entities and 0 collections)

}


=====

The collection has PDF only
=====

2018-02-17 20:30:27,534 INFO  org.hibernate.engine.internal.StatisticalLoggingSessionEventListener @ Session Metrics {

    341778 nanoseconds spent acquiring 1 JDBC connections;

    0 nanoseconds spent releasing 0 JDBC connections;

    15974134 nanoseconds spent preparing 1189 JDBC statements;

    271327429 nanoseconds spent executing 1189 JDBC statements;

    0 nanoseconds spent executing 0 JDBC batches;

    717412 nanoseconds spent performing 45 L2C puts;

    3906724 nanoseconds spent performing 247 L2C hits;

    2081066 nanoseconds spent performing 855 L2C misses;

    6829028 nanoseconds spent executing 2 flushes (flushing a total of 3916 entities and 5072 collections);

    578786157 nanoseconds spent executing 261 partial-flushes (flushing a total of 345641 entities and 345641 collections)

}

.....


2018-02-17 20:30:27,536 INFO  org.hibernate.engine.internal.StatisticalLoggingSessionEventListener @ Session Metrics {

    120941 nanoseconds spent acquiring 1 JDBC connections;

    0 nanoseconds spent releasing 0 JDBC connections;

    19210 nanoseconds spent preparing 1 JDBC statements;

    190340 nanoseconds spent executing 1 JDBC statements;

    0 nanoseconds spent executing 0 JDBC batches;

    0 nanoseconds spent performing 0 L2C puts;

    0 nanoseconds spent performing 0 L2C hits;

    0 nanoseconds spent performing 0 L2C misses;

    0 nanoseconds spent executing 0 flushes (flushing a total of 0 entities and 0 collections);

    0 nanoseconds spent executing 0 partial-flushes (flushing a total of 0 entities and 0 collections)


=====

The collection has PDF and ZIPPED MASTER
=====

2018-02-17 20:31:25,062 INFO  org.hibernate.engine.internal.StatisticalLoggingSessionEventListener @ Session Metrics {

    168784 nanoseconds spent acquiring 1 JDBC connections;

    0 nanoseconds spent releasing 0 JDBC connections;

    12220421 nanoseconds spent preparing 1276 JDBC statements;

    282597546 nanoseconds spent executing 1276 JDBC statements;

    0 nanoseconds spent executing 0 JDBC batches;

    237448 nanoseconds spent performing 32 L2C puts;

    4220734 nanoseconds spent performing 300 L2C hits;

    2013993 nanoseconds spent performing 1008 L2C misses;

    7370699 nanoseconds spent executing 2 flushes (flushing a total of 4078 entities and 5434 collections);

    708646098 nanoseconds spent executing 301 partial-flushes (flushing a total of 419170 entities and 419170 collections)

}

.....


2018-02-17 20:31:25,064 INFO  org.hibernate.engine.internal.StatisticalLoggingSessionEventListener @ Session Metrics {

    102177 nanoseconds spent acquiring 1 JDBC connections;

    0 nanoseconds spent releasing 0 JDBC connections;

    15913 nanoseconds spent preparing 1 JDBC statements;

    195207 nanoseconds spent executing 1 JDBC statements;

    0 nanoseconds spent executing 0 JDBC batches;

    0 nanoseconds spent performing 0 L2C puts;

    0 nanoseconds spent performing 0 L2C hits;

    0 nanoseconds spent performing 0 L2C misses;

    0 nanoseconds spent executing 0 flushes (flushing a total of 0 entities and 0 collections);

    0 nanoseconds spent executing 0 partial-flushes (flushing a total of 0 entities and 0 collections)



=====

Should we consider this is a performance bug and issue a jira ticket?

Best,
Ying

Kim Shepherd

unread,
Feb 21, 2018, 5:22:47 PM2/21/18
to DSpace Technical Support
Hi Ying,

Just to help properly reproduce your conditions, could you confirm what "load the collection" refers to and which UI you're using? Which specific page(s) will trigger the load times? Is it the collection home page, a search/browse of (or scoped to) the collection, etc? Administrative pages? Do you also see slow load times on item pages of the items with many bitstreams

Cheers

Kim

Ying Jin

unread,
Feb 22, 2018, 10:04:30 AM2/22/18
to DSpace Technical Support
Hi Kim,

We are using XMLUI and the trigger is trying to open the collection home page. I think search will suffer too but I didn't test on it. Opening the individual item with lots of bitstreams(100+) is kind of slow too, but not as bad (it might because we display 20 recent submission items x 100+/item = 2000+ bitstreams to load a collection?).

Please let me know if you need any test collections or the full log that I may share through google drive. Thanks so much for working on it!

Best,
Ying

Bill T

unread,
Feb 22, 2018, 11:40:09 AM2/22/18
to DSpace Technical Support
Ying

As you've noticed I'm having the same difficulties with DSpace 6x...  How did you get this log info?  I have show_sql = true in hibernate.cfg, but I'm not seeing the same info...
--Bill

Ying Jin

unread,
Feb 22, 2018, 1:50:33 PM2/22/18
to DSpace Technical Support
Hi Bill,

I put some extra lines in log4j.properties

First, change this to DEBUG

loglevel.other=DEBUG


And add following:

log4j.logger.org.hibernate=INFO, hb

log4j.logger.org.hibernate.SQL=DEBUG

log4j.logger.org.hibernate.type=TRACE

log4j.logger.org.hibernate.hql.ast.AST=info

log4j.logger.org.hibernate.tool.hbm2ddl=warn

log4j.logger.org.hibernate.hql=debug

log4j.logger.org.hibernate.cache=info

log4j.logger.org.hibernate.jdbc=debug

log4j.logger.org.hibernate.stat=debug


Also, I added extra lines to hibernate.cfg.xml file:

<property name="hibernate.generate_statistics">true</property>

        <property name="org.hibernate.stat">debug</property>

        <property name="show_sql">true</property>

        <property name="format_sql">true</property>

        <property name="use_sql_comments">true</property>


It generates huge log files, so be careful with using it. I only use this on our development server and when I am not debugging, I turn them off.

Hope this helps!

Best,
Ying

Bill T

unread,
Feb 22, 2018, 2:01:17 PM2/22/18
to DSpace Technical Support
Thanks! -- I missed a couple of these.  Yes, really verbose output!  No problem, because I have yet to move this to production.
--Bill

Bill T

unread,
Feb 22, 2018, 3:33:34 PM2/22/18
to DSpace Technical Support
Here is my hibernate log for /community-list (page takes about 90 seconds to load);  We have about 200 communities, 1200 collections, and 64,000 items.  Unlike Ying, we essentially have one original bitstream per item, primarily pdf files.  Redhat 7, Java 8, Tomcat 8.5, Postgresql 9.6.2;  32G Ram, 8 CPU  This setup worked well with previous versions up to and including 5.8

I'm not trying to pile on, but hopefully some additional info will be helpful!


2018-02-22 13:52:05,871 INFO  org.hibernate.engine.internal.StatisticalLoggingSessionEventListener @ Session Metrics {
    372441 nanoseconds spent acquiring 1 JDBC connections;
    0 nanoseconds spent releasing 0 JDBC connections;
    233248 nanoseconds spent preparing 3 JDBC statements;
    482887 nanoseconds spent executing 1 JDBC statements;
    1985778 nanoseconds spent executing 2 JDBC batches;
    3867293 nanoseconds spent performing 3 L2C puts;
    0 nanoseconds spent performing 0 L2C hits;
    0 nanoseconds spent performing 0 L2C misses;
    29762475 nanoseconds spent executing 2 flushes (flushing a total of 2 entities and 0 collections);
    0 nanoseconds spent executing 0 partial-flushes (flushing a total of 0 entities and 0 collections)
}

2018-02-22 13:52:05,873 INFO  org.hibernate.engine.internal.StatisticalLoggingSessionEventListener @ Session Metrics {
    0 nanoseconds spent acquiring 0 JDBC connections;
    0 nanoseconds spent releasing 0 JDBC connections;
    0 nanoseconds spent preparing 0 JDBC statements;
    0 nanoseconds spent executing 0 JDBC statements;
    0 nanoseconds spent executing 0 JDBC batches;
    0 nanoseconds spent performing 0 L2C puts;
    0 nanoseconds spent performing 0 L2C hits;
    0 nanoseconds spent performing 0 L2C misses;
    0 nanoseconds spent executing 0 flushes (flushing a total of 0 entities and 0 collections);
    0 nanoseconds spent executing 0 partial-flushes (flushing a total of 0 entities and 0 collections)
}

2018-02-22 13:52:16,450 INFO  org.hibernate.engine.internal.StatisticalLoggingSessionEventListener @ Session Metrics {
    440048 nanoseconds spent acquiring 1 JDBC connections;
    0 nanoseconds spent releasing 0 JDBC connections;
    273468 nanoseconds spent preparing 3 JDBC statements;
    547508 nanoseconds spent executing 1 JDBC statements;
    1702327 nanoseconds spent executing 2 JDBC batches;
    3683682 nanoseconds spent performing 3 L2C puts;
    0 nanoseconds spent performing 0 L2C hits;
    0 nanoseconds spent performing 0 L2C misses;
    27298233 nanoseconds spent executing 2 flushes (flushing a total of 2 entities and 0 collections);
    0 nanoseconds spent executing 0 partial-flushes (flushing a total of 0 entities and 0 collections)
}

2018-02-22 13:52:16,452 INFO  org.hibernate.engine.internal.StatisticalLoggingSessionEventListener @ Session Metrics {
    0 nanoseconds spent acquiring 0 JDBC connections;
    0 nanoseconds spent releasing 0 JDBC connections;
    0 nanoseconds spent preparing 0 JDBC statements;
    0 nanoseconds spent executing 0 JDBC statements;
    0 nanoseconds spent executing 0 JDBC batches;
    0 nanoseconds spent performing 0 L2C puts;
    0 nanoseconds spent performing 0 L2C hits;
    0 nanoseconds spent performing 0 L2C misses;
    0 nanoseconds spent executing 0 flushes (flushing a total of 0 entities and 0 collections);
    0 nanoseconds spent executing 0 partial-flushes (flushing a total of 0 entities and 0 collections)
}

2018-02-22 13:52:27,121 INFO  org.hibernate.engine.internal.StatisticalLoggingSessionEventListener @ Session Metrics {
    429299 nanoseconds spent acquiring 1 JDBC connections;
    0 nanoseconds spent releasing 0 JDBC connections;
    236026 nanoseconds spent preparing 3 JDBC statements;
    561023 nanoseconds spent executing 1 JDBC statements;
    1029554 nanoseconds spent executing 2 JDBC batches;
    3446462 nanoseconds spent performing 3 L2C puts;
    0 nanoseconds spent performing 0 L2C hits;
    0 nanoseconds spent performing 0 L2C misses;
    23575938 nanoseconds spent executing 2 flushes (flushing a total of 2 entities and 0 collections);
    0 nanoseconds spent executing 0 partial-flushes (flushing a total of 0 entities and 0 collections)
}

2018-02-22 13:52:27,123 INFO  org.hibernate.engine.internal.StatisticalLoggingSessionEventListener @ Session Metrics {
    0 nanoseconds spent acquiring 0 JDBC connections;
    0 nanoseconds spent releasing 0 JDBC connections;
    0 nanoseconds spent preparing 0 JDBC statements;
    0 nanoseconds spent executing 0 JDBC statements;
    0 nanoseconds spent executing 0 JDBC batches;
    0 nanoseconds spent performing 0 L2C puts;
    0 nanoseconds spent performing 0 L2C hits;
    0 nanoseconds spent performing 0 L2C misses;
    0 nanoseconds spent executing 0 flushes (flushing a total of 0 entities and 0 collections);
    0 nanoseconds spent executing 0 partial-flushes (flushing a total of 0 entities and 0 collections)
}

2018-02-22 13:52:37,220 INFO  org.hibernate.engine.internal.StatisticalLoggingSessionEventListener @ Session Metrics {
    2570171 nanoseconds spent acquiring 1 JDBC connections;
    0 nanoseconds spent releasing 0 JDBC connections;
    27121987 nanoseconds spent preparing 2831 JDBC statements;
    641543786 nanoseconds spent executing 2831 JDBC statements;
    0 nanoseconds spent executing 0 JDBC batches;
    1108158 nanoseconds spent performing 23 L2C puts;
    892275 nanoseconds spent performing 13 L2C hits;
    249809 nanoseconds spent performing 28 L2C misses;
    58195179 nanoseconds spent executing 2 flushes (flushing a total of 10470 entities and 24994 collections);
    185425636 nanoseconds spent executing 17 partial-flushes (flushing a total of 15132 entities and 15132 collections)
}

2018-02-22 13:52:37,226 INFO  org.hibernate.engine.internal.StatisticalLoggingSessionEventListener @ Session Metrics {
    0 nanoseconds spent acquiring 0 JDBC connections;
    0 nanoseconds spent releasing 0 JDBC connections;
    0 nanoseconds spent preparing 0 JDBC statements;
    0 nanoseconds spent executing 0 JDBC statements;
    0 nanoseconds spent executing 0 JDBC batches;
    0 nanoseconds spent performing 0 L2C puts;
    0 nanoseconds spent performing 0 L2C hits;
    0 nanoseconds spent performing 0 L2C misses;
    0 nanoseconds spent executing 0 flushes (flushing a total of 0 entities and 0 collections);
    0 nanoseconds spent executing 0 partial-flushes (flushing a total of 0 entities and 0 collections)
}

2018-02-22 13:54:36,355 INFO  org.hibernate.engine.internal.StatisticalLoggingSessionEventListener @ Session Metrics {
    267649 nanoseconds spent acquiring 1 JDBC connections;
    0 nanoseconds spent releasing 0 JDBC connections;
    45162338 nanoseconds spent preparing 4812 JDBC statements;
    1756479486 nanoseconds spent executing 4812 JDBC statements;
    0 nanoseconds spent executing 0 JDBC batches;
    36704624 nanoseconds spent performing 1411 L2C puts;
    50803773 nanoseconds spent performing 5623 L2C hits;
    14742351 nanoseconds spent performing 11243 L2C misses;
    31181919 nanoseconds spent executing 2 flushes (flushing a total of 13504 entities and 25940 collections);
    60069213452 nanoseconds spent executing 7045 partial-flushes (flushing a total of 37986752 entities and 37986752 collections)
}

2018-02-22 13:54:36,358 INFO  org.hibernate.engine.internal.StatisticalLoggingSessionEventListener @ Session Metrics {

Kim Shepherd

unread,
Mar 11, 2018, 5:43:42 PM3/11/18
to Bill T, Ying Jin, DSpace Technical Support
Hi Bill and Ying, just a note to say I haven't managed to reproduce / investigate this problem thoroughly yet -- i keep getting sidetracked with other work! And I'm struggling a bit to reproduce with test data now that Bill is reporting the same problem with a fairly small / typical amount of items and bitstreams, so it's obviously not just connected to number of bitstreams attached to a single item.

However I still have this on my radar, and if you discover anything else please let the list know! Once we know exactly which conditions are needed to reproduce this / identify where things are going wrong, exactly, we'll log a JIRA issue so we can get a fix in for a future DSpace version

Cheers!

Kim

M: k...@shepherd.nz
T: @kimshepherd
P: +6421883635

0CCB D957 0C35 F5C1 497E CDCF FC4B ABA3 2A1A FAEC

--
You received this message because you are subscribed to a topic in the Google Groups "DSpace Technical Support" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/dspace-tech/VIofW7EwEXY/unsubscribe.
To unsubscribe from this group and all its topics, send an email to dspace-tech+unsubscribe@googlegroups.com.
To post to this group, send email to dspac...@googlegroups.com.
Visit this group at https://groups.google.com/group/dspace-tech.
For more options, visit https://groups.google.com/d/optout.

Tom Desair

unread,
Mar 11, 2018, 6:39:32 PM3/11/18
to DSpace Technical Support
Thanks all, this is really useful information!

Since in both cases most time is lost on executing "flushes" or "partial-flushes", I think the problem is similar to https://jira.duraspace.org/browse/DS-3552: In AUTO mode Hibernate keeps track of all objects loaded from the database and tries to "auto-save" them in the database before executing a new query. But the more objects Hibernate loads into its session, the more objects it needs to check for modification and possible interference with the new query that is being executed. Hence the more time it takes.

@Bill and @Ying, would it be possible for any of you to provide an anonymised and cleaned database that you can share with us? Make sure the database does not contain any sensitive information. I'm sure such a database will also be useful for other (demo and testing) purposes and the development of DSpace 7.

@Ying, for the Collection and Community pages can you try setting the context mode to READ_ONLY in the getValidity(), addPageMeta() and addBody() methods of classes org.dspace.app.xmlui.aspect.artifactbrowser.CollectionViewer and org.dspace.app.xmlui.aspect.artifactbrowser.CommunityViewer. Here's an example of how to do that: https://github.com/DSpace/DSpace/pull/1694/files#diff-fb7249e19a6a8652860795d8fd59cd9e

@Bill, unfortunately the class responsible for the /community-list page already uses the READ_ONLY mode (https://github.com/dspace/DSpace/blob/dspace-6_x/dspace-xmlui/src/main/java/org/dspace/app/xmlui/aspect/artifactbrowser/CommunityBrowser.java#L245). But the fact that your number of (partial-)flushes is still so high, I wonder if there is still another place hibernate is loading so many objects. However as a workaround, you could enable caching of that /community-list page by setting the xmlui.community-list.cache property in dspace.cfg to true (https://github.com/DSpace/DSpace/blob/dspace-6_x/dspace/config/dspace.cfg#L1917).


Best regards,
Tom
 
logoTom Desair
250-B Suite 3A, Lucius Gordon Drive, West Henrietta, NY 14586
Gaston Geenslaan 14, Leuven 3001, Belgium
www.atmire.com

--
You received this message because you are subscribed to the Google Groups "DSpace Technical Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dspace-tech+unsubscribe@googlegroups.com.

Ying Jin

unread,
Apr 2, 2018, 4:31:04 PM4/2/18
to Tom Desair, Lisa Spiro, tdon...@duraspace.org, DSpace Technical Support
Hi Tom,

Sorry for late reply. Somehow I missed your posting and just found it by doing a search.

"would it be possible for any of you to provide an anonymised and cleaned database that you can share with us? Make sure the database does not contain any sensitive information. I'm sure such a database will also be useful for other (demo and testing) purposes and the development of DSpace 7."

==== Yes, I would love to have an anonymised and cleaned database to share with you. I can't include everything we have, however, it would include several collections that we are experiencing the performance problems. I am wondering if it is easier sending them out as simple archive format and you can ingest into whatever database you have?


"for the Collection and Community pages can you try setting the context mode to READ_ONLY in the getValidity(), addPageMeta() and addBody() methods of classes org.dspace.app.xmlui.aspect.artifactbrowser.CollectionViewer and org.dspace.app.xmlui.aspect.artifactbrowser.CommunityViewer. Here's an example of how to do that: https://github.com/DSpace/DSpace/pull/1694/files#diff-fb7249e19a6a8652860795d8fd59cd9e"

==== I will give it a try and see if it is indeed make any difference of our problem. Although I think the ultimate solution might be to not loading unnecessary objects (in our case, it may not necessary to load all the bitstreams for a collection home page; )

"unfortunately the class responsible for the /community-list page already uses the READ_ONLY mode (https://github.com/dspace/DSpace/blob/dspace-6_x/dspace-xmlui/src/main/java/org/dspace/app/xmlui/aspect/artifactbrowser/CommunityBrowser.java#L245). But the fact that your number of (partial-)flushes is still so high, I wonder if there is still another place hibernate is loading so many objects. However as a workaround, you could enable caching of that /community-list page by setting the xmlui.community-list.cache property in dspace.cfg to true (https://github.com/DSpace/DSpace/blob/dspace-6_x/dspace/config/dspace.cfg#L1917)."

==== Is there any configuration for the cache of collection page? I thought they are enabled by default but from the page loading, it doesn't seem like the pages are cached ... 

Best,
Ying

Hugo Carlos

unread,
Apr 3, 2018, 10:02:37 AM4/3/18
to DSpace Technical Support
Hi,

I think we had similar issues with xmlui at the exhibition of some itens we imported from scopus, where there are a large group of author (2000 people at the CERN workgroup, for example,as you can see from the attached image). We had modified the item-view.xml to look only for 3 of the author. (as seen in second image with the same registry), and it apparently solved the issue of performance when to exhibit those itens. Maybe if you limit the bitstreams to look when xml is building the exhibition could solve your issue?





Ying Jin

unread,
Apr 3, 2018, 3:15:40 PM4/3/18
to DSpace Technical Support


Hi Hugo,

Thanks for the advice. Somehow, we don't have problem with item page but collection page. 

We updated the collection page with 3 recent submissions instead of 20 and zipped most of our masters so that the number of bitstreams for each item decreased. It helped a lot but its a temporary solution. It would be great if DSpace performs well with the setup we used to have.

Best,
Ying

On Tue, Apr 3, 2018 at 8:15 AM, Hugo Carlos <hugo....@ufabc.edu.br> wrote:
Hi,

I think we had similar issues with xmlui at the exhibition of some itens we imported from scopus, where there are a large group of author (2000 people at the CERN workgroup, for example,as you can see from the attached image). We had modified the item-view.xml to look only for 3 of the author. (as seen in second image with the same registry), and it apparently solved the issue of performance when to exhibit those itens. Maybe if you limit the bitstreams to look when xml is building the exhibition could solve your issue?






_________________________________
Hugo da Silva Carlos
Bibliotecário
Universidade Federal do ABC
55+ (11) 4996-7934

Terry Brady

unread,
Apr 4, 2018, 11:59:18 AM4/4/18
to Ying Jin, DSpace Technical Support
This sounds like a significant issue that should be tracked for the next DSpace release.  I created a Jira ticket to track it.  I referenced this thread in the ticket.

Terry Brady
Applications Programmer Analyst
Georgetown University Library Information Technology
425-298-5498 (Seattle, WA)

Tim Donohue

unread,
Apr 10, 2018, 10:20:21 AM4/10/18
to Terry Brady, Ying Jin, DSpace Technical Support
Hi Ying (and all),

We now have a possible solution to this Collection homepage slowness in DSpace 6.2.  It does require minor Java code changes to the XMLUI (in a single class).  But, if you have time/willingness, it might be worth testing on your end prior to the new DSpace release (version 6.3) -- that way we can ensure it completely solves the problem on your site.

Here's the Pull Request (PR) including the necessary code changes: https://github.com/DSpace/DSpace/pull/2016
And here's a downloadable "diff" of the changes included in that PR: https://github.com/DSpace/DSpace/pull/2016.diff 

Let us know if you are able to test it out. If so, please comment either in this email thread or on the ticket and let us know if it solves the issue: https://jira.duraspace.org/browse/DS-3883  

Thanks,
Tim



To unsubscribe from this group and all its topics, send an email to dspace-tech...@googlegroups.com.

To post to this group, send email to dspac...@googlegroups.com.
Visit this group at https://groups.google.com/group/dspace-tech.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "DSpace Technical Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dspace-tech...@googlegroups.com.

To post to this group, send email to dspac...@googlegroups.com.
Visit this group at https://groups.google.com/group/dspace-tech.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "DSpace Technical Support" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/dspace-tech/VIofW7EwEXY/unsubscribe.
To unsubscribe from this group and all its topics, send an email to dspace-tech...@googlegroups.com.

To post to this group, send email to dspac...@googlegroups.com.
Visit this group at https://groups.google.com/group/dspace-tech.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "DSpace Technical Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dspace-tech...@googlegroups.com.

To post to this group, send email to dspac...@googlegroups.com.
Visit this group at https://groups.google.com/group/dspace-tech.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "DSpace Technical Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dspace-tech...@googlegroups.com.

To post to this group, send email to dspac...@googlegroups.com.
Visit this group at https://groups.google.com/group/dspace-tech.
For more options, visit https://groups.google.com/d/optout.
--
Terry Brady
Applications Programmer Analyst
Georgetown University Library Information Technology
425-298-5498 (Seattle, WA)

--
You received this message because you are subscribed to the Google Groups "DSpace Technical Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dspace-tech...@googlegroups.com.

To post to this group, send email to dspac...@googlegroups.com.
Visit this group at https://groups.google.com/group/dspace-tech.
For more options, visit https://groups.google.com/d/optout.
--
Tim Donohue
Technical Lead for DSpace & DSpaceDirect
DuraSpace.org | DSpace.org | DSpaceDirect.org

Tim Donohue

unread,
Apr 11, 2018, 10:13:10 AM4/11/18
to Ying Jin, DSpace Technical Support
Hi Ying (I'm copying back in DSpace-Tech, as I'd like this discussion to stay on-list),

Looking back at this thread, I see you mentioned the custom MASTER bundle earlier as well. I had forgotten that.

The core issue (that I resolved in https://github.com/DSpace/DSpace/pull/2016) is actually not directly related to Hibernate. The issue we discovered is the XMLUI themes/aspects are loading and parsing bundles that *are unnecessary to display the Collection homepage*.   So, in PR#2016, I resolved this by ensuring that, on the Collection/Community homepages, the XMLUI themes/aspects never load the ORIGINAL bundle content (and only load up the first bitstream in the THUMBNAIL bundle).

Since this didn't improve your performance, I'd venture to guess that the XMLUI theme you are using is either purposefully or accidentally loading up you custom MASTER bundle (and looping through all the objects in that bundle).  Assuming you don't need any information about the MASTER bitstreams on the Collection homepage, the fix would be to ensure your theme is *only* loading/using the THUMBNAIL bundle (as likely your Collection homepages will want to display thumbnails for items that have them).

This is usually controlled at the *theme* level in the XMLUI.  

For example, our demo.dspace.org site uses the Mirage 2 theme.  The Mirage 2 theme specifically only loads the THUMBNAIL bundle for items on the Collection homepage.  You can see this by viewing the HTML source of a Collection's homepage, e.g. http://demo.dspace.org/xmlui/handle/10673/2

In the HTML source, where each Item is displayed (in the Recent Submissions section) you'll see an HTML comment that says:
<!-- External Metadata URL: cocoon://metadata/handle/10673/80/mets.xml?sections=dmdSec,fileSec&fileGrpTypes=THUMBNAIL--> 

This comment shows where the metadata was gathered for this Item.  In fact, you can paste that URL into your browser to see the Item's metadata (in METS format): 
http://demo.dspace.org/xmlui/metadata/handle/10673/80/mets.xml?sections=dmdSec,fileSec&fileGrpTypes=THUMBNAIL

The key point here is that URL includes a parameter "fileGrpTypes=THUMBNAIL".  That parameter restricts the theme to ONLY using the THUMBNAIL bundle of Items when creating the Recent Submissions lis on a Collection/Community homepage.

My guess here is that your local theme may NOT include this key optimization.  If instead, the URL were to just point at the "mets.xml", then by default your theme would load ALL bitstreams in ALL bundles for ALL items displayed in that Recent Submissions listing. As you probably can guess, that would take a VERY LONG time if you have hundreds of bitstreams...and that time is wasted unless your theme actually needs to display something from those bitstreams. The reason this behavior occurs in the XMLUI is that the XMLUI itself doesn't know which "data" the theme requires. If the theme just asks for the "mets.xml", then the XMLUI will assume the theme needs *all information about the item* (including all bitstreams/bundles).  So, it's up to the theme to properly tell the XMLUI which data to load.



If your theme is based on the original dri2xhtml.xsl base theme (which is less likely), you'd find that setting here: https://github.com/DSpace/DSpace/blob/dspace-6_x/dspace-xmlui/src/main/webapp/themes/dri2xhtml/structural.xsl#L3600 

Notice how in all of these themes, the theme always makes sure to append "fileGrpTypes=THUMBNAIL" when in the "summaryList" mode (this mode is the one used for Recent Submissions listings, search results, etc.).  Obviously, if a specific theme requires more than just THUMBNAIL bitstreams in the Recent Submissions listing, you can append other Bundle names to the "fileGrpTypes".  But, beware that doing so will (by default) tell the XMLUI to loop through all bitstreams in that bundle to ensure they are ALL included in the DRI XML output so the Theme can select which one(s) to display.

I'd recommend double checking your theme(s) to ensure they all include this optimizatio. My guess is that one or more of them likely do not, which may be why bitstreams in your MASTER bundle end up getting loaded, and end up affecting the performance of your Collection/Community homepages.

I hope that helps!

Tim


On Tue, Apr 10, 2018 at 8:02 PM Ying Jin <elsa...@gmail.com> wrote:
Hi Tim,


I have the code applied and tested on our development site. It's a great improvement for the XMLUI code. Unfortunately, most of our collections still slow loading even we have only 3 recent submissions for the collection home page.
 
We have our master files preserved with items, that's the most of our bitstreams come from. The masters are hidden from end users and have no thumbnails or txt derivatives. Since the code change is limiting number of thumbnails to load, it doesn't quite apply to our situation.


Some other situations is, we have one collection with gallery theme which we have to load all 300+ thumbnails; Another is TEI theme, which we have more about 790 original bitstreams and masters each item, we need to load all the thumbnails for the TEI generation too. 

Hope this helps,

Best,
Ying


On Tue, Apr 10, 2018 at 9:20 AM, Tim Donohue <tdon...@duraspace.org> wrote:
Hi Ying (and all),

We now have a possible solution to this Collection homepage slowness in DSpace 6.2.  It does require minor Java code changes to the XMLUI (in a single class).  But, if you have time/willingness, it might be worth testing on your end prior to the new DSpace release (version 6.3) -- that way we can ensure it completely solves the problem on your site.

Here's the Pull Request (PR) including the necessary code changes: https://github.com/DSpace/DSpace/pull/2016
And here's a downloadable "diff" of the changes included in that PR: https://github.com/DSpace/DSpace/pull/2016.diff 

Let us know if you are able to test it out. If so, please comment either in this email thread or on the ticket and let us know if it solves the issue: https://jira.duraspace.org/browse/DS-3883  

Thanks,
Tim


On Wed, Apr 4, 2018 at 10:59 AM Terry Brady <Terry...@georgetown.edu> wrote:
This sounds like a significant issue that should be tracked for the next DSpace release.  I created a Jira ticket to track it.  I referenced this thread in the ticket.

On Tue, Apr 3, 2018 at 12:15 PM, Ying Jin <elsa...@gmail.com> wrote:


Hi Hugo,

Thanks for the advice. Somehow, we don't have problem with item page but collection page. 

We updated the collection page with 3 recent submissions instead of 20 and zipped most of our masters so that the number of bitstreams for each item decreased. It helped a lot but its a temporary solution. It would be great if DSpace performs well with the setup we used to have.

Best,
Ying
On Tue, Apr 3, 2018 at 8:15 AM, Hugo Carlos <hugo....@ufabc.edu.br> wrote:
Hi,

I think we had similar issues with xmlui at the exhibition of some itens we imported from scopus, where there are a large group of author (2000 people at the CERN workgroup, for example,as you can see from the attached image). We had modified the item-view.xml to look only for 3 of the author. (as seen in second image with the same registry), and it apparently solved the issue of performance when to exhibit those itens. Maybe if you limit the bitstreams to look when xml is building the exhibition could solve your issue?

image.png

image.png


--

Ying Jin

unread,
Apr 11, 2018, 11:06:45 AM4/11/18
to DSpace Technical Support
Hi Tim,

Thanks very much for the detailed information and great suggestion! It makes sense to not load original bundles or any other bundles that are not related to the page view. 

Now I remembered I customized the "sections=dmdSec,fileSec&fileGrpTypes=THUMBNAIL" part long time ago for some customizations as we need more info from mets.xml than what it is provided by default. I need to go and check what I have changed and report back!

For the thumbnail part, it is still a problem for us as I mentioned earlier, we have special theme needs to load all the thumbnails for the Gallery theme and TEI theme.

I posted a code update on ticket about the READ_ONLY mode. For read-only transactions, we should set Hibernate flush mode to MANUAL. However, DSpace missed several places to change the default AUTO to MANUAL, that's why we got lots of partial-flushing in the log. I will post more findings on the tickets.

Best,
Ying

Tim Donohue

unread,
Apr 11, 2018, 11:19:10 AM4/11/18
to Ying Jin, DSpace Technical Support
Hi Ying,

Looking forward to hearing what you find out.

In the meantime, I'd like to better understand this statement:


"For the thumbnail part, it is still a problem for us as I mentioned earlier, we have special theme needs to load all the thumbnails for the Gallery theme and TEI theme."

Could you describe how you are using/displaying thumbnails in the Gallery theme and TEI theme? Are you displaying multiple thumbnails *per item* on the Community/Collection homepages (e.g. in Recent Submissions) in these themes?  If so, is there a limit to the number of thumbnails you display per Item (or a limit to the number of thumbnails that tend to exist per item)? Or is each item still represented by a single thumbnail on the Community/Collection homepage?  Do you have links you could share with us regarding these themes, so that we can better understand the use case here?  

Obviously, we don't want to include changes in DSpace 6.3 that would improve performance but also break specific theme use cases.

Thanks

- Tim

Ying Jin

unread,
Apr 12, 2018, 2:02:10 AM4/12/18
to DSpace Technical Support
Hi Tim,

I have the fileSec updated so it only retrieve THUMBNAILs and it significantly improved the page loading. Though some of our static thumbnails broken, I am sure I can find another way to fix that.

Cheers,
Ying

Bill Tantzen

unread,
Sep 28, 2018, 11:11:36 AM9/28/18
to elsa...@gmail.com, DSpace Technical Support
I'm very tardy jumping back into this party, but here's an update:  I just upgraded my dev site to version 6.3, and the speed problems persist.  I have made sure to optimize the mets requests in my theme, but that's actually a moot point, because the slowness is equally apparent even in the Default Reference Theme and the Atmire Mirage2 Theme (which I assume has been optimized, in v6.3, correct?).

For /community-list with ~1500 collections + communities and 62K items, the page load time is minutes.  For our main community page, which contains the bulk of those sub-communities and collections (maybe 90%), the page load time is about 15 seconds.  That's still too slow to put into production, but makes me wonder why the difference in speed is so vast compared with the number of children in each case.

Also, in one respect, the performance is even worse now.  If you dig up an earlier thread that I started, you will remember that the speed is just fine when logged in as an admin.  This is no longer the case, as there is no difference in speed between admin and anonymous users.

Naturally, this is causing us to sit on v5.x for now, but since the rest half of v7 is based on v6, we need to figure this out!

Cheers,
Bill

Tim Donohue

unread,
Sep 28, 2018, 11:47:06 AM9/28/18
to Bill Tantzen, elsa...@gmail.com, DSpace Technical Support
Hi Bill,

The issue you are seeing doesn't sound to be the same as the original issue in this thread.  The behavior that was fixed in 6.3 was a behavior where the *Collection homepage* loaded extremely slowly if you had Items that had large number of bitstreams/thumbnails.  The fix was to be much smarter about when to load Thumbnails/bitstreams into memory (and it was a fix to the Java code under the XMLUI, so it applied to all themes): https://github.com/DSpace/DSpace/pull/2016

The behavior you seem to be describing sounds different... you are talking about the /community-list page with a large number of collections/communities, and a Community page with a large number of sub-communities/collections.  So, I'm not surprised that this specific 6.3 optimization had no effect -- as it was only for Collection homepages.

I cannot say for certain what is behind the behavior you are seeing -- it's very possible that it's semi-related in that DSpace may be loading too many objects into memory at once (which slows down the application, especially with the addition of Hibernate in 6.x).  But, in your situation, it sounds like maybe DSpace is loading too many Communities/Collections at once (whereas the bug fixed was about ensuring DSpace didn't load too many bitstreams into memory at once)

All in all, I think we may need to create a new ticket & begin a new investigation into why this other performance behavior is occurring.  Hopefully it's something we can track down and fix in a 6.4 release.

- Tim

--
All messages to this mailing list should adhere to the DuraSpace Code of Conduct: https://duraspace.org/about/policies/code-of-conduct/
---
You received this message because you are subscribed to the Google Groups "DSpace Technical Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dspace-tech...@googlegroups.com.
To post to this group, send email to dspac...@googlegroups.com.
Visit this group at https://groups.google.com/group/dspace-tech.
For more options, visit https://groups.google.com/d/optout.
--

kar...@gmail.com

unread,
Oct 31, 2018, 4:53:55 PM10/31/18
to DSpace Technical Support
Hi all,

As Ying suggests in his second post here, this seems to affect search (i.e. /discover) pages as well. The fix mentioned by Tim (https://github.com/DSpace/DSpace/pull/2016) doesn't seem to affect search pages.

I've encountered this on both 5.9 and 6.3. I created another post (https://groups.google.com/forum/#!topic/dspace-tech/PbKbDfQlhok) before I saw this one.

Jacob


On Wednesday, February 21, 2018 at 4:02:06 PM UTC-6, Ying Jin wrote:

Bill Tantzen

unread,
Nov 1, 2018, 10:40:11 AM11/1/18
to kar...@gmail.com, dspac...@googlegroups.com
In my case, I see no performance problem on discovery pages. It is
only on pages with a large number of records, for instance
community-list or on community pages with a large number of
sub-communities and collections.

For me, it is most apparent for /community-list, which takes a couple
minutes to display.
-- Bill
> --
> All messages to this mailing list should adhere to the DuraSpace Code of Conduct: https://duraspace.org/about/policies/code-of-conduct/
> ---

kar...@gmail.com

unread,
Nov 1, 2018, 10:45:22 AM11/1/18
to DSpace Technical Support
Hi Bill,

Thanks for your note. A quick follow up question:

On the discovery pages you tested, do most of the search result items have 1 or 2 or 3 bitstreams (e.g., an ORIGINAL file and a PREVIEW and a THUMBNAIL), or do they have many?

I think my issue is that many of my search result items have dozens, sometimes hundreds, of bitstreams associated with them, and the discovery page is loading metadata individually for each of these bitstreams (which for a search results page with 20 items could easily be many thousands).

Thanks,

Jacob
Reply all
Reply to author
Forward
0 new messages