performance tuning for Islandora

546 views
Skip to first unread message

Hardy Pottinger

unread,
Mar 23, 2012, 6:32:21 PM3/23/12
to isla...@googlegroups.com
Hi, we're kicking the tires of an Islandora installation, based on the vbox image (I think I may have mentioned that earlier, but don't think it would hurt to mention it again, here). Several people who have used the system have commented on the slow performance, and I've done what I can to improve things. We've thrown more CPU and memory at the VM, tweaked the JAVA_OPTS to get more memory to Tomcat, have installed APC (alternative PHP cache) for PHP. And things have improved a bit, but are still slow. The most noticably slow thing is the speed that thumbnails render. Is there anything I can do to either pre-generate thumbnails, or otherwise speed up their display? Is there a single resource that someone can point me to that lists things I can try to improve the performance of a new installation? Thanks!

Chuck Schoppet

unread,
Mar 24, 2012, 8:28:21 AM3/24/12
to isla...@googlegroups.com


I am also interested hearing how to tweak Islandora for larger repositories (200,000 plus items). Our Mulgara triplestore has begin running out of Java memory during more complex queries.
Thanks 

Richard Wincewicz

unread,
Mar 24, 2012, 9:45:42 AM3/24/12
to isla...@googlegroups.com
Hi Hardy

Performance tuning is always a tricky area as it depends a lot on the
hardware you have available and your use cases. I don't think we have
any specific documentation on how to improve performance of an
islandora install but I can pass on some of the experiences we've had
at the Robertson Library in setting up and maintaining our servers.

Firstly, the VM we distribute is only meant to give users a taste of
what islandora can do and so is designed to work on as many systems as
possible. This means that performance was sacrificed to allow people
to run the VM on low powered systems. If you are planning to
thoroughly test islandora and put it into production then installing
from scratch would be a good way to go. This will give you complete
control over how things are set up, a deeper understanding of how the
stack works and a better idea of where performance improvements can be
made.

A java heap size of 4-8 Gb is usually plenty for a production tomcat
and setting both the upper and lower limits to the same value means
that all of that memory is allocated at the start.

APC certainly helps with the drupal side of things but keep an eye on
the amount of cache being used. We use munin
(http://munin-monitoring.org/) to monitor the fragmentation and memory
usage, as well as various other aspects of the server (load, memory
usage, apache stats, etc).

Another gain can be made by splitting the different components of the
stack out. We often have apache, mysql and tomcat/fedora running on
separate machines which spreads the load and allows each individual
server to be set up for a specific task. We have also noticed an
improvement by putting tomcat/fedora on bare metal rather than a VM.
The bottleneck for fedora is often I/O and performance is improved by
going to a real server from a virtual one.

One way around the problem with slow loading thumbnails is to store
them outside of fedora. This raises the issue of whether you want the
object to be completely self-contained or are happy to have some of
the content stored elsewhere. For thumbnails like the folder icon used
for collections an external reference datastream can be used to point
at an image on the webserver. As this will be the same file for each
occurrence it will mostly served out of cache rather than loading it
each time. For thumbnails that are created upon ingest from the
initial file it is more difficult.

Richard

> --
> You received this message because you are subscribed to the Google Groups
> "islandora" group.
> To view this discussion on the web visit
> https://groups.google.com/d/msg/islandora/-/0CzYtfIAL3kJ.
> To post to this group, send email to isla...@googlegroups.com.
> To unsubscribe from this group, send email to
> islandora+...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/islandora?hl=en.

Pottinger, Hardy J.

unread,
Mar 24, 2012, 3:33:35 PM3/24/12
to isla...@googlegroups.com
Richard, thank you. I understand the design concerns that went into
development of the vbox image. We are working towards a much more serious
test, using our own Fedora and Drupal stacks, but in the mean time, we
need to convince some people that the direction we're going in is an
appropriate one. Running a demo from the vbox image, set up on our campus
virtual infrastructure, seemed like a fast way to do that. Unfortunately,
the demo we've ended up with, while adequate for experimentation, is not
quite convincing. It's almost there. If there is anything I can do to
quickly increase the performance of thumbnail rendering, I'd appreciate
it. I've been reading a bit about Djatoka cache settings, but I have not
yet figured out how to implement Djatoka caching. If you have a writeup on
how to implement serving certain thumbnails outside of fedora, I'd love to
read it. Thanks!
--
HARDY POTTINGER <potti...@umsystem.edu>
University of Missouri Library Systems
http://lso.umsystem.edu/~pottingerhj/
https://MOspace.umsystem.edu/
"I am always doing that which I cannot do, in order that I may learn how
to do it." --Pablo Picasso

On 3/24/12 8:45 AM, "Richard Wincewicz" <richard....@googlemail.com>
wrote:

Richard Wincewicz

unread,
Mar 26, 2012, 7:17:12 AM3/26/12
to isla...@googlegroups.com
Hi Hardy

I don't have a writeup but it's not too difficult to get it set up for
a test. In the Fedora admin client
(http://<ip-address>:8080/fedora/admin) you can create datastreams
manually. To replace the thumbnails with externally referenced images
you would need to delete the existing TN datastreams and create new
ones with a control group of either 'External reference (E)' or
'Redirect (R)'. You add the URL to your resource in the Location field
and then Fedora will serve the thumbnail from the external location
rather than from Fedora. External reference still uses Fedora to
process the request whereas Redirect will redirect the browser to the
external URL. In order to do this automatically on ingest you would
have to change the code for the particular solution pack to save the
generated thumbnail somewhere outside of Fedora. When creating the
Fedora object you would have to check that the thumbnail had been made
correctly and associate the external file with the TN datastream.

Another thing that occurred to me after I sent my previous email was
that, while Virtual Box works well for a free program, there are
commerical offerings that perform better. If you have access to
something like VMWare Fusion then you will probably see a performance
increase over Virtual Box.

Richard

Gervais de Montbrun

unread,
Mar 26, 2012, 9:36:19 AM3/26/12
to isla...@googlegroups.com
Hi Hardy,

Everything Richard says about the VM we distribute it true... One more thing to note is that, in an effort to make it as portable as possible, it is running a 32-bit OS/hardware. Trying to make it perform well by increasing the RAM, may not be helpful. The OS can't see more than 4GB anyway.

Cheers,
Gervais

Phil Redmon

unread,
Mar 26, 2012, 9:49:46 AM3/26/12
to isla...@googlegroups.com
I have a feeling part of this speed issue is Drupal. The thumbs seem
to load a little faster (6 secs) as opposed to about 12 secs when I'm
logged out. Comments on Drupal.org say this has something to do with
the authentication system.

--
phil

Mark Leggott

unread,
Mar 26, 2012, 12:50:20 PM3/26/12
to isla...@googlegroups.com
This reminds me of an issue we had with some version of the VM and it was related to the network settings in Drupal, or something along those lines? I know Paul Pound had experienced this and had a solution. Thumbnails should load in milliseconds, not 12 seconds! There is definitely something wrong with this setup.

Mark

Pottinger, Hardy J.

unread,
Mar 26, 2012, 1:39:40 PM3/26/12
to isla...@googlegroups.com
Hi, just a quick point of clarification, while we're using the disk image
from the vbox demo, we are using our campus virtual infrastructure, which
is based on VMWare Fusion. So, we've just used the vbox image as a quick
starting place, so we can hit the ground running. It *is* good to be
reminded that this is a 32 bit OS, so we don't need to try throwing more
than 4GB at the VM, though.

--
HARDY POTTINGER <potti...@umsystem.edu>
University of Missouri Library Systems
http://lso.umsystem.edu/~pottingerhj/
https://MOspace.umsystem.edu/
"I am always doing that which I cannot do, in order that I may learn how
to do it." --Pablo Picasso

On 3/26/12 6:17 AM, "Richard Wincewicz" <richard....@googlemail.com>

Richard Wincewicz

unread,
Mar 26, 2012, 5:53:50 PM3/26/12
to isla...@googlegroups.com
Getting objects from Fedora while logged in will take longer because
not only is there authentication through Drupal but the credentials
then have to get passed to Fedora before it will deliver the content.
This difference in response time should be minimal and I agree with
Mark, 6-12 seconds is far too long to be waiting for thumbnails to
load even on the Virtual Box image.

Richard

Phil Redmon

unread,
Mar 27, 2012, 9:41:02 AM3/27/12
to isla...@googlegroups.com
We're still looking into this issue. The large image tiles take a
while to render as well, maybe 8 secs for the full size image. I'm
thinking maybe this has something to do with the image toolkit?

--
phil

Mark Leggott

unread,
Mar 27, 2012, 9:57:20 AM3/27/12
to isla...@googlegroups.com
Hi Phil,

A possible solution from one of our tech team:

"If the VM is slow it may be that Fedora is doing a DNS lookup and there is a 5 second timeout for this, so if Fedora can't resolve the DNS things slow way down. One way around this is to configure islandora with IP address and or add DNS entries to the vm images host file. By adding the DNS entry to the host file the DNS lookup will always resolve quickly."

Let me know if that helps.

Mark

Phil Redmon

unread,
Mar 27, 2012, 10:08:32 AM3/27/12
to isla...@googlegroups.com
Yep, we've added the IP/DNS entry to the hosts file. It did speed it
up from the initial 30 seconds or so of page loading.

We've also turned on all the caching in Drupal. The page loading
seems to be loading fine, as all the admin pages are snappy. It's
only when there are calls to the repo that it starts to bog down.
Even when we browse to the ~fedora/repository page, it takes around
4/6 secs.

Pottinger, Hardy J.

unread,
Mar 28, 2012, 3:35:33 PM3/28/12
to isla...@googlegroups.com
Hi, we appear to have resolved the performance issues. I used PsiProbe [1]
to diagnose the issue (not enough memory, and no memory tuning getting
through to the fedora application).


Two groups of changes have made a big impact:

1) I set the JAVA_OPTS directly in the fedora start script
(/etc/init.d/fedora), instead of relying on that script to pick up the
environment variables from the fedora user's bash_profile. Here are the
values I am using (borrowed from our production DSpace instance's start
script):

# recommended settings for a production DSpace environment
JAVA_OPTS="-Xmx1024M -Xms768M"JAVA_OPTS="$JAVA_OPTS -XX:MaxPermSize=128M"
JAVA_OPTS="$JAVA_OPTS -XX:PermSize=32M"

# tweak: use the parallel garbage collector
JAVA_OPTS="$JAVA_OPTS -XX:+UseParallelGC"

# turn on UTF-8 encoding for URLs
JAVA_OPTS="$JAVA_OPTS -Dfile.encoding=UTF-8"

# turn on the JMX remote management interface so PsiProbe can work
JAVA_OPTS="$JAVA_OPTS -Dcom.sun.management.jmxremote"

export JAVA_OPTS

2) I changed the caching settings for Djatoka, in this file:

/usr/local/fedora/tomcat/webapps/adore-djatoka/WEB-INF/classes/djatoka.prop
erties

I am using these values:

OpenURLJP2KService.cacheSize=200000
OpenURLJP2KService.cacheImageMaxPixels=600000

Increasing the cacheSize helped with the tiling image viewer's
performance. As did increasing the cacheImageMaxPixels (up from 10000) to
accommodate caching thumbnails (which clock in at 200 x 200 pixels = 40000
pixels).

Sharing this info to the list in case it helps anyone else.

[1] http://code.google.com/p/psi-probe/

--

"I think I like disruptive technology because
it makes the whole world a bit fuzzy, my
normal state of mind."
-- Robert Llewellyn (aka Kryten)

Phil Redmon

unread,
Mar 28, 2012, 4:30:23 PM3/28/12
to isla...@googlegroups.com
In addition, we installed the devel module from drupal to see if it
wasn't in fact something in the drupal install that was bogging the
load times down. Devel showed that the pages were loading quickly but
the total page load completion time was much larger, leading us to
conclude that the problem wasn't within Drupal but somewhere else.

This in addition to the other tweaks mentioned above make the site
almost snappy. Thanks for the help!

On Wed, Mar 28, 2012 at 2:35 PM, Pottinger, Hardy J.

Priscilla Caplan

unread,
Mar 28, 2012, 4:37:52 PM3/28/12
to isla...@googlegroups.com
Thank you both for sharing this information.

Priscilla Caplan
FCLA

Richard Wincewicz

unread,
Mar 28, 2012, 4:42:09 PM3/28/12
to isla...@googlegroups.com
If you want to delve deeper into the code and see what is causing
issues you can use something like Webgrind
(https://github.com/jokkedk/webgrind) which is uses xdebug to profile
PHP code. This may be overkill in this case but is useful to pinpoint
whether the issue is a database call, fedora or something else.

Richard

Cameron Kerr

unread,
Oct 24, 2014, 4:58:49 AM10/24/14
to isla...@googlegroups.com
(For the benefit of the long tail...)

Note: I'm not using any virtual appliance, but I am in the process of tuning a similar install for a client.

I found it most useful to deploy a caching layer above the web-server. I used Nginx as a reverse-proxy to Apache (you could also use Varnish is similar), and had Nginx selectively cache on disk just the thumbnail images (which are public data). This make a huge improvement to site usability (especially searches, where thumbnails were also shown). I also had Nginx set some sane Cache-Control headers for the thumbnails, hoping to stave off even freshness queries. You could pass off more such things to cache... but I was putting this in at a later stage, and wanted minimal change.

Here's my Nginx config (notice that I moved Apache to ports 81 and 444)

proxy_cache_path /var/cache/nginx/proxy_cache keys_zone=one:10m;
proxy_temp_path  /var/cache/nginx/proxy_temp;

server {
    listen       80;
    server_name  example.com;
    proxy_cache_key $scheme$host$uri$is_args$args;

    location / {
        proxy_cache off;
        proxy_set_header Host       $host;
        proxy_pass_request_headers on;
        proxy_pass_request_body on;
        proxy_http_version 1.1;
        proxy_set_header Connection "";
        proxy_pass   http://127.0.0.1:81$request_uri;
    }

    location ~ ^/islandora/object/.*/datastream/TN/view$ {
        proxy_cache  one;
        proxy_cache_valid 200 10m;
        proxy_cache_valid 404 1m;
        add_header Cache-Control "public, max-age=36000, s-maxage=36000";
        proxy_http_version 1.1;
        proxy_set_header Connection "";

        proxy_set_header Host       $host;
        proxy_pass_request_headers on;
        proxy_pass_request_body on;
        proxy_pass   http://127.0.0.1:81$request_uri;
    }
}

... similarly for server 443 ssl


Next I'll be looking at tuning OpenURLJP2KService.cacheSize etc. for Djatoka.

Reply all
Reply to author
Forward
0 new messages