GC memory leak with java.lang.ref.Finalizer / BaseAnnouncementService

6,708 views
Skip to first unread message

Austin

unread,
Feb 7, 2018, 3:15:14 PM2/7/18
to sakai-dev
Hello Sakai Devs,

Yesterday we experienced an issue where GC was looping over and over causing some performance issues on one of our servers.

We're running:

Sakai 11.4
Tomcat 8.0.47
Java 1.8.0_151
MySQL 5.6
RHEL 7

with these Java opts (these aren't all of them, but the ones that are mem. related):

-Xmx10280m 
-Xms10280m 
-XX:MaxMetaspaceSize=1024m 
-Xmn2500m 
-XX:+UseConcMarkSweepGC 
-XX:+UseParNewGC 
-XX:CMSInitiatingOccupancyFraction=80 
-XX:+DisableExplicitGC 
-XX:+DoEscapeAnalysis 
-verbose:gc 
-XX:+PrintGCDetails 
-XX:+PrintGCTimeStamps 

The error in the logs that keep looping over and over are:

47863.032: [CMS-concurrent-preclean: 5.037/5.037 secs] [Times: user=5.35 sys=0.04, real=5.04 secs] 
47863.032: [CMS-concurrent-abortable-preclean-start]
47863.032: [CMS-concurrent-abortable-preclean: 0.000/0.000 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 
47863.033: [GC (CMS Final Remark) [YG occupancy: 1542954 K (2304000 K)]47863.033: [Rescan (parallel) , 0.3890201 secs]47863.423: [weak refs processing, 0.0000592 secs]47863.423: [class unloading, 0.2006707 secs]47863.623: [scrub symbol table, 0.0485713 secs]47863.672: [scrub string table, 0.0086044 secs][1 CMS-remark: 7966719K(7966720K)] 9509674K(10270720K), 0.6472521 secs] [Times: user=2.06 sys=0.02, real=0.65 secs] 
47863.682: [CMS-concurrent-sweep-start]
47865.226: [CMS-concurrent-sweep: 1.544/1.544 secs] [Times: user=1.58 sys=0.01, real=1.54 secs] 
47865.226: [CMS-concurrent-reset-start]
47865.241: [CMS-concurrent-reset: 0.015/0.015 secs] [Times: user=0.02 sys=0.00, real=0.02 secs] 
06-Feb-2018 16:40:56.125 DEBUG [ajp-bio-8009-exec-309] edu.gatech.sakai.user.impl.GaTechUserDirectoryServiceImpl.authorizeUserLogin authorizeUserLogin; user.eid; gasconm; user.email: gas...@hawaii.edu
47867.243: [GC (CMS Initial Mark) [1 CMS-initial-mark: 7966695K(7966720K)] 9965851K(10270720K), 0.3508320 secs] [Times: user=0.93 sys=0.00, real=0.35 secs] 
47867.594: [CMS-concurrent-mark-start]
47871.695: [CMS-concurrent-mark: 4.099/4.101 secs] [Times: user=8.48 sys=0.03, real=4.10 secs] 
47871.695: [CMS-concurrent-preclean-start]
47871.739: [CMS-concurrent-preclean: 0.041/0.044 secs] [Times: user=0.05 sys=0.00, real=0.05 secs] 
47871.739: [CMS-concurrent-abortable-preclean-start]
47871.739: [CMS-concurrent-abortable-preclean: 0.000/0.000 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 
47871.740: [GC (CMS Final Remark) [YG occupancy: 2083498 K (2304000 K)]47871.741: [Rescan (parallel) , 0.5782041 secs]47872.319: [weak refs processing, 0.0000354 secs]47872.319: [class unloading, 0.1000239 secs]47872.419: [scrub symbol table, 0.0407029 secs]47872.460: [scrub string table, 0.0067250 secs][1 CMS-remark: 7966695K(7966720K)] 10050194K(10270720K), 0.7260444 secs] [Times: user=2.95 sys=0.01, real=0.72 secs] 

While the problem was happening I took a heap dump with jmap and upon analyzing it with Eclipse Memory Analyzer, it said that there was a leak with

java.lang.ref.Finalizer

and it was taking up 61% of the heap.  Digging a little deeper, the first 10 or 20 of those objects (out of 8million?) were coming from

BaseAnnouncementService

and of those 10 or 20 they all seemed to coming from the same site, which appeared to have an announcement that included an <img> tag(s) with base64 image data.  I'm not sure if the content of the announcement had anything to do with it or if the instructor did something specific and the content is irrelevant. 

Has anyone seen anything like this?  Is it a memory leak in the Announcement tool?  Or do we need to tune our Memory settings to improve the GC?

Thanks,

Austin
memory_leak1.png
memory_leak2.png
memory_leak3.png

Austin

unread,
Feb 7, 2018, 6:52:03 PM2/7/18
to sakai-dev
looking further into the memory dump, it wasn't just the one site that was loading up it's announcements.  It seemed like announcements from random sites were included, even very old announcements.  Is there some process, user action, quartz job, etc. that would load up tons of announcements from different sites? Or maybe over the course of the day, could announcements get loaded into memory, but not removed, until some breaking point that causes the GC issue?

Sam Ottenhoff

unread,
Feb 7, 2018, 7:09:12 PM2/7/18
to Austin, sakai-dev
Hi Austin,

Eclipse MAT is showing you the retained heap locations at your snapshot time, but it's not accurate to say there is a leak in java.lang.ref.Finalizer. There is a lot of good info on Finalizer on the web. Here is one good resource:


Here are two important paragraphs from that doc:

Any instances of classes that implement the finalize() method are often called finalizable objects. They will not be immediately reclaimed by the Java garbage collector when they are no longer referenced. Instead, the Java garbage collector appends the objects to a special queue for the finalization process. Usually it's performed by a special thread called a "Reference Handler" on some Java Virtual Machines. During this finalization process, the "Finalizer" thread will execute each finalize() method of the objects. Only after successful completion of the finalize() method will an object be handed over for Java garbage collection to get its space reclaimed by "future" garbage collection. I did not say "current," which means at least two garbage collection cycles are required to reclaim the finalizable object. Sounds like it has some overhead? You got it. We need several shots to get the space recycled.

Finalizer threads are not given maximum priorities on systems. If a "Finalizer" thread cannot keep up with the rate at which higher priority threads cause finalizable objects to be queued, the finalizer queue will keep growing and cause the Java heap to fill up. Eventually the Java heap will get exhausted and a java.lang.OutOfMemoryError will be thrown.



--
You received this message because you are subscribed to the Google Groups "Sakai Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sakai-dev+unsubscribe@apereo.org.
To post to this group, send email to saka...@apereo.org.
Visit this group at https://groups.google.com/a/apereo.org/group/sakai-dev/.

Hendrik Steller

unread,
Feb 7, 2018, 7:49:31 PM2/7/18
to saka...@apereo.org, aust...@hawaii.edu
---------- Forwarded Message ----------
Subject: Re: [sakai-dev] GC memory leak with java.lang.ref.Finalizer
BaseAnnouncementService
Date: Donnerstag, 8. Februar 2018, 00:51:06 CET

On Mittwoch, 7. Februar 2018 21:15:09 CET Austin wrote:
> Has anyone seen anything like this? Is it a memory leak in the
> Announcement tool? Or do we need to tune our Memory settings to improve
> the GC?

java.lang.Object has a method called finalize().
If a class overrides this (AFAIK empty by default) method, it makes garbage
collection more tricky. Those objects can't be destroyed immediately when the
JVM thinks that they're ready to be garbage collected, because their
finalize()-method has to be called first.
To that end, they're stuffed into a queue and a rather low-priority
"finalizer"-thread walks by and pops elements off that queue and calls their
finalize-method.

Potential Problem 1:
Objects can be queued faster than the low-priority thread removes them.
Potential Problem 2:
People can do stupid sh*t inside their finalize() method which might delay or
block the finalizer thread; think deadlocks, infinite loops, hanging
connections [1], ..

So there seem to be a couple of basic rules regarding overriding "finalize":
1. Don't do it
2. If you do it anyway, the implementation should return quickly and naturally
not hang, so it's probably a good idea to avoid expensive stuff, stuff that
needs locking or stuff that might have to wait for network connections or
other system.

.so I did a quick grep over the Sakai 11.4 source for "void finalize" [2]
and found that Sakai seems to do all of that:
Several classes override finalize and some seem to do stuff which looks rather
dangerous for doing it in a finalize.
For example, I still had the AssignmentService open in eclipse and tried
following the calls in the finalize-implementations of one of the inner edit-
classes. This took me to
org.sakaiproject.util.BaseDbSingleStorage#cancelResource(Edit)

Without trying to really understand it or thinking it through, there seem to
be at least three things happening which would make me hesitate to call them
from an overridden finalize():
* using synchronization (accessing a hashtable in this case)
* making database calls, likely across a network in many cases
* doing something with locks; I didn't actually look if those edit locks being
released there are just some flags sakai sets in one of its own tables or if
there are "actual" db locks involved (like explicitly doing "lock table"s etc)


In case the problem occurs again, it might be worth a try to deploy a button
which calls System.runFinalization() (or just copy a "secret" JSP with a
scriplet into an existing webapp or the ROOT context and call that JSP) .
Because should the JVM start a separate finalizer thread for that call and
should the problem be that the original finalizer thread is blocked by a
single object e.g. waiting for a hanging connection and not some general
problem, then a potential second finalizer thread might be able to empty the
queue.

Regards,
Hendrik

[1]
"Fun" possibly relataed fact which I learned the hard way once:
by default, java.net.URLConnection uses an "infinite" read timeout, i.e. if
you don't take special care, your program can end up waiting forever for a
ressource which isn't available anymore.

[2]
If there's more than one match in one class file, it's likely because there
are inner classes overriding finalize() - for example, in case of the
AssignmentService, there seem to be several inner "AssignmentEdit"-classes

$> grep -r "void\s*finalize\s*(" | grep .java | grep -iv test
kernel/kernel-impl/src/main/java/org/sakaiproject/user/impl/
BasePreferencesService.java:
protected void finalize()
kernel/kernel-impl/src/main/java/org/sakaiproject/user/impl/
BaseUserDirectoryService.java: protected void finalize()
kernel/kernel-impl/src/main/java/org/sakaiproject/alias/impl/
BaseAliasService.java:
protected void finalize()
kernel/kernel-impl/src/main/java/org/sakaiproject/db/impl/
BasicSqlService.java:
protected void finalize() {
kernel/kernel-impl/src/main/java/org/sakaiproject/event/impl/
BaseNotificationService.java:
protected void finalize()
kernel/kernel-impl/src/main/java/org/sakaiproject/content/impl/
BaseContentService.java:
protected void finalize()
kernel/kernel-impl/src/main/java/org/sakaiproject/content/impl/
BaseContentService.java:
protected void finalize()
kernel/kernel-impl/src/main/java/org/sakaiproject/email/impl/
BaseDigestService.java:
protected void finalize()
message/message-util/util/src/java/org/sakaiproject/message/util/
BaseMessage.java:
protected void finalize()
message/message-util/util/src/java/org/sakaiproject/message/util/
BaseMessage.java:
protected void finalize()
assignment/assignment-impl/impl/src/java/org/sakaiproject/assignment/impl/
BaseAssignmentService.java: protected void finalize()
assignment/assignment-impl/impl/src/java/org/sakaiproject/assignment/impl/
BaseAssignmentService.java: protected void finalize()
assignment/assignment-impl/impl/src/java/org/sakaiproject/assignment/impl/
BaseAssignmentService.java: protected void finalize()
chat/chat-tool/tool/src/java/org/sakaiproject/chat2/tool/
PresenceObserverHelper.java:
protected void finalize()
providers/jldap/src/java/edu/amc/sakai/user/PooledLDAPConnection.java:
protected void
finalize() throws LDAPException {
common/archive-api/src/main/java/org/sakaiproject/importer/api/
ResetOnCloseInputStream.java: public void finalize() throws IOException{




Austin

unread,
Feb 7, 2018, 8:04:19 PM2/7/18
to Hendrik Steller, sakai-dev
Thanks Sam and Hendrik,

By coincidence (or stroke of bad luck), the same problem happened again today.  Actually, I believe we see these quite often, but usually it occurs at night so our users aren't impacted as much.  After taking a heap dump, Eclipse MAT showed the same issue with Finalizer and most of the first 20 or so objects listed had to do with Announcements again... and they looked like the exact same announcements as yesterday.  I'm sure the list of objects referenced with Finalizer is quite long, I can't say for certain that most of them have to do with announcement, but it sure is suspicious that the first bunch are announcements.

Creating a "secret JSP" that runs System.runFinalization() sounds interesting, but I might need a little help getting started with that.

However, while clearing out the finalizer with a secret jsp, would be a great feature, I'd prefer to find the root cause.  l would like to know how our system is getting into that state.  Is a user doing something to suddenly trigger it?  Or is it an accumulation over time?  I suspect it's user triggered because usually it only happens on one server at a time (but not always), it doesn't happen every day, and we restart tomcat every morning, so anything accumulated would have been cleared in the morning.

Sam Ottenhoff

unread,
Feb 7, 2018, 8:09:22 PM2/7/18
to Austin, Hendrik Steller, sakai-dev
How many CPUs do you have allocated to this machine? A 10GB heap is large and will need lots of processing to cleanup. Why is CMSInitiatingOccupancyFraction set?

Austin

unread,
Feb 7, 2018, 8:17:32 PM2/7/18
to Sam Ottenhoff, Hendrik Steller, sakai-dev
Hello Sam,

Each of our 6 user facing servers has:

6 vCPUs / 16G RAM

we copied CMSInitiatingOccupancyFraction the setting from:  https://confluence.sakaiproject.org/display/PROD/Rutgers+performance+tuning+experience

Ashley Willis

unread,
Feb 7, 2018, 8:24:48 PM2/7/18
to Sakai Development
I think you might be on to something with the Base 64 image stuff. We have been having similar issues the last 2 weeks. We had a sever run out of heap this afternoon. GC seemed to be in a loop with near constant major collections that were getting longer and longer. It looked to us like a problem with the Blog tool. Looking at the heap, roughly 50% of the heap appears to be Strings that are copy/paste probably from Word (they were full of styles) into the Blog tool and they contain img tags <img alt="" src=".....

The reason this sounds really suspicious to me is that we saw something similar with a tool a year or 2 ago, I think it was Melete. The tool had a method to sanitize Strings by removing tags etc. One professor was using the tool for class photos. I don't remember all the details but it was essentially trying to process the images as Strings and running out of heap. I wonder if something similar is going on here.

We are on
Sakai 11.2
java 1.8.0_144
Tomcat 8.0.32
I don't have the GC settings handy but I can get them if needed.

Ashley

Hendrik Steller

unread,
Feb 7, 2018, 9:02:16 PM2/7/18
to Austin, sakai-dev
On Donnerstag, 8. Februar 2018 02:04:16 CET Austin wrote:

> Creating a "secret JSP" that runs System.runFinalization() sounds
> interesting, but I might need a little help getting started with that.

1. Copy the code below into a file with a top secret name of your choice
ending in ".jsp" (e.g. "thats_totally_secure.jsp")

2. Copy that file into the root folder of a webapp of your choice (e.g.
tomcat/webapps/foobar)

Now you (well, and everyone else..) can call the JSP directly by opening the
URL http://server/foobar/thats_totally_secure.jsp
Or, if you copy the file into tomcat/webapps/ROOT, you can call it via
http://server/thats_totally_secure.jsp

(3.)
Delete the file again / move it out of the webapp's folder once you don't need
it anymore.

Hendrik

Code (also attached):


<%@page import="java.util.Date"%>
<%@ page language="java" contentType="text/html; charset=UTF-8"
pageEncoding="UTF-8"%>
<%!
private static Date lastFinalize;
private static Date lastGC;
%>
<%
if ("post".equalsIgnoreCase(request.getMethod())) {
final String action = request.getParameter("whatdo");
if ("finalize".equals(action)) {
lastFinalize = new Date();
System.runFinalization();
}
else if ("gc".equals(action)) {
lastGC = new Date();
System.gc();
}
}
%>
<html>
<head><title>Top secret JSP</title></head>
<body>
<div>
<form action="" method="POST">
<span>Last manual finalize run: <%=lastFinalize != null ?
lastFinalize : "never"%></span> <input type="submit"
value="Run finalize"> <input type="hidden" name="whatdo"
value="finalize">
</form>
</div>
<div>
<form action="" method="POST">
<span>Last manual garbage collection: <%=lastGC != null ? lastGC :
"never"%></span> <input type="submit" value="Force GC">
<input type="hidden" name="whatdo" value="gc">
</form>
</div>
</body>
</html>
finalize.jsp

Austin

unread,
Feb 7, 2018, 9:15:45 PM2/7/18
to Hendrik Steller, sakai-dev
Thanks Hendrik!

Austin

unread,
Feb 8, 2018, 1:20:14 PM2/8/18
to Hendrik Steller, sakai-dev
Does anyone know which comes first (or how I can find out) ... 

if the heap fills up with Finalizer objects, then the [CMS-concurrent-reset-start] is unable to process them causing it to keep looping?

Or...

does the [CMS-concurrent-reset-start] fail first (for some other reason), making the heap fill up with Finalizer objects?

Matthew Buckett

unread,
Feb 8, 2018, 2:05:32 PM2/8/18
to Austin, Hendrik Steller, sakai-dev
Are you able to look at a thread dump and see the state of the finalizer thread(s), it sounds like they may have become stuck and although the the announcement objects are in the queue are they the ones currently being processed?

As Hendrik mentioned Sakai does lots of scary stuff in finalizers (like making database calls) so this could well be where it's got stuck.

On 8 February 2018 at 18:20, Austin <aust...@hawaii.edu> wrote:
Does anyone know which comes first (or how I can find out) ... 

if the heap fills up with Finalizer objects, then the [CMS-concurrent-reset-start] is unable to process them causing it to keep looping?

Or...

does the [CMS-concurrent-reset-start] fail first (for some other reason), making the heap fill up with Finalizer objects?
On Wed, Feb 7, 2018 at 4:15 PM, Austin <aust...@hawaii.edu> wrote:

--
You received this message because you are subscribed to the Google Groups "Sakai Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sakai-dev+unsubscribe@apereo.org.
To post to this group, send email to saka...@apereo.org.
Visit this group at https://groups.google.com/a/apereo.org/group/sakai-dev/.



--
  Matthew Buckett, Senior VLE Developer

  Systems and Applications, Technology Enhanced Learning
  IT Services, University of Oxford
  13 Banbury Road, OX2 6NN
  Tel: 01865 283349

Austin

unread,
Feb 8, 2018, 2:36:56 PM2/8/18
to Matthew Buckett, Hendrik Steller, sakai-dev
Hello Matthew,

Sorry, I'm new to MAT.  I'm not sure if I'm looking in the right place...

The Finalizer thread stats show:

Type|Name       |Value
-----------------------------------------------------------
ref |lock       |java.lang.Object @ 0x5e0e9aad0
ref |unfinalized|java.lang.ref.Finalizer @ 0x55ab72458
ref |queue      |java.lang.ref.ReferenceQueue @ 0x5d9c07f90
-----------------------------------------------------------

Also the thread details says that the thread name is NULL.  I'm not sure if that's expected?

Thanks,

Austin

thread detail.png
thread stats.png

Matthew Jones

unread,
Feb 8, 2018, 3:08:53 PM2/8/18
to Austin, Matthew Buckett, Hendrik Steller, sakai-dev
Can you expand the window there with the classnames? What's the inner class there in org.sakaiproject.announcement.impl.BaseAnnouncementService$Base  ???

To unsubscribe from this group and stop receiving emails from it, send an email to sakai-dev+...@apereo.org.

To post to this group, send email to saka...@apereo.org.
Visit this group at https://groups.google.com/a/apereo.org/group/sakai-dev/.



--
  Matthew Buckett, Senior VLE Developer

  Systems and Applications, Technology Enhanced Learning
  IT Services, University of Oxford
  13 Banbury Road, OX2 6NN
  Tel: 01865 283349

--
You received this message because you are subscribed to the Google Groups "Sakai Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sakai-dev+...@apereo.org.

Austin

unread,
Feb 8, 2018, 3:21:43 PM2/8/18
to Matthew Jones, Matthew Buckett, Hendrik Steller, sakai-dev
The inner classes are:  java.lang.String, BaseAnnouncementMessageHeaderEdit, BaseResourcePropertiesEdit

The 2nd class in Finalizer has something to do with TikainputStream?  I saw a few instances of that, but the majority the objects had to do with Announcements.

Thanks for helping us look into this!

- Austin

To unsubscribe from this group and stop receiving emails from it, send an email to sakai-dev+unsubscribe@apereo.org.

To post to this group, send email to saka...@apereo.org.
Visit this group at https://groups.google.com/a/apereo.org/group/sakai-dev/.



--
  Matthew Buckett, Senior VLE Developer

  Systems and Applications, Technology Enhanced Learning
  IT Services, University of Oxford
  13 Banbury Road, OX2 6NN
  Tel: 01865 283349

--
You received this message because you are subscribed to the Google Groups "Sakai Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sakai-dev+unsubscribe@apereo.org.
BaseAnnouncementMessageHeaderEdit.png

Austin

unread,
Feb 8, 2018, 3:27:24 PM2/8/18
to Matthew Jones, Matthew Buckett, Hendrik Steller, sakai-dev
also the message_date on that announcement object is quite old, 2014-04-14 11:18:34.  I also asked the user if he was doing anything with their site / announcements, but he did not.  So it's pretty strange that such an old announcement is getting loaded up and an 'edit' object is created for it.

Austin

unread,
Feb 9, 2018, 2:28:07 PM2/9/18
to Matthew Jones, Matthew Buckett, Hendrik Steller, sakai-dev
Another thing I was wondering is if Announcements get loaded into memory whenever a user is added into memory?  The reason I ask is because, we have a site that has 30,000 users, which includes a ton of "old" users.  (which I've also been suspicious of in the past causing problems)  And since some of the Announcements I inspected were also 'old'... this is kind of a leap, but is there any chance that even though these old users are not actually logging in, but they are somehow getting loaded into memory (suspect that if one person logs in and views that 30K site, every user from that site will also be loaded into mem.), are announcements they would have been seeing also getting loaded into memory?

Hendrik Steller

unread,
Feb 9, 2018, 5:36:13 PM2/9/18
to sakai-dev

On Freitag, 9. Februar 2018 20:28:04 CET Austin wrote:

> Another thing I was wondering is if Announcements get loaded into memory

> whenever a user is added into memory? The reason I ask is because, we have

> a site that has 30,000 users, which includes a ton of "old" users. (which

> I've also been suspicious of in the past causing problems) And since some

> of the Announcements I inspected were also 'old'... this is kind of a leap,

> but is there any chance that even though these old users are not actually

> logging in, but they are somehow getting loaded into memory (suspect that

> if one person logs in and views that 30K site, every user from that site

> will also be loaded into mem.), are announcements they would have been

> seeing also getting loaded into memory?

 

I don't think so; sounds rather unlikely.

Also, I just had a look by enabling the mysql general query log for my local development sakai instance and then:

started the server,

logged in with an admin account,

clicked a bit around the announcements tool in my Home site,

went to a course site,

looked at the announcements,

edited one until "preview,

canceled the preview,

logged out of sakai without canceling the edit and

shut down the server again.

 

Only a few queries were made which included "%announcement%" and nothing of that looked like what you were suspecting (see queries below) ; then again, your setup, tools, customizations etc are going to be different.

 

I also didn't find what I was suspecting (or not really suspecting because I hadn't found any evidence of it happening in the code) and actually looking for:

In your screenshots, those were "edit"-objects which were waiting for finalization. That seems like there's more happening than e.g. a user or a search engine's crawler opening and reading an old announcement.

I found ~160 four abandoned records which never got cleaned up in my sakai_locks table with the oldest ones from February 2014.

 

That made me wonder if something could be loading all entries from that locks table and create "ancient" edit objects for those records that way, e.g. for prefilling a local cache or something cleanup-related.

But according to the query log, this doesn't happen; at least not on my one-tomcat-no-distributed-caching notebook without any quartz jobs.

 

 

If I were really interested in where those old announcement edits are originating from, I would go into the announcement service where those edit objects for announcements are being created and then add something like

 

if (switchToEnableThis && "thatOldAnnouncementsId".equals(announcementId)) {

dumpRequestAndSessionInfo();

throwAndCatchSomeDummyExceptionAndLogItsStacktrace();

notifyMe();

}

 

Regards

Hendrik

 

 

-------

Query log:

 

MariaDB [sakai11]> select argument from mysql.general_log where argument like '%announcement%' and argument not like '%mysql.general_log%' and command_type!= 'Prepare';

| CREATE TABLE ANNOUNCEMENT_CHANNEL ( CHANNEL_ID VARCHAR (255) NOT NULL, NEXT_ID INT, XML LONGTEXT )                                                                                                                                                                                                                                                             |
| select XML from ANNOUNCEMENT_CHANNEL where ( CHANNEL_ID = '/announcement/channel/!site/motd' )                                                                                                                                                                       |
| select XML from ANNOUNCEMENT_MESSAGE where (CHANNEL_ID = '/announcement/channel/!site/motd' and DRAFT = '0') order by MESSAGE_DATE desc                                                                                                                                  |
| select XML from ANNOUNCEMENT_CHANNEL where ( CHANNEL_ID = '/announcement/channel/~hsteller/main' )                                                                                                                                                                       |
| select XML from ANNOUNCEMENT_MESSAGE  where (CHANNEL_ID = '/announcement/channel/~hsteller/main' )order by MESSAGE_DATE asc                                                                                                                                              |
| select XML from ANNOUNCEMENT_CHANNEL where ( CHANNEL_ID = '/announcement/channel/30f49e34-5224-49aa-9099-9f11fa490654/main' )                                                                                                                                                                                           
| select XML from ANNOUNCEMENT_MESSAGE  where (CHANNEL_ID = '/announcement/channel/30f49e34-5224-49aa-9099-9f11fa490654/main' )order by MESSAGE_DATE asc                                     
| select SAKAI_ALIAS.ALIAS_ID,SAKAI_ALIAS.TARGET,SAKAI_ALIAS.CREATEDBY,SAKAI_ALIAS.MODIFIEDBY,SAKAI_ALIAS.CREATEDON,SAKAI_ALIAS.MODIFIEDON from SAKAI_ALIAS where TARGET = '/announcement/ann
ouncement/30f49e34-5224-49aa-9099-9f11fa490654' order by SAKAI_ALIAS.ALIAS_ID                                                                                                                                                                                                                                                                                     |
| select SAKAI_ALIAS.ALIAS_ID,SAKAI_ALIAS.TARGET,SAKAI_ALIAS.CREATEDBY,SAKAI_ALIAS.MODIFIEDBY,SAKAI_ALIAS.CREATEDON,SAKAI_ALIAS.MODIFIEDON from SAKAI_ALIAS where TARGET = '/announcement/ann
ouncement/30f49e34-5224-49aa-9099-9f11fa490654' order by SAKAI_ALIAS.ALIAS_ID                                                                                                                                                                                                                                                                                      |
| select XML from ANNOUNCEMENT_CHANNEL where ( CHANNEL_ID = '/announcement/channel/30f49e34-5224-49aa-9099-9f11fa490654/main' )                                                              
| select XML from ANNOUNCEMENT_CHANNEL where ( CHANNEL_ID = '/announcement/channel/30f49e34-5224-49aa-9099-9f11fa490654/main' )                                                                                                                                                                       |
| select XML from ANNOUNCEMENT_MESSAGE  where (CHANNEL_ID = '/announcement/channel/30f49e34-5224-49aa-9099-9f11fa490654/main' )order by MESSAGE_DATE asc                                                                                                                                                                       |
| select XML from ANNOUNCEMENT_CHANNEL where ( CHANNEL_ID = '/announcement/channel/30f49e34-5224-49aa-9099-9f11fa490654/main' )                                                                                                                                                                       |
| select XML from ANNOUNCEMENT_MESSAGE  where (CHANNEL_ID = '/announcement/channel/30f49e34-5224-49aa-9099-9f11fa490654/main' )order by MESSAGE_DATE asc                                                                                                                                                                       |
| select XML from ANNOUNCEMENT_CHANNEL where ( CHANNEL_ID = '/announcement/channel/30f49e34-5224-49aa-9099-9f11fa490654/main' )                                                                                                                                                                       |
| select XML from ANNOUNCEMENT_MESSAGE where (CHANNEL_ID = '/announcement/channel/30f49e34-5224-49aa-9099-9f11fa490654/main' ) and ( MESSAGE_ID = 'f419dcea-72ef-4e8b-8f63-3a98b969628a' )   
                                                                                                                                                                      |
| select XML from ANNOUNCEMENT_MESSAGE where (CHANNEL_ID = '/announcement/channel/30f49e34-5224-49aa-9099-9f11fa490654/main' ) and ( MESSAGE_ID = 'f419dcea-72ef-4e8b-8f63-3a98b969628a' )   
                                                                                                                                                                      |
| insert into SAKAI_LOCKS (TABLE_NAME,RECORD_ID,LOCK_TIME,USAGE_SESSION_ID) values ('ANNOUNCEMENT_MESSAGE', '1275768022 - /announcement/channel/30f49e34-5224-49aa-9099-9f11fa490654/main/f41
9dcea-72ef-4e8b-8f63-3a98b969628a', TIMESTAMP'2018-02-09 21:29:56.459000', 'd78979c3-f937-48a9-8edc-bcad916955f6')                                                                                                                                                                       |
| select XML from ANNOUNCEMENT_CHANNEL where ( CHANNEL_ID = '/announcement/channel/30f49e34-5224-49aa-9099-9f11fa490654/main' )                                                                                                                                                                       |
| select count(1) from SAKAI_REALM_RL_FN,SAKAI_REALM force index (AK_SAKAI_REALM_ID) where SAKAI_REALM_RL_FN.REALM_KEY = SAKAI_REALM.REALM_KEY and  SAKAI_REALM.REALM_ID IN ('/announcement/c
hannel/30f49e34-5224-49aa-9099-9f11fa490654/main','/site/30f49e34-5224-49aa-9099-9f11fa490654','!site.helper') and FUNCTION_KEY in (select FUNCTION_KEY from SAKAI_REALM_FUNCTION where FUNCT
ION_NAME = 'annc.read')  and (ROLE_KEY in (select ROLE_KEY from SAKAI_REALM_RL_GR where ACTIVE = '1' and USER_ID = NULL  and REALM_KEY in (select REALM_KEY from SAKAI_REALM where  SAKAI_REA
LM.REALM_ID IN ('/announcement/channel/30f49e34-5224-49aa-9099-9f11fa490654/main','/site/30f49e34-5224-49aa-9099-9f11fa490654','!site.helper')))  or ROLE_KEY in (1) ) |


| select XML from ANNOUNCEMENT_CHANNEL where ( CHANNEL_ID = '/announcement/channel/30f49e34-5224-49aa-9099-9f11fa490654/main' )                                                                                                                                                                       |
| select XML from ANNOUNCEMENT_CHANNEL where ( CHANNEL_ID = '/announcement/channel/30f49e34-5224-49aa-9099-9f11fa490654/main' )                                                                                                                                                                       |
| delete from SAKAI_LOCKS where TABLE_NAME = 'ANNOUNCEMENT_MESSAGE' and RECORD_ID = '1275768022 - /announcement/channel/30f49e34-5224-49aa-9099-9f11fa490654/main/f419dcea-72ef-4e8b-8f63-3a9
8b969628a'

Matthew Jones

unread,
Feb 9, 2018, 6:41:31 PM2/9/18
to Austin, Matthew Buckett, Hendrik Steller, sakai-dev
It's possible whenever a user logs in via the Announcement Synoptic Widget. By default it only shows for the last 10 days but since the data is stored in XML I could imagine that it actually has to read through the database records and make objects just to display the tool for every case. I wouldn't know without research.

You could try to remove that tool registration from the system. (Removing the xml from webapps/sakai-announcement-tool/WEB-INF/tools/sakai.synoptic.announcement.xml) and restarting should make it not load up anymore.

To unsubscribe from this group and stop receiving emails from it, send an email to sakai-dev+...@apereo.org.

To post to this group, send email to saka...@apereo.org.
Visit this group at https://groups.google.com/a/apereo.org/group/sakai-dev/.



--
  Matthew Buckett, Senior VLE Developer

  Systems and Applications, Technology Enhanced Learning
  IT Services, University of Oxford
  13 Banbury Road, OX2 6NN
  Tel: 01865 283349

--
You received this message because you are subscribed to the Google Groups "Sakai Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sakai-dev+...@apereo.org.

Austin

unread,
Feb 13, 2018, 2:56:35 PM2/13/18
to Matthew Jones, Matthew Buckett, Hendrik Steller, sakai-dev
Ashley Willis mentioned, "I think you might be on to something with the Base 64 image stuff"

I checked some old heap dumps from last semester (when we were on Sakai 10.7) when we experienced similar [CMS-concurrent...] looping.  And while I did not see the BaseAnnoncementMessageEdit leaks, I was seeing tons of instances of 

org.apache.jasper.runtime.BodyContentImpl
org.apache.jasper.runtime.PageContextImpl
org.apache.jasper.runtime.BodyContentImpl[]

and inspecting them, the content seemed to have a embedded e.g. "...ppt/slides/_rels/slide28.xml.rels...", or other encoded data.  And in particular, they had to do with the Forums tool, perhaps large ppt attachments of over 100MB?  Anyway, I didn't see this Forums error in my recent heap dumps from Sakai 11.4... maybe whatever forums issue that caused it in 10.7 is now fixed in Sakai 11.4 or perhaps fixed after we upgraded to MySql 5.6, but in any case, I thought it was a weird coincidence that back then as well as now, we're seeing stuck objects that include embedded data: images, ppt, etc...

To unsubscribe from this group and stop receiving emails from it, send an email to sakai-dev+unsubscribe@apereo.org.

To post to this group, send email to saka...@apereo.org.
Visit this group at https://groups.google.com/a/apereo.org/group/sakai-dev/.



--
  Matthew Buckett, Senior VLE Developer

  Systems and Applications, Technology Enhanced Learning
  IT Services, University of Oxford
  13 Banbury Road, OX2 6NN
  Tel: 01865 283349

--
You received this message because you are subscribed to the Google Groups "Sakai Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sakai-dev+unsubscribe@apereo.org.

Austin

unread,
Feb 21, 2018, 8:58:54 PM2/21/18
to Matthew Jones, Matthew Buckett, Hendrik Steller, sakai-dev
Hello All,

Unfortunately, we've hit this problem 4 or 5 more times in the last week.

We haven't tried removing the Announcement synoptic tool yet because I'd think a lot of users find it useful.  We may revisit removing it later if we can't find some other solution.

Another thing I discovered while analyzing heaps is that in addition to Announcements taking up a lot of space in memory, something do to with the BaseCollectionEdit is also taking up a lot of space... millions of objects.  What would cause the content tool / BaseCollectionEdit to load up millions of objects?  We thought maybe the Admin Workspace -> Resources tool, but after removing that tool from Admin Workspace (In case one of our admins was accidentally opening that tool), the problem still happened.  Digging a little further, here is a list of classes within the Finalizer class that have tons of objects:

# objects    class
---------    ------
  245,941    org.sakaiproject.announcement.impl.BaseAnnouncementService$BaseAnnouncementMessageEdit
2,171,252    org.sakaiproject.content.impl.BaseContentService$BaseCollectionEdit
  444,086    org.sakaiproject.content.impl.BaseContentService$BaseResourceEdit
  248,318    java.util.zip.ZipFile\$ZipFileInputStream
  246,652    java.util.zip.Inflater
  248,030    java.util.zip.ZipFile\$ZipFileInflaterInputStream
  242,115    sun.net.www.protocol.jar.URLJarFile
   39,572    com.sun.crypto.provider.PBEKey
  341,342    java.util.jar.JarFile
  103,226    org.sakaiproject.user.impl.BaseUserDirectoryService$BaseUserEdit 
   92,685    java.io.FileInputStream
    5,780    com.novell.ldap.LDAPConnection
   67,453    org.apache.commons.fileupload.disk.DiskFileItem
            .
            .
            .


So we're still wondering if there's something specific a user is doing to cause all those content and announcement objects to fill up memory or if it's just some gradual that happens over the course of the day?  I wouldn't think it's gradual because, I mentioned this before, if that were the case, I'd think we'd see this problem every day on every server.  I seems to happen very randomly.  Also, when I check the dump files' biggest announcement objects, the top few always seem to be the same announcement.  So either someone really is doing something specific or maybe it's just normal user(s) doing normal things and since the same users are online everyday, it makes sense that their "large" announcements float to the top?

Another thing we noticed was that as the GC loop was happening and the CPU load was high, the OS's system memory dropped down to nearly 0 (223MB of 16G).  See attached screenshot.  And actually, the OS's system memory looks like it started to decline about 4 hours before the GC loop started and the CPU load spiked.  Seeing that our Systems Admin, increased the memory on all servers from 16G to 24G.  That didn't stop the GC issue from happening, but it did seem to help with the system memory from dropping too low.

Anyone have any ideas on how we might further debug this problem?  Should we try to tune our heap settings?  Or switch from CMS to G1 collection?

Thanks,

Austin

GC issue.png

Austin

unread,
Feb 27, 2018, 9:47:58 PM2/27/18
to Matthew Jones, Matthew Buckett, Hendrik Steller, sakai-dev
Hello All,

Sorry to keep bugging you all with this problem.  But here's some additional info from our heap dumps.

After comparing the java.lang.ref.Finalizer object between a bad and a normal heap dump (e.g. when the system is running normally), the number objects retained in Finalizer is inflated by the thousands (note these are just a handful of objects, I haven't listed them all).  However, that makes sense because Sam Ottenhoff said, 

"If a "Finalizer" thread cannot keep up with the rate at which higher priority threads cause finalizable objects to be queued, the finalizer queue will keep growing and cause the Java heap to fill up. Eventually the Java heap will get exhausted and a java.lang.OutOfMemoryError will be thrown"

However, we don't have an OutOfMemoryError when this happens, the GC just keeps looping over and over, but you can see below how many more objects are in the Finalizer when the problem happens.

+-----------+----------+----------------------------------------------------------------------------------------+
|  BAD GC   |  GOOD GC | 
| #objects  | #objects |  Class
+-----------+----------+----------------------------------------------------------------------------------------+
|   245,941 |   18,875 |  org.sakaiproject.announcement.impl.BaseAnnouncementService$BaseAnnouncementMessageEdit
| 2,171,252 |    6,289 |  org.sakaiproject.content.impl.BaseContentService$BaseCollectionEdit
|   444,086 |    2,443 |  org.sakaiproject.content.impl.BaseContentService$BaseResourceEdit
|   248,318 |      200 |  java.util.zip.ZipFile\$ZipFileInputStream
|   246,652 |    1,715 |  java.util.zip.Inflater
|   248,030 |      200 |  java.util.zip.ZipFile\$ZipFileInflaterInputStream
|   242,115 |      207 |  sun.net.www.protocol.jar.URLJarFile
|    39,572 |       18 |  com.sun.crypto.provider.PBEKey
|   341,342 |    3,236 |  java.util.jar.JarFile
|   103,226 |   57,231 |  org.sakaiproject.user.impl.BaseUserDirectoryService$BaseUserEdit 
|    92,685 |       49 |  java.io.FileInputStream
|     5,780 |        0 |  com.novell.ldap.LDAPConnection
|    67,453 |       11 |  org.apache.commons.fileupload.disk.DiskFileItem
+-----------+----------+------------------------------------------------------------------------------------------+

The 2 million org.sakaiproject.content.impl.BaseContentService$BaseCollectionEdit objects above seemed suspicious, so looking into that object, there were 400K+ instances each of Content Collection objects that referenced "/" and "/group/".  And normal site references seemed to have 1000s of references more than usual.  (see screenshots).  However, I wouldn't know if these increased number of "/" and "/group/" objects is CAUSING the Finalizer to keep growing, or if it's a SYMPTOM of it growing.

We also tried adding these to our JAVA config, but the problem still happened (I didn't get a chance to take a heap dump when the prob. happened with these settings)

-XX:CMSMaxAbortablePrecleanTime=10000
-XX:+CMSScavengeBeforeRemark

Thanks,

Austin
BaseCollectionEdit-BAD.png
BaseCollectionEdit-NORMAL.png

Hendrik Steller

unread,
Feb 28, 2018, 5:32:55 AM2/28/18
to saka...@apereo.org, Austin

Hi Austin,

 

after glancing over

http://grepcode.com/file/repo1.maven.org/maven2/com.novell.ldap/jldap/4.3/com/novell/ldap/LDAPConnection.java

 

I would take a closer look at those;

 

On Mittwoch, 28. Februar 2018 03:47:54 CET Austin wrote:

> | 5,780 | 0 | com.novell.ldap.LDAPConnection

 

Because:

a)

zero-versus-anything

Doesn't have to mean anything, but being the only one, it it looks a bit suspicious

 

b)

It does something over the network, i.e. there's a chance of something hanging

 

c)

To add to b) by looking at the source linked above:

unless explicitly told otherwise, that class use "0" as the default timeout for the sockets it opens

 

d)

Among other fun things - like potentially making the finalizer-Thread waiting to join() another thread[1] - it also does this in its finalize() method [2]:

while( true ) {

try {

extending_vector_ie_heavily_synchronized.remove(0);

}

catch( ArrayIndexOutOfBoundsException ex) {

break;

}

}

 

 

It might be worth to add some debug message at the begin and end of that class's finalize method and see if/when it doesn't leave.

Or maybe change the default socket timeout from 0ms to something like 5 minutes and see what happens.

 

Hendrik

 

[1]

http://grepcode.com/file/repo1.maven.org/maven2/com.novell.ldap/jldap/4.3/com/novell/ldap/Connection.java#984

[2]

http://grepcode.com/file/repo1.maven.org/maven2/com.novell.ldap/jldap/4.3/com/novell/ldap/Connection.java#929

 

 

Mark Triggs

unread,
Feb 28, 2018, 5:33:54 AM2/28/18
to Austin, sakai-dev

Hi Austin,

Have you been able to get a thread dump out of Tomcat when it has run out of memory? I think that might be the missing piece here (I read back through the archives and didn't see one, but apologies if I missed it).

When an object with a finalize() method is to be garbage collected, a java.lang.ref.Finalizer object is created for it and added to a queue. That queue is processed by a "Finalizer" background thread, which takes care of running the appropriate finalize() method for each of those objects.

All it takes is for one object to have a badly implemented finalize() method for the whole thing to come crashing down. As a perverse example, creating an object with a method like:

 class UhOh {
     protected void finalize() throws Throwable {
         while (true) {}
     }
 }

is going to permanently lock up that Finalizer thread. At that point you're leaking memory and it's only a matter of time until you get an OOM error (which may take a while to manifest if you're running with a large heap).

The important thing here is that the counts of objects waiting to be finalized don't necessarily tell you the root cause—there may just be millions of ready-to-be-collect object stuck behind a single bad object, so you've got to find the needle in the haystack. My above example is a bit contrived, but I've seen this happen with single LDAP connections deadlocking, for instance.

If you can get a thread dump from the JVM, that will let you see what the Finalizer thread is actually doing, and hopefully give a hint as to what's blocking the queue. While the JVM is running, you can do:

 kill -3 <java pid>

and it will log a thread dump to your catalina.out file. Look for an entry called "Finalizer"--something like this:

 "Finalizer" #3 daemon prio=8 os_prio=0 tid=0x00007faac04d9000 nid=0x347a in Object.wait() [0x00007faaadf15000]
    java.lang.Thread.State: WAITING (on object monitor)
         at java.lang.Object.$$YJP$$wait(Native Method)
         at java.lang.Object.wait(Object.java)
         at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:143)
         - locked <0x00000000800466d0> (a java.lang.ref.ReferenceQueue$Lock)
         at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:164)
         at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:209)

Mine's currently doing nothing, which in a healthy system is probably a good thing. If yours looks different, try running multiple thread dumps a few seconds apart and see how they change. That should give more an idea of where that thread is spending its time.

Cheers,

Mark

Austin <aust...@hawaii.edu> writes:

Hello All,

Sorry to keep bugging you all with this problem. But here's some additional info from our heap dumps.

After comparing the java.lang.ref.Finalizer object between a bad and a normal heap dump (e.g. when the system is running normally), the number objects retained in Finalizer is inflated by the thousands (note these are just a handful of objects, I haven't listed them all). However, that makes sense because Sam Ottenhoff said,

*"If a "Finalizer" thread cannot keep up with the rate at which higher priority threads cause finalizable objects to be queued, the finalizer queue will keep growing and cause the Java heap to fill up. Eventually the Java heap will get exhausted and a java.lang.OutOfMemoryError will be thrown"*

However, we don't have an OutOfMemoryError when this happens, the GC just keeps looping over and over, but you can see below how many more objects are in the Finalizer when the problem happens.

 +-----------+----------+------------------------------------
 ----------------------------------------------------+
 |  BAD GC   |  GOOD GC |
 | #objects  | #objects |  Class
 +-----------+----------+------------------------------------
 ----------------------------------------------------+
 |   245,941 |   18,875 |  org.sakaiproject.announcement.impl.BaseAnnouncementService$BaseAnnouncementMessageEdit
 | 2,171,252 |    6,289 |  org.sakaiproject.content.impl.BaseContentService$BaseCollectionEdit
 |   444,086 |    2,443 |  org.sakaiproject.content.impl.BaseContentService$BaseResourceEdit
 |   248,318 |      200 |  java.util.zip.ZipFile\$ZipFileInputStream
 |   246,652 |    1,715 |  java.util.zip.Inflater
 |   248,030 |      200 |  java.util.zip.ZipFile\$ZipFileInflaterInputStream
 |   242,115 |      207 |  sun.net.www.protocol.jar.URLJarFile
 |    39,572 |       18 |  com.sun.crypto.provider.PBEKey
 |   341,342 |    3,236 |  java.util.jar.JarFile
 |   103,226 |   57,231 |  org.sakaiproject.user.impl.BaseUserDirectoryService$BaseUserEdit
 |    92,685 |       49 |  java.io.FileInputStream
 |     5,780 |        0 |  com.novell.ldap.LDAPConnection
 |    67,453 |       11 |  org.apache.commons.fileupload.disk.DiskFileItem
 +-----------+----------+------------------------------------
 ------------------------------------------------------+

The 2 million org.sakaiproject.content.impl.BaseContentService$BaseCollectionEdit objects above seemed suspicious, so looking into *that* object, there were 400K+ instances each of Content Collection objects that referenced "/" and "/group/". And normal site references seemed to have 1000s of references more than usual. (see screenshots). However, I wouldn't know if these increased number of "/" and "/group/" objects is *CAUSING* the Finalizer to keep growing, or if it's a *SYMPTOM* of it growing.

We also tried adding these to our JAVA config, but the problem still happened (I didn't get a chance to take a heap dump when the prob. happened with these settings)

-XX:CMSMaxAbortablePrecleanTime=10000
-XX:+CMSScavengeBeforeRemark

Thanks,

Austin

-- 
Mark Triggs
<ma...@dishevelled.net>

Mark Triggs

unread,
Feb 28, 2018, 5:38:21 AM2/28/18
to Hendrik Steller, saka...@apereo.org, Austin

Hi Hendrik,

I see our last messages crossed in the post :) Just to note: NYU had issues with exactly this code, where LDAP connections would deadlock in their finalize method waiting on another thread.

NYU is now running Sam's Unboundid LDAP implementation from here:

https://github.com/sakaiproject/sakai/pull/1924

with the modifications described further down in that PR thread.

Mark

Hendrik Steller <sa...@stellers.net> writes:

Hi Austin,

after glancing over

http://grepcode.com/file/repo1.maven.org/maven2/com.novell.ldap/jldap/4.3/com/ novell/ldap/LDAPConnection.java

I would take a closer look at those;

-- 
Mark Triggs
<ma...@dishevelled.net>

Hendrik Steller

unread,
Feb 28, 2018, 6:01:27 AM2/28/18
to saka...@apereo.org
On Mittwoch, 28. Februar 2018 11:38:10 CET Mark Triggs wrote:
> Hi Hendrik,
>
> I see our last messages crossed in the post :)

:)

>> Just to note: NYU had
> issues with exactly this code, where LDAP connections would deadlock in
> their finalize method waiting on another thread.

And looking further at that Novell implementation, the thread it potentially
seems to wait for forever via join() without at timeout..
.happens to be the one waiting for responses from the LDAP server on that
socket without a timeout by default.
So changing the default timeout or adding one to the join()-call at
http://grepcode.com/file/repo1.maven.org/maven2/com.novell.ldap/jldap/4.3/com/
novell/ldap/Connection.java#986
might improve things, too.

Hendrik


Austin

unread,
Mar 2, 2018, 7:28:01 PM3/2/18
to sakai-dev
Hello Again,

Thanks Hendrik and Mark,  I looked into the LDAP code you mentioned and I see what you were talking about the join() call not using a timeout parameter.  But I don't quite feel comfortable at this time to modify the novell code and compile that into our sakai instance.  In the meantime, on half of our servers, I've increased the jldap-beans.xml timeout from the default 5s to 60s.  I know that probably doesn't have anything to do with the novell join() issue, but I figured it'd be worth a shot.

Another strange thing that just happened last night (on a server that did not have the jldap-beans.xml timeout increased) is that we hit the same [CMS-concurrent-...] looping issue that's been associated with this issue, but upon inspecting the heap dump.... there weren't any stuck Finalizers!

Is there more than one cause of this CMS/GC looping problem?  Or did something change?  Possibly....

Since the last time we saw the issue with the stuck Finalizers, we increased the OS's system memory from 16G to 24G... it could be a coincidence, but I wonder if the lower 16G of memory was involved in the Finalizer getting stuck?  And now that we have more memory, while the CMS/GC problem still happens, we're not seeing as bad of a performance problem and maybe the heap dumps will now show something closer to the root of the problem? (hopefully that's it and this is not a separate way to hit the GC issue)

So analyzing the recent heap dumps, Eclipse MAT indicated that the main leak suspects are

org.apache.tomcat.util.threads.TaskThread
org.apache.jasper.runtime.PageContextImpl
org.apache.jasper.runtime.BodyContentImpl

Inspecting TaskThread, which actually contained BodyContextImpl, it contained some html, javascript, and... embedded data.  210MB worth of stuff.  And that's just one of the TaskThread objects.  In the 2 or 3 I checked, they all contained large amounts of embedded data.  I mentioned this before and Ashely Willis also mentioned something similar,

"I think you might be on to something with the Base 64 image stuff. We have been having similar issues the last 2 weeks. We had a sever run out of heap this afternoon. GC seemed to be in a loop with near constant major collections that were getting longer and longer. It looked to us like a problem with the Blog tool. Looking at the heap, roughly 50% of the heap appears to be Strings that are copy/paste probably from Word (they were full of styles) into the Blog tool and they contain img tags <img alt="" src=".....

The reason this sounds really suspicious to me is that we saw something similar with a tool a year or 2 ago, I think it was Melete. The tool had a method to sanitize Strings by removing tags etc. One professor was using the tool for class photos. I don't remember all the details but it was essentially trying to process the images as Strings and running out of heap. I wonder if something similar is going on here."

Then after further inspecting the contents of that BodyContentImpl object, it looked as though the first part of it was coming from msgcntr's synMainLite.jsp, then it listed some Forums(msgcntr) JSON objects, then the embedded data filled up the rest of the content.  However, what was strange is that the Forums JSON objects appeared to come from several sites.  But I checked to see if any users were participants common to both of the forums' sites, but they were not.  So I thought that maybe some admin page could be loading this data... but inspecting the embedded data, it appeared to be a .ppt file.  And looking up that file in the DB's content_resource table, it was coming from melete!  Which Ashley mentioned above...  Unless someone copied the melete attachment URL and put it into a Forum message, I couldn't think of anywhere like a synpotic tool that might show both Forums and Melete content in the same place, especially when logged in as an Admin.  Another thing that's strange is this embedded content seems to start "out of the blue".  It doesn't start with or is nested under any HTML or JS content.  It just seems to appear out of no where... all of this just doesn't add up.  Is there any chance that the memory is so messed up that it's mashing up content from separate places?

Here's a short snippet of what this bad content looks like:

.
.
.
// SAK-11014 revert }
// SAK-11014 revert var height = getFrameHeight(frame);
// here we fudge to get a little bigger
var newHeight = height + 40;
// but not too big!
if (newHeight > 32760) newHeight = 32760;
// capture my current scroll position
var scroll = findScroll();
// resize parent frame (this resets the scroll as well)
objToResize.height=newHeight + "px";
// reset the scroll, unless it was y=0)
if (scroll[1] > 0)
{
var position = findPosition(frame);
parent.window.scrollTo(position[0]+scroll[0], position[1]+scroll[1]);
}
}
}
</script>
IS? \"Playing Favorites\"",
"forumid": 62036,
"forumtitle": "Week 12; Chapter 11"
},
{
"topicid": 1535185,
"topictitle": "HOW WOULD YOU HANDLE THIS? \"Union Busting\"",
"forumid": 62037,
"forumtitle": "Week 13; Chapter 12"
},
{
"topicid": 1535186,
"topictitle": "Internet; posting your article to \"Forums.\"",
"forumid": 62038,
"forumtitle": "Week 14; Chapter 13"
},
.
.
.
  "forumid": 63386,
"forumtitle": "Summary"
},
{
"forumid": 63387,
"forumtitle": "Grilled Cheese ePortfolio Forum"
},
{
"forumid": 63388,
"forumtitle": "ENG-100-0 [KAP.33007.SP16] Forum"
}
]}
]
</div>
vA ˇ˝ˇ˝ˇ˝ˇ˝ˇ˝ ˇ˝Zˇ˝u% ˇ˝ ˆHˇ˝;ˇ˝\ˇ˝ˇ˝CNzˇ˝% ˇ˝ ˇ˝:ˇ˝S<ˇ˝ˇ˝ ˇ˝ˇ˝ˇ˝mˇ˝Fˇ˝Qˇ˝ˇ˝l<ˇ˝'ˇ˝, ˇ˝ˇ˝^ ˇ˝ˇ˝8
ˇ˝A ˇ˝gˇ˝ˇ˝bˇ˝ˇ˝ Zˇ˝kˇ˝ˇ˝ˇ˝ xnˇ˝qˇ˝ˇ˝%ˇ˝eˇ˝J].ˇ˝ERˇ˝
ˇ˝[7**ˇ˝U ˇ˝ˇ˝ˇ˝ˇ˝uˇ˝ˇ˝ˇ˝" ˇ˝ ˇ˝ˇ˝ˇ˝ ˇ˝laˇ˝jˇ˝ ˇ˝dgˇ˝C&+ˇ˝sˇ˝5 ˇ˝ ] ˇ˝ ˇ˝.Yˇ˝3ˇ˝hˇ˝> 6ˇ˝ˇ˝ˇ˝ ˇ˝ ˇ˝]ˇ˝ˇ˝-ˇ˝ ˇ˝ˇ˝ˇ˝Ct-bˇ˝Iˇ˝ˇ˝Iˇ˝r ]ˇ˝X_ˇ˝ˇ˝|ˇ˝])m~ eˇ˝ˇ˝ˇ˝ˇ˝: M,Gˇ˝ ˇ˝ˇ˝iW/ˇ˝ ªˇ˝ ˇ˝ˇ˝ˇ˝ˇ˝ˇ˝ˇ˝<ˇ˝ˇ˝ ˇ˝ˇ˝ˇ˝ˇ˝ˇ˝ ˇ˝ˇ˝ˇ˝j^ ˇ˝U ˇ˝ ˇ˝ˇ˝Rˇ˝|\ ˇ˝ˇ˝ˇ˝ˇ˝Eˇ˝ ˇ˝ˇ˝ @4ˇ˝ˇ˝ˇ˝ˇ˝X3ˇ˝/ˇ˝Iˇ˝Qˇ˝ˇ˝ 7ˇ˝ˇ˝ˇ˝$ˇ˝vˇ˝ˇ˝0nˇ˝ˇ˝ˇ˝x ˇ˝ˇ˝^s ãˇ˝q?ˇ˝FSˇ˝ ˇ˝6Z x6ˇ˝ˇ˝? ˇ˝ˇ˝xˇ˝ˇ˝ˇ˝uˇ˝1ˇ˝ˇ˝ˇ˝ˇ˝ˇ˝*fˇ˝ Kˇ˝ ˇ˝hˇ˝ ˇ˝ˇ˝ˇ˝Pˇ˝=<~,S4ˇ˝|cJ ˇ˝ˇ˝C} ˇ˝ˇ˝hy ˇ˝ˇ˝?ˇ˝ˇ˝zˇ˝ˇ˝O+ˇ˝ˇ˝Qˇ˝ˇ˝ˇ˝oˇ˝A->,(-d ˇ˝vˇ˝ˇ˝ˇ˝ˇ˝ ˇ˝ˇ˝=9ˇ˝As0ˇ˝ˇ˝ˇ˝q ˇ˝ˇ˝sxk6 À)ˇ˝ˇ˝"ˇ˝ufˇ˝ˇ˝t:ˇ˝ˇ˝ˇ˝ˇ˝2MEA ¸ˇ˝Yˇ˝ ˇ˝ˇ˝ Acˇ˝ˇ˝ˇ˝ˇ˝' :ˇ˝3ˇ˝~ˇ˝xˇ˝ˇ˝ ˇ˝ˇ˝>&yˇ˝ˇ˝9 "ˇ˝ˇ˝ˇ˝ˇ˝ \tˇ˝ˇ˝0^ pˇ˝kˇ˝ˇ˝f4ˇ˝ ˇ˝ˇ˝A<[ [ˇ˝A ˇ˝ˇ˝Vˇ˝ˇ˝Q0ˇ˝ˇ˝ˇ˝ˇ˝2ıCL.1C@ˇ˝ˇ˝D
.
.
.



Earle Nietzel

unread,
Mar 5, 2018, 10:03:55 AM3/5/18
to Austin, sakai-dev
You may be running into this JSF leak:

Since your running melete it may not be configured correctly to avoid this.

To unsubscribe from this group and stop receiving emails from it, send an email to sakai-dev+...@apereo.org.

To post to this group, send email to saka...@apereo.org.
Visit this group at https://groups.google.com/a/apereo.org/group/sakai-dev/.

--
You received this message because you are subscribed to the Google Groups "Sakai Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sakai-dev+...@apereo.org.

Austin

unread,
Mar 5, 2018, 1:52:10 PM3/5/18
to Earle Nietzel, sakai-dev
Hello Earle,

Thanks, yes looking into the code for both etudes jforum and melete, I see:

jforum->jforum-tool->src-webapp->WEB-INF->web.xml
<context-param>
        <param-name>javax.faces.STATE_SAVING_METHOD</param-name>
        <param-value>server</param-value>
</context-param>


melete->melete-admin->src-webapp->WEB-INF->web.xml
<context-param>
<param-name>javax.faces.STATE_SAVING_METHOD</param-name>
<param-value>server</param-value>
</context-param>


melete->melete-app->src-webapp->WEB-INF->web.xml
<context-param>
<param-name>javax.faces.STATE_SAVING_METHOD</param-name>
<param-value>client</param-value>
</context-param>


And looking at the patch in SAK-25677, it changes everything to 'client'.  And SAK-29126 added the sakai property and basically just removed that param from web.xml.  Just double checking... is that's all that's needed... removed this context-param from web.xml to get the tool to use the defaults from sakai.properties?

Thanks,

Austin

To unsubscribe from this group and stop receiving emails from it, send an email to sakai-dev+unsubscribe@apereo.org.

To post to this group, send email to saka...@apereo.org.
Visit this group at https://groups.google.com/a/apereo.org/group/sakai-dev/.

--
You received this message because you are subscribed to the Google Groups "Sakai Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sakai-dev+unsubscribe@apereo.org.

Earle Nietzel

unread,
Mar 5, 2018, 4:05:48 PM3/5/18
to Austin, sakai-dev
Yep I believe that is all you need!

To unsubscribe from this group and stop receiving emails from it, send an email to sakai-dev+...@apereo.org.

To post to this group, send email to saka...@apereo.org.
Visit this group at https://groups.google.com/a/apereo.org/group/sakai-dev/.

--
You received this message because you are subscribed to the Google Groups "Sakai Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sakai-dev+...@apereo.org.

Austin

unread,
Mar 5, 2018, 5:34:41 PM3/5/18
to Earle Nietzel, sakai-dev
Thanks Earle,

We'll give that patch a try with both jforum and melete the next time we push changes to our production servers in a few weeks.



Also, this prob. belongs in a different email thread, but I've been keeping a fork of the etudes dependencies, jforum, and melete in my github account with compatibility and bug fixes for Sakai 11


If anyone is interested in forking them to the sakaicontrib github, that'd be fine with me ;)
(note, I did not include mneme because that project already exists in sakaicontrib... though that version might not be compatible with the dependencies in my forks) 


To unsubscribe from this group and stop receiving emails from it, send an email to sakai-dev+unsubscribe@apereo.org.

To post to this group, send email to saka...@apereo.org.
Visit this group at https://groups.google.com/a/apereo.org/group/sakai-dev/.

--
You received this message because you are subscribed to the Google Groups "Sakai Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sakai-dev+unsubscribe@apereo.org.

Alex Balleste

unread,
Sep 19, 2018, 8:30:17 AM9/19/18
to Sakai Development, aust...@hawaii.edu, sa...@stellers.net

Hi, everyone. We've been experimenting those memory leak. We discovered using YourKit that the problem is the same Austin described in the previous emails. We have a lot of objects org.sakaiproject.announcement.impl.BaseAnnouncementService$BaseAnnouncementMessageEdit . They are pending to finalize. I've been seeing the content of those announcements, and they are really old. That, made me think those messages are loaded when somebody access to the Workspace announcements tool and loads all the messages (we had multiple sites that are available for l all students and had announcements since 2008).

There are some announcement specially big, (they use base64 images embeded) and java retains arround 2MB for each object. That makes that we rapidly gather 4GB of memory just for announcements when a lot of users access to the announcements tool.

I was wondering if anyone found the solution for this problem

Thanks in advance.

Àlex




El 05/03/18 a les 23:34, Austin ha escrit:
To unsubscribe from this group and stop receiving emails from it, send an email to sakai-dev+...@apereo.org.

To post to this group, send email to saka...@apereo.org.
Visit this group at https://groups.google.com/a/apereo.org/group/sakai-dev/.

-- 
Alexandre Ballesté Crevillén alexandre.balleste at udl.cat
====================
Universitat de Lleida

Àrea de sistemes d'Informació i Comunicacions

Analista/Programador


University of Lleida

Information and Communication Systems Service


Tlf: +34 973 702148

Fax: +34 973 702130

=====================

Avís legal / Aviso legal / Avertiment legal / Legal notice <http://www.imatge.udl.cat/avis_legal_lopd.html>

Stephen Marquard

unread,
Sep 19, 2018, 8:34:18 AM9/19/18
to Alex Balleste, Sakai Development, aust...@hawaii.edu, sa...@stellers.net

With Longsight’s help, we recently resolved (we think) an issue with GC related to Finalizers by switching from the default JLDAP provider to the new unboundid LDAP provider.

 

https://jira.sakaiproject.org/browse/SAK-23630

 

Before that, we would see one or two app servers run out of memory and into excessive, repeated Full GCs every week. Since that change, it hasn’t happened again.

 

Regards

Stephen

 

---
Stephen Marquard, Learning Technologies Co-ordinator,
Centre for Innovation in Learning and Teaching (CILT)
University of Cape Town
http://www.cilt.uct.ac.za
stephen....@uct.ac.za
Phone: +27-21-650-5037 Cell: +27-83-500-5290

 

From: saka...@apereo.org <saka...@apereo.org> On Behalf Of Alex Balleste
Sent: 19 September 2018 02:30 PM
To: Sakai Development <saka...@apereo.org>; aust...@hawaii.edu; sa...@stellers.net
Subject: Re: [sakai-dev] GC memory leak with java.lang.ref.Finalizer / BaseAnnouncementService

 

Hi, everyone. We've been experimenting those memory leak. We discovered using YourKit that the problem is the same Austin described in the previous emails. We have a lot of objects org.sakaiproject.announcement.impl.BaseAnnouncementService$BaseAnnouncementMessageEdit . They are pending to finalize. I've been seeing the content of those announcements, and they are really old. That, made me think those messages are loaded when somebody access to the Workspace announcements tool and loads all the messages (we had multiple sites that are available for l all students and had announcements since 2008).

There are some announcement specially big, (they use base64 images embeded) and java retains arround 2MB for each object. That makes that we rapidly gather 4GB of memory just for announcements when a lot of users access to the announcements tool.

I was wondering if anyone found the solution for this problem

Thanks in advance.

Àlex

Disclaimer - University of Cape Town This email is subject to UCT policies and email disclaimer published on our website at http://www.uct.ac.za/main/email-disclaimer or obtainable from +27 21 650 9111. If this email is not related to the business of UCT, it is sent by the sender in an individual capacity. Please report security incidents or abuse via https://csirt.uct.ac.za/page/report-an-incident.php.

Alex Balleste

unread,
Sep 19, 2018, 8:57:48 AM9/19/18
to Stephen Marquard, Sakai Development, aust...@hawaii.edu, sa...@stellers.net

Do you think both problems are related, LDAP and Announcements?. Is LDAP preventing in some way to finalize those BaseAnnouncementEdit Objects? I would try it because we are using LDAP provider too.

Thanks.

Àlex


El 19/09/18 a les 14:34, Stephen Marquard ha escrit:
Disclaimer - University of Cape Town This email is subject to UCT policies and email disclaimer published on our website at http://www.uct.ac.za/main/email-disclaimer or obtainable from +27 21 650 9111. If this email is not related to the business of UCT, it is sent by the sender in an individual capacity. Please report security incidents or abuse via https://csirt.uct.ac.za/page/report-an-incident.php. --
You received this message because you are subscribed to the Google Groups "Sakai Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sakai-dev+...@apereo.org.
To post to this group, send email to saka...@apereo.org.
Visit this group at https://groups.google.com/a/apereo.org/group/sakai-dev/.

Stephen Marquard

unread,
Sep 19, 2018, 9:13:16 AM9/19/18
to Alex Balleste, Sakai Development, aust...@hawaii.edu, sa...@stellers.net

Hi Àlex,

The analysis of heap dumps and thread dumps taken when the server was in extreme low-memory conditions suggested that the finalizer queue was being blocked by JLDAP objects, so the GC was never able to run the finalizers and thus free up the BaseAnnouncementEdit objects.

 

Sam Ottenhof from Longsight may be able to explain that better.

Hendrik Steller

unread,
Sep 19, 2018, 9:25:01 AM9/19/18
to Alex Balleste, saka...@apereo.org
Hi Alex,

if I recall the problem and the LDAP code I looked at correctly, the problem
is that there's by default only one finalizer-thread for all objects in the
JVM.
So due to Sakai#s inner workings, there might be a ton of announcements queued
up and waiting to have their finalize()-method called, but the object
actually blocking the finalizer thread from doing its work might be something
completely unrelated to all those announcements.

I vaguely remember seeing some rather suspicious code in the finalize()-
methods of the LDAP libraries being used - it might have been something along
the lines of opening a connection to the LDAP server with Java's default URL
connection settings, i.e. with the connection timeouts set to "unlimited",
which means that. this code and thus the finalizer thread will hang forever if
it can't reach the LDAP server for whatever reason in that exact moment.

Hendrik


On Wednesday, 19 September 2018 14:57:42 CEST Alex Balleste wrote:
> Do you think both problems are related, LDAP and Announcements?. Is LDAP
> preventing in some way to finalize those BaseAnnouncementEdit Objects? I
> wo try it because we are using LDAP provider too.
>
> Thanks.
>
> Àlex
>
> El 19/09/18 a les 14:34, Stephen Marquard ha escrit:
> > With Longsight’s help, we recently resolved (we think) an issue with
> > GC related to Finalizers by switching from the default JLDAP provider
> > to the new unboundid LDAP provider.
> >
> > https://jira.sakaiproject.org/browse/SAK-23630
> >
> > Before that, we would see one or two app servers run out of memory and
> > into excessive, repeated Full GCs every week. Since that change, it
> > hasn’t happened again.
> >
> > Regards
> >
> > Stephen
> >
> > ---
> > Stephen Marquard, Learning Technologies Co-ordinator,
> > Centre for Innovation in Learning and Teaching (CILT)
> > University of Cape Town
> > http://www.cilt.uct.ac.za <http://www.cilt.uct.ac.za/>
> > stephen....@uct.ac.za <mailto:stephen....@uct.ac.za>
> > Phone: +27-21-650-5037 Cell: +27-83-500-5290
> >
> > *From:*saka...@apereo.org <saka...@apereo.org> *On Behalf Of *Alex
> > Balleste
> > *Sent:* 19 September 2018 02:30 PM
> > *To:* Sakai Development <saka...@apereo.org>; aust...@hawaii.edu;
> > sa...@stellers.net
> > *Subject:* Re: [sakai-dev] GC memory leak with java.lang.ref.Finalizer
> > <mailto:sakai-dev+...@apereo.org>.
> > To post to this group, send email to saka...@apereo.org
> > <mailto:saka...@apereo.org>.

Alexandre Ballesté

unread,
Sep 19, 2018, 10:15:33 AM9/19/18
to Hendrik Steller, saka...@apereo.org
Good, so I'll go for the LDAP changes to prevent Finalizer to being
blocked. Thanks for the infomation.

Àlex.


El 19/9/18 a les 15:24, Hendrik Steller ha escrit:

Earle Nietzel

unread,
Sep 19, 2018, 10:27:43 AM9/19/18
to Hendrik Steller, Alex Balleste, saka...@apereo.org
Hi Everyone,

After looking over heap dumps provided by Stephen, the JVM's finalizer thread was in a deadlock and the locked class was JLDAP's Connection.finalize which eventually calls acquireWriteSemaphore:

at com.novell.ldap.Connection.acquireWriteSemaphore(Connection.java:286)
 - locked <0x000000068166d2b0> (a java.lang.Object)
 at com.novell.ldap.Connection.shutdown(Connection.java:956)
 at com.novell.ldap.Connection.finalize(Connection.java:912)

When the finalizer gets stuck its pretty much game over as nothing else in the JVM can be finalized and memory just fills up.

Here is the relelvant code:
private int    writeSemaphoreOwner = 0;
...
synchronized( writeSemaphore) {
if( id == 0) {
ephemeralId = ((ephemeralId == Integer.MIN_VALUE)
? (ephemeralId = -1) : --ephemeralId);
id = ephemeralId;
}
while( true) {
if( writeSemaphoreOwner == 0) {
// we have acquired the semahpore
writeSemaphoreOwner = id;
break;
} else {
if( writeSemaphoreOwner == id) {
// we already own the semahpore
break;
}
try {
// Keep trying for the lock
writeSemaphore.wait();
continue;
} catch( InterruptedException ex) {
// Keep trying for the lock
continue;
}
}
}
writeSemaphoreCount++;
}
I think the issue has to do with another thread changing the writeSemaphoreOwner while another thread is in the wait for the same Connection object. Because the values are copied to the Threads stack when writeSemaphoreOwner is changed the for that Connection object the other Thread in the same object is unaware. A possible fix might simply be the change:
private volatile int    writeSemaphoreOwner = 0;


To unsubscribe from this group and stop receiving emails from it, send an email to sakai-dev+...@apereo.org.
To post to this group, send email to saka...@apereo.org.

Austin

unread,
Sep 19, 2018, 2:20:49 PM9/19/18
to sakai-dev
Great work tracking this down!  Thanks!

We've been running the 'finalizer' code that Hendrik provided earlier in this thread to manually clear the finalizer.  It was happening quite frequently during the first couple weeks of the semester.  It works great, but yes, a better solution would be awesome.

Is/will there be a fix for default novell ldap?  Or is using unboundid ldap the preferred fix?   And will the unboundid ldap fix be merged into 12.4 / 12.x?

Thanks,

Austin




> > To post to this group, send email to saka...@apereo.org
> > <mailto:saka...@apereo.org>.
> > Visit this group at
> > https://groups.google.com/a/apereo.org/group/sakai-dev/.




--
You received this message because you are subscribed to the Google Groups "Sakai Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sakai-dev+unsubscribe@apereo.org.

To post to this group, send email to saka...@apereo.org.
Visit this group at https://groups.google.com/a/apereo.org/group/sakai-dev/.

--
You received this message because you are subscribed to the Google Groups "Sakai Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sakai-dev+unsubscribe@apereo.org.

Alex Balleste

unread,
Sep 20, 2018, 2:03:39 AM9/20/18
to Earle Nietzel, Hendrik Steller, saka...@apereo.org

Hi, thanks for all that information. I'll definitively go to the new unboundid LDAP provider. It's true that in the logs I can see problems about semaphores related with LDAP.

I'll keep you posted if it resolves all my problems

Thanks again.
Àlex.

El 19/09/18 a les 16:27, Earle Nietzel ha escrit:

Earle Nietzel

unread,
Sep 20, 2018, 11:14:46 AM9/20/18
to Austin, sakai-dev
Hi Austin at this point the JLDAP lib hasn't seen many updates and the unboundid lib has a lot more activity so I think moving to unboundid is the clear choice.

Unboundid LDAP provider will be in 19 and I assume at some point in the future once some more people have used it, it will be merged back to 12.x

One potential work around for the novell jldap provider is to avoid using pooling as this appears to be where it has surfaced.


> > an email to sakai-dev+...@apereo.org
> > <mailto:sakai-dev+...@apereo.org>.

> > To post to this group, send email to saka...@apereo.org
> > <mailto:saka...@apereo.org>.
> > Visit this group at
> > https://groups.google.com/a/apereo.org/group/sakai-dev/.




--
You received this message because you are subscribed to the Google Groups "Sakai Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sakai-dev+...@apereo.org.

To post to this group, send email to saka...@apereo.org.
Visit this group at https://groups.google.com/a/apereo.org/group/sakai-dev/.

--
You received this message because you are subscribed to the Google Groups "Sakai Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sakai-dev+...@apereo.org.

To post to this group, send email to saka...@apereo.org.
Visit this group at https://groups.google.com/a/apereo.org/group/sakai-dev/.

--
You received this message because you are subscribed to the Google Groups "Sakai Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sakai-dev+...@apereo.org.

Alex Balleste

unread,
Sep 27, 2018, 8:24:27 AM9/27/18
to Earle Nietzel, Hendrik Steller, saka...@apereo.org

Well, we've applied the LDAP provider implementation as you suggested and it really improved the system behavior. It has been running for a week and the number of GC alerts decreased drastically.

Thanks for the suggestion and the work you did.

Àlex.

PS. It would be nice to have the new provider in the next 12 release too.
El 20/09/18 a les 08:03, Alex Balleste ha escrit:

Austin

unread,
Oct 18, 2018, 5:35:52 PM10/18/18
to sakai-dev, Earle Nietzel, Hendrik Steller
Hello All,

Unfortunately, I haven't been able to get the unboundid library working yet... but in the meantime, we've been using Hendrik's code to force finalization when we detect GC.  However, that only seems to work half the time, which is a big improvement!  But when that finalizer code doesn't work, sometimes the GC will resolve itself after 30 - 120 minutes.  But other times, it will go on forever until we reboot the system.

In those cases, inspecting the heap dump, we typically see that the biggest object dominator is SessionComponent, but drilling down there are bunch of ConcurrentHashMaps, MySession, MyLittleSession (see attached)... is that related to the ldap / unboundid stuff?  Other times, it looks like NotifyingMemoryStore is the biggest hog, where I think I'm seeing lots of instances of sites that have base64 images in their site descriptions (note that that bas64 thing was mentioned earlier in this thread re: stuck announcement objects)

However, last night we got one that showed the biggest dominator was TaskThread, which contained VelocityPortalRenderContext, then a bunch of hash and tree maps (see attached).  And besides doing the normal looping of GC, I was also seeing tons of "FULL GC" events in the tomcat logs.

Also, Earle mentioned, "After looking over heap dumps provided by Stephen, the JVM's finalizer thread was in a deadlock and the locked class was JLDAP's Connection.finalize which eventually calls acquireWriteSemaphore:"
What are you looking at in the heap dump to determine that?  Or can that only be determined when connected to a live system? I've tried both MAT and YourKit, but I don't see anything obvious, but I also don't know either of those apps that well to find deadlocks.  But looking in our tomcat logs, I'm definitely seeing more errors with ldap semaphores that usual when the GC occurs.

- Austin




> > To post to this group, send email to saka...@apereo.org
> > <mailto:saka...@apereo.org>.
> > Visit this group at
> > https://groups.google.com/a/apereo.org/group/sakai-dev/.




--
You received this message because you are subscribed to the Google Groups "Sakai Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sakai-dev+unsubscribe@apereo.org.

To post to this group, send email to saka...@apereo.org.
Visit this group at https://groups.google.com/a/apereo.org/group/sakai-dev/.

-- 
Alexandre Ballesté Crevillén alexandre.balleste at udl.cat
====================
Universitat de Lleida

Àrea de sistemes d'Informació i Comunicacions

Analista/Programador


University of Lleida

Information and Communication Systems Service


Tlf: +34 973 702148

Fax: +34 973 702130

=====================

Avís legal / Aviso legal / Avertiment legal / Legal notice <http://www.imatge.udl.cat/avis_legal_lopd.html>
--
You received this message because you are subscribed to the Google Groups "Sakai Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sakai-dev+unsubscribe@apereo.org.

To post to this group, send email to saka...@apereo.org.
Visit this group at https://groups.google.com/a/apereo.org/group/sakai-dev/.

-- 
Alexandre Ballesté Crevillén alexandre.balleste at udl.cat
====================
Universitat de Lleida

Àrea de sistemes d'Informació i Comunicacions

Analista/Programador


University of Lleida

Information and Communication Systems Service


Tlf: +34 973 702148

Fax: +34 973 702130

=====================

Avís legal / Aviso legal / Avertiment legal / Legal notice <http://www.imatge.udl.cat/avis_legal_lopd.html>

--
You received this message because you are subscribed to the Google Groups "Sakai Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sakai-dev+unsubscribe@apereo.org.
SessionComponent.png
TaskThread.png
Reply all
Reply to author
Forward
0 new messages