Memory leak in grizzly

832 views
Skip to first unread message

Ryan de Laplante

unread,
Jun 24, 2021, 2:48:04 PM6/24/21
to Payara Forum
For years we've had to occasionally kill the GlassFish/Payara java.exe process because it ran out of memory and became completely unresponsive, using 95% CPU.  I originally thought it was related to JDBC connections going bad because the server.log ALWAYS starts with this:

Exception [EclipseLink-4002] (Eclipse Persistence Services - 2.7.7.payara-p2): org.eclipse.persistence.exceptions.DatabaseException
Internal Exception: java.sql.SQLException: I/O Error: Read timed out
Error Code: 0

Which eventually leads to a bunch of these:

java.sql.SQLException: java.lang.reflect.UndeclaredThrowableException
Caused by: java.lang.reflect.InvocationTargetException
Caused by: java.sql.SQLException: Invalid state, the Connection object is closed.

We're using the jTDS JDBC driver v1.3.1 with MSSQL server over the internal network.  We have the connection pools configured to verify connections before use and to close all connections on discovery of a failure. I'd expect it to recover automatically, but it doesn't.

The issue happened again today.  This time I used jmap to get a heap dump and ran it through Visual VM to get some clues.   What I found is that there are over 4 million (each) of org.glassfish.grizzly.http.util.CharChunk/BufferChunk/ByteChunk/DataChunk objects using almost 1.5 GB of memory.  I've attached a screenshot.  This leads me to believe there is a memory leak in grizzly, the HTTP(S) web server component in the Payara application server.

The website hosted in Payara is a high traffic website. When the DB becomes unusable, I'm sure many people are clicking over and over while more new users connect to the web server.  That should not cause Payara to go to 95% CPU solid, cause the memory to balloon, or make the website become completely unresponsive.   When this issue happens, we test the DB and find that it is up and running fine.  When we restart Payara, the DB connections work without issue.

Any suggestions to help further diagnose and fix this issue would be greatly appreciated.

This server is running  Payara 5.2020.5 and OpenJDK 11 on a Windows Server.  We've been using GlassFish/Payara since 2008 and I think we've been having this problem occasionally with all versions.
 
Thanks,
Ryan

payara profiler.png

Fabio Luis - Vanguarda TI

unread,
Jun 24, 2021, 3:39:35 PM6/24/21
to Ryan de Laplante, Payara Forum
Hi Ryan.

I had a similar problem in the past, and it's cost me some months too.

In my case it was a client application that did not close the connections. As its query frequency was at each 5 minutos, in one week we had 2016 locked connections, giving us the famous OutOfMemory.

I don't know why a timeout didn't occur and closed the idle connections, but my guess is that it has to do with TCP idle connections in Operating System (my case Linux), as the keep alive timeout is set up.

We asked the client to fix his application and close the connections before making new ones, and it solved the problem.

Kind Regards
Fabio Silva


--
You received this message because you are subscribed to the Google Groups "Payara Forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to payara-forum...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/payara-forum/16b976a0-daa7-4a58-89b1-81b40836636cn%40googlegroups.com.

Ryan de Laplante

unread,
Jun 24, 2021, 3:42:41 PM6/24/21
to Payara Forum
Another interesting clue... based on application session data in the heap dump, I think there were 46 users at the time of the crash. The heap dump shows 235,516 instances of org.glassfish.grizzly.http.Cookie which seems odd.   I've attached another screenshot that shows only org.glassfish.grizzly objects.
payara profiler2.png

Ryan de Laplante

unread,
Jun 24, 2021, 3:49:51 PM6/24/21
to Payara Forum
Thank you very much for the tip Fabio. I will look into that.  I'm not so sure that is the issue here because the application uses injected JPA PersistenceContext in Spring beans, so I think the connection management happens at the container level.  Also, we use JDBC connection pools in Payara, so I would think we would see clues early on if we exhausted all connections in the pool.  But who knows, maybe there is some manual JDBC code lurking somewhere that might be opening connections and not closing them.

Will Hartung

unread,
Jun 24, 2021, 5:26:27 PM6/24/21
to Ryan de Laplante, Payara Forum
On Thu, Jun 24, 2021 at 11:48 AM Ryan de Laplante <ry...@ryandelaplante.ca> wrote:
For years we've had to occasionally kill the GlassFish/Payara java.exe process because it ran out of memory and became completely unresponsive, using 95% CPU.  I originally thought it was related to JDBC connections going bad because the server.log ALWAYS starts with this:

Exception [EclipseLink-4002] (Eclipse Persistence Services - 2.7.7.payara-p2): org.eclipse.persistence.exceptions.DatabaseException
Internal Exception: java.sql.SQLException: I/O Error: Read timed out
Error Code: 0

Which eventually leads to a bunch of these:

java.sql.SQLException: java.lang.reflect.UndeclaredThrowableException
Caused by: java.lang.reflect.InvocationTargetException
Caused by: java.sql.SQLException: Invalid state, the Connection object is closed.

We're using the jTDS JDBC driver v1.3.1 with MSSQL server over the internal network.  We have the connection pools configured to verify connections before use and to close all connections on discovery of a failure. I'd expect it to recover automatically, but it doesn't.

On the surface, this all sounds like an issue with the DB server or the network. On the client side, a socket is a socket. It's pretty brute force stupid. The I/O Error is your server not responding. If your DB server (or the network) is sick, who knows what the automatic refreshing of connections will do. You don't have stack traces showing where you're getting the Invalid state errors, they could well be coming from the connection pool logic trying to reconnect.
 
The issue happened again today.  This time I used jmap to get a heap dump and ran it through Visual VM to get some clues.   What I found is that there are over 4 million (each) of org.glassfish.grizzly.http.util.CharChunk/BufferChunk/ByteChunk/DataChunk objects using almost 1.5 GB of memory.  I've attached a screenshot.  This leads me to believe there is a memory leak in grizzly, the HTTP(S) web server component in the Payara application server.

I'm not saying there isn't a memory leak in Grizzly but...there isn't a memory leak in Grizzly. That code is pushing 15 years old now. If there's a resource leak (memory, connections, whatever), it's in your application. Unfortunately, you have to be savvy on your application to figure out memory leaks.

When analyzing heap dumps for leaks, don't look for system classes (like org.glassfish.*), look for your application classes. Odds are, somewhere, somehow, something from your application is "growing" and not being properly released, and THOSE are hanging on to something from the system. You could be holding on to GBs of String, Byte Arrays, etc. but those are all derivative of your classes. Look at those first, the rest is noise.
 
The website hosted in Payara is a high traffic website. When the DB becomes unusable, I'm sure many people are clicking over and over while more new users connect to the web server.  That should not cause Payara to go to 95% CPU solid, cause the memory to balloon, or make the website become completely unresponsive.   When this issue happens, we test the DB and find that it is up and running fine.  When we restart Payara, the DB connections work without issue.

When underlying parts of the infrastructure get sick, it's not untoward for Payara to behave badly. Most of the time, if the underlying, external cause can be cleaned up, Payara will recover just fine. But it may take some thrashing to do it. Other times, kicking the server is simply more expedient. But that doesn't necessarily mean there's a root cause in the app server itself, it's something external driven by your application and its traffic.
 
Any suggestions to help further diagnose and fix this issue would be greatly appreciated.

If you do indeed suffer an Out of Memory exception, the SAFEST approach is to kick the server. OOM errors are unrecoverable, they can happen ANYWHERE (like during static class initialization) and thus lead to "impossible" things. Obviously use your judgement based on the stack traces and where it's happening, but if it's more than one or two (even if the server "recovered"), it's safer to kick the server. OOM servers may as well be bombarded with alpha radiation -- anything goes. (And I've seen that before too, "impossible" things happening in Java system classes).

You have to have intimate knowledge of your application to hunt down memory leaks. You have to know what looks right and wrong but the GBs of noise in the heap. The best thing is to take several heap dumps over time. Obviously, it helps to have some idea when this may happen. Heap dumps aren't cheap. But if you take one every 5 to 10 minutes, you can then compare and see what classes are growing, which are not, and you need to be able to explain what those are and why they're doing it. For those that you can't, those are obviously suspicious.

And as I said earlier, ignore any of the java or Glassfish/Payara classes out the gate. Always assume that it's your code that's wrong, 99% of the time, you're right. All of those other classes are derivative of your application doing its work.

In my time diagnosing these things, I've never had any luck with actual heap dumps. They're stupid expensive, the tools are terrible (they've always been terrible), and slow as mollases. Your tool of choice is the jmap histogram. These are live snapshots of the classes in your heap. These are much, much cheaper to run, and can be run in production (they do add load, of course). But they're human digestible.

During times of stress hunting stuff like that down, I've had jobs that captured one every 5 minutes. Let it run for a week. When something gets sick, you now have a history you can look back on during the post mortem. Even if the server went extra stupid and you just have to kick it, you have history. I don't run those all the time, just when something like this is going weird. On the other hand, I do run thread dumps solid, 24x7, 5 minutes apart, day in and day out. Those are cheap enough. Gives a good snapshot of what your server is doing at any one time. I'd rather have 2 months of unread thread dumps lying idle on the file system, than not have one the few minutes before my server kicked the bucket. Put them on a 60 day clean up timer.

"Hey, my memory is slammed. HEY, I have 700 com.example.MainServlet running in the thread dump! Maybe that's important!"

These are things you can do today that don't cost you anything.

Regards,

Will Hartung

Gregor Kovač

unread,
Jun 24, 2021, 6:49:42 PM6/24/21
to Will Hartung, Ryan de Laplante, Payara Forum

Hi!

Will, nice write-up. Gave me some ideas how to solve one of our long standing problems with Glassfish.

Just couple of questions:

- can you please provide the command-line options of jmap you used to find your memory leak?

- when I try to run jmap against my GlassFish like "jmap PID" I get "Error attaching to process: sun.jvm.hotspot.debugger.DebuggerException: Can't attach to the process". I can attach with my NetBeans debugger to Glassfish. What am I missing with jmap?

Best regards,

    Gregor

Will Hartung je 24. 06. 21 ob 23:26 napisal:
--
You received this message because you are subscribed to the Google Groups "Payara Forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to payara-forum...@googlegroups.com.

Ryan de Laplante (Personal)

unread,
Jun 24, 2021, 8:56:12 PM6/24/21
to Payara Forum
Thank you William for the detailed reply.   I have a very deep, intimate knowledge of the application since I wrote it and have been maintaining and supporting it for the past 16+ years.  I also have experience troubleshooting and fixing at least a few memory leaks in this application. jmap, heap dumps and Visual VM have been invaluable for diagnosing and fixing these leaks.  And yes, they were all my application code's fault. I've been programming professionally for 23 years and know my own application code is always the first suspect when diagnosing these types of issues. No need to tell me that, thank you.

This application processes millions of transactions.  Some customers experience this issue maybe 1 - 3 times per year, while others experience it every month or two. When it happens, it's quick to go down, not a slow buildup.  I've always suspected a DB issue (such as full backups) or a flaky network issue.   Other applications hosted on other servers that communicate with the same DB do not have an issue, although they are .NET and PHP apps. 

I know Grizzly is a mature product. My application has been running on Payara since it was called Sun Java System Application Server, before Grizzly even existed. Just because Grizzly is a mature product doesn't mean there isn't an obscure bug or memory leak lurking in there somewhere triggered by a rare, strange network issue. It's always a possibility and I was hoping someone might be willing and able to help me look into that possibility a bit deeper.   If it really is Payara's issue, finding and fixing it will benefit everyone. If it's my application code's fault, I'm happy to admit it and share the gained knowledge with everyone.

I've attached a partial log that begins where we started seeing issues today.   There are many warnings about tasks being delayed and a "huge system clock jump". That's a big clue. I'm pretty sure this customer is hosting on VMs.  If the VM becomes very sluggish, that may be triggering these issues, whatever they are.


Thanks,
Ryan
db and clock issue.txt

Will Hartung

unread,
Jun 25, 2021, 12:12:57 AM6/25/21
to Ryan de Laplante (Personal), Payara Forum
On Thu, Jun 24, 2021 at 5:56 PM Ryan de Laplante (Personal) <ry...@ryandelaplante.ca> wrote:
I've attached a partial log that begins where we started seeing issues today.   There are many warnings about tasks being delayed and a "huge system clock jump". That's a big clue. I'm pretty sure this customer is hosting on VMs.  If the VM becomes very sluggish, that may be triggering these issues, whatever they are.

I can't speak to the Hazelcast chatter. Someone else may be able to help with that.

I summarized the first event in the log, with ellided stack traces, below

Was this a busy server? You must keep very clean logs.

This appears to all be the same "transaction", stemming from SearchCommand.

Apparently you log your SQL errors, but don't honor them (when I get these things, I just blow out the whole stack frame). You appear to press on.

This happened over the period of several minutes. The first failure at 8:31:58, and the last 8:35:09. The gap between the first two is quite long, almost 2 minutes. You get 2 timeouts, then the DB Connection (vs the socket) is closed. Inevitably, it tries to rollback against a dead DB Connection.

The socket timeouts say it's network or DB. It's obviously not too severe of a network issue, as the socket is still alive (the socket connection wasn't dropped), but doesn't mean data is passing.

Did all of the DB calls fail during this transaction? Or just these?

The socket timeouts are interesting for a DB connection. Did you make them aggressive? I never set mine, honestly, in case we get a long query (which would also cause a timeout). But given a long enough query and a short enough timeout, you'll get the same result (a read timeout).

But you can see you had a timeout at 31:58, then the next one at 33:47 -- almost 2m later -- 109s. So, what may have been happening in between there? It's an odd timeout value (despite being decimal creatures, we tend to set timeouts at minute fractions (1/4, 1/2, full minute, etc.). 109s isn't anything "regular". Minute based, 100 second based. So, just an odd number.

I can't speak to what closed the connection. The driver may have done it after see the other errors, the DB may have done it. In Postgres, if you make any kind of error, it blows the transaction. The connection is still good, but any other work in the transaction results in an error (another reason we just blow out SQL Exceptions -- get one, and the transaction is dead anyway). But, the issue with the DB Connection Pool not resetting connections does not apply here. You're still in the same transaction, the pool has not had a chance to release and reuse it yet.

Now, you mentioned VMs and such, which is a whole other kettle of fish. The Hazelcast alerts may speak to something hinky at the VM level, but the times mentioned in the log don't correlate to the times of the events, plus they're all of different threads (which doesn't necessarily indicate anything, but it's interesting). The VM can very well be "losing time", which messes everything up.

I've done things like that in the past as well, had a continually running job that just logged the time to a file, every 15s. And, yea, sometimes, we'd see gaps. Lines that were more than 15s apart. We think it had something to do with the storage and the kernel.

So. Nothing (obviously) memory related here. If you suspect memory, you would need to check GC logs, see if you can correlate crushing GC activity with these anomalies between the server and the DB.

If it were me, as I said, I'd be running regular thread dumps, and log the GC. Use them all in a post mortem of the event. Even if there's a sudden onset, a 5m thread dump can catch a lot of things.

Regards,

Will Hartung


[2021-06-24T08:31:58.952-0700] [Payara 5.2020.5] [WARNING] [] [org.eclipse.persistence.session./file:/D:/payara/payara5/glassfish/domains/domain1/applications/WebCheckInOut/WEB-INF/classes/_WebCicoPU] [tid: _ThreadID=4694 _ThreadName=http-thread-pool::http-listener-2(28)] [timeMillis: 1624548718952] [levelValue: 900] [[
Local Exception Stack:
Exception [EclipseLink-4002] (Eclipse Persistence Services - 2.7.7.payara-p2): org.eclipse.persistence.exceptions.DatabaseException
Internal Exception: java.sql.SQLException: I/O Error: Read timed out
        at com.ijws.webcico.dao.impl.GenericDaoJpaImpl.findByNamedQuery(GenericDaoJpaImpl.java:170)
        at com.ijws.webcico.dao.impl.StatsDaoJpaImpl.findStatsForNewTransaction(StatsDaoJpaImpl.java:85)
        at com.ijws.webcico.service.impl.TransactionHistoryServiceImpl.loadOrCreateStatsEntity(TransactionHistoryServiceImpl.java:359)
        at com.ijws.webcico.service.impl.TransactionHistoryServiceImpl.updateStats(TransactionHistoryServiceImpl.java:291)
        at com.ijws.webcico.service.impl.TransactionHistoryServiceImpl.recordTransaction(TransactionHistoryServiceImpl.java:213)
        at com.ijws.webcico.service.impl.WebCheckInOutServiceImpl.recordError(WebCheckInOutServiceImpl.java:2786)
        at com.ijws.webcico.service.impl.WebCheckInOutServiceImpl.searchReservations(WebCheckInOutServiceImpl.java:495)
        at com.ijws.webcico.web.ui.screens.processingcommands.SearchCommand.execute(SearchCommand.java:163)
Caused by: java.sql.SQLException: I/O Error: Read timed out
Caused by: java.net.SocketTimeoutException: Read timed out
        at java.base/java.net.SocketInputStream.socketRead0(Native Method)
        at java.base/java.net.SocketInputStream.socketRead(SocketInputStream.java:115)
        at java.base/java.net.SocketInputStream.read(SocketInputStream.java:168)
        at java.base/java.net.SocketInputStream.read(SocketInputStream.java:140)
        at java.base/java.io.DataInputStream.readFully(DataInputStream.java:200)
        at java.base/java.io.DataInputStream.readFully(DataInputStream.java:170)
        at net.sourceforge.jtds.jdbc.SharedSocket.readPacket(SharedSocket.java:850)

[2021-06-24T08:33:47.842-0700] [Payara 5.2020.5] [SEVERE] [] [] [tid: _ThreadID=4694 _ThreadName=http-thread-pool::http-listener-2(28)] [timeMillis: 1624548827842] [levelValue: 1000] [[
  javax.persistence.PessimisticLockException: Exception [EclipseLink-4002] (Eclipse Persistence Services - 2.7.7.payara-p2): org.eclipse.persistence.exceptions.DatabaseException

Internal Exception: java.sql.SQLException: I/O Error: Read timed out
        at com.ijws.webcico.dao.impl.GenericDaoJpaImpl.findByNamedQuery(GenericDaoJpaImpl.java:170)
        at com.ijws.webcico.dao.impl.StatsDaoJpaImpl.findStatsForNewTransaction(StatsDaoJpaImpl.java:85)
        at com.ijws.webcico.service.impl.TransactionHistoryServiceImpl.loadOrCreateStatsEntity(TransactionHistoryServiceImpl.java:359)
        at com.ijws.webcico.service.impl.TransactionHistoryServiceImpl.updateStats(TransactionHistoryServiceImpl.java:291)
        at com.ijws.webcico.service.impl.TransactionHistoryServiceImpl.recordTransaction(TransactionHistoryServiceImpl.java:213)
        at com.ijws.webcico.service.impl.WebCheckInOutServiceImpl.recordError(WebCheckInOutServiceImpl.java:2786)
        at com.ijws.webcico.service.impl.WebCheckInOutServiceImpl.searchReservations(WebCheckInOutServiceImpl.java:495)
        at com.ijws.webcico.web.ui.screens.processingcommands.SearchCommand.execute(SearchCommand.java:163)
Caused by: Exception [EclipseLink-4002] (Eclipse Persistence Services - 2.7.7.payara-p2): org.eclipse.persistence.exceptions.DatabaseException

Internal Exception: java.sql.SQLException: I/O Error: Read timed out
Caused by: java.sql.SQLException: I/O Error: Read timed out
Caused by: java.net.SocketTimeoutException: Read timed out
        at java.base/java.net.SocketInputStream.socketRead0(Native Method)
        at java.base/java.net.SocketInputStream.socketRead(SocketInputStream.java:115)
        at java.base/java.net.SocketInputStream.read(SocketInputStream.java:168)
        at java.base/java.net.SocketInputStream.read(SocketInputStream.java:140)
        at java.base/java.io.DataInputStream.readFully(DataInputStream.java:200)
        at java.base/java.io.DataInputStream.readFully(DataInputStream.java:170)
        at net.sourceforge.jtds.jdbc.SharedSocket.readPacket(SharedSocket.java:850)

[2021-06-24T08:34:30.149-0700] [Payara 5.2020.5] [WARNING] [] [org.eclipse.persistence.session./file:/D:/payara/payara5/glassfish/domains/domain1/applications/WebCheckInOut/WEB-INF/classes/_WebCicoPU] [tid: _ThreadID=4694 _ThreadName=http-thread-pool::http-listener-2(28)] [timeMillis: 1624548870149] [levelValue: 900] [[
Local Exception Stack:
Exception [EclipseLink-4002] (Eclipse Persistence Services - 2.7.7.payara-p2): org.eclipse.persistence.exceptions.DatabaseException
Internal Exception: java.sql.SQLException: java.lang.reflect.UndeclaredThrowableException
        at com.ijws.webcico.dao.impl.GenericDaoJpaImpl.findByNamedQuery(GenericDaoJpaImpl.java:170)
        at com.ijws.webcico.dao.impl.GenericDaoJpaImpl.findObjectInstanceByNamedQuery(GenericDaoJpaImpl.java:347)
        at com.ijws.webcico.dao.impl.TransactionDaoJpaImpl.findTotalAttemptsForResByClassification(TransactionDaoJpaImpl.java:205)
        at com.ijws.webcico.service.impl.TransactionHistoryServiceImpl.calcTotalRecordsByOutcome(TransactionHistoryServiceImpl.java:430)
        at com.ijws.webcico.service.impl.TransactionHistoryServiceImpl.updateStats(TransactionHistoryServiceImpl.java:304)
        at com.ijws.webcico.service.impl.TransactionHistoryServiceImpl.recordTransaction(TransactionHistoryServiceImpl.java:213)
        at com.ijws.webcico.service.impl.WebCheckInOutServiceImpl.recordError(WebCheckInOutServiceImpl.java:2786)
        at com.ijws.webcico.service.impl.WebCheckInOutServiceImpl.searchReservations(WebCheckInOutServiceImpl.java:495)
        at com.ijws.webcico.web.ui.screens.processingcommands.SearchCommand.execute(SearchCommand.java:163)
Caused by: java.sql.SQLException: java.lang.reflect.UndeclaredThrowableException
Caused by: java.lang.reflect.UndeclaredThrowableException
Caused by: java.lang.reflect.InvocationTargetExceptionCaused by: java.sql.SQLException: Invalid state, the Connection object is closed.
        at net.sourceforge.jtds.jdbc.JtdsConnection.checkOpen(JtdsConnection.java:1744)
        at net.sourceforge.jtds.jdbc.JtdsConnection.prepareStatement(JtdsConnection.java:2486)
        at net.sourceforge.jtds.jdbcx.proxy.ConnectionProxy.prepareStatement(ConnectionProxy.java:466)

[2021-06-24T08:34:31.556-0700] [Payara 5.2020.5] [SEVERE] [] [] [tid: _ThreadID=4694 _ThreadName=http-thread-pool::http-listener-2(28)] [timeMillis: 1624548871556] [levelValue: 1000] [[
  javax.persistence.PersistenceException: Exception [EclipseLink-4002] (Eclipse Persistence Services - 2.7.7.payara-p2): org.eclipse.persistence.exceptions.DatabaseException
Internal Exception: java.sql.SQLException: java.lang.reflect.UndeclaredThrowableException
         at org.eclipse.persistence.internal.jpa.QueryImpl.getDetailedException(QueryImpl.java:391)
         at com.ijws.webcico.dao.impl.GenericDaoJpaImpl.findByNamedQuery(GenericDaoJpaImpl.java:170)
         at com.ijws.webcico.dao.impl.GenericDaoJpaImpl.findObjectInstanceByNamedQuery(GenericDaoJpaImpl.java:347)
         at com.ijws.webcico.dao.impl.TransactionDaoJpaImpl.findTotalAttemptsForResByClassification(TransactionDaoJpaImpl.java:205)
         at com.ijws.webcico.service.impl.TransactionHistoryServiceImpl.calcTotalRecordsByOutcome(TransactionHistoryServiceImpl.java:430)
         at com.ijws.webcico.service.impl.TransactionHistoryServiceImpl.updateStats(TransactionHistoryServiceImpl.java:304)
         at com.ijws.webcico.service.impl.TransactionHistoryServiceImpl.recordTransaction(TransactionHistoryServiceImpl.java:213)
         at com.ijws.webcico.service.impl.WebCheckInOutServiceImpl.recordError(WebCheckInOutServiceImpl.java:2786)
         at com.ijws.webcico.service.impl.WebCheckInOutServiceImpl.searchReservations(WebCheckInOutServiceImpl.java:495)
         at com.ijws.webcico.web.ui.screens.processingcommands.SearchCommand.execute(SearchCommand.java:163)
         at com.ijws.webcico.web.ui.screens.ProcessingScreen.executeCommand(ProcessingScreen.java:340)
         at com.ijws.webcico.web.ui.screens.ProcessingScreen.beginButton_action(ProcessingScreen.java:177)
         at com.ijws.webcico.web.servlets.filters.HTTPSecurityHeadersFilter.doFilter(HTTPSecurityHeadersFilter.java:74)
         at org.springframework.security.web.access.ExceptionTranslationFilter.doFilter(ExceptionTranslationFilter.java:118)
Caused by: Exception [EclipseLink-4002] (Eclipse Persistence Services - 2.7.7.payara-p2): org.eclipse.persistence.exceptions.DatabaseException
Internal Exception: java.sql.SQLException: java.lang.reflect.UndeclaredThrowableException
Caused by: java.sql.SQLException: java.lang.reflect.UndeclaredThrowableException
Caused by: java.lang.reflect.UndeclaredThrowableException

Caused by: java.lang.reflect.InvocationTargetException
Caused by: java.sql.SQLException: Invalid state, the Connection object is closed.
        at net.sourceforge.jtds.jdbc.JtdsConnection.checkOpen(JtdsConnection.java:1744)
        at net.sourceforge.jtds.jdbc.JtdsConnection.prepareStatement(JtdsConnection.java:2486)
        at net.sourceforge.jtds.jdbcx.proxy.ConnectionProxy.prepareStatement(ConnectionProxy.java:466)
        ... 140 more

[2021-06-24T08:34:52.487-0700] [Payara 5.2020.5] [WARNING] [] [org.eclipse.persistence.session./file:/D:/payara/payara5/glassfish/domains/domain1/applications/WebCheckInOut/WEB-INF/classes/_WebCicoPU] [tid: _ThreadID=4694 _ThreadName=http-thread-pool::http-listener-2(28)] [timeMillis: 1624548892487] [levelValue: 900] [[
Local Exception Stack:
Exception [EclipseLink-4002] (Eclipse Persistence Services - 2.7.7.payara-p2): org.eclipse.persistence.exceptions.DatabaseException
Internal Exception: java.sql.SQLException: java.lang.reflect.UndeclaredThrowableException
        at org.eclipse.persistence.exceptions.DatabaseException.sqlException(DatabaseException.java:342)
        at org.eclipse.persistence.internal.databaseaccess.DatabaseAccessor.processExceptionForCommError(DatabaseAccessor.java:1650)
        at com.ijws.webcico.dao.impl.GenericDaoJpaImpl.flush(GenericDaoJpaImpl.java:573)
        at com.ijws.webcico.service.impl.TransactionHistoryServiceImpl.recordTransaction(TransactionHistoryServiceImpl.java:237)
        at com.ijws.webcico.service.impl.WebCheckInOutServiceImpl.recordError(WebCheckInOutServiceImpl.java:2786)
        at com.ijws.webcico.service.impl.WebCheckInOutServiceImpl.searchReservations(WebCheckInOutServiceImpl.java:495)
        at com.ijws.webcico.web.ui.screens.processingcommands.SearchCommand.execute(SearchCommand.java:163)
        at com.ijws.webcico.web.ui.screens.ProcessingScreen.executeCommand(ProcessingScreen.java:340)
        at com.ijws.webcico.web.ui.screens.ProcessingScreen.beginButton_action(ProcessingScreen.java:177)
        at com.ijws.webcico.web.servlets.filters.HTTPSecurityHeadersFilter.doFilter(HTTPSecurityHeadersFilter.java:74)
        at org.springframework.security.web.access.ExceptionTranslationFilter.doFilter(ExceptionTranslationFilter.java:118)
Caused by: java.sql.SQLException: java.lang.reflect.UndeclaredThrowableException
Caused by: java.lang.reflect.UndeclaredThrowableException
Caused by: java.lang.reflect.InvocationTargetException
Caused by: java.lang.reflect.InvocationTargetException
Caused by: java.sql.SQLException: Invalid state, the Connection object is closed.
        at net.sourceforge.jtds.jdbc.JtdsConnection.checkOpen(JtdsConnection.java:1744)
        at net.sourceforge.jtds.jdbc.JtdsConnection.prepareStatement(JtdsConnection.java:2486)
        at net.sourceforge.jtds.jdbcx.proxy.ConnectionProxy.prepareStatement(ConnectionProxy.java:466)

[2021-06-24T08:34:58.364-0700] [Payara 5.2020.5] [WARNING] [] [org.eclipse.persistence.session./file:/D:/payara/payara5/glassfish/domains/domain1/applications/WebCheckInOut/WEB-INF/classes/_WebCicoPU] [tid: _ThreadID=4694 _ThreadName=http-thread-pool::http-listener-2(28)] [timeMillis: 1624548898364] [levelValue: 900] [[
Local Exception Stack:
Exception [EclipseLink-4002] (Eclipse Persistence Services - 2.7.7.payara-p2): org.eclipse.persistence.exceptions.DatabaseException
Internal Exception: java.sql.SQLException: java.lang.reflect.UndeclaredThrowableException
        at org.eclipse.persistence.exceptions.DatabaseException.sqlException(DatabaseException.java:342)
        at org.eclipse.persistence.internal.databaseaccess.DatabaseAccessor.processExceptionForCommError(DatabaseAccessor.java:1650)
        at com.ijws.webcico.dao.impl.GenericDaoJpaImpl.findAll(GenericDaoJpaImpl.java:121)
        at com.ijws.webcico.service.impl.ErrorNotificationServiceImpl.getAffectedNotification(ErrorNotificationServiceImpl.java:130)
        at com.ijws.webcico.service.impl.ErrorNotificationServiceImpl.notifyError(ErrorNotificationServiceImpl.java:93)
        at com.ijws.webcico.service.impl.WebCheckInOutServiceImpl.recordError(WebCheckInOutServiceImpl.java:2812)
        at com.ijws.webcico.service.impl.WebCheckInOutServiceImpl.searchReservations(WebCheckInOutServiceImpl.java:495)
        at com.ijws.webcico.web.ui.screens.processingcommands.SearchCommand.execute(SearchCommand.java:163)
        at com.ijws.webcico.web.ui.screens.ProcessingScreen.executeCommand(ProcessingScreen.java:340)
        at com.ijws.webcico.web.ui.screens.ProcessingScreen.beginButton_action(ProcessingScreen.java:177)
        at com.ijws.webcico.web.servlets.filters.HTTPSecurityHeadersFilter.doFilter(HTTPSecurityHeadersFilter.java:74)
        at org.springframework.security.web.access.ExceptionTranslationFilter.doFilter(ExceptionTranslationFilter.java:118)
Caused by: java.sql.SQLException: java.lang.reflect.UndeclaredThrowableException
Caused by: java.lang.reflect.UndeclaredThrowableException
Caused by: java.lang.reflect.InvocationTargetException
Caused by: java.lang.reflect.InvocationTargetException
Caused by: java.sql.SQLException: Invalid state, the Connection object is closed.
        at net.sourceforge.jtds.jdbc.JtdsConnection.checkOpen(JtdsConnection.java:1744)
        at net.sourceforge.jtds.jdbc.JtdsConnection.prepareStatement(JtdsConnection.java:2486)
        at net.sourceforge.jtds.jdbcx.proxy.ConnectionProxy.prepareStatement(ConnectionProxy.java:466)

[2021-06-24T08:35:09.688-0700] [Payara 5.2020.5] [WARNING] [] [javax.enterprise.system.core.transaction.com.sun.jts.jtsxa] [tid: _ThreadID=4694 _ThreadName=http-thread-pool::http-listener-2(28)] [timeMillis: 1624548909688] [levelValue: 900] [[
javax.transaction.xa.XAException: Invalid state, the Connection object is closed.
        at net.sourceforge.jtds.jdbc.XASupport.raiseXAException(XASupport.java:620)
        at net.sourceforge.jtds.jdbc.XASupport.xa_rollback(XASupport.java:452)
        at net.sourceforge.jtds.jdbcx.JtdsXAResource.rollback(JtdsXAResource.java:95)
        at net.sourceforge.jtds.jdbc.XASupport.raiseXAException(XASupport.java:620)
        at com.ijws.webcico.web.ui.screens.processingcommands.SearchCommand.execute(SearchCommand.java:163)
        at com.ijws.webcico.web.ui.screens.ProcessingScreen.executeCommand(ProcessingScreen.java:340)
        at com.ijws.webcico.web.ui.screens.ProcessingScreen.beginButton_action(ProcessingScreen.java:177)
        at com.ijws.webcico.web.servlets.filters.HTTPSecurityHeadersFilter.doFilter(HTTPSecurityHeadersFilter.java:74)
        at org.springframework.security.web.access.ExceptionTranslationFilter.doFilter(ExceptionTranslationFilter.java:118)

Christoph John

unread,
Jun 25, 2021, 5:09:53 AM6/25/21
to Ryan de Laplante (Personal), Payara Forum
Did you already suspect the JDBC driver? JTDS seems to be unmaintained since 8 years or so?! Just briefly checked SourceForge but looks like that...

https://sourceforge.net/p/jtds/bugs/772/

Sounds a little like your issue with reads timing out. Can't it be that since connections are unavailable that other connections are created which will increase memory eventually?

Just making a guess...

Cheers,
Chris.
--
You received this message because you are subscribed to the Google Groups "Payara Forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to payara-forum...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/payara-forum/426a089c-16a2-f9d0-7801-f5dc2b0ec272%40ryandelaplante.ca.

-- 
Christoph John
Software Engineering
T +49 241 557080-28
christo...@macd.com

MACD GmbH
Oppenhoffallee 103
52066 Aachen, Germany
www.macd.com

Amtsgericht Aachen: HRB 8151 
Ust.-Id: DE 813021663
Geschäftsführer: George Macdonald
Message has been deleted

Christoph John

unread,
Jun 25, 2021, 7:11:53 AM6/25/21
to Ryan de Laplante (Personal), Payara Forum
Also, do you *only* have problems on machines with clock drift? If yes, I would strongly encourage you to fix that problem first because it can lead to all kind of follow-up problems, e.g. wrong timeouts.

Chris.

Christoph John

unread,
Jun 25, 2021, 7:16:30 AM6/25/21
to Payara Forum
Hi Gregor,

that Exception occured also for me on my Ubuntu machines. See this: https://stackoverflow.com/q/38710457/4962355

Cheers,
Chris.

Will Hartung

unread,
Jun 25, 2021, 10:11:15 AM6/25/21
to Gregor Kovač, Ryan de Laplante, Payara Forum
On Thu, Jun 24, 2021 at 3:49 PM Gregor Kovač <kov...@gmail.com> wrote:
Just couple of questions:

- can you please provide the command-line options of jmap you used to find your memory leak?

- when I try to run jmap against my GlassFish like "jmap PID" I get "Error attaching to process: sun.jvm.hotspot.debugger.DebuggerException: Can't attach to the process". I can attach with my NetBeans debugger to Glassfish. What am I missing with jmap.


As for the attaching error, That's a "recent" JDK problem (like JDK 7), and I don't have a definitive answer, but I think it's related to permissions. I've also had issues where I've accidentally used the wrong JDKs version of jmap, which can also cause the problem.

I would ensure that you're the same user as the java process, and, if not, perhaps try "root".

Regards,

Will Hartung

Will Hartung

unread,
Jun 25, 2021, 10:13:00 AM6/25/21
to christo...@macd.com, Ryan de Laplante (Personal), Payara Forum
On Fri, Jun 25, 2021 at 4:11 AM 'Christoph John' via Payara Forum <payara...@googlegroups.com> wrote:
Also, do you *only* have problems on machines with clock drift? If yes, I would strongly encourage you to fix that problem first because it can lead to all kind of follow-up problems, e.g. wrong timeouts.

I've not personally encountered that in the wild, but I can easily see something like that causing insidious problems especially with things like a timeout.

Regards,

Will Hartung

Ryan de Laplante (Personal)

unread,
Jun 25, 2021, 10:22:22 AM6/25/21
to payara...@googlegroups.com
Yes I have suspected the jTDS JDBC driver might have something to do with it and am in the process of testing Microsoft's JDBC driver again.  The reason we use the jTDS driver instead of Microsoft's driver is because of an issue that goes back to 2006-2008.  Our customers provide the servers for the application, and almost all of them choose Windows.  We were experiencing blue screen of death errors on a regular basis. Sometimes multiple times in an hour, other times once every week or two.  We worked closely with Sun Microsystems and tracked it down to "NP Pool count" of the java.exe process suddenly going up and up and up until BSOD.  Sun eventually escalated to Microsoft who acknowledged some sort of low level networking issue in Windows.  If I remember correctly, Microsoft provided an experimental patch for Windows which we tried, but it did not solve the issue and we never heard from Microsoft about it again. At the time we had some customers running our application on Linux and they did not experience these sorts of issues. Eventually I tried the jTDS JDBC driver and the problem went away.  We've never had a BSOD since.  Well, except two years later when we decided to try the latest version of Microsoft's JDBC driver again hoping the latest version might have fixed the issue.  Right away we were back to getting BSOD errors, so we reverted back to jTDS and have been afraid of trying the Microsoft driver ever since.  When a server goes down, sometimes thousands of people are affected depending on time of day and size of the customer's business. This issue is not something we've ever been able to reproduce in development or test environments, only production. 

Recently I've been experimenting with Microsoft's latest JDBC driver on my dev environment.  We're making plans to test it with one customer soon, in case the issue is the jTDS JDBC driver.

Regarding BSOD, a couple years ago we had another interesting issue.  A customer had a security device installed on their network which intercepted all traffic between the firewall and our server, decrypted and looked for potential malicious issues related to mobile devices.  I really don't know anything about it other than it somehow only touched traffic for iOS and possibly Android clients.  Our web app worked fine for everyone except iPhone users.  Somehow when an iPhone user accessed the website, the JVM memory usage immediately began to balloon and would crash Windows within 2 minutes requiring the power to be cycled.  We never figured out why but suspected the network security device might be messing with low level TCP/IP packets and screwing something up.  We tried re-installing on other VMs and other physical machines, same issue.  Because of this issue, the customer did not go live until this year when we tried again.  We did a full re-install of everything on a fresh VM, and this time it works fine.  Crazy.
You received this message because you are subscribed to a topic in the Google Groups "Payara Forum" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/payara-forum/KTgFuW1zsg4/unsubscribe.
To unsubscribe from this group and all its topics, send an email to payara-forum...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/payara-forum/e4a1fb58-344e-4da8-4bef-f973d1d9bc52%40macd.com.

Will Hartung

unread,
Jun 25, 2021, 10:30:10 AM6/25/21
to Ryan de Laplante (Personal), Payara Forum
On Fri, Jun 25, 2021 at 7:22 AM Ryan de Laplante (Personal) <ry...@ryandelaplante.ca> wrote:
Recently I've been experimenting with Microsoft's latest JDBC driver on my dev environment.  We're making plans to test it with one customer soon, in case the issue is the jTDS JDBC driver.

Is the MS Driver a JNI driver or pure Java? I haven't used a JNI driver in forever, and that was an early Oracle driver. As a rule I avoid JNI drivers because when a Java driver spits something up, you get a stack trace. When a JNI driver spits something up, you get a core dump and a dead server. I've had other "not so good" experiences with SQL Server over the years, enough to keep me away from it. But these were from just so long ago you'd like to think they're just memories now. My main reason for avoiding Windows is just my utter ignorance of the platform today.

Regards,

Will Hartung


Ryan de Laplante (Personal)

unread,
Jun 25, 2021, 10:34:56 AM6/25/21
to Will Hartung, christo...@macd.com, Payara Forum
Yes the clock drift issue has shown up on multiple customer servers that encounter this issue, and might be the root cause.  We will need to look at this closer.   Perhaps it has something to do with multiple VMs sharing the same hardware. 


Thanks,
Ryan


Will Hartung

unread,
Jun 25, 2021, 10:40:39 AM6/25/21
to Ryan de Laplante (Personal), christo...@macd.com, Payara Forum
On Fri, Jun 25, 2021 at 7:34 AM Ryan de Laplante (Personal) <ry...@ryandelaplante.ca> wrote:
Yes the clock drift issue has shown up on multiple customer servers that encounter this issue, and might be the root cause.  We will need to look at this closer.   Perhaps it has something to do with multiple VMs sharing the same hardware. 

The advent of the modern VM really rocks the world of what underlying assumptions we can make about what "reality" is as viewed by the software.

Also, not your problem I'm sure, but anecdotal, on my Mac, the Netbeans IDE will complain about "detected slowdown" when the computer goes asleep. I think it's in a somnambulant state of borderline consciousness that the IDE sees as clock drift, and it makes shoutouts about it.

Regards,

Will Hartung

Ryan de Laplante (Personal)

unread,
Jun 25, 2021, 11:12:16 AM6/25/21
to Payara Forum
The driver is a pure Java driver, unless you need to use XA transactions, in which case there is a DLL that goes with it.  However, the DLL must be installed on MSSQL Server itself along with the creation of some tables in the master database, assignment of special roles to users, creation of some special stored procedures, configuration of DTC, etc.  I had been using XA DataSources because we're using JTA and multiple databases, but I was able to get it working with regular DataSources recently.  I'm not done testing, but if it works well then I will consider dumping jTDS because it is no longer maintained (last released in 2013).

I used to use PostgreSQL with no issues, but customers have MSSQL DBAs, existing clusters and licenses, etc. so we go with what they want.

Thanks,
Ryan




Will Hartung

unread,
Jun 25, 2021, 11:28:17 AM6/25/21
to Ryan de Laplante (Personal), Payara Forum
On Fri, Jun 25, 2021 at 8:12 AM Ryan de Laplante (Personal) <ry...@ryandelaplante.ca> wrote:
The driver is a pure Java driver, unless you need to use XA transactions, in which case there is a DLL that goes with it.  However, the DLL must be installed on MSSQL Server itself along with the creation of some tables in the master database, assignment of special roles to users, creation of some special stored procedures, configuration of DTC, etc.  I had been using XA DataSources because we're using JTA and multiple databases, but I was able to get it working with regular DataSources recently.  I'm not done testing, but if it works well then I will consider dumping jTDS because it is no longer maintained (last released in 2013).

I used to use PostgreSQL with no issues, but customers have MSSQL DBAs, existing clusters and licenses, etc. so we go with what they want.

XA is amazing. We used that between Oracle and Postgres. Black magic voodoo right there. Under appreciated.

My nit with SQL Server started early when I was able to send a routine query to it that would crash the machine. Not the DB server, the entire machine. It's annoying that something like that would crash a DB server, it's more annoying when it takes the box with it. I appreciate the potential performance benefits etc. of having something like the DB server that tighty integrated into the kernel, but always chafed at how dangerous I considered it. But, arguably, from the mindset at the time, the DB server was likely dedicated to the application so, whether the DB server was down, or the OS was down, what difference did it make. Coming from a background of mini computers hosting several services, my mind never really thought that way.

Regards,

Will Hartung

Christoph John

unread,
Jun 29, 2021, 6:26:24 PM6/29/21
to Will Hartung, Ryan de Laplante (Personal), Payara Forum
I took a brief look at the jTDS 1.3 source and they are using System.currentTimeMillis() for timeouts all over the place (class util/TimerThread). In the experimental 2.0 branch they are at least using java.util.Timer threads but they are likely to cause the same problems since they are also not measuring via System.nanoTime().

Some references:
https://stackoverflow.com/questions/409932/java-timer-vs-executorservice
https://stackoverflow.com/questions/17588167/what-should-timertask-scheduleatfixedrate-do-if-the-clock-changes
https://stackoverflow.com/questions/18803695/how-is-the-timer-class-in-java-sensitive-to-the-system-clock
https://bugs.java.com/bugdatabase/view_bug.do?bug_id=4290274

So if you really only have problems on the systems with clock drift I'd wager that the jTDS driver and its outdated usage of timers is the culprit.

Cheers,
Chris.

Ryan de Laplante (Personal)

unread,
Jun 29, 2021, 9:41:30 PM6/29/21
to Payara Forum
Wow that's very interesting, thank you Chris! We'll give Microsoft's current JDBC driver a try in production once we can get it scheduled. There's a bit of a process to get changes like that rolled out (could take weeks.)

We've been investigating the load of VMs on the shared host, the possibility VM live migration causing pauses, as well as tools the customer might be using to test the web server is up and running.  In the past we've seen load balancers doing strange things like connecting and doing SSL handshaking every five seconds then closing the socket before a response is ever sent.  Just loading the "home page" of the web app causes a JSESSIONID cookie to be created. They expire after 30 minutes. The heap dump we took the other day had hundreds of thousands of cookies which is suspicious. Possible DDOS?   I should generate periodic thread dumps like Will suggested.


Thanks,
Ryan

Ryan de Laplante

unread,
Jun 30, 2021, 1:16:11 PM6/30/21
to Payara Forum
The other day I captured a heap dump from a "crashed" production Payra server. I am able to browse the threads using VisualVM, and focused on the http-listener-2 threads (SSL).  Below is a summary breakdown of what I could see from the thread dump. We increase the max thread pool size of http-thread-pool from 5 to 100 at installation because I built a proprietary JCA connector that communicates with many EIS systems, and when in use it blocks the HTTP thread for at least a few seconds.  So, we have 100 http-listener-2 threads:

79 SSL handshaking - I think (blocked)
4 SSL handshaking - I think (runnable)
4 DB access (runnable)
1 proprietary JCA connector access (runnable)
10 rendering JSF page (runnable)
1 rendering JSF page (blocked)
1 payara logging an exception (runnable)

All of that SSL handshaking probably contributed to the 95% CPU usage of the java.exe process.

I also looked at a random sampling of the over 235,000 cookies in memory.  I'm seeing all kinds of cookies that are not used by our application. Cookies for various analytics platforms that we don't use.  I'm starting to think this could be a DoS/DDoS attack on CPU and memory. Perhaps seemingly real cookies are sent in the request just to fill memory. If it is DoS, perhaps rate limiting at the firewall might help if not already in place. Note that killing the java.exe process and starting it back up again resolves the issue until next time though.

Below are a couple stack dumps from threads doing what I think is SSL handshaking:

"http-thread-pool::http-listener-2(2)" daemon prio=5 tid=75 BLOCKED
    at org.glassfish.grizzly.http2.AlpnSupport.setConnection(AlpnSupport.java:77)
       local variable: sun.security.ssl.SSLEngineImpl#7894
       local variable: org.glassfish.grizzly.nio.transport.TCPNIOConnection#8308
       local variable: java.util.WeakHashMap#537
    at org.glassfish.grizzly.http2.AlpnSupport.access$500(AlpnSupport.java:46)
    at org.glassfish.grizzly.http2.AlpnSupport$1.onStart(AlpnSupport.java:146)
       local variable: sun.security.ssl.SSLEngineImpl#7894
       local variable: org.glassfish.grizzly.http2.AlpnServerNegotiatorImpl#1
    at org.glassfish.grizzly.ssl.SSLBaseFilter.notifyHandshakeStart(SSLBaseFilter.java:996)
       local variable: org.glassfish.grizzly.nio.transport.TCPNIOConnection#8308
       local variable: java.util.concurrent.ConcurrentHashMap$KeyIterator#77
    at org.glassfish.grizzly.ssl.SSLBaseFilter$SSLTransportFilterWrapper.handleRead(SSLBaseFilter.java:1119)
       local variable: org.glassfish.grizzly.ssl.SSLBaseFilter$SSLTransportFilterWrapper#2
       local variable: org.glassfish.grizzly.filterchain.FilterChainContext#128
       local variable: org.glassfish.grizzly.ssl.SSLConnectionContext#7889
    at org.glassfish.grizzly.filterchain.ExecutorResolver$9.execute(ExecutorResolver.java:95)
    at org.glassfish.grizzly.filterchain.DefaultFilterChain.executeFilter(DefaultFilterChain.java:260)
       local variable: org.glassfish.grizzly.filterchain.ExecutorResolver$9#1
       local variable: org.glassfish.grizzly.ssl.SSLBaseFilter$SSLTransportFilterWrapper#2
       local variable: org.glassfish.grizzly.filterchain.FilterChainContext#128
    at org.glassfish.grizzly.filterchain.DefaultFilterChain.executeChainPart(DefaultFilterChain.java:177)
       local variable: org.glassfish.grizzly.filterchain.DefaultFilterChain#7
       local variable: org.glassfish.grizzly.filterchain.FilterChainContext#128
       local variable: org.glassfish.grizzly.filterchain.ExecutorResolver$9#1
       local variable: org.glassfish.grizzly.filterchain.DefaultFilterChain$FiltersState#23303
       local variable: org.glassfish.grizzly.ssl.SSLBaseFilter$SSLTransportFilterWrapper#2
    at org.glassfish.grizzly.filterchain.DefaultFilterChain.execute(DefaultFilterChain.java:109)
       local variable: org.glassfish.grizzly.filterchain.DefaultFilterChain#7
       local variable: org.glassfish.grizzly.filterchain.FilterChainContext#128
       local variable: org.glassfish.grizzly.filterchain.ExecutorResolver$9#1
       local variable: org.glassfish.grizzly.filterchain.DefaultFilterChain$FiltersState#23303
    at org.glassfish.grizzly.filterchain.DefaultFilterChain.process(DefaultFilterChain.java:88)
    at org.glassfish.grizzly.ProcessorExecutor.execute(ProcessorExecutor.java:53)
       local variable: org.glassfish.grizzly.filterchain.InternalContextImpl#128
    at org.glassfish.grizzly.nio.transport.TCPNIOTransport.fireIOEvent(TCPNIOTransport.java:524)
    at org.glassfish.grizzly.strategies.AbstractIOStrategy.fireIOEvent(AbstractIOStrategy.java:89)
       local variable: org.glassfish.grizzly.nio.transport.TCPNIOConnection#8308
       local variable: java.util.logging.Logger#28
    at org.glassfish.grizzly.strategies.WorkerThreadIOStrategy.run0(WorkerThreadIOStrategy.java:94)
    at org.glassfish.grizzly.strategies.WorkerThreadIOStrategy.access$100(WorkerThreadIOStrategy.java:33)
    at org.glassfish.grizzly.strategies.WorkerThreadIOStrategy$WorkerThreadRunnable.run(WorkerThreadIOStrategy.java:114)
    at org.glassfish.grizzly.threadpool.AbstractThreadPool$Worker.doWork(AbstractThreadPool.java:569)
       local variable: org.glassfish.grizzly.threadpool.SyncThreadPool$SyncThreadWorker#4
       local variable: org.glassfish.grizzly.threadpool.DefaultWorkerThread#9
       local variable: org.glassfish.grizzly.strategies.WorkerThreadIOStrategy$WorkerThreadRunnable#115
    at org.glassfish.grizzly.threadpool.AbstractThreadPool$Worker.run(AbstractThreadPool.java:549)
       local variable: org.glassfish.grizzly.threadpool.SyncThreadPool$SyncThreadWorker#4
    at java.lang.Thread.run(Thread.java:834)
       local variable: org.glassfish.grizzly.threadpool.DefaultWorkerThread#9


"http-thread-pool::http-listener-2(17)" daemon prio=5 tid=639 RUNNABLE
    at java.util.regex.Pattern$Branch.match(Pattern.java:4749)
       local variable: java.util.regex.Pattern$Branch#26
       local variable: java.util.regex.Matcher#1
       local variable: java.lang.String#1067456
    at java.util.regex.Pattern$Start.match(Pattern.java:3619)
       local variable: java.util.regex.Pattern$Start#22
       local variable: java.util.regex.Matcher#1
       local variable: java.lang.String#1067456
    at java.util.regex.Matcher.search(Matcher.java:1729)
       local variable: java.util.regex.Matcher#1
    at java.util.regex.Matcher.find(Matcher.java:746)
    at java.util.regex.Pattern.split(Pattern.java:1264)
       local variable: java.lang.String#1067456
       local variable: java.util.ArrayList#596756
       local variable: java.util.regex.Matcher#1
    at java.util.regex.Pattern.split(Pattern.java:1334)
    at sun.security.util.AlgorithmDecomposer.decomposeImpl(AlgorithmDecomposer.java:60)
       local variable: java.util.HashSet#16675
       local variable: java.lang.String[]#32201
    at sun.security.util.AlgorithmDecomposer.decompose(AlgorithmDecomposer.java:87)
    at sun.security.ssl.SSLAlgorithmDecomposer.decomposes(SSLAlgorithmDecomposer.java:142)
       local variable: sun.security.ssl.SSLCipher#2
       local variable: java.util.HashSet#16674
       local variable: java.util.HashSet#16674
    at sun.security.ssl.SSLAlgorithmDecomposer.decompose(SSLAlgorithmDecomposer.java:245)
       local variable: sun.security.ssl.SSLAlgorithmDecomposer#2
       local variable: sun.security.ssl.SSLCipher#2
       local variable: sun.security.ssl.CipherSuite$MacAlg#2
       local variable: sun.security.ssl.CipherSuite$HashAlg#2
       local variable: java.util.HashSet#16672
       local variable: java.util.HashSet#16672
    at sun.security.ssl.SSLAlgorithmDecomposer.decompose(SSLAlgorithmDecomposer.java:271)
    at sun.security.util.AbstractAlgorithmConstraints.checkAlgorithm(AbstractAlgorithmConstraints.java:95)
       local variable: java.lang.String#144366
       local variable: sun.security.ssl.SSLAlgorithmDecomposer#2
       local variable: java.lang.String[]#637
       local variable: java.lang.String#146735
    at sun.security.util.DisabledAlgorithmConstraints.permits(DisabledAlgorithmConstraints.java:110)
       local variable: sun.security.util.DisabledAlgorithmConstraints#2
       local variable: java.lang.String#144366
    at sun.security.ssl.SSLAlgorithmConstraints.permits(SSLAlgorithmConstraints.java:159)
       local variable: sun.security.ssl.SSLAlgorithmConstraints#714
       local variable: java.util.RegularEnumSet#19930
       local variable: java.lang.String#144366
    at sun.security.ssl.HandshakeContext.isActivatable(HandshakeContext.java:514)
       local variable: sun.security.ssl.CipherSuite#4
       local variable: sun.security.ssl.SSLAlgorithmConstraints#714
       local variable: java.util.EnumMap#123
    at sun.security.ssl.HandshakeContext.getActiveCipherSuites(HandshakeContext.java:338)
       local variable: java.util.Collections$UnmodifiableRandomAccessList#31133
       local variable: sun.security.ssl.SSLAlgorithmConstraints#714
       local variable: java.util.LinkedList#89983
       local variable: java.util.EnumMap#123
       local variable: java.util.Collections$UnmodifiableCollection$1#2
       local variable: sun.security.ssl.CipherSuite#4
    at sun.security.ssl.HandshakeContext.<init>(HandshakeContext.java:184)
       local variable: sun.security.ssl.ServerHandshakeContext#712
       local variable: sun.security.ssl.TransportContext#8228
       local variable: sun.security.ssl.ServerHandshakeContext#712
    at sun.security.ssl.ServerHandshakeContext.<init>(ServerHandshakeContext.java:62)
       local variable: sun.security.ssl.ServerHandshakeContext#712
    at sun.security.ssl.TransportContext.kickstart(TransportContext.java:223)
       local variable: sun.security.ssl.TransportContext#8228
       local variable: sun.security.ssl.TransportContext#8228
       local variable: sun.security.ssl.ServerHandshakeContext#712
    at sun.security.ssl.SSLEngineImpl.beginHandshake(SSLEngineImpl.java:103)
       local variable: sun.security.ssl.SSLEngineImpl#8224
    at org.glassfish.grizzly.ssl.SSLBaseFilter$SSLTransportFilterWrapper.handleRead(SSLBaseFilter.java:1117)
       local variable: org.glassfish.grizzly.ssl.SSLBaseFilter$SSLTransportFilterWrapper#2
       local variable: org.glassfish.grizzly.filterchain.FilterChainContext#196
       local variable: org.glassfish.grizzly.nio.transport.TCPNIOConnection#7955
       local variable: org.glassfish.grizzly.ssl.SSLConnectionContext#8223
       local variable: sun.security.ssl.SSLEngineImpl#8224
    at org.glassfish.grizzly.filterchain.ExecutorResolver$9.execute(ExecutorResolver.java:95)
    at org.glassfish.grizzly.filterchain.DefaultFilterChain.executeFilter(DefaultFilterChain.java:260)
       local variable: org.glassfish.grizzly.filterchain.ExecutorResolver$9#1
       local variable: org.glassfish.grizzly.ssl.SSLBaseFilter$SSLTransportFilterWrapper#2
       local variable: org.glassfish.grizzly.filterchain.FilterChainContext#196
    at org.glassfish.grizzly.filterchain.DefaultFilterChain.executeChainPart(DefaultFilterChain.java:177)
       local variable: org.glassfish.grizzly.filterchain.DefaultFilterChain#7
       local variable: org.glassfish.grizzly.filterchain.FilterChainContext#196
       local variable: org.glassfish.grizzly.filterchain.ExecutorResolver$9#1
       local variable: org.glassfish.grizzly.filterchain.DefaultFilterChain$FiltersState#22943
       local variable: org.glassfish.grizzly.ssl.SSLBaseFilter$SSLTransportFilterWrapper#2
    at org.glassfish.grizzly.filterchain.DefaultFilterChain.execute(DefaultFilterChain.java:109)
       local variable: org.glassfish.grizzly.filterchain.DefaultFilterChain#7
       local variable: org.glassfish.grizzly.filterchain.FilterChainContext#196
       local variable: org.glassfish.grizzly.filterchain.ExecutorResolver$9#1
       local variable: org.glassfish.grizzly.filterchain.DefaultFilterChain$FiltersState#22943
    at org.glassfish.grizzly.filterchain.DefaultFilterChain.process(DefaultFilterChain.java:88)
    at org.glassfish.grizzly.ProcessorExecutor.execute(ProcessorExecutor.java:53)
       local variable: org.glassfish.grizzly.filterchain.InternalContextImpl#196
    at org.glassfish.grizzly.nio.transport.TCPNIOTransport.fireIOEvent(TCPNIOTransport.java:524)
    at org.glassfish.grizzly.strategies.AbstractIOStrategy.fireIOEvent(AbstractIOStrategy.java:89)
       local variable: org.glassfish.grizzly.nio.transport.TCPNIOConnection#7955
       local variable: java.util.logging.Logger#28
    at org.glassfish.grizzly.strategies.WorkerThreadIOStrategy.run0(WorkerThreadIOStrategy.java:94)
    at org.glassfish.grizzly.strategies.WorkerThreadIOStrategy.access$100(WorkerThreadIOStrategy.java:33)
    at org.glassfish.grizzly.strategies.WorkerThreadIOStrategy$WorkerThreadRunnable.run(WorkerThreadIOStrategy.java:114)
    at org.glassfish.grizzly.threadpool.AbstractThreadPool$Worker.doWork(AbstractThreadPool.java:569)
       local variable: org.glassfish.grizzly.threadpool.SyncThreadPool$SyncThreadWorker#27
       local variable: org.glassfish.grizzly.threadpool.DefaultWorkerThread#35
       local variable: org.glassfish.grizzly.strategies.WorkerThreadIOStrategy$WorkerThreadRunnable#20
    at org.glassfish.grizzly.threadpool.AbstractThreadPool$Worker.run(AbstractThreadPool.java:549)
       local variable: org.glassfish.grizzly.threadpool.SyncThreadPool$SyncThreadWorker#27
    at java.lang.Thread.run(Thread.java:834)


Ryan

Christoph John

unread,
Jun 30, 2021, 7:06:47 PM6/30/21
to Ryan de Laplante, Payara Forum
BTW, are you really requiring HTTP/2 to be turned on? Did you try without? There really seems to be
a leak that looks a little like the one you observed:

https://github.com/eclipse-ee4j/glassfish/issues/22539
https://github.com/eclipse-ee4j/grizzly/pull/2015
https://github.com/eclipse-ee4j/glassfish/pull/23003

Looks like it will not be resolved before Payara 6, so disabling HTTP/2 seems to be the only option.

Cheers,
Chris.

Ryan de Laplante

unread,
Jul 1, 2021, 10:20:59 AM7/1/21
to Payara Forum
Wow Chris, this is a great find, thank you!   As far as I know we do not need HTTP/2, so I will try disabling it and see if that resolves the issue.  Hopefully the other issues we've been seeing in the logs are just symptoms of this memory leak.

Regarding the cookies issue, I noticed that there are actually only 150 javax.servlet.http.Cookie instances.  It is org.glassfish.grizzly.http.Cookie that has 235,516 instances.  Hoping that is also somehow related to the memory leak.


Thanks,
Ryan

Steve Millidge

unread,
Jul 1, 2021, 10:37:59 AM7/1/21
to Payara Forum
Just a heads up. Those upstream patches were contributed by Payara to Grizzly. We maintain a patched version of Grizzly in Payara so those fixes will be in the latest versions no need to wait for Grizzly 3.0.0. We aren't forked from Eclipse GlassFish so no need to wait for Payara 6. 

As an aside I would recommend Eclipse Memory Analyzer Open Source Project | The Eclipse Foundation for loading your heap dump and then look in the dominator tree to see what is retaining the most heap. Alternatively look in the Leak Suspects report.

Steve



Ryan de Laplante (Personal)

unread,
Jul 1, 2021, 10:58:12 AM7/1/21
to payara...@googlegroups.com
Thank you Steve.  This server is on Payara 5.2020.5.    I'll see if we can get permission to upgrade Payara, otherwise we'll try disabling HTTP/2 for now. 

Thanks for the recommendation of the Eclipse Memory Analyzer.  I will give that a try and get back to you tomorrow (today is a holiday in Canada)


Thanks,
Ryan
--
You received this message because you are subscribed to a topic in the Google Groups "Payara Forum" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/payara-forum/KTgFuW1zsg4/unsubscribe.
To unsubscribe from this group and all its topics, send an email to payara-forum...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/payara-forum/3f89a173-c7a3-4472-99a8-7c1a491234fan%40googlegroups.com.

Steve Millidge

unread,
Jul 1, 2021, 11:11:16 AM7/1/21
to Payara Forum
The referenced fix is quite old. I suspect you will have it in that version though can't be certain without tracking back through the commits.

5.2020.5 ships with Grizzly  2.4.4.payara-p4 this is our maintenance branch  payara/patched-src-grizzly at 2.4.4.payara-maintenance (github.com) 

Ryan de Laplante

unread,
Jul 2, 2021, 9:33:18 AM7/2/21
to Payara Forum
Hi Steve,

I ran Eclipse Memory Analyzer on the heap dump and looked at the leak suspects. There was only one suspect.  I pasted some of the info below, and attached a PDF with more detail, including the dominator tree.

----
Problem Suspect 1

One instance of java.util.WeakHashMap loaded by <system class loader> occupies 1,788,579,160 (83.96%) bytes. The instance is referenced by org.glassfish.grizzly.threadpool.DefaultWorkerThread @ 0x82e86c28 http-thread-pool::http-listener-2(1) , loaded by org.apache.felix.framework.BundleWiringImpl$BundleClassLoader @ 0x82368338.

The thread org.glassfish.grizzly.threadpool.DefaultWorkerThread @ 0x82e86c28 http-thread-pool::http-listener-2(1) keeps local variables with total size 246,592 (0.01%) bytes.

The memory is accumulated in one instance of java.util.TreeMap$Entry, loaded by <system class loader>, which occupies 1,786,697,736 (83.87%) bytes.
----

Thread Stack
http-thread-pool::http-listener-2(1) at org.glassfish.grizzly.http2.AlpnSupport.setConnection(Ljavax/net/ssl/SSLEngine;Lorg/glassfish/grizzly/Connection;)V (AlpnSupport.java:77) at org.glassfish.grizzly.http2.AlpnSupport.access$500(Ljavax/net/ssl/SSLEngine;Lorg/glassfish/grizzly/Connection;)V (AlpnSupport.java:46) at org.glassfish.grizzly.http2.AlpnSupport$1.onStart(Lorg/glassfish/grizzly/Connection;)V (AlpnSupport.java:146) at org.glassfish.grizzly.ssl.SSLBaseFilter.notifyHandshakeStart(Lorg/glassfish/grizzly/Connection;)V (SSLBaseFilter.java:996) at org.glassfish.grizzly.ssl.SSLBaseFilter$SSLTransportFilterWrapper.handleRead(Lorg/glassfish/grizzly/filterchain/FilterChainContext;)Lorg/glassfish/grizzly/filterchain/NextAction; (SSLBaseFilter.java:1119) at org.glassfish.grizzly.filterchain.ExecutorResolver$9.execute(Lorg/glassfish/grizzly/filterchain/Filter;Lorg/glassfish/grizzly/filterchain/FilterChainContext;)Lorg/glassfish/grizzly/filterchain/NextAction; (ExecutorResolver.java:95) at org.glassfish.grizzly.filterchain.DefaultFilterChain.executeFilter(Lorg/glassfish/grizzly/filterchain/FilterExecutor;Lorg/glassfish/grizzly/filterchain/Filter;Lorg/glassfish/grizzly/filterchain/FilterChainContext;)Lorg/glassfish/grizzly/filterchain/NextAction; (DefaultFilterChain.java:260) at org.glassfish.grizzly.filterchain.DefaultFilterChain.executeChainPart(Lorg/glassfish/grizzly/filterchain/FilterChainContext;Lorg/glassfish/grizzly/filterchain/FilterExecutor;IILorg/glassfish/grizzly/filterchain/DefaultFilterChain$FiltersState;)Lorg/glassfish/grizzly/filterchain/DefaultFilterChain$FilterExecution; (DefaultFilterChain.java:177) at org.glassfish.grizzly.filterchain.DefaultFilterChain.execute(Lorg/glassfish/grizzly/filterchain/FilterChainContext;)Lorg/glassfish/grizzly/ProcessorResult; (DefaultFilterChain.java:109) at org.glassfish.grizzly.filterchain.DefaultFilterChain.process(Lorg/glassfish/grizzly/Context;)Lorg/glassfish/grizzly/ProcessorResult; (DefaultFilterChain.java:88) at org.glassfish.grizzly.ProcessorExecutor.execute(Lorg/glassfish/grizzly/Context;)V (ProcessorExecutor.java:53) at org.glassfish.grizzly.nio.transport.TCPNIOTransport.fireIOEvent(Lorg/glassfish/grizzly/IOEvent;Lorg/glassfish/grizzly/Connection;Lorg/glassfish/grizzly/IOEventLifeCycleListener;)V (TCPNIOTransport.java:524) at org.glassfish.grizzly.strategies.AbstractIOStrategy.fireIOEvent(Lorg/glassfish/grizzly/Connection;Lorg/glassfish/grizzly/IOEvent;Lorg/glassfish/grizzly/IOEventLifeCycleListener;Ljava/util/logging/Logger;)V (AbstractIOStrategy.java:89) at org.glassfish.grizzly.strategies.WorkerThreadIOStrategy.run0(Lorg/glassfish/grizzly/Connection;Lorg/glassfish/grizzly/IOEvent;Lorg/glassfish/grizzly/IOEventLifeCycleListener;)V (WorkerThreadIOStrategy.java:94) at org.glassfish.grizzly.strategies.WorkerThreadIOStrategy.access$100(Lorg/glassfish/grizzly/Connection;Lorg/glassfish/grizzly/IOEvent;Lorg/glassfish/grizzly/IOEventLifeCycleListener;)V (WorkerThreadIOStrategy.java:33) at org.glassfish.grizzly.strategies.WorkerThreadIOStrategy$WorkerThreadRunnable.run()V (WorkerThreadIOStrategy.java:114) at org.glassfish.grizzly.threadpool.AbstractThreadPool$Worker.doWork()V (AbstractThreadPool.java:569) at org.glassfish.grizzly.threadpool.AbstractThreadPool$Worker.run()V (AbstractThreadPool.java:549) at java.lang.Thread.run()V (Thread.java:834)


Thanks,
Ryan
leak suspect 2021-06-24.pdf
Reply all
Reply to author
Forward
0 new messages