Memory leak with OORT?

15 views
Skip to first unread message

Will Glass-Husain

unread,
Sep 8, 2022, 12:46:52 AM9/8/22
to cometd-dev
Hi,

We're running a system that embeds CometD, 2 nodes, with Oort.  (Cometd 5.0.9, though I understand we need to upgrade).  We use the Ack extension in CometD though this is disabled in the Oort config.

We are seeing heap size that grows over time, resulting in a need to restart the cometd node after a few days.  Investigating with a profiler, we see that most of the heap is in a single instance of ServerSessionImpl.

heap1.png

When I dig into each object, the large 1G sized session has a null user agent field, while the others have typical "Mozilla" etc user agents.

One quirk of our system is that we recently discovered due to a misconfiguration in our load balancer, almost all of our outside traffic goes to only one of the two cometd notes.  (In other words, it acts as a failover, not a true load balancer).

Any suggestions for where to go next in investigating this issue?  Are there any known problems with OORT that might result in uncontrolled growth in heap size?   Could a ServerSessionImpl without a user agent indicate that it's a connection using Oort from comet to comet node?

Thanks in advance,

WILL

Simone Bordet

unread,
Sep 9, 2022, 3:11:46 AM9/9/22
to comet...@googlegroups.com
Hi,

On Thu, Sep 8, 2022 at 6:46 AM Will Glass-Husain <wgl...@forio.com> wrote:
>
> Hi,
>
> We're running a system that embeds CometD, 2 nodes, with Oort. (Cometd 5.0.9, though I understand we need to upgrade). We use the Ack extension in CometD though this is disabled in the Oort config.

Is there a specific reason to disable the ack extension for Oort?

> We are seeing heap size that grows over time, resulting in a need to restart the cometd node after a few days. Investigating with a profiler, we see that most of the heap is in a single instance of ServerSessionImpl.
>
> When I dig into each object, the large 1G sized session has a null user agent field, while the others have typical "Mozilla" etc user agents.
>
> One quirk of our system is that we recently discovered due to a misconfiguration in our load balancer, almost all of our outside traffic goes to only one of the two cometd notes. (In other words, it acts as a failover, not a true load balancer).

Worth remembering that the load balancer should be sticky, see
https://docs.cometd.org/current/reference/#_java_oort.

> Any suggestions for where to go next in investigating this issue?

Would be interesting to know the session id of the faulty session, to
figure out if it is a remote session or not.

> Are there any known problems with OORT that might result in uncontrolled growth in heap size?

Not that I know.
We recently fixed a very rare issue that might lead to the same
problem (a never-expiring session that accumulates messages in its
queue), but I doubt it's your case.

> Could a ServerSessionImpl without a user agent indicate that it's a connection using Oort from comet to comet node?

Or a client that does not send the `User-Agent` string.
Oort nodes use Jetty's HttpClient that does send the `User-Agent` string.

Another thing that you can do is setup JMX integration:
https://docs.cometd.org/current/reference/#_java_server_jmx.

Once you have that, you can dump() the `BayeuxServerImpl` object and
get more detailed information.

--
Simone Bordet
---
Finally, no matter how good the architecture and design are,
to deliver bug-free software with optimal performance and reliability,
the implementation technique must be flawless. Victoria Livschitz

Will Glass-Husain

unread,
Sep 9, 2022, 10:55:16 AM9/9/22
to cometd-dev
Thanks for detailed note.  Some responses.

(1)
There's two large ServerSessionIml.  One has retained heap size of 1G, the other 30M.   Others (about 200) are 5K-20K

The _id fields of the two big ones include "oort", e.g. "oort_51lk8lnbdid2j11lx1srjzb4xgw"

(2)
Regarding the fact tht Oort isn't configured with the ack plugin.  There's no good reason for this.  The default is to have "ack" off in the oort servlet.  We didn't catch that it needed to be configured in cometd servlet and oort servlet.

(3) Can you point me to ticket or commit for this issue?

> We recently fixed a very rare issue that might lead to the same
> problem (a never-expiring session that accumulates messages in its
> queue), but I doubt it's your case.

Thanks, WILL

Simone Bordet

unread,
Sep 11, 2022, 12:27:01 PM9/11/22
to comet...@googlegroups.com
Hi,

On Fri, Sep 9, 2022 at 4:55 PM Will Glass-Husain <wgl...@forio.com> wrote:
>
> Thanks for detailed note. Some responses.
>
> (1)
> There's two large ServerSessionIml. One has retained heap size of 1G, the other 30M. Others (about 200) are 5K-20K
>
> The _id fields of the two big ones include "oort", e.g. "oort_51lk8lnbdid2j11lx1srjzb4xgw"

This is strange. CometD/Oort does not use this session as a subscriber
to channels, so it should not receive any message.
Are you using this session to subscribe to some channel in your application?

> (2)
> Regarding the fact tht Oort isn't configured with the ack plugin. There's no good reason for this. The default is to have "ack" off in the oort servlet. We didn't catch that it needed to be configured in cometd servlet and oort servlet.

Then I suggest you restore it to default.

> (3) Can you point me to ticket or commit for this issue?

https://github.com/cometd/cometd/issues/1215

I would dump the BayeuxServer to gather more information.

Will Glass-Husain

unread,
Sep 11, 2022, 11:55:30 PM9/11/22
to cometd-dev
Hi Simone,

Thanks again.

We were having oort observe all channels…

oort.observeChanel(“/**”);

…which may have been a mistake in understanding on our part, but the documentation for the method seems to indicate this is needed.    Can you clarify if this needed?

/**
* <p>Observes the given channel, registering to receive messages from
* the Oort comets connected to this Oort instance.</p>
* <p>Once observed, all
OortComet instances subscribe
* to the channel and will repeat any messages published to
* the local channel (with loop prevention), so that the
* messages are distributed to all Oort comet servers.</p>
*

This documentation suggests that the oort instance needs to subscribe to a channel to repeat the messages to other oort comets and, therefore, unsubscribed channels will not be repeated, and not seen by clients connected to other comets. Since we’re interested in messages from all channels being received by clients subscribed to those channels on any comet in the cluster, and we have no way, a priori, to determine which channels will be created, it seemed best to subscribe the oort to all channels in this way. However, this is the only channel interaction we have and we do not specifically subscribe oort to any channels beyond this observation.

 

As a correction regarding the Acknowledgement extension -- I made a mistake in my earlier comments.    The default for the OortServlet is to have acknowledgement *on* in the oort servlet, as clearly specified in the documentation and we have not changed this default.

Best, WILL

Simone Bordet

unread,
Sep 14, 2022, 4:21:24 PM9/14/22
to comet...@googlegroups.com
Hi,

On Mon, Sep 12, 2022 at 5:55 AM Will Glass-Husain <wgl...@forio.com> wrote:
>
> Hi Simone,
>
> Thanks again.
>
> We were having oort observe all channels…
>
> oort.observeChanel(“/**”);
>
> …which may have been a mistake in understanding on our part, but the documentation for the method seems to indicate this is needed. Can you clarify if this needed?
>
> /**
> * <p>Observes the given channel, registering to receive messages from
> * the Oort comets connected to this Oort instance.</p>
> * <p>Once observed, all OortComet instances subscribe
> * to the channel and will repeat any messages published to
> * the local channel (with loop prevention), so that the
> * messages are distributed to all Oort comet servers.</p>
> *
>
> This documentation suggests that the oort instance needs to subscribe to a channel to repeat the messages to other oort comets and, therefore, unsubscribed channels will not be repeated, and not seen by clients connected to other comets. Since we’re interested in messages from all channels being received by clients subscribed to those channels on any comet in the cluster, and we have no way, a priori, to determine which channels will be created, it seemed best to subscribe the oort to all channels in this way. However, this is the only channel interaction we have and we do not specifically subscribe oort to any channels beyond this observation.
>

Your interpretation is correct and should not cause the problems you are seeing.
I'm worried you might be using Oort.getOortSession() for something else.

Will Glass-Husain

unread,
Sep 14, 2022, 4:41:54 PM9/14/22
to cometd-dev
Hi,

Thanks for the assistance.     

We've recently set up a new system that does not have this problem.  When comparing the two, we have found some problems with the load balancing.   Oort is behind a pool of proxy servers and the ip hash we are using for load balancing is inconsistent.   Most of our users run with websockets.  That means when the user refreshes the browser and re-establishes the connection it may go to a different server.    I'm not exactly sure how this causes the memory problem, but it's the only difference we see.

We're going to go back to the original system, fix the load balancing, and then see if that solves the problem.  I'll report back if this is successful.

Thanks,
WILL

Will Glass-Husain

unread,
Sep 28, 2022, 2:02:19 PM9/28/22
to cometd-dev
Hi --

I wanted to follow up with this.   We went back to our standard configuration, with corrected load balancing.  This seems to have solved our problems.   

We're using IP hashing in nginx to evenly distribute incoming connections.   We have a load balanced proxy server in front of the nginx, and our hashing was incorrectly based on the proxy server, not on the original IP address.  This resulted in inconsistent load balancing for oort.   We fixed this by using a header X-Forwarded-For to indicate the original IP address and then hashed based on that header.

I'm not sure of the mechanism by which the inconsistent load balancing caused the memory leak, but we are happy to declare victory based on a week's worth of analysis after this change.

Simone Bordet

unread,
Sep 28, 2022, 2:23:14 PM9/28/22
to comet...@googlegroups.com
Hi,

On Wed, Sep 28, 2022 at 8:02 PM Will Glass-Husain <wgl...@forio.com> wrote:
>
> Hi --
>
> I wanted to follow up with this. We went back to our standard configuration, with corrected load balancing. This seems to have solved our problems.
>
> We're using IP hashing in nginx to evenly distribute incoming connections. We have a load balanced proxy server in front of the nginx, and our hashing was incorrectly based on the proxy server, not on the original IP address. This resulted in inconsistent load balancing for oort. We fixed this by using a header X-Forwarded-For to indicate the original IP address and then hashed based on that header.
>
> I'm not sure of the mechanism by which the inconsistent load balancing caused the memory leak, but we are happy to declare victory based on a week's worth of analysis after this change.

Great to hear!

Thanks for this follow-up!
Reply all
Reply to author
Forward
0 new messages