STOMP clients looses channel on server if LDAP connection is lost

439 views
Skip to first unread message

John F

unread,
Jan 23, 2015, 8:54:13 AM1/23/15
to rabbitm...@googlegroups.com
Our setup:
A STOMP client connects to RabbitMQ and RabbitMQ authenticates operations (using LDAP plugin) against LDAP.
RabbitMQ will check client credentials and create a connection and a channel.
Client and server starts sending heartbeats back and forth to each other to maintain connection.
Client starts publishing to an exchange.

All is fine so far.

Client publish a new message as before, but now RabbitMQ (for any reason) can't validate user permission against LDAP (it might be down), so it sends client the following STOMP frame:

ERROR
message:Processing error
content-type:text/plain
version:1.0,1.1,1.2
content-length:16

Processing error.

and shuts down the auto-generates channel that previously was created when the connection was set up.
Client and server still sends heartbeats back and forth to each other so connection is still maintained.
Client still thinks it is connected to RabbitMQ, however, now it retries publishing message but it never gets to RabbitMQ.

My questions:

a) Should a STOMP client always close connection and reconnect if it receives an ERROR frame (maybe only "Processing error"s?

b) Is it an error inside the STOMP plugin on RabbitMQ? Should it have also shutdown the connection or should it not have shutdown the channel at all?

c) .... other?

Michael Klishin

unread,
Jan 23, 2015, 9:13:07 AM1/23/15
to rabbitm...@googlegroups.com, John F
Yes but clients ideally should have a sensible number of attempts before
they give up.

> b) Is it an error inside the STOMP plugin on RabbitMQ? Should it
> have also shutdown the connection or should it not have shutdown
> the channel at all?

If RabbitMQ cannot authorise an operation, it should
notify the client (in protocols that have authorisation notifications for clients, e.g. STOMP)
or drop the connection (in less, ahem, well thought out protocols, e.g. MQTT).

However, per STOMP 1.2 spec [1] the server must close connection. If that doesn't happen
or RabbitMQ does not release its underlying "internal connection", those are definitely considered
bugs by our team.

Please take a look at RabbitMQ log files, there should be errors logged. Understanding what exactly
they are would help us reproduce the issue quicker.

Thank you for reporting this.

1. https://stomp.github.io/stomp-specification-1.2.html#ERROR
--
MK

Staff Software Engineer, Pivotal/RabbitMQ

John F

unread,
Jan 26, 2015, 5:21:45 AM1/26/15
to rabbitm...@googlegroups.com, john.f...@gmail.com

> a) Should a STOMP client always close connection and reconnect  
> if it receives an ERROR frame (maybe only "Processing error"s?  

Yes but clients ideally should have a sensible number of attempts before
they give up.

Okay. We will look into a better solution for retries.

 
> b) Is it an error inside the STOMP plugin on RabbitMQ? Should it  
> have also shutdown the connection or should it not have shutdown  
> the channel at all?

If RabbitMQ cannot authorise an operation, it should
notify the client (in protocols that have authorisation notifications for clients, e.g. STOMP)
or drop the connection (in less, ahem, well thought out protocols, e.g. MQTT).

However, per STOMP 1.2 spec [1] the server must close connection. If that doesn't happen
or RabbitMQ does not release its underlying "internal connection", those are definitely considered
bugs by our team.

Please take a look at RabbitMQ log files, there should be errors logged. Understanding what exactly
they are would help us reproduce the issue quicker.

We're currently running RabbitMQ 3.3.5 with Erlang R16B03 and we can recreate this issue just by shutting down the LDAP server when a STOMP client has successfully connected.

I haven't been able to find a release note for the RabbitMQ versions after 3.3.5 so I'm not able to see if this connection problem should have been solved in a newer version. Do you know where to find more information about fixes than what is written on the /news.html page?

Michael Klishin

unread,
Jan 26, 2015, 5:24:30 AM1/26/15
to rabbitm...@googlegroups.com, John F
On 26 January 2015 at 13:21:47, John F (john.f...@gmail.com) wrote:
> Do you know where to find more information about fixes than what
> is written on the /news.html page?

http://www.rabbitmq.com/changelog.html

John F

unread,
Jan 26, 2015, 7:12:34 AM1/26/15
to rabbitm...@googlegroups.com, john.f...@gmail.com
http://www.rabbitmq.com/changelog.html

Thanks for the link. I have checked all versions from 3.3.5 and up to the latest and none of them seems to fix this error.

We have tried this on version 3.3.5 and also 3.4.1 (our most current version).

Michael Klishin

unread,
Jan 26, 2015, 1:35:30 PM1/26/15
to rabbitm...@googlegroups.com, John F


On 26 January 2015 at 15:12:36, John F (john.f...@gmail.com) wrote:
> > Thanks for the link. I have checked all versions from 3.3.5 and
> up to the latest and none of them seems to fix this error.
>
> We have tried this on version 3.3.5 and also 3.4.1 (our most current
> version).

John,

I so far cannot reproduce the problem (I'm not using LDAP but there are ways to inject exceptions
around authorisation without it). What's in your log files (both regular and SASL) when the issue happens?

Thanks.

John F

unread,
Jan 27, 2015, 2:24:28 AM1/27/15
to rabbitm...@googlegroups.com, john.f...@gmail.com

I so far cannot reproduce the problem (I'm not using LDAP but there are ways to inject exceptions
around authorisation without it). What's in your log files (both regular and SASL) when the issue happens?

This is from the log:

=INFO REPORT==== 23-Jan-2015::12:11:47 ===
accepting STOMP connection <0.12481.59> (105.167.7.6:41677 -> 192.168.30.8:15677)

=INFO REPORT==== 23-Jan-2015::12:11:47 ===
LDAP CHECK: login for 0080F48F80A1

=INFO REPORT==== 23-Jan-2015::12:11:47 ===
        LDAP filling template "macAddress=${username},ou=group,o=acme" with
            [{username,<<"0080F48F80A1">>}]

=INFO REPORT==== 23-Jan-2015::12:11:47 ===
        LDAP template result: "macAddress=0080F48F80A1,ou=group,o=acme"

=ERROR REPORT==== 23-Jan-2015::12:11:49 ===
STOMP error frame sent:
Message: "Processing error"
Detail: "Processing error"
Server private detail: {noproc,
                           {gen_server,call,
                               [<0.19248.58>,next_publish_seqno,infinity]}}

=ERROR REPORT==== 23-Jan-2015::12:11:50 ===
STOMP error frame sent:
Message: "Processing error"
Detail: "Processing error"
Server private detail: {noproc,
                           {gen_server,call,
                               [<0.13667.56>,next_publish_seqno,infinity]}}

=ERROR REPORT==== 23-Jan-2015::12:11:51 ===
STOMP error frame sent:
Message: "Processing error"
Detail: "Processing error"
Server private detail: {noproc,
                           {gen_server,call,
                               [<0.22517.57>,next_publish_seqno,infinity]}}

=ERROR REPORT==== 23-Jan-2015::12:11:52 ===
STOMP error frame sent:
Message: "Processing error"
Detail: "Processing error"
Server private detail: {noproc,
                           {gen_server,call,
                               [<0.19248.58>,next_publish_seqno,infinity]}}

=ERROR REPORT==== 23-Jan-2015::12:11:52 ===
STOMP error frame sent:
Message: "Processing error"
Detail: "Processing error"
Server private detail: {noproc,
                           {gen_server,call,
                               [<0.9976.59>,next_publish_seqno,infinity]}}

=ERROR REPORT==== 23-Jan-2015::12:11:53 ===
STOMP error frame sent:
Message: "Processing error"
Detail: "Processing error"
Server private detail: {noproc,
                           {gen_server,call,
                               [<0.8894.57>,next_publish_seqno,infinity]}}

=ERROR REPORT==== 23-Jan-2015::12:11:55 ===
STOMP error frame sent:
Message: "Processing error"
Detail: "Processing error"
Server private detail: {noproc,
                           {gen_server,call,
                               [<0.19248.58>,next_publish_seqno,infinity]}}

=INFO REPORT==== 23-Jan-2015::12:11:56 ===
closing STOMP connection <0.12481.59> (105.167.7.6:41677 -> 192.168.30.8:15677)

There is nothing in the SASL log.

John F

unread,
Jan 27, 2015, 2:27:57 AM1/27/15
to rabbitm...@googlegroups.com, john.f...@gmail.com
Also found this:

=INFO REPORT==== 23-Jan-2015::12:12:45 ===
accepting STOMP connection <0.12557.59> (105.167.7.6:41693 -> 192.168.30.8:15677)

=INFO REPORT==== 23-Jan-2015::12:12:45 ===
LDAP CHECK: login for 0080F48F80A1

=INFO REPORT==== 23-Jan-2015::12:12:45 ===
        LDAP filling template "macAddress=${username},ou=group,o=acme" with
            [{username,<<"0080F48F80A1">>}]

=INFO REPORT==== 23-Jan-2015::12:12:45 ===
        LDAP template result: "macAddress=0080F48F80A1,ou=group,o=acme"

=ERROR REPORT==== 23-Jan-2015::12:12:46 ===
STOMP error frame sent:
Message: "Processing error"
Detail: "Processing error"
Server private detail: {noproc,
                           {gen_server,call,
                               [<0.19248.58>,next_publish_seqno,infinity]}}

=ERROR REPORT==== 23-Jan-2015::12:12:49 ===
STOMP error frame sent:
Message: "Processing error"
Detail: "Processing error"
Server private detail: {noproc,
                           {gen_server,call,
                               [<0.19248.58>,next_publish_seqno,infinity]}}

=ERROR REPORT==== 23-Jan-2015::12:12:50 ===
STOMP error frame sent:
Message: "Processing error"
Detail: "Processing error"
Server private detail: {noproc,
                           {gen_server,call,
                               [<0.13667.56>,next_publish_seqno,infinity]}}

=ERROR REPORT==== 23-Jan-2015::12:12:51 ===
STOMP error frame sent:
Message: "Processing error"
Detail: "Processing error"
Server private detail: {noproc,
                           {gen_server,call,
                               [<0.22517.57>,next_publish_seqno,infinity]}}

=ERROR REPORT==== 23-Jan-2015::12:12:52 ===
STOMP error frame sent:
Message: "Processing error"
Detail: "Processing error"
Server private detail: {noproc,
                           {gen_server,call,
                               [<0.19248.58>,next_publish_seqno,infinity]}}

=ERROR REPORT==== 23-Jan-2015::12:12:52 ===
STOMP error frame sent:
Message: "Processing error"
Detail: "Processing error"
Server private detail: {noproc,
                           {gen_server,call,
                               [<0.9976.59>,next_publish_seqno,infinity]}}

=INFO REPORT==== 23-Jan-2015::12:12:53 ===
    LDAP connect error: {error,"connect failed"}

=INFO REPORT==== 23-Jan-2015::12:12:53 ===
LDAP DECISION: login for 0080F48F80A1: {error,"connect failed"}

=WARNING REPORT==== 23-Jan-2015::12:12:53 ===
STOMP login failed for user "0080F48F80A1"

=ERROR REPORT==== 23-Jan-2015::12:12:53 ===
STOMP error frame sent:
Message: "Bad CONNECT"
Detail: "Access refused for user '0080F48F80A1'\n"
Server private detail: none

=ERROR REPORT==== 23-Jan-2015::12:12:53 ===
STOMP error frame sent:
Message: "Processing error"
Detail: "Processing error"
Server private detail: {noproc,
                           {gen_server,call,
                               [<0.8894.57>,next_publish_seqno,infinity]}}

=INFO REPORT==== 23-Jan-2015::12:12:54 ===
closing STOMP connection <0.12557.59> (105.167.7.6:41693 -> 192.168.30.8:15677)

Michael Klishin

unread,
Jan 27, 2015, 4:17:08 AM1/27/15
to John F, rabbitm...@googlegroups.com
This means the error was handled and an ERROR frame was sent. For me the connection STOMP plugin uses is then closed (per management UI). Are you observations different or you observe that STOMP *client*'s TCP connection is still open (I know, this is a bit confusing: RabbitMQ STOMP plugin builds on top of RabbitMQ's core protocol and thus acts as a "smart proxy").

Thanks.

MK

John F

unread,
Jan 27, 2015, 6:10:56 AM1/27/15
to rabbitm...@googlegroups.com, john.f...@gmail.com
 
This means the error was handled and an ERROR frame was sent. For me the connection STOMP plugin uses is then closed (per management UI). Are you observations different or you observe that STOMP *client*'s TCP connection is still open (I know, this is a bit confusing: RabbitMQ STOMP plugin builds on top of RabbitMQ's core protocol and thus acts as a "smart proxy").

Yes, the connection from STOMP client to RabbitMQ remains open (we can see it in HTTP Management interface) and heartbeats are still being sent from client to server and vice versa.
When the LDAP connection is lost we can see that when client sends a message, right after it will receive the ERROR frame and the channel (not the connection) will disappear from the HTTP Management interface).
This leaves the STOMP client in a state where it continues receiving heartbeats and can still send heartbeats (otherwise RabbitMQ would close the connection if it doesn't receive it), but no messages are delivered since the channel is down (which the client does not know anything about).

Michael Klishin

unread,
Jan 27, 2015, 6:17:28 AM1/27/15
to rabbitm...@googlegroups.com, John F
On 27 January 2015 at 14:10:57, John F (john.f...@gmail.com) wrote:
> Yes, the connection from STOMP client to RabbitMQ remains open
> (we can see it in HTTP Management interface) and heartbeats are
> still being sent from client to server and vice versa.
> When the LDAP connection is lost we can see that when client sends
> a message, right after it will receive the ERROR frame and the
> channel (not the connection) will disappear from the HTTP Management
> interface).

So on the 2nd exception the connection disappears from the management UI…

> This leaves the STOMP client in a state where it continues receiving
> heartbeats and can still send heartbeats (otherwise RabbitMQ
> would close the connection if it doesn't receive it), but no messages
> are delivered since the channel is down (which the client does
> not know anything about).

…but even then the client TCP connection remains open.

Thanks, we'll investigate more.

John F

unread,
Jan 27, 2015, 7:27:57 AM1/27/15
to rabbitm...@googlegroups.com, john.f...@gmail.com
Thanks, we'll investigate more.

Thanks :o)

Please do not hesitate to ask for further information. Also I'm able to change logging etc. in one of our environments if needed to log even further information. 

John F

unread,
Jan 28, 2015, 8:12:31 AM1/28/15
to rabbitm...@googlegroups.com, john.f...@gmail.com
We have talked about this further here at our company and there are two things that pops up:

1) Why does RabbitMQ even shutdown the channel when it can't anymore connect to LDAP to verify client has access to the given queue that it has subscribed for? Wouldn't it be better to just send out an ERROR frame with message "Unauthorized" or "Authorization failed"? I understand (and agrees) that it should not tell client anything about LDAP, just send an error.

2) Whenever RabbitMQ shuts down the channel it also removes the queue that the client was subscribed for which indicates to us that RabbitMQ either:
a) removes the queue because the channel is taken down, or
b) removes the channel because authorization is not allowed on the specific queue, so it is removed from the client (and thus also shuts down channel because there are no more bindings on it).

John F

unread,
Jan 28, 2015, 8:21:24 AM1/28/15
to rabbitm...@googlegroups.com, john.f...@gmail.com
1) Why does RabbitMQ even shutdown the channel when it can't anymore connect to LDAP to verify client has access to the given queue that it has subscribed for? Wouldn't it be better to just send out an ERROR frame with message "Unauthorized" or "Authorization failed"? I understand (and agrees) that it should not tell client anything about LDAP, just send an error.

Ahemm... I just looked at the STOMP 1.2 specification and can see that if an ERROR frame is sent to the client then server MUST close the connection:


However it SHOULD contain a short description of the error.

Michael Klishin

unread,
Jan 28, 2015, 8:23:12 AM1/28/15
to John F, rabbitm...@googlegroups.com
Channel process terminates with an error, that's why you see noproc (no process) in the error message. Per STOMP spec every error reported is followed by connection closure.

I cannot say what happens to the queue as it differs from destination to destination but likely the queue is exclusive so it is removed whet the connection that was using it is closed.

So what we need to do is to handle the exception from ldap gracefully and make sure connections on both ends are closed.

This is my understanding to date, I'm only coming back to work on this tomorrow.

MK

Michael Klishin

unread,
Jan 29, 2015, 10:22:04 PM1/29/15
to John F, rabbitm...@googlegroups.com
On 28 January 2015 at 16:23:09, Michael Klishin (mkli...@pivotal.io) wrote:
> Channel process terminates with an error, that's why you see
> noproc (no process) in the error message. Per STOMP spec every
> error reported is followed by connection closure.
>
> I cannot say what happens to the queue as it differs from destination
> to destination but likely the queue is exclusive so it is removed
> whet the connection that was using it is closed.
>
> So what we need to do is to handle the exception from ldap gracefully
> and make sure connections on both ends are closed.

John,

Would it be possible for you to build the plugin from source using branch
bug26553? It should properly notify a client with an ERROR frame. The error message
is currently not very specific ("AMQP channel has died" or similar) but that possibly can be improved
once we see that we're on the right track.

To build the plugin from bug26553, make sure you have Mercurial and then

 * hg clone http://hg.rabbitmq.com/rabbitmq-public-umbrella/
 * cd rabbitmq-public-umbrella
 * make up
 * make named_update BRANCH=stable
 * cd rabbitmq-stomp
 * hg checkout bug26553
 * make

More info can be found on http://www.rabbitmq.com/plugin-development.html.

If building is too much an effort but you can give this development version a try in your dev/QA
environment, I can also upload a compiled version to the list.

Otherwise I'll try setting up LDAP locally to properly reproduce the issue with and without our changes.

Thank you. 

John F

unread,
Jan 30, 2015, 9:46:08 AM1/30/15
to rabbitm...@googlegroups.com, john.f...@gmail.com
Hi Michael,


If building is too much an effort but you can give this development version a try in your dev/QA
environment, I can also upload a compiled version to the list.

if I could have you compile it and give me a URL to download it from I will be able to install it on monday or tuesday and test the same day and give feedback. That would be awsome.
 
Otherwise I'll try setting up LDAP locally to properly reproduce the issue with and without our changes.

No need for that right now.
 
Thank you.

No, thank you! :o)

Michael Klishin

unread,
Feb 3, 2015, 12:14:18 AM2/3/15
to rabbitm...@googlegroups.com, john.f...@gmail.com
On Friday, January 30, 2015 at 5:46:08 PM UTC+3, John F wrote:
Hi Michael,

If building is too much an effort but you can give this development version a try in your dev/QA
environment, I can also upload a compiled version to the list.

if I could have you compile it and give me a URL to download it from I will be able to install it on monday or tuesday and test the same day and give feedback. That would be awsome. 

Attaching to this thread. This is build against 3.4.x (current stable branch) with a possible fix.

Please give it a try. Thank you.

MK 
rabbitmq_stomp-0.0.0.ez

Michael Klishin

unread,
Feb 5, 2015, 5:57:52 AM2/5/15
to john.f...@gmail.com, rabbitm...@googlegroups.com
 On 3 February 2015 at 08:14:20, Michael Klishin (mkli...@pivotal.io) wrote:
> Attaching to this thread. This is build against 3.4.x (current
> stable branch) with a possible fix.

John,

Have you had a chance to try this build?

I'm setting up a LDAP environment to reproduce the issue but it takes time. We'd like to fix this
before releasing 3.4.4, which is due soon (say, early next week).

Thanks. 

John F

unread,
Feb 6, 2015, 2:52:26 AM2/6/15
to rabbitm...@googlegroups.com, john.f...@gmail.com
Have you had a chance to try this build?

Not yet, sorry!

I have talked to our server guys and they will upgrade our development environment from 3.3.5 to 3.4.3 today and then we will be able to try the plugin. So - hopefully - later today I can give you some feedback on it.

John F

unread,
Feb 6, 2015, 12:18:56 PM2/6/15
to rabbitm...@googlegroups.com, john.f...@gmail.com
Hi Michael,

we have now tested this STOMP plugin and it seems to work on our RabbitMQ 3.4.1. Unfortunately I haven't been allowed to upgrade our production environment from 3.4.1 to 3.4.3 so I can't say if it works on the latest release or not.

Thanks soo mucn for providing us with this plugin which we will keep in development and we will then upgrade all environments to the 3.4.4 (or what the next release will be called) release when it becomes available, if you can somehow acknowledge that this fix will be in there? ;o)

Thanks,
John

Michael Klishin

unread,
Feb 6, 2015, 12:31:58 PM2/6/15
to rabbitm...@googlegroups.com, John F
 On 6 February 2015 at 20:18:58, John F (john.f...@gmail.com) wrote:
> we have now tested this STOMP plugin and it seems to work on our
> RabbitMQ 3.4.1. Unfortunately I haven't been allowed to upgrade
> our production environment from 3.4.1 to 3.4.3 so I can't say
> if it works on the latest release or not.

3.4.1 and 3.4.3 should be identical as far as the fix for your particular issue goes.
So if the problem is resolved, don't rush the upgrade to 3.4.3.

> Thanks soo mucn for providing us with this plugin which we will
> keep in development and we will then upgrade all environments
> to the 3.4.4 (or what the next release will be called) release
> when it becomes available, if you can somehow acknowledge that
> this fix will be in there? ;o)

If it works for you it will. We were debating it just today. I'm still hopeful I can finish setting up
a LDAP environment to verify  by the time we release 3.4.4 next week.

John F

unread,
Feb 16, 2015, 4:13:34 AM2/16/15
to rabbitm...@googlegroups.com, john.f...@gmail.com
Hi Michael,
did get a chance of testing this plugin fix? Did it made it for the 3.4.4 release?

Michael Klishin

unread,
Feb 16, 2015, 4:15:43 AM2/16/15
to rabbitm...@googlegroups.com, John F
 On 16 February 2015 at 12:1 3:35, John F (john.f...@gmail.com) wrote:
> did get a chance of testing this plugin fix?

Yes, no connections left open for me.

> Did it made it for
> the 3.4.4 release?

It did but I see that it wasn't included into change log :/

Michael Klishin

unread,
Feb 16, 2015, 4:25:14 AM2/16/15
to rabbitm...@googlegroups.com, John F
  On 16 February 2015 at 12:15:41, Michael Klishin (mkli...@pivotal.io) wrote:
> It did but I see that it wasn't included into change log :/

Updated: http://www.rabbitmq.com/release-notes/README-3.4.4.txt.

John F

unread,
Feb 16, 2015, 6:15:32 AM2/16/15
to rabbitm...@googlegroups.com, john.f...@gmail.com

Yes, no connections left open for me.

Great. I assume that you also tested it with previous version where it kept the connection open.

John F

unread,
Feb 16, 2015, 6:17:01 AM2/16/15
to rabbitm...@googlegroups.com, john.f...@gmail.com

Updated: http://www.rabbitmq.com/release-notes/README-3.4.4.txt.

Cool. Thanks! I read the release note earlier today and was really unsure if the fix was included. So happy that it did. Thanks a lot for your quick fix and great communication here in the forum!
Reply all
Reply to author
Forward
0 new messages