I mixed and matched all the versions pretty much and the same thing
happens almost every single time on the windows 7 32 box
I used the Tracer tool and the tests cases don't hang but the
following exception is sometimes thrown.
1320191848535: <Tracer-3> ch#2 <- {#method<channel.close-ok>(),null,""}
1320191848535: <Tracer-3> ch#0 ->
{#method<connection.close>(reply-code=200, reply-text=OK, class-id=0,
method-id=0),null,""}
1320191848535: <Tracer-3> ch#0 <- {#method<connection.close-ok>(),null,""}
1320191848535: <Tracer-3> uncaught java.io.EOFException
at java.io.DataInputStream.readUnsignedByte(Unknown Source)
at com.rabbitmq.client.impl.Frame.readFrom(Frame.java:99)
at com.rabbitmq.tools.Tracer$DirectionHandler.readFrame(Tracer.java:271)
at com.rabbitmq.tools.Tracer$DirectionHandler.doFrame(Tracer.java:305)
at com.rabbitmq.tools.Tracer$DirectionHandler.run(Tracer.java:345)
at java.lang.Thread.run(Unknown Source)
The entire tracer log is here http://pastebin.com/ecp9ymsH .
I am pretty sure this is causing the hang but how can catch this
exception on the connection close method and not deadlock.
Thanks,
Benjamin Bennett
_______________________________________________
rabbitmq-discuss mailing list
rabbitmq...@lists.rabbitmq.com
https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/2011-November/015968.html
Do you think a good work around would be to put a connection timeout?
I was thinking about putting it in but I didn't exactly what that did,
don't want to risk breaking other code.
Thanks,
Benjamin Bennett
After investigation of our code, I'm now no longer convinced that
your hangs occur because we haven't trapped the EOFException
correctly. We ought to be handling it properly.
I cannot reproduce this problem here, yet.
Could you please do some more investigation for us? What would
be helpful is for you to take a java stack dump (use jstack) when
the hang occurs.
jstack -l <pid>
ought to be sufficient, with <pid> the process number of the client JVM.
That will help us to find out what is hung where.
(It would be most helpful if you could do this on the
latest release -- RabbitMQ 2.7.0, but an older one is fine.)
Steve Powell
st...@rabbitmq.com
[wrk: +44-2380-111-528] [mob: +44-7815-838-558]
On 3 Nov 2011, at 21:40, Benjamin Bennett wrote:
> http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/2011-November/015980.html
The tests hang for 20 minutes then the junit run kills the jvm for
timeout exception.
In the rabbitmq log it is just says the connection abruptly close at
the same time of the junit runner killed the unit tests.
I was going to attempt to put a timeout on the connection close method
but that really would be a hack.
The stacktrace is exactly the same as this one
http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/2011-August/014789.html
.
http://pastebin.com/u1Z0Qyrr
Also before the test cases are run I reinstall rabbitmq , delete all
vhosts and recreate them . So there should not be any memory issues .
The only thread that is in the RUNNABLE state is at the beginning of
the the stacktrace. I realize this is causing the hang but I don't
understand why no reply is getting through from rabbit.
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:129)
Searching the interwebs points to the remote server not responding.
http://javaeesupportpatterns.blogspot.com/2011/04/javanetsocketinputstreamsocketread0.html
I can hook up wireshark next to try to diagnose the problem , but I
thought tracer class would be sufficient.
On 13/11/11 20:50, Benjamin Bennett wrote:
> Here is the stacktrace from the running test cases. I just ran the
> test cases 4 times without a hang on the same vm , then last 3 times
> it hangs.
>
> The tests hang for 20 minutes
When that happens, please run 'rabbitmqctl report' and send us the output.
> I can hook up wireshark next to try to diagnose the problem , but I
> thought tracer class would be sufficient.
Since you were unable to reproduce the problem with the tracer,
wireshark is the next logical step.
Regards,
Matthias.
I was able to with tracer see beginning of thread, tracer threw an exception
On 14/11/11 00:24, Benjamin Bennett wrote:
> I was able to with tracer see beginning of thread, tracer threw an exception
That's not unusual. Neither the client or the tracer got stuck though,
which is the problem we are trying to investigate here.
The same stacktrace is happening onIy difference is different test
case is hanging.
I ran the same test cases for the exact same git revision on all are
build agents and the ran without issues , everything passed.
It is crap shoot on whether it is happens.
It is an SDC machine so it has all kinds of extra security GPO and
services . I am going to attempt to shut down all other services that
I can. And see if it has an benefit.
Ah, thank you. After some poking, that genuinely looks like a server
bug. Damn.
In order for it to happen you need the last channel close / close_ok to
overlap with the connection close / close_ok. With the Java client you
have to invoke Channel.close() and Connection.close() from different
threads to get this to happen, and still be unlucky.
You should be allowed to do this, but right now it's racy.
> I was going to attempt to put a timeout on the connection close method
> but that really would be a hack.
Indeed! Other slightly less hacky workarounds until we get this fixed:
* Invoke Channel.close() and Connection.close() from the same thread, or
otherwise ensure they don't overlap.
* Don't invoke Channel.close() if you know you're going to invoke
Connection.close() anyway.
Cheers, Simon
--
Simon MacMullen
RabbitMQ, VMware
I am using the spring amqp lib and it is doing the connection closing when the spring context is closed. I do not think it has a property to inject the hack. Also if you know any of the spring amqp devs. Having you telling them to check to make sure it is doing the way you have described will have much more authority than me.
I will probably hack the spring amqp lib for now
I was going to either change it to the following .
1) Remove the infinite wait on a connection close.
Or
Place a BlockingQueue on the channel closes during the close call.
The sychronize on the BlockingQueue for the connection close , of
which it cannot close the connection if a channel is currently be
closed.
The second is more code but it would keep the infinity timeout in
place will working around the deadlock issue.
It would save a lot of pain if people are using 2.7.1 and below.
>> rabbitmq...@lists.rabbitmq.com
>> https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
In general we do accept patches, although due to our Corporate Overlords
you would have to sign a contributor agreement.
I would be reluctant to merge such a patch though, since:
* It's to work around a server bug which will be fixed in the next release
* This bug has been around for most of a year without attracting much
attention
* Option 1) is ugly; option 2) is (somewhat) complicated.
BTW, I got a reply from the spring-amqp maintainer, Dave Syer:
> Spring AMQP doesn't explicitly invoke close() in different threads.
> There's nothing to stop it happening (as is the case with the Java
> client itself I suppose), but we actually hardly ever call
> Channel.close() so it is pretty unlikely. I would be interested to
> hear of a way to tickle a normal app into this behaviour.
So I still wonder if the threading thing is something you are doing in
your app.
Cheers, Simon
The stacktraces have no other threads running except spring-amqp
threads BlockingQueueConsumer and CachingConnectionFactory.
My only comment about attracting attention it is a race condition ,
and race conditions are the most painful bugs to have to deal with.
On my dev box I have only seen the issue one time in 200 runs that ran
over night.
On our test windows 7 vms it happens 50% of the time, sample size was 40 runs.
On our windows xp test vms it has never happened.
In our production we get a "The service failed to close in timely
manner" and doesn't shut down.
Finally if you are fixing it in the server , what about the previous
versions? I don't even think spring-amqp has migrated to 2.7 series
java client because it is backwards incompatible.
Thanks,
Benjamin Bennett
I'm afraid I can't really comment on spring-amqp - I don't know if Dave can?
But for the purpose of your tests maybe you could work around this by
not closing the channels at all?
> My only comment about attracting attention it is a race condition ,
> and race conditions are the most painful bugs to have to deal with.
Oh, absolutely.
> On my dev box I have only seen the issue one time in 200 runs that ran
> over night.
> On our test windows 7 vms it happens 50% of the time, sample size was 40 runs.
> On our windows xp test vms it has never happened.
>
> In our production we get a "The service failed to close in timely
> manner" and doesn't shut down.
>
> Finally if you are fixing it in the server , what about the previous
> versions? I don't even think spring-amqp has migrated to 2.7 series
> java client because it is backwards incompatible.
You don't need to use the same versions of client and server - just
upgrading the server should be fine.
I have a hack for now and just added a timeout to the spring-amqp lib
when it closes the connection.
Thanks for all your help. I will be looking for the release that fixes
this issue.
--
Sincerely,
Benjamin Bennett
314.246.0645
benbe...@gmail.com
"For a successful technology, reality must take precedence over public
relations, for Nature cannot be fooled."
Richard Feynman
Yes, it was fixed in 2.7.1.
Cheers, Simon
--
Simon MacMullen
RabbitMQ, VMware