Getting through severe connection breakage

75 views
Skip to first unread message

Laurent Caillette

unread,
May 22, 2013, 11:44:42 AM5/22/13
to cometd...@googlegroups.com
Hi all,

I'd like my Java CometD client to reconnect after some kind of network outage and I've some questions about that.

I'm using CometD-2.6.0, the Java client and Web Sockets (but Long Poll expected to work, too).

There is authentication with a `SecurityPolicy` that verifies user's identity at handshake time and it works well.

I support an optional two-factor authentication. At CometD level, it relies on handshake message customization (as the documentation kindly explains). Two-factor authentication makes things special, because a reconnection after session loss must pop a dialog up, asking for a new code.

I'd like to simulate network outages on my laptop. First I did shut down the loopback interface (`sudo ifconfig lo0 down`). It's not completely unrealistic (seems that I get the same errors with a remote server, when stopping Wi-Fi) but I felt like missing more frequent errors, and this kind of low-level hacking doesn't fit in a JUnit test suite.

I decided to simulate the outage using DonsProxy ( http://donsproxy.moneybender.com ). DonsProxy looks good because it's a full Java proxy. I know that network problem simulators aren't as vicious as platform-dependant network stacks but this one might give a good start. The network outages I produce interactively:
- Stop the proxy for one second, send a CometD message, then restart it.
- Stop the proxy for one minute, then restart it.
- Close the connection for one second, send a CometD message, then reopen it.
- Close the connection for one minute, then reopen it.

I'm using CometD default options (as far as I know the timeout is 30 s).

CometD doesn't recover after a one-minute outage nor after a proxy interruption.

After closing the connection from DonsProxy and sending a CometD message, I get `META_CONNECT` notifications with `successful->false`. When I reopen the connection I get a lot of `META_CONNECT' notifications with `successful->true` (about one every 10 s) but any further attempt to send messages to the server fails. Of course the debugger may break the timings.

Before hacking stupid things here I'm asking: How to get notified that network is available again, //but// the session has expired? With such a hook I could disconnect and run a full reconnection,with or without two-factor authentication.

By the way, does my proxy-based network error simulation make sense? If not, how to automate deterministic network outages in full Java?

Thanks for this great product, and all the discussion on this list.

c.

Simone Bordet

unread,
May 22, 2013, 12:29:01 PM5/22/13
to cometd-users
Hi,
I am surprised.
We have tests that simulate this behavior in the test suite.
I would need much more details such as client and server logs.

> After closing the connection from DonsProxy and sending a CometD message, I
> get `META_CONNECT` notifications with `successful->false`. When I reopen the
> connection I get a lot of `META_CONNECT' notifications with
> `successful->true` (about one every 10 s) but any further attempt to send
> messages to the server fails. Of course the debugger may break the timings.
>
> Before hacking stupid things here I'm asking: How to get notified that
> network is available again, //but// the session has expired? With such a
> hook I could disconnect and run a full reconnection,with or without
> two-factor authentication.

For this you flip a flag when you see a /meta/connect failed, and when
you see a /meta/connect successful, then the connection is up again.

> By the way, does my proxy-based network error simulation make sense? If not,
> how to automate deterministic network outages in full Java?

We have tests where we stop the server.

In a proxy case where you close the upstream connections, but not the
downstream ones, the client may think it is connected when in fact
it's not.
However, in this case the response will never arrive, and the
maxNetworkDelay mechanism will kick in, trying to recover the
connectivity.

I think CometD is covered pretty well in case of disconnections, but
perhaps you have a special case that we need to cover better.
Please send more detailed information.

--
Simone Bordet
----
http://cometd.org
http://webtide.com
http://intalio.com
Developer advice, training, services and support
from the Jetty & CometD experts.
Intalio, the modern way to build business applications.

Laurent Caillette

unread,
May 23, 2013, 2:08:13 PM5/23/13
to cometd...@googlegroups.com
Thanks Simone for the answer.

>> network is available again, //but// the session has expired? With such a
>> hook I could disconnect and run a full reconnection,with or without
>> two-factor authentication.

> For this you flip a flag when you see a /meta/connect failed, and when
> you see a /meta/connect successful, then the connection is up again.

OK I just wanted to be sure to not miss some API part.


>> By the way, does my proxy-based network error simulation make sense? If not,
>> how to automate deterministic network outages in full Java?

> We have tests where we stop the server.

> In a proxy case where you close the upstream connections, but not the
> downstream ones, the client may think it is connected when in fact
> it's not.

I think there is an exception breaking something, in a way that leaves CometD in a bad state.

I'm attaching the logs. Server runs on port 8888. DonsProxy runs on port 8889. After one successful "Hello, world" message (the client calls one service echoing the parameter) I'm closing the connection done through DonsProxy around 19:46:10. I wait for 1 minute. I reopen the connection around 19:47:20. Reconnection happens (`State update: CONNECTED -> CONNECTED`) but I can't send any more "Hello" message and I get many `o.e.jetty.client.AsyncHttpConnection - finally null` in the client log.

I think that server restart is more gentle than a network problem ("problem" is vague enough). Friends are telling me that network problems can be platform-dependant but one thing at a time.

I started mavenizing DonsProxy. I'll try to reproduce the behavior above programmatically.

Regards,

c.
shoemaker-client.log
shoemaker-server.log

Simone Bordet

unread,
May 25, 2013, 12:44:26 PM5/25/13
to cometd-users
Hi,

On Thu, May 23, 2013 at 8:08 PM, Laurent Caillette
<laurent....@gmail.com> wrote:
> I think there is an exception breaking something, in a way that leaves
> CometD in a bad state.

That's correct.
From the client logs I see that the initial connection was using the
WebSocket transport.
Then the connection broke, re-connect attempts happen, until one of
those succeeds using the WebSocket transport, at 19:47:32.377.
This connect attempt is replied with a 402::Unknown client, as
expected, at 19:47:32.379, with an advice to re-handshake.
Then the client re-handshakes, but uses the LongPolling transport
instead of the WebSocket transport.
From there on, I see /meta/connect messages that return immediately
(not held despite a timeout=10000) with an interval=2000, which you
seem to set in an extension (not sure why ?)

So the switch from WebSocket to LongPolling is definitely wrong, but I
am not sure that this is the cause of the /meta/connect not being
held, or the other problem you were mentioning.
I filed http://bugs.cometd.org/browse/COMETD-431.

I would fix this bug first, and see if that was the root cause; and
then look at other problems, if they remain.

Will you be able to try a 2.6.1-SNAPSHOT after I fixed this problem ?

Simone Bordet

unread,
May 27, 2013, 4:39:46 AM5/27/13
to cometd-users, laurent....@gmail.com
Hi,

On Sat, May 25, 2013 at 6:44 PM, Simone Bordet <sbo...@intalio.com> wrote:
> Hi,
>
> On Thu, May 23, 2013 at 8:08 PM, Laurent Caillette
> <laurent....@gmail.com> wrote:
>> I think there is an exception breaking something, in a way that leaves
>> CometD in a bad state.
>
> That's correct.
> From the client logs I see that the initial connection was using the
> WebSocket transport.
> Then the connection broke, re-connect attempts happen, until one of
> those succeeds using the WebSocket transport, at 19:47:32.377.
> This connect attempt is replied with a 402::Unknown client, as
> expected, at 19:47:32.379, with an advice to re-handshake.
> Then the client re-handshakes, but uses the LongPolling transport
> instead of the WebSocket transport.

I now have a test for the WebSocket transport too that reproduces a
server restart, and it works fine.
CometD sees the failure via a callback on
WebSocketTransport$CometDWebSocket.onClose(), and triggers the
automatic reconnection mechanism.
My test and your logging code follow the same path so far.

The difference is that in my test, when the WebSocket connection can
be established after the failure, the protocols available for
negotiation are [websocket, long-polling], while in your logs only
[long-polling] (see line at 19:47:32.381).
Since you are wrapping CometD's BayeuxClient with your own class, I
suspect you are doing something to exclude the WebSocket transport. Is
that the case ?

Laurent Caillette

unread,
May 27, 2013, 8:08:23 AM5/27/13
to cometd...@googlegroups.com
> I now have a test for the WebSocket transport too that reproduces a
> server restart, and it works fine.
> CometD sees the failure via a callback on
> WebSocketTransport$CometDWebSocket.onClose(), and triggers the
> automatic reconnection mechanism.
> My test and your logging code follow the same path so far.

Great!

> Since you are wrapping CometD's BayeuxClient with your own class, I
> suspect you are doing something to exclude the WebSocket transport.
> Is that the case ?

I don't think so. I'm extending the BayeuxClient:
- To enrich the handshake with my own fields.
- To install acknowledge extension. (I should have mentioned this.)
- Get a place to put hooks.


<<<
public class CometdClient extends BayeuxClient {
  public CometdClient( ..., url, httpClient ) {
    super(
        url,
        WebSocketTransport.create( null, createWebSocketClientFactory() ),
        LongPollingTransport.create( null, httpClient )
    ) ;
    if( CometdConstants.USE_ACKNOWLEDGE_EXTENSION ) {
      addExtension( new AckExtension() ) ;
    }
    
    ...
  }


  private static WebSocketClientFactory createWebSocketClientFactory() {
    final WebSocketClientFactory webSocketClientFactory = new WebSocketClientFactory() ;
    webSocketClientFactory.setBufferSize( CometdConstants.WEBSOCKET_BUFFER_SIZE_BYTES );
    return webSocketClientFactory ;
  }

  ...
}
>>>

Simone Bordet

unread,
May 27, 2013, 8:26:22 AM5/27/13
to cometd-users, laurent....@gmail.com
Hi,

On Mon, May 27, 2013 at 2:08 PM, Laurent Caillette
<laurent....@gmail.com> wrote:
> I don't think so. I'm extending the BayeuxClient:
> - To enrich the handshake with my own fields.
> - To install acknowledge extension. (I should have mentioned this.)
> - Get a place to put hooks.
>
>
> <<<
> public class CometdClient extends BayeuxClient {
> public CometdClient( ..., url, httpClient ) {
> super(
> url,
> WebSocketTransport.create( null, createWebSocketClientFactory() ),
> LongPollingTransport.create( null, httpClient )
> ) ;
> if( CometdConstants.USE_ACKNOWLEDGE_EXTENSION ) {
> addExtension( new AckExtension() ) ;
> }
>
> ...
> }
>
>
> private static WebSocketClientFactory createWebSocketClientFactory() {
> final WebSocketClientFactory webSocketClientFactory = new
> WebSocketClientFactory() ;
> webSocketClientFactory.setBufferSize(
> CometdConstants.WEBSOCKET_BUFFER_SIZE_BYTES );
> return webSocketClientFactory ;
> }
>
> ...
> }

I tried to reproduce, but could not, seems to work for me.

At this point I need a test case from you, if you can reproduce it as
simple as you can.

Thanks

Laurent Caillette

unread,
Sep 14, 2013, 2:20:09 PM9/14/13
to Simone Bordet, cometd-users
I removed some stupid error handling I introduced into my custom
CometD client. After that I couldn't reproduce the problem,
reconnection happens as it should.

CometD rocks. Sorry for the noise!

c.
Reply all
Reply to author
Forward
0 new messages