meta/handshake and meta/connect intermittently failing

114 views
Skip to first unread message

kdavie

unread,
Apr 11, 2011, 8:06:11 PM4/11/11
to aspc...@googlegroups.com
I'm having intermittent issues with meta/handshake and meta/connect. I have set breakpoints in the appropriate handlers to get an idea of what is going on, but I don't see any exceptions (there are occasionally some unrecognized clientId errors, but I believe that's another problem entirely). I've also watched to requests/responses go out/in and they look good. For example, here is a request/response that I'm seeing; it is resulting in a "meta/handshake failing" message:

Request Body:

[{"ext":{"authentication":{"user":"Kevin","credentials":"password"}},"version":"1.0","minimumVersion":"0.9","channel":"/meta/handshake","supportedConnectionTypes":["long-polling"],"id":5}]


Response Body:

[{,"version":"1.0","channel":"/meta/handshake","supportedConnectionTypes":["long-polling"],"id":"5","clientId":"lYL47yDpnWqM63xXc8vk","advice":{"reconnect":"retry"},"successful":true}]

As you can see the response indicates that it was successful, however the meta/unsuccessful callback is being raised. 

I am able to consistently achieve these results by connecting, disconnecting, and reconnecting to the server (by repeatedly opening and closing new tabs in my browser). The amount of times I have to go through this workflow before everything starts failing varies, but it does seem to eventually happen every time. 

I'm also noticing that the client repository seems to be arbitrarily removing clients. This results in the aforementioned unrecognized clientId error. I can't figure out what is triggering this behavior.

If anyone can provide feedback that can help I would really appreciate it. I was very excited to find an open-source ASP.NET implementation of a Comet server and am hopeful that it can be used (maybe with some minor tweaks) in a production environment.

Thanks.

Neil Mosafi

unread,
Apr 11, 2011, 9:30:12 PM4/11/11
to aspc...@googlegroups.com
Hi

Thanks for the interest in the project.  I believe it is being used in production in some places, so hopefully these issues can be resolved.

So can you provide a bit more information about how you have aspComet configured?  Which version of IIS and OS are you using?

When you say the meta/unsuccessful callback is being raised, what do you mean?  Is this on the client side or server?  If client, which client side library are you using (jquery/dojo/etc?)

You mention the client repository dropping clients.  Is it losing all clients in one go or just some of them?

If it's all of them this would indicate that the process is being recycled by IIS.  You could implement your own IClientRepository which stores the clients in some kind of persistent storage rather than using the default InMemoryClientRepository.  I don't think it's THAT important that the clients stay around forever - the browser will reinitiate the handshake if they do get lost, the only reason I thought you might need to roll your own would be in web farm scenarios.

If it's just some of them, then it seems strange really as the only place this could happen is the DeleteById method, try setting a breakpoint here or putting some logging.

Hope this helps
Neil

Kevin Davie

unread,
Apr 11, 2011, 9:42:58 PM4/11/11
to aspc...@googlegroups.com

Hi Neil,

Thanks for the quick response. I am developing on a Windows 7 machine with IIS 7. The callback is being raised client-side in the chat.js sample you included. I am on a train right now so I don't have the source in front of me.

I am only losing 1 client at a time. I attempted t to walk the code and did break on DeleteClientById, it just wasn't obvious to me what state was causing that to get called.
If this isn't enough info I will post again once I am at a computer.

Thanks again for the quick response and all your hard work.

(Sent from my phone, please excuse brevity and typos)

Greg Thomas

unread,
Apr 12, 2011, 3:04:37 AM4/12/11
to aspc...@googlegroups.com
On the chat example, there is some code to arbitrarily fail
connections as a demonstration of how to implement a handler. Could it
be that which you're seeing?

Greg

Kevin Davie

unread,
Apr 12, 2011, 4:22:42 AM4/12/11
to aspc...@googlegroups.com
I actually implemented my own server using the example as guidence and
omitted the arbitrary fail. I used the client side piece of the
example to validate everything was working with my implementation.

Austin

unread,
Apr 12, 2011, 7:31:25 AM4/12/11
to aspComet
I'm having the same problem as Kevin. Sometimes I can connect
successfully to the server, but eventually a successful response is
being sent back from the server, but the client is outputting the
message:

System: Handshake complete. Successful? false
System: Request on channel /meta/handshake failed: No message

And the response that I'm receiving from the server:

[{,"version":"1.0","channel":"/meta/
handshake","supportedConnectionTypes":["long-
polling"],"id":"6","clientId":"n8V89QzyAzeOT2zGPXcR","advice":
{"reconnect":"retry"},"successful":true}]

It has "successful":true, but the client library says failed. I'm
running this on Windows 7 using IIS Express and using the
jquery.cometd.js that is in the sample AspComet solution.

On Apr 12, 4:22 am, Kevin Davie <kevin.da...@gmail.com> wrote:
> I actually implemented my own server using the example as guidence and
> omitted the arbitrary fail. I used the client side piece of the
> example to validate everything was working with my implementation.
>
>
>
>
>
>
>
> On Tuesday, April 12, 2011, Greg Thomas <greg.d.tho...@gmail.com> wrote:
> > On the chat example, there is some code to arbitrarily fail
> > connections as a demonstration of how to implement a handler. Could it
> > be that which you're seeing?
>
> > Greg
>
> > On Tuesday, 12 April 2011, kdavie <kevin.da...@gmail.com> wrote:
> >> I'm having intermittent issues with meta/handshake and meta/connect. I have set breakpoints in the appropriate handlers to get an idea of what is going on, but I don't see any exceptions (there are occasionally some unrecognized clientId errors, but I believe that's another problem entirely). I've also watched to requests/responses go out/in and they look good. For example, here is a request/response that I'm seeing; it is resulting in a "meta/handshake failing" message:
> >> Request Body:
> >> [{"ext":{"authentication":{"user":"Kevin","credentials":"password"}},"versi on":"1.0","minimumVersion":"0.9","channel":"/meta/handshake","supportedConn ectionTypes":["long-polling"],"id":5}]
>
> >> Response Body:
> >> [{,"version":"1.0","channel":"/meta/handshake","supportedConnectionTypes":[ "long-polling"],"id":"5","clientId":"lYL47yDpnWqM63xXc8vk","advice":{"recon nect":"retry"},"successful":true}]

Kevin Davie

unread,
Apr 12, 2011, 3:08:16 PM4/12/11
to aspc...@googlegroups.com, Austin
I am making a little headway on this issue. I can see that every time this happens the xhr.onerror is being called, subsequently causing the transport failure to fire. This is occurring after handshake on connect (mostly, although occasionally it actually does happen on handshake). The xhr error message is: "Expected identifier, string or number." It doesn't provide any further context. I am going to keep debugging and hopefully will get to the bottom of it soon. Will post back any progress.

-Kevin 

kdavie

unread,
Apr 14, 2011, 7:58:18 PM4/14/11
to aspComet
Had to focus my attention on another project for a few days but I'm
back on this now. I think I've gotten past the handshake issue. It
appears to have been a bug I introduced in my custom handshake
authenticator. I mistakenly omitted the event cancellation on bad
requests. If the cancellation is omitted it seems to hold on to this
bad state on all subsequent (good) requests. Since I've made the
change I haven't run into the problem again. Fingers crossed that it
doesn't happen anymore.

I tracked the dropped connections down to the client timeout. It seems
that my (c# desktop) client library is having trouble reconnecting
after the timeout, which results in the loss of incoming messages.
This is strange because my library works perfect with the cometd java
server, as well as a proprietary server written by some former
colleagues of mine. It's been a while since I wrote the client library
so I am planning on revisiting the spec and making sure nothing funny
is happening on my end; though considering it's working on other
server implementations I'm leaning towards it being a problem with
aspcomet. Hopefully I will get to the bottom of it soon :)

-Kevin

kdavie

unread,
Apr 15, 2011, 1:29:25 PM4/15/11
to aspComet

Sweet. I have figured it out. Turns out that the advice from the
server only returns an interval; the reconnect action is omitted. My
client was interpreting this as reconnect action = none, which, per
the specs, meant that it could not try to connect or handshake again.
I added a new value in my ReconnectionAdvice enum called noAdvice. Now
when the server doesn't provide any reconnect action my connect
callback will act on that case and perform the retry operation. If the
server provides reconnect advice in the future, it will respect that.
So I think I'm inline with the spec.

Austin

unread,
Apr 19, 2011, 9:11:18 AM4/19/11
to aspComet
I was looking through the issues on github because I was still having
this intermittent issue, and Andrey fixed this in Issue 14,
https://github.com/nmosafi/aspComet/issues/14.

It looks like the NullRegex that is supposed to clean up the nulls in
the JSON string was assuming that the serializer would serialize the
properties in the same order that the GetProperties() returns the
properties. There was an edge case when the last property defined in
the NullRegex was the first property serialized, it left a leading
comma in the JSON string which generated a parseError. It looks like
Andrey's solution fixes the problem (In the MessageConverter.cs):

Line 56: stringBuilder.AppendFormat(@"(""{0}"":null)|(,""{0}"":null)",
properties[properties.Length - 1].Name);

I think it should be replaced to:

stringBuilder.AppendFormat(@"(""{0}"":null,)|(""{0}"":null)|
(,""{0}"":null)", properties[properties.Length - 1].Name);

If anyone has a better solution or confirm that this doesn't introduce
any other problems I would be interested to hear it.

Austin

Bryan

unread,
May 26, 2011, 3:41:29 PM5/26/11
to aspComet
I'm currently using this fix. It looks to me like it was simply a
typo or copy-paste error. So far we have not had any problems with
it, but I will post back here if we do encounter any problems.

On Apr 19, 9:11 am, Austin <austingr...@gmail.com> wrote:
> I was looking through the issues on github because I was still having
> this intermittent issue, and Andrey fixed this in Issue 14,https://github.com/nmosafi/aspComet/issues/14.

Matt

unread,
May 27, 2011, 3:40:53 PM5/27/11
to aspComet
We've also seen the loss of one client at a time and tracked it down
to a deadlock on the server. The deadlock looked like this:

Thread 1 call stack:
[blocked while invoking callback]
CometAsyncResult.CompleteRequestWithMessages()
Client.FlushQueue() <----- acquires lock on syncRoot
FowardingHandler.SendMessageToRecipients()
FowardingHandler.HandleMessage()
MessagesProcessor.Process(Message)
MessagesProcessor.Process(IEnumerable<Message>)
MessageBus.CreateProcessorAndProcess()
MessageBus.HandleMessages()
CometHttpHandler.BeginProcessRequest()
CometHttpHandler.BeginProcessRequest()


Thread 2 call stack:
[waiting for lock on syncRoot held by Thread 1]
Client.Enqueue()
ClientExtensions.Enqueue()
ForwardingHandler.SendMessageToRecipients()
ForwardingHandler.HandleMessage()
MessagesProcessor.Process(Message)
MessagesProcessor.Process(IEnumerable<Message>)
MessageBus.CreateProcessorAndProcess()
MessageBus.HandleMessages()
CometHttpHandler.BeginProcessRequest()
CometHttpHandler.BeginProcessRequest()


The changes that appear to fix this are in Client.cs:

1) Change GetMessages() to iterate over list immediately:
private IEnumerable<Message> GetMessages()
{
IList<Message> result = new List<Message>();

while (this.messageQueue.Count > 0)
result.Add( this.messageQueue.Dequeue() );

return result;
}

2) Release lock on syncRoot earlier in FlushQueue(), before the
callback:

public void FlushQueue()
{
if (this.messageQueue.Count > 0 && this.CurrentAsyncResult != null)
{
IEnumerable<Message> response = null;
ICometAsyncResult asyncResult = null;

lock (syncRoot) // double checked lock
{
if (this.messageQueue.Count > 0 && this.CurrentAsyncResult !=
null)
{
response = this.GetMessages();
asyncResult = CurrentAsyncResult;
this.CurrentAsyncResult = null;
}
}

if (response != null)
{
asyncResult.CompleteRequestWithMessages(response);
Reply all
Reply to author
Forward
0 new messages