client reconnect - server unsubscribe interaction

Eric van der Maarel

unread,

May 9, 2012, 8:48:32 AM5/9/12

to gran...@googlegroups.com

As I understand it, a client should periodically reconnect to the server to let the server know he's still alive.
I guess the interval used for this is the reconnect-interval-millis that we can set in the gravity configuration in granite-config.xml and the timeout before the server will unsubscribe the client with channel-idle-timeout-millis.

For testing puproses I've set the channel idle timeout to 40 sec. and the reconnect interval to 10 sec. Now after 40 sec of idle time the Gravity implementation (DefaultGravity) receives an unsubscribe (from the server's idle timeout thread). I don't expect to see that since the reconnect timeout was set to 10 sec.

What's more, millis later the Gravity impl. receives a subscribe from the client, which would of course restore the connection. With this I see two problems. There is a small period between the unsubscribe and the subscribe where message may get lost. Second, the subscribe that arrives does not have the selector information that we provided with the original subscription when we created the consumer.

Are my assumptions about the meaning of the reconnect interval and the idle timeout correct?
Is there something wrong with my 40 sec / 10 sec test configuration so problems I described cannot actually happen?

Regards,
Eric van der Maarel

Franck Wolff

unread,

May 9, 2012, 10:28:43 AM5/9/12

to gran...@googlegroups.com

Eric,

See my responses below:

Le mercredi 9 mai 2012 08:48:32 UTC-4, Eric van der Maarel a écrit :

As I understand it, a client should periodically reconnect to the server to let the server know he's still alive.

It is not working this way: after a certain amount of time (default is 30s with Tomcat), the server closes the connect request from the client and lets it reconnect (this is how long-polling is working). If you want to modify this default amount of time, you need to configure the underlying servlet container and this has nothing to do with the "channel-idle-timeout-millis" option. It's implementation dependent and, for Tomcat+APR, you cannot change the default value, due to a limitation of APR... The right option for this is "long-polling-timeout-millis". On the other hand, the "channel-idle-timeout-millis" option is meant to ensure that Gravity won't keep gost subscriptions for disconnected clients (network failure, browser shutdown, etc.) So, after a certain amount of time, Gravity sends unsubscribe message in order to cleanup memory.

I guess the interval used for this is the reconnect-interval-millis that we can set in the gravity configuration in granite-config.xml and the timeout before the server will unsubscribe the client with channel-idle-timeout-millis.

The "reconnect-interval-millis", together with the "reconnect-max-attempts" option, is a client-side advice: it basically means that if the client cannot reconnect to the server (server shutdown, network failure, etc.), it will retry every "reconnect-interval-millis", with a maximum of "reconnect-max-attempts" times, to reestablish the connection. For example, if you shutdown the server, all connected clients will try reconnect and, if you restart the server, they will be able to reconnect and resubscribe, provided that the server was restarted before the expiration delay configured with the two options.

For testing puproses I've set the channel idle timeout to 40 sec. and the reconnect interval to 10 sec. Now after 40 sec of idle time the Gravity implementation (DefaultGravity) receives an unsubscribe (from the server's idle timeout thread). I don't expect to see that since the reconnect timeout was set to 10 sec.

See above, your are misunderstanding the meaning of these options.

What's more, millis later the Gravity impl. receives a subscribe from the client, which would of course restore the connection. With this I see two problems. There is a small period between the unsubscribe and the subscribe where message may get lost. Second, the subscribe that arrives does not have the selector information that we provided with the original subscription when we created the consumer.

It should. All (re)connections are normally done with the topic and selector headers... This maybe a bug, please confirm this issue after new tests.

Are my assumptions about the meaning of the reconnect interval and the idle timeout correct?

No.

Is there something wrong with my 40 sec / 10 sec test configuration so problems I described cannot actually happen?

Yes and no, this is just misunderstanding about the options: it's wrong and it can happen ;)

Regards,
Eric van der Maarel

Regards.

Eric van der Maarel

unread,

May 9, 2012, 3:58:22 PM5/9/12

to gran...@googlegroups.com

This is very helpful thanks.

So the 'keep alive' that William mentioned in
https://groups.google.com/forum/?fromgroups#!searchin/graniteds/reconnect/graniteds/GYKoPiOreBM/niQ9Og0HVIoJ
is implemented based in the long polling mechanism.

In any way the purpose of this mechanism is for Gravity to not issue an unsubscribe command when the client is still connected, right? Having the idle timeout at 40 sec, and since the default Tomcat+APR long polling timeout is 30 sec (and I have the long-polling-timeout-millis even at 20 sec), I wouldn't expect to see an unsubscribe command coming from the Gravity ChannelTimerTask after 40 sec. (the idle timeout as configured), or should I?

I'll have a closer look into the reconnect subscribe to confirm whether is uses the selector.

Thanks,
Eric

Franck Wolff

unread,

May 9, 2012, 6:48:01 PM5/9/12

to gran...@googlegroups.com

So the 'keep alive' that William mentioned in
https://groups.google.com/forum/?fromgroups#!searchin/graniteds/reconnect/graniteds/GYKoPiOreBM/niQ9Og0HVIoJ
is implemented based in the long polling mechanism.

Yes, they are basically the same (do not confuse, of course, the APR keep alive option with the Gravity thread pool option, which is completely different).

In any way the purpose of this mechanism is for Gravity to not issue an unsubscribe command when the client is still connected, right? Having the idle timeout at 40 sec, and since the default Tomcat+APR long polling timeout is 30 sec (and I have the long-polling-timeout-millis even at 20 sec), I wouldn't expect to see an unsubscribe command coming from the Gravity ChannelTimerTask after 40 sec. (the idle timeout as configured), or should I?

The default timeout for Tomcat+APR is actually 20s, not 30s (I was wrong on this one). Here is how it should work: whenever the client is sending a request to the server, the underlying channel is marked as accessed (see the servlet implementation). "Accessed" means that a one-time timer task, configured with the channel-idle-timeout-millis value, is canceled and a new one is created and scheduled with the same delay (see DefaultGravity.access(...))

So, you're right, the ChannelTimerTask should be cancelled and rescheduled every 20s and the channel removal should not occur... I have tried to reproduce your problem without success (Chat sample application / APR / Tomcat 7 / Servlet 3)...

Could you enable logging for DefaultGravity and ChannelTimerTask? Normally, you should see something like that after each connect request (every 20s, without doing anything on the client, just automatic reconnecting after long-polling timeout):

May 9, 2012 6:28:41 PM org.granite.logging.JdkLogger debug
FINER: Canceling TimerTask: org.granite.gravity.ChannelTimerTask {channelId=1DCCC4D3-9D58-4CD3-92E8-0F43304FA029}
May 9, 2012 6:28:41 PM org.granite.logging.JdkLogger debug
FINER: Scheduling TimerTask: org.granite.gravity.ChannelTimerTask {channelId=1DCCC4D3-9D58-4CD3-92E8-0F43304FA029} for 40000 ms.

(...)

May 9, 2012 6:36:23 PM org.granite.logging.JdkLogger debug
FINER: Canceling TimerTask: org.granite.gravity.ChannelTimerTask {channelId=1DCCC4D3-9D58-4CD3-92E8-0F43304FA029}
May 9, 2012 6:36:23 PM org.granite.logging.JdkLogger debug
FINER: Scheduling TimerTask: org.granite.gravity.ChannelTimerTask {channelId=1DCCC4D3-9D58-4CD3-92E8-0F43304FA029} for 40000 ms.

And, if you close your browser tab, after 40s (6:36:23 + 40s = 6:37:03):

May 9, 2012 6:37:03 PM org.granite.logging.JdkLogger debug
FINER: Removing channel: "1DCCC4D3-9D58-4CD3-92E8-0F43304FA029"...
May 9, 2012 6:37:03 PM org.granite.logging.JdkLogger debug
FINER: >> Finding serviceAdapter for messageType: "flex.messaging.messages.AsyncMessage" and destinationId: "gravity"
May 9, 2012 6:37:03 PM org.granite.logging.JdkLogger debug
FINER: >> Found adapterRef: "default"
May 9, 2012 6:37:03 PM org.granite.logging.JdkLogger debug
FINER: >> Found a cached serviceAdapter for ref: org.granite.config.flex.Adapter@a9509d
May 9, 2012 6:37:03 PM org.granite.logging.JdkLogger debug
FINER: << Returning serviceAdapter: org.granite.gravity.adapters.SimpleServiceAdapter@fa21a4
May 9, 2012 6:37:03 PM org.granite.logging.JdkLogger debug
FINER: Channel: "1DCCC4D3-9D58-4CD3-92E8-0F43304FA029" removed.

My configuration is:

<granite-config>
<gravity channel-idle-timeout-millis="40000" />
</granite-config>

You didn't set 40, did you?

I'll have a closer look into the reconnect subscribe to confirm whether is uses the selector.

Thanks,
Eric

--
Franck Wolff
Granite Data Services Inc.
CEO & Founder

Eric van der Maarel

unread,

May 10, 2012, 5:46:38 PM5/10/12

to gran...@googlegroups.com

Here is some log. There are schedule/cancel pairs some millis apart. Next thing, a scheduled task is not canceled and so after 40s the channel gets removed. Of course I also use the number 40000 as idle timeout setting :-)
Can you make anything out of this Franck?

Regards,
Eric

22:45:28,957 DEBUG [org.granite.gravity.Gravity] Saving channel id in distributed data: "2C96ED50-CC04-433F-BCF4-AE27DBEB5949"
22:45:28,957 DEBUG [org.granite.gravity.Gravity] Scheduling TimerTask: org.granite.gravity.ChannelTimerTask {channelId=2C96ED50-CC04-433F-BCF4-AE27DBEB5949} for 40000 ms.
22:45:28,958 DEBUG [org.granite.gravity.Gravity] handshake.handle: reply=flex.messaging.messages.AcknowledgeMessage {
22:45:29,142 DEBUG [org.granite.gravity.Gravity] Saving new subscription message for channel: "2C96ED50-CC04-433F-BCF4-AE27DBEB5949" - flex.messaging.messages.CommandMessage {
22:45:29,142 DEBUG [org.granite.gravity.Gravity] Canceling TimerTask: org.granite.gravity.ChannelTimerTask {channelId=2C96ED50-CC04-433F-BCF4-AE27DBEB5949}
22:45:29,143 DEBUG [org.granite.gravity.Gravity] Scheduling TimerTask: org.granite.gravity.ChannelTimerTask {channelId=2C96ED50-CC04-433F-BCF4-AE27DBEB5949} for 40000 ms.
22:45:29,221 DEBUG [org.granite.gravity.Gravity] Canceling TimerTask: org.granite.gravity.ChannelTimerTask {channelId=2C96ED50-CC04-433F-BCF4-AE27DBEB5949}
22:45:29,221 DEBUG [org.granite.gravity.Gravity] Scheduling TimerTask: org.granite.gravity.ChannelTimerTask {channelId=2C96ED50-CC04-433F-BCF4-AE27DBEB5949} for 40000 ms.

22:46:09,221 DEBUG [org.granite.gravity.ChannelTimerTask] Removing channel: "2C96ED50-CC04-433F-BCF4-AE27DBEB5949"...
22:46:09,247 DEBUG [org.granite.gravity.ChannelTimerTask] Channel: "2C96ED50-CC04-433F-BCF4-AE27DBEB5949" removed.
22:46:09,279 DEBUG [org.granite.gravity.Gravity] Found channel id in distributed data: "2C96ED50-CC04-433F-BCF4-AE27DBEB5949"
22:46:09,279 DEBUG [org.granite.gravity.Gravity] Resubscribing channel: "2C96ED50-CC04-433F-BCF4-AE27DBEB5949" - flex.messaging.messages.CommandMessage {
22:46:09,306 DEBUG [org.granite.gravity.Gravity] Scheduling TimerTask: org.granite.gravity.ChannelTimerTask {channelId=2C96ED50-CC04-433F-BCF4-AE27DBEB5949} for 40000 ms.
22:46:09,307 DEBUG [org.granite.gravity.Gravity] Canceling TimerTask: org.granite.gravity.ChannelTimerTask {channelId=2C96ED50-CC04-433F-BCF4-AE27DBEB5949}
22:46:09,307 DEBUG [org.granite.gravity.Gravity] Scheduling TimerTask: org.granite.gravity.ChannelTimerTask {channelId=2C96ED50-CC04-433F-BCF4-AE27DBEB5949} for 40000 ms.

....

22:48:49,602 DEBUG [org.granite.gravity.ChannelTimerTask] Removing channel: "2C96ED50-CC04-433F-BCF4-AE27DBEB5949"...
22:48:49,640 DEBUG [org.granite.gravity.ChannelTimerTask] Channel: "2C96ED50-CC04-433F-BCF4-AE27DBEB5949" removed.
22:48:49,663 DEBUG [org.granite.gravity.Gravity] Found channel id in distributed data: "2C96ED50-CC04-433F-BCF4-AE27DBEB5949"
22:48:49,663 DEBUG [org.granite.gravity.Gravity] Resubscribing channel: "2C96ED50-CC04-433F-BCF4-AE27DBEB5949" - flex.messaging.messages.CommandMessage {
22:48:49,702 DEBUG [org.granite.gravity.Gravity] Scheduling TimerTask: org.granite.gravity.ChannelTimerTask {channelId=2C96ED50-CC04-433F-BCF4-AE27DBEB5949} for 40000 ms.
22:48:49,703 DEBUG [org.granite.gravity.Gravity] Canceling TimerTask: org.granite.gravity.ChannelTimerTask {channelId=2C96ED50-CC04-433F-BCF4-AE27DBEB5949}
22:48:49,704 DEBUG [org.granite.gravity.Gravity] Scheduling TimerTask: org.granite.gravity.ChannelTimerTask {channelId=2C96ED50-CC04-433F-BCF4-AE27DBEB5949} for 40000 ms.

closing browser, after 40s

22:49:29,704 DEBUG [org.granite.gravity.ChannelTimerTask] Removing channel: "2C96ED50-CC04-433F-BCF4-AE27DBEB5949"...
22:49:29,731 DEBUG [org.granite.gravity.ChannelTimerTask] Channel: "2C96ED50-CC04-433F-BCF4-AE27DBEB5949" removed.

Franck Wolff

unread,

May 10, 2012, 6:34:54 PM5/10/12

to gran...@googlegroups.com

I'm afraid I won't be able to do anything else without more information about your setup/application. You can use my personal email if anything is sensible...

2012/5/10 Eric van der Maarel <ericvd...@gmail.com>

John Fletcher

unread,

May 11, 2012, 5:15:57 AM5/11/12

to gran...@googlegroups.com

"sensible" in French and probably all the romanic languages = "sensitive" in English :-)

John

2012/5/11 Franck Wolff <franck...@graniteds.org>

Eric van der Maarel

unread,

May 11, 2012, 6:29:57 AM5/11/12

to gran...@googlegroups.com

We're not revealing any secrets when I explain you what we're doing.
Franck, I'll send the relevant code in an email.

Our Flex client application can subscribe for application events to e.g., display them in a monitor.
The client subscribes with a filter known by id, which is conveyed to the server as selector text.
The server has a singleton service that dispatches events to subscribed clients (dispatcher).
This service has a ref to a (the?) Gravity instance, which it gets injected from a servlet
attribute listener at system startup. See a former discussion on that at

https://groups.google.com/forum/#!searchin/graniteds/maarel/graniteds/ow4vDgVKzRQ/0ETRSsb67CoJ

The dispatcher service receives an event to be dispatched, so it uses the Gravity instance to send
an event message to interested clients.

It uses a selector of ours that is based on the Gravity subscription id (i.e., the
AsyncMessage.DESTINATION_CLIENT_ID_HEADER property) to decide where the event
message should go.
As always, the selector is constructed by at subscription time in the gravity Subscription object.
In the ServiceAdapter of our own (that extends your SimpleServiceAdapter) we take the selector text
from the subscribe command message to tell another service of ours what event filter must be used
and we set the selector text in the subscription command to contain the subscription id like this

String sid = (String)msg.getHeader(AsyncMessage.DESTINATION_CLIENT_ID_HEADER);
msg.setHeader(CommandMessage.SELECTOR_HEADER, "SubscriptionId = '" + sid + "'");

where msg is the subscribe command. This selector text is later on used by our selector to match
the subscribers.

This scheme has worked with GDS 2.2.1. It also works with GDS 2.3.1. The difference between the two is
in the connects that stay away and therefore subsequent resubscribe that we see.

I have played a little more with the gravity config. We use Tomcat+APR with JBoss-4.2.3. It appears that
there is no CONNECT command coming in. Quite possibly because it isn't sent. Which is likely to happen because
the long polling request isn't ended. I've tested with the default settings: after half an hour no CONNECT
requests have come in, so Gravity submits an unsubscribe command, upon which the client resubscribes, as can
also be seen in the log in the previous post in this thread.

Thanks,
Eric

Franck Wolff

unread,

May 11, 2012, 8:12:34 AM5/11/12

to gran...@googlegroups.com

You could have say that when using my personal email address... But, yes, of course, you're right, I meant sensitive ;)

2012/5/11 John Fletcher <flet...@gmail.com>

John Fletcher

unread,

May 14, 2012, 5:42:03 AM5/14/12

to gran...@googlegroups.com

Yes... but I wanted to clarify your message for Eric etc because "sensible" has another meaning in English. It sounded a bit like what he wrote so far was not sensible, which was evidently not the impression you wanted to bring across :-).

Eric van der Maarel

unread,

May 15, 2012, 10:22:29 AM5/15/12

to gran...@googlegroups.com

Thanks John, we got it :-)

Eric

Eric van der Maarel

unread,

May 25, 2012, 8:49:22 AM5/25/12

to gran...@googlegroups.com

I've tracked the problem down. It's related to having a security manager (java.lang.SecurityManager) installed in the JVM, as we have.
The problem boils down to a problem in the org.apache.catalina.core.ApplicationFilterFactory class as bundled in jbossweb with jboss-4.2.3. In this factory setting a filter chain is prohibited when there is a security manager installed and the request is a comet request, as can be seen in the ApplicationFilterFactory#createFilterChain code, which has

            comet = req.isComet();
            if (Globals.IS_SECURITY_ENABLED) {
                // Security: Do not recycle
                filterChain = new ApplicationFilterChain();
                if (comet) {
                    req.setFilterChain(filterChain);
                }
            } else {
                filterChain = (ApplicationFilterChain) req.getFilterChain();
                if (filterChain == null) {
                    filterChain = new ApplicationFilterChain();
                    req.setFilterChain(filterChain);
                }
            }

Creation of the the filter chain seems to be requested when preparing where the request does not have the comet flag set. Consequently, a filter chain is created when there is no security manager installed and not when there is no security manager. This doesn't seem to make a lot of sense.
It is reported that jbossweb 2.0.1 that comes with jboss 4.2.3 is based in tomcat 6.0.13. However, the original tomcat 6.0.13 code has

            if (Globals.IS_SECURITY_ENABLED) {
                // Security: Do not recycle
                filterChain = new ApplicationFilterChain();
            } else {
                filterChain = (ApplicationFilterChain) req.getFilterChain();
                if (filterChain == null) {
                    filterChain = new ApplicationFilterChain();
                    req.setFilterChain(filterChain);
                }
            }

which makes a lot more sense.
I've changed the jbossweb-2.0.1 accordingly. Now the client server interaction is as expected, and similar to that of the GDS chat example.

Btw, the problem can be perfectly reproduced with the chat example and jboss-4.2.3 by setting the java.security.manager system property when starting jboss.

Regards,
Eric

Franck Wolff

unread,

May 25, 2012, 8:58:14 AM5/25/12

to gran...@googlegroups.com

Phew ;) Thanks for sharing your inquiry results.

Reply all

Reply to author

Forward