GCM Messages delayed - how to debug? how to solve?

3,116 views
Skip to first unread message

Andreas Be

unread,
Jul 11, 2013, 4:08:52 PM7/11/13
to andro...@googlegroups.com
I am developing an app that receives messages via GCM.

Some customers complain that messages are late, using WIFI
I found the app Push notification fixer click which explains and solves the problem (tcp timeouts on some routers)
Unfortunately, that requires root which not everybody wants to unlock.

Is there any way to debug push notifications to find out why the message was late?
I already found the GTALK service menu, which shows the last heartbeat signal, but is it also possible to read that information from some log file?

  1. Is GOOGLE planning to reduce that refresh (heartbeat) interval? Seems to be a massive problem for some users... A smaller refresh rate looks like an easy and promising solution to me.
  2. Is there a way to manually send a heartbeat on non-rooted phones?

Regards
Andy

Kenneth Corbin

unread,
Jul 13, 2013, 11:42:41 AM7/13/13
to andro...@googlegroups.com
Google would no doubt not approve of the situation, but adding a user option that requests a new registration ID from Google servers seems to do a pretty good job of fixing this problem.   The registration ID returned is usually the same as what we already have on file, and when it is I don't bother sending it to the server side of the app.  It has always been a mystery why this works, but now it starts to make sense. It sounds like the request initiated by the device side performs the same function as the client initiate heartbeat message.

This has been so successful that I recently added logic to initiate the re-register request automatically if no incoming push is received in 24 hours.  I'm planing on checking timestamps and doing it when an incoming push take more than 30 seconds to be delivered.

Andreas Be

unread,
Jul 14, 2013, 5:36:34 PM7/14/13
to andro...@googlegroups.com
I am not sure if I get the meaning of your post. How is a re-registration useful every 24 hours? My app needs push messages to be received after a maximum of 2 minutes.
And how can you force the device to re-register if the push isn't delivered after 30 seconds? How can you control that, when no message is received? O_o
Sorry, I don't get that :-)

A heartbeat is only useful if it is sent regularly...

Andreas

Tomcat

unread,
Jul 15, 2013, 8:34:43 AM7/15/13
to andro...@googlegroups.com
Unfortunately, C2DM/GCM is not reliable enough to meet your needs. If you search well, the developer of C2DM/GCM admitted this fact and recommended the use of SMS. You can't find solution with C2DM/GCM to meet your requirement: push messages to be received after a maximum of 2 minutes. When everything is working fine, it's seconds, but known problem delays it to 20 or 30 minutes easily. If you insist, don't use C2DM/GCM.

Tomcat

2013年7月15日月曜日 6時36分34秒 UTC+9 Andreas Be:

Costin Manolache

unread,
Jul 15, 2013, 4:22:50 PM7/15/13
to andro...@googlegroups.com
Keep in mind that reducing the heartbeat interval will greatly reduce battery life.

Most message delays are caused by a weak connection or bad routers not sending FIN/RST. 
There is also a window after the device connects - a message sent within few seconds of the 
connect may be delayed until next refresh.
In both cases the message will still be delivered after the connection stabilizes.

Costin
 

On Thursday, July 11, 2013 1:08:52 PM UTC-7, Andreas Be wrote:

Kenneth Corbin

unread,
Jul 16, 2013, 6:12:20 PM7/16/13
to andro...@googlegroups.com
I have no idea why a re-registration helps.  But  it apparently does.  Most users receiving the GCM pushes within seconds of being sent.  But occasional users have trouble with the push pages being delay to to an hour.  Or not arriving at all.  It has been observed that advising these user to perform a reconnect operation (which basically just sends Google another registration request) almost always fixes their problem.  If it is true that routers between the device and Google have gotten their routing confused, it may be the case that sending some kind of TCP message being sent from the device to Google servers and getting a response back was all that was required to straighten out the routing tables.  Or it may not.  All I know is that it usually fixes things.

Current logic performs this re-register automatically if no push is received in the last 24 hours.  That is basically an attempt to cut down on my support requirements.  I figure that will fix a good percentage of the users not getting push messages before they have to call me and have told to do the manual reconnect procedure.

The only thing that this doesn't cover is users who periodically get pages 30 min late.  Since they are still getting the, the automatic re-register when 24 hours go by without a push message doesn't kick in.  So the next step is going to be to check the server time stamp to determine how long it took the message to arrive.  If it takes too long, probably more than 30 sec, the automatic re-register logic kicks in.

Costin Manolache

unread,
Jul 16, 2013, 7:44:02 PM7/16/13
to andro...@googlegroups.com
On Tue, Jul 16, 2013 at 3:12 PM, Kenneth Corbin <kencor...@gmail.com> wrote:
I have no idea why a re-registration helps.  But  it apparently does.  Most users receiving the GCM pushes within seconds of being sent.  But occasional users have trouble with the push pages being delay to to an hour.  Or not arriving at all.  It has been observed that advising these user to perform a reconnect operation (which basically just sends Google another registration request) almost always fixes their problem.  If it is true that routers between the device and Google have gotten their routing confused, it may be the case that sending some kind of TCP message being sent from the device to Google servers and getting a response back was all that was required to straighten out the routing tables.  Or it may not.  All I know is that it usually fixes things.

Do you happen to use 'delay while idle' when sending the message ?

Can you reproduce this on any device ? If yes - could you send me the test device registration ID and some time interval when this happened ? Would making a request to a random URL also help ?
Also some logcats/bugreport from a device would help. 

I'm pretty sure there is absolutely no connection between the registration request and the message delivery on the server side - by the time you register again the message is already queued. I can imagine some TCP magic happening, but not sure when. 

 

Current logic performs this re-register automatically if no push is received in the last 24 hours.  That is basically an attempt to cut down on my support requirements.  I figure that will fix a good percentage of the users not getting push messages before they have to call me and have told to do the manual reconnect procedure.

  

The only thing that this doesn't cover is users who periodically get pages 30 min late.  Since they are still getting the, the automatic re-register when 24 hours go by without a push message doesn't kick in.  So the next step is going to be to check the server time stamp to determine how long it took the message to arrive.  If it takes too long, probably more than 30 sec, the automatic re-register logic kicks in.


Do you have any server-side logs on when the devices received the message ( i.e. do you start a sync or other server request on receiving the message ) ? 

I need some info to debug this - lots of google apps are using GCM and they register only on install, without retries - and so far I haven't heard of this kind of issue. If it can be narrowed down to specific device or operator it may help. 


Costin
 

On Sunday, July 14, 2013 2:36:34 PM UTC-7, Andreas Be wrote:
I am not sure if I get the meaning of your post. How is a re-registration useful every 24 hours? My app needs push messages to be received after a maximum of 2 minutes.
And how can you force the device to re-register if the push isn't delivered after 30 seconds? How can you control that, when no message is received? O_o
Sorry, I don't get that :-)

A heartbeat is only useful if it is sent regularly...

Andreas

Am Samstag, 13. Juli 2013 17:42:41 UTC+2 schrieb Kenneth Corbin:
Google would no doubt not approve of the situation, but adding a user option that requests a new registration ID from Google servers seems to do a pretty good job of fixing this problem.   The registration ID returned is usually the same as what we already have on file, and when it is I don't bother sending it to the server side of the app.  It has always been a mystery why this works, but now it starts to make sense. It sounds like the request initiated by the device side performs the same function as the client initiate heartbeat message.

This has been so successful that I recently added logic to initiate the re-register request automatically if no incoming push is received in 24 hours.  I'm planing on checking timestamps and doing it when an incoming push take more than 30 seconds to be delivered.


On Thursday, July 11, 2013 1:08:52 PM UTC-7, Andreas Be wrote:
I am developing an app that receives messages via GCM.

Some customers complain that messages are late, using WIFI
I found the app Push notification fixer click which explains and solves the problem (tcp timeouts on some routers)
Unfortunately, that requires root which not everybody wants to unlock.

Is there any way to debug push notifications to find out why the message was late?
I already found the GTALK service menu, which shows the last heartbeat signal, but is it also possible to read that information from some log file?

  1. Is GOOGLE planning to reduce that refresh (heartbeat) interval? Seems to be a massive problem for some users... A smaller refresh rate looks like an easy and promising solution to me.
  2. Is there a way to manually send a heartbeat on non-rooted phones?

Regards
Andy

--
You received this message because you are subscribed to the Google Groups "android-gcm" group.
To unsubscribe from this group and stop receiving emails from it, send an email to android-gcm...@googlegroups.com.
To post to this group, send email to andro...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/android-gcm/3dc00958-9d89-4ab1-bb92-05c0e9183a46%40googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.
 
 

Kenneth Corbin

unread,
Jul 22, 2013, 1:10:17 PM7/22/13
to andro...@googlegroups.com
See below...


On Tuesday, July 16, 2013 4:44:02 PM UTC-7, Costin Manolache wrote:
On Tue, Jul 16, 2013 at 3:12 PM, Kenneth Corbin <kencor...@gmail.com> wrote:
I have no idea why a re-registration helps.  But  it apparently does.  Most users receiving the GCM pushes within seconds of being sent.  But occasional users have trouble with the push pages being delay to to an hour.  Or not arriving at all.  It has been observed that advising these user to perform a reconnect operation (which basically just sends Google another registration request) almost always fixes their problem.  If it is true that routers between the device and Google have gotten their routing confused, it may be the case that sending some kind of TCP message being sent from the device to Google servers and getting a response back was all that was required to straighten out the routing tables.  Or it may not.  All I know is that it usually fixes things.

Do you happen to use 'delay while idle' when sending the message ?

Cannot say for certain.  The senders are separate organizations and both have been out of touch over the weekend.  I have told them they need to specify delay_while_idle=false, but I need to positively confirm that they have done so.  Still working on that.
 
Can you reproduce this on any device ? If yes - could you send me the test device registration ID and some time interval when this happened ? Would making a request to a random URL also help ?
Also some logcats/bugreport from a device would help. 

Unfortunately no.  This is a problem that hits a small number of devices for no reason that we can determine.  Once it starts happening, it tends to keep happening until the device goes through the re-register sequence.  And workaround logic I have put in place does seem to have had the effect of reducing the user reports of this problem to not quite zero.  I have observed it on my phone maybe once or twice a year, before the workaround logic went into effect.  I have been putting the word out to all users that we would like to catch a hard example of a push message that did not arrive or was significantly delayed.  I will let you know when we get one. 

I'm pretty sure there is absolutely no connection between the registration request and the message delivery on the server side - by the time you register again the message is already queued. I can imagine some TCP magic happening, but not sure when. 

The latest intriguing theory is this is caused by some kind of routing confusing between the device and your servers.  And that it isn't the actual re-registration process that is fixing things.  That just sending a simple ping from the device to your servers and getting a response back would have had the same effect.  If I knew the URL or IP address that GCM tries to communicate with from the server side, I would like to try sending a ping request every hour instead of a re-register request once a day and see if that improves things.

I have one user reporting that all of the Android phone users in his department running our app seem to run into this problem sooner or later.  But the IPhone users in the department running a similar app that performs the same function never experience the delays or missed pages.  The implication is that, if this is some kind of routing problem, the Apple push protocol is doing something to avoid it.  Which might be a simple as sending periodic heartbeat messages.
 
Current logic performs this re-register automatically if no push is received in the last 24 hours.  That is basically an attempt to cut down on my support requirements.  I figure that will fix a good percentage of the users not getting push messages before they have to call me and have told to do the manual reconnect procedure.

  

The only thing that this doesn't cover is users who periodically get pages 30 min late.  Since they are still getting the, the automatic re-register when 24 hours go by without a push message doesn't kick in.  So the next step is going to be to check the server time stamp to determine how long it took the message to arrive.  If it takes too long, probably more than 30 sec, the automatic re-register logic kicks in.


Do you have any server-side logs on when the devices received the message ( i.e. do you start a sync or other server request on receiving the message ) ? 

Yes, in most cases.  But the server request goes to another system and may have to be tracked down.  And when the end user sends in a trouble report, that will usually show a time stamp when the message was actually received.   Once we identify a solid report of this happening we should have not trouble documenting the message receive time.

I need some info to debug this - lots of google apps are using GCM and they register only on install, without retries - and so far I haven't heard of this kind of issue. If it can be narrowed down to specific device or operator it may help. 

I understand.  We will do what we can to get you the information you need.

This has been a fairly common complaint on this group.  Maybe another developer can provide a hard data point you can work with.
 

Costin

Thanks a bunch,
-Ken 

Costin Manolache

unread,
Jul 23, 2013, 2:39:46 AM7/23/13
to andro...@googlegroups.com
On Mon, Jul 22, 2013 at 10:10 AM, Kenneth Corbin <kencor...@gmail.com> wrote:
See below...

On Tuesday, July 16, 2013 4:44:02 PM UTC-7, Costin Manolache wrote:
On Tue, Jul 16, 2013 at 3:12 PM, Kenneth Corbin <kencor...@gmail.com> wrote:
I have no idea why a re-registration helps.  But  it apparently does.  Most users receiving the GCM pushes within seconds of being sent.  But occasional users have trouble with the push pages being delay to to an hour.  Or not arriving at all.  It has been observed that advising these user to perform a reconnect operation (which basically just sends Google another registration request) almost always fixes their problem.  If it is true that routers between the device and Google have gotten their routing confused, it may be the case that sending some kind of TCP message being sent from the device to Google servers and getting a response back was all that was required to straighten out the routing tables.  Or it may not.  All I know is that it usually fixes things.

Do you happen to use 'delay while idle' when sending the message ?

Cannot say for certain.  The senders are separate organizations and both have been out of touch over the weekend.  I have told them they need to specify delay_while_idle=false, but I need to positively confirm that they have done so.  Still working on that.
 
Can you reproduce this on any device ? If yes - could you send me the test device registration ID and some time interval when this happened ? Would making a request to a random URL also help ?
Also some logcats/bugreport from a device would help. 

Unfortunately no.  This is a problem that hits a small number of devices for no reason that we can determine.  Once it starts happening, it tends to keep happening until the device goes through the re-register sequence.  And workaround logic I have put in place does seem to have had the effect of reducing the user reports of this problem to not quite zero.  I have observed it on my phone maybe once or twice a year, before the workaround logic went into effect.  I have been putting the word out to all users that we would like to catch a hard example of a push message that did not arrive or was significantly delayed.  I will let you know when we get one. 

A registration Id of a device having this issue and some time interval when it happened would be nice. 
 

I'm pretty sure there is absolutely no connection between the registration request and the message delivery on the server side - by the time you register again the message is already queued. I can imagine some TCP magic happening, but not sure when. 

The latest intriguing theory is this is caused by some kind of routing confusing between the device and your servers.  And that it isn't the actual re-registration process that is fixing things.  That just sending a simple ping from the device to your servers and getting a response back would have had the same effect.  If I knew the URL or IP address that GCM tries to communicate with from the server side, I would like to try sending a ping request every hour instead of a re-register request once a day and see if that improves things.

I have one user reporting that all of the Android phone users in his department running our app seem to run into this problem sooner or later.  But the IPhone users in the department running a similar app that performs the same function never experience the delays or missed pages.  The implication is that, if this is some kind of routing problem, the Apple push protocol is doing something to avoid it.  Which might be a simple as sending periodic heartbeat messages.

GCM does send periodic heartbeat messages ( interval depends on network type / operator ), and if the hearbeat fails it would reconnect. You can get some information (if you can reproduce the issue ) using tcpdump, on a developer device.

A known issue that may delay messages is a Wifi access point / NAT with too many connections, and which is dropping connections it can't handle without sending RST. 

Note that the registration is done on different servers than connection, and the messages you receive are already queued ( since you receive them after the dummy registration call, I assume sent earlier ), the new registration is not involved. It looks like it's just the waking up the network that helps you.


 
 
Current logic performs this re-register automatically if no push is received in the last 24 hours.  That is basically an attempt to cut down on my support requirements.  I figure that will fix a good percentage of the users not getting push messages before they have to call me and have told to do the manual reconnect procedure.

  

The only thing that this doesn't cover is users who periodically get pages 30 min late.  Since they are still getting the, the automatic re-register when 24 hours go by without a push message doesn't kick in.  So the next step is going to be to check the server time stamp to determine how long it took the message to arrive.  If it takes too long, probably more than 30 sec, the automatic re-register logic kicks in.


Do you have any server-side logs on when the devices received the message ( i.e. do you start a sync or other server request on receiving the message ) ? 

Yes, in most cases.  But the server request goes to another system and may have to be tracked down.  And when the end user sends in a trouble report, that will usually show a time stamp when the message was actually received.   Once we identify a solid report of this happening we should have not trouble documenting the message receive time.

If you can't get a specific device - at least general information on device version / type, or if there is any IP pattern ( operators, or if it's wifi)
 

I need some info to debug this - lots of google apps are using GCM and they register only on install, without retries - and so far I haven't heard of this kind of issue. If it can be narrowed down to specific device or operator it may help. 

I understand.  We will do what we can to get you the information you need.

This has been a fairly common complaint on this group.  Maybe another developer can provide a hard data point you can work with.

So far I'm not aware of any reproducible issue. We know that some firewalls may block the connection, bad routers/NATs may break it - but neither would apply to what you describe, the messages will either never be received regardless of registration requests, or the keepalive would trigger a reconnect and then messages would be delivered. 

Keep in mind that any 'periodic ping' costs battery life and network bytes.

Costin
 
 

Costin

Thanks a bunch,
-Ken 

--
You received this message because you are subscribed to the Google Groups "android-gcm" group.
To unsubscribe from this group and stop receiving emails from it, send an email to android-gcm...@googlegroups.com.
To post to this group, send email to andro...@googlegroups.com.

Kenneth Corbin

unread,
Jul 30, 2013, 11:30:20 AM7/30/13
to andro...@googlegroups.com
We identified a one time case where a GCM push was not received.

Alert was pushed out to a group of devices on 7/29/2013 at 8:57 PDT.  Alert was received by multiple devices, including
registration_id: APA91bF8nluS4b3_D-70vsnxMLgsHmQGAJxaqDmOvjMRYT3EzBQYzxXA3QCaW33D4bgkDgvUzyJx4qDMEZa9ScOQ8j5e5WFVlqgX7svJmOoGte02mTX8ie_aEPaqCU87NKKIfgbdZCpOFJGe6wx0vhSoHQb7qSuzyw

It was not received by device 
registration_id: APA91bH2jUYHvhkIM8NxM-t9e98vZNXQQHlA_TAir4GjaxF7xRKAS2P9FEfo6v0P-uKQcm1oeyPJhEabZXeMC-20S5-uWY-5awyccGYKZhPkQe07KfLi-ECIoC8Qk6trjObfPtjjIsYsoV2ejpWGXdxTRDXfVX4K7w

The device was connected through a home WiFi router at the time.  The alert was never received.

I do not have the server logs available.  Still working getting them, should get some information by this afternoon.

This may not be a typical case, the device in question did receive push alerts the day before and twice later that day.  It is just this one that went missing.  When this was a common problem it was more typical for the lost or delayed push problem to continue impacting that device until we did something to address it.

Thanks a bunch,
-Ken

Kenneth Corbin

unread,
Aug 2, 2013, 10:56:09 AM8/2/13
to andro...@googlegroups.com
Another push failure has been reported.

Some time before  12:25PM (PDT) on 8/1, server pushed a confirmation message to
Registration_id: APA91bG1XjopJzd1efYPqY_h3WXzrqYgQ75kJsyholsdPa5_KqLIdLKr22-CAzEAp9-OvJMpPi5VjVDoF2MSAD_tM1w2R5ebgnq7ycBygGy39YQ9b_EXYhkCFDYnpF9HdoiSyqYCkVmuQhoRObvoOOb-oRRPLN36cA

Message was never received by device.  This was the first attempt to push a message to this device, so possible setup or configuration problems with device may be responsible.

Kenneth Corbin

unread,
Aug 5, 2013, 11:50:45 AM8/5/13
to andro...@googlegroups.com
More info....
The server operator reports that this actually happened on 7/31, but he didn't give me any specific times.
User report that restarting the device cleared up the problem

Kenneth Corbin

unread,
Aug 5, 2013, 12:09:29 PM8/5/13
to andro...@googlegroups.com
Another report of a delayed GCM push
Push message sent to
registration_id: APA91bHEbvyE8cdNCpe0vxvGkN9oEWbeuUfmITseXgt1G7Db0nWRGeCkkjx8Wyz4A-6avP8DzgebvECAOGlQh6RGzU1Tfa1kLd6N9H3d_e92nKtIwumqM0jdTTOyF4ECB3tG3Ofh9E8hX2yLRRWw4Z8Rbc_Rzg7xkw
at 2013-08-05 02:38:19 (UMT)
was received at 02:44:09

On Thursday, July 11, 2013 1:08:52 PM UTC-7, Andreas Be wrote:

Costin Manolache

unread,
Aug 5, 2013, 2:54:56 PM8/5/13
to andro...@googlegroups.com
Looks like something was closing the device connection every 5 minutes, may be an overloaded router or something else. When the connection is not stable there are some delays, in particular if multiple data centers are involved - both because of the phone backing off on reconnect ( to avoid overwhelming the network or server ), and due to replication delays. 

Costin







--
You received this message because you are subscribed to the Google Groups "android-gcm" group.
To unsubscribe from this group and stop receiving emails from it, send an email to android-gcm...@googlegroups.com.
To post to this group, send email to andro...@googlegroups.com.

Kenneth Corbin

unread,
Aug 5, 2013, 3:58:07 PM8/5/13
to andro...@googlegroups.com
Is there anything we can do about that?

Is it plausible that the new registration request we initiate from the device would clear up the routing issue some way?

Thanks for checking on this....
-Ken

Kenneth Corbin

unread,
Aug 20, 2013, 10:37:57 AM8/20/13
to andro...@googlegroups.com
Had another delayed push report.
registration_id: APA91bFeUEI_bpuVMt0GSLJvhgCJ0WehBiP0ZyOSaiwhn1wwObUqB4m-0OK5UJ-1Bh78ovXDT0VcwWWofhfpXAT5u6jABohIOLJGgLz5hsrnHCEqyLI-9Byjx2FjpMNtmvcGiK7ak7RHugm6uuwhIOm35xc3H7VSyA
Push message was sent to the device on 2013-08-19 15:00:29 (UMT) was received about 11 minutes later.

Complaints about delayed push messages are much much less common than they used to be.  I do not know if you changed anything, or my automatic re-register logic is responsible, or if there was some random Internet wierdness going on at the time.

On Thursday, July 11, 2013 1:08:52 PM UTC-7, Andreas Be wrote:

Kenneth Corbin

unread,
Aug 21, 2013, 5:49:04 PM8/21/13
to andro...@googlegroups.com
Two  more delayed push reports.  Both by same user
Registration ID: 
APA91bHEbvyE8cdNCpe0vxvGkN9oEWbeuUfmITseXgt1G7Db0nWRGeCkkjx8Wyz4A-6avP8DzgebvECAOGlQh6RGzU1Tfa1kLd6N9H3d_e92nKtIwumqM0jdTTOyF4ECB3tG3Ofh9E8hX2yLRRWw4Z8Rbc_Rzg7xkw

First message push at 2013-08-21 18:27:35 (UMT) arrived at 18:32:26
Second message push at 2013-08-21 12:55:09 (UMT) arrived at 13:04.06

The reconnect fix does not seem to be working for this user.  I will ask if the problem seems to be network or location specific.

On Thursday, July 11, 2013 1:08:52 PM UTC-7, Andreas Be wrote:

Kenneth Corbin

unread,
Aug 22, 2013, 11:35:35 AM8/22/13
to andro...@googlegroups.com
User reports that problem is not network specific.  He has experienced delays on the carrier network, his own WiFi network, and other WiFi networks.

Aaron Drydale

unread,
Aug 22, 2013, 10:51:05 PM8/22/13
to andro...@googlegroups.com
I had this issue and it was resolved by changing the power saving options (this was on an HTC One). The "Power saver" was turning off the data connection when the screen was turned off and only syncing up at odd intervals (sometimes 5+ minutes after the GCM push was sent). After preventing the power saver option from killing the data connection GCM messages are received much, much faster (within about 15 seconds at the worst). Not sure if it's the same issue but it corrected mine (and I was getting a little annoyed by it!)
Reply all
Reply to author
Forward
0 new messages