Timeouts while sending notifications

Prashant Chaubey

unread,

Oct 2, 2020, 6:55:24 AM10/2/20

to pushy

Hello,

We use pushy in a high throughput environment with multiple instances. In a single instance if we use a singleton ApnsClient then everything works fine (We join the future immediately with a timeout as we don't want to deal with the futures). However, whenever we try to use this singleton instance inside an executor by matching the number of threads in the executor with the number of threads in the event loop group for the client we start getting a large number of timeouts (while joining the response future).

According to me, in theory, because we now have the same number of threads and connections enabled for an ApnsClient, if we use it inside an executor it should work fine.

One more thing is maybe because we are still at 0.13.10 this could be causing the issue but I was wondering how suddenly started.

PS:

java: 11.0.7

pushy: 0.13.10

Arvind

unread,

Oct 2, 2020, 7:00:24 PM10/2/20

to pushy

Hello,

We started experiencing similar timeouts in our push service recently, starting ~ Sep 19th. Our service is also on Pushy 0.13.10.

Looking at trace logs from ApnsClientHandler, we're not receiving response headers from APNS gateway. So all we see are "Wrote headers on stream..." and "Wrote payload on stream..." logs.

Pings right after these kinds of failure get a corresponding response just fine so the connection appears to be healthy.

Has anyone run into this before? Before I enable netty level tracing, I wanted to ask if there are any traces in Pushy that can help diagnose further.

Thanks!

Jon Chambers

unread,

Oct 2, 2020, 7:13:58 PM10/2/20

to pushy

This is all sounding very, very strange.

Can you folks please clarify what you mean by "timeouts?"

Regarding concurrency, @Prashant, I'm not quite sure if I understand your setup, but ApnsClient instances are thread-safe. You may find the best practices wiki page helpful for general guidance, but generally speaking, it shouldn't matter how many threads you're using from outside of a client.

Arvind, thank you for the added detail about the timing of this change; it does sound like SOMETHING has changed upstream, but it's not clear what. To that end, yes, I'd recommend turning on frame-level logging via ApnsClientBuilder#setFrameLogger.

-Jon

Arvind

unread,

Oct 2, 2020, 9:51:17 PM10/2/20

to pushy

Thanks Jon, I'll use the framelogger to gather more details.

Regarding timeouts - Our push server allows consumers to send a batch of notifications in one request. We use a CountDownLatch to track the status of the batch, counting down as each send operation completes. We await for the latch to count down to 0 with a 10 second timeout. Our batch sizes are fairly small (<=50), we've never ever hit this 10 second timeout until recently.

Prashant Chaubey

unread,

Oct 5, 2020, 4:53:56 AM10/5/20

to pushy

Hi Jon,

Thanks for the reply!

Regarding my setup, I have a Singleton instance of ApnsClinet with let's say X number of connections and same X number of threads in the assigned event loop group. Now I am using this client inside an Executor for batch sending notifications (where multiple threads will be using this client to send notifications). I have set same X threads for Executor so that in theory it can utilize X number of connections/threads assigned to the client earlier.

Regarding timeouts, we know that ApnsClient returns a PushNotificationFuture. So we are using pushNotificationFuture.get(Y, TimeUnit.Seconds) to try getting the response straight away as we have an executor on top and a good amount of servers doing the same so we don't have problems with throughput and we don't want to deal with futures in upstream code.

Now the problem is that recently we start getting a TimeoutException in pushNotificationFuture.get(Y, TimeUnit.Seconds). I am using 0.13.10 for pushy.

Michele Lorenzini

unread,

Oct 7, 2020, 5:12:37 AM10/7/20

to pushy

Hi all, we have experienced similar problem starting from september 21.

In that case we had a setup with pushy 0.13.10 and we also called get() on PushNotificationFuture (without specify a timeout) and the application hanged: from the jvm dump we see that threads were waiting on DefaultPromise.await().

After that we upgraded to Pushy 0.14.1 and Netty 4.1.52.Final, and also added a timeout of 10 seconds in get method.

Now we see some threads sometimes reaching the timeout condition, and exit the get method.

After furthed investigation, we see that:
- some timeouts occours many times on the same device/token in a sequence of sends, also on different instances of the application.

- having a trace, we see that the response from Apple is different in length than the others where the thread got the timeout after 10 sec:

16:49:17.792914 IP xxx.xxx.xxx.xxx > 17.188.134.24.443: Flags [P.], seq 815271:815862, ack 128037, win 32848, options [nop,nop,TS val 1606204565 ecr 3349113116], length 591

16:49:17.894623 IP 17.188.134.24.443 > xxx.xxx.xxx.xxx: Flags [P.], seq 128037:128079, ack 815271, win 1452, options [nop,nop,TS val 3349113323 ecr 1606204565], length 42

16:49:17.954627 IP 17.188.134.24.443 > xxx.xxx.xxx.xxx: Flags [P.], seq 128079:128157, ack 815862, win 1452, options [nop,nop,TS val 3349113383 ecr 1606204565], length 78

16:49:18.095096 IP xxx.xxx.xxx.xxx > 17.188.134.24.443: Flags [.], ack 128157, win 32848, options [nop,nop,TS val 1606204566 ecr 3349113323], length 0

16:49:21.539622 IP xxx.xxx.xxx.xxx > 17.188.134.24.443: Flags [P.], seq 815862:816465, ack 128157, win 32848, options [nop,nop,TS val 1606204572 ecr 3349113323], length 603

16:49:21.701444 IP 17.188.134.24.443 > xxx.xxx.xxx.xxx: Flags [P.], seq 128157:128235, ack 816465, win 1452, options [nop,nop,TS val 3349117129 ecr 1606204572], length 78

16:49:21.815076 IP xxx.xxx.xxx.xxx > 17.188.134.24.443: Flags [.], ack 128235, win 32848, options [nop,nop,TS val 1606204573 ecr 3349117129], length 0

16:49:21.828051 IP xxx.xxx.xxx.xxx > 17.188.134.24.443: Flags [P.], seq 816465:817054, ack 128235, win 32848, options [nop,nop,TS val 1606204573 ecr 3349117129], length 589

16:49:21.989825 IP 17.188.134.24.443 > xxx.xxx.xxx.xxx: Flags [P.], seq 128235:128313, ack 817054, win 1452, options [nop,nop,TS val 3349117418 ecr 1606204573], length 78

16:49:22.025028 IP xxx.xxx.xxx.xxx > 17.188.134.24.443: Flags [.], ack 128313, win 32848, options [nop,nop,TS val 1606204573 ecr 3349117418], length 0

16:49:23.865899 IP xxx.xxx.xxx.xxx > 17.188.134.24.443: Flags [P.], seq 817054:817656, ack 128313, win 32848, options [nop,nop,TS val 1606204577 ecr 3349117418], length 602

16:49:24.027753 IP 17.188.134.24.443 > xxx.xxx.xxx.xxx: Flags [P.], seq 128313:128391, ack 817656, win 1452, options [nop,nop,TS val 3349119456 ecr 1606204577], length 78

16:49:24.061083 IP xxx.xxx.xxx.xxx > 17.188.134.24.443: Flags [P.], seq 817656:818051, ack 128391, win 32848, options [nop,nop,TS val 1606204577 ecr 3349119456], length 395

16:49:24.222903 IP 17.188.134.24.443 > xxx.xxx.xxx.xxx: Flags [P.], seq 128391:128469, ack 818051, win 1452, options [nop,nop,TS val 3349119651 ecr 1606204577], length 78

16:49:24.295059 IP xxx.xxx.xxx.xxx > 17.188.134.24.443: Flags [.], ack 128469, win 32848, options [nop,nop,TS val 1606204578 ecr 3349119651], length 0

16:49:28.789315 IP xxx.xxx.xxx.xxx > 17.188.134.24.443: Flags [P.], seq 818051:818736, ack 128469, win 32848, options [nop,nop,TS val 1606204587 ecr 3349119651], length 685

16:49:28.954463 IP 17.188.134.24.443 > xxx.xxx.xxx.xxx: Flags [P.], seq 128469:128547, ack 818736, win 1452, options [nop,nop,TS val 3349124382 ecr 1606204587], length 78

16:49:29.075023 IP xxx.xxx.xxx.xxx > 17.188.134.24.443: Flags [P.], seq 818736:819399, ack 128547, win 32848, options [nop,nop,TS val 1606204587 ecr 3349124382], length 663

16:49:29.236960 IP 17.188.134.24.443 > xxx.xxx.xxx.xxx: Flags [P.], seq 128547:128625, ack 819399, win 1452, options [nop,nop,TS val 3349124665 ecr 1606204587], length 78

16:49:29.305051 IP xxx.xxx.xxx.xxx > 17.188.134.24.443: Flags [.], ack 128625, win 32848, options [nop,nop,TS val 1606204588 ecr 3349124665], length 0

2020-10-06 16:49:27,734 ERROR - APNS Timeout

It seems in some case the I/O threads cannot handle Apple response properly and do not interrupt the waiting thread.

Thanks,

M.

Jon Chambers

unread,

Oct 7, 2020, 10:45:13 AM10/7/20

to pushy

Thanks, Michele! This is a really important clue.

It's sound very much like there was some kind of upstream change some time around September 19-21, which closely coincides with the release of iOS 14. I hate to ask for more, but is there any possibility you can capture some frame logs as described earlier in this thread? That would be an enormous help for figuring out what's changed.

For what it's worth, while I do see some updated recommendations around connection management in the docs, I don't see any protocol-level changes that would explain this behavior.

Thanks kindly!

-Jon

Petr Dvořák

unread,

Oct 8, 2020, 3:40:18 AM10/8/20

to pushy

Hello everyone, could someone please assist us with collecting the frame logs? We see a similar issue but do not have much experience with Netty debugging. Could we use our logger instance (org.slf4j.LoggerFactory.getLogger(Clazz.class)) for this somehow?

Douglas Oliveira

unread,

Oct 8, 2020, 1:59:25 PM10/8/20

to pushy

Hey guys,

While also experiencing this issue with Prashant, we collected a bit more information and colelct the httpframe logs to help understand a bit more.

Here are some more evidences:

- We are on pushy 0.13.10, will soon try to update to 0.14.x, but it looks like the problem is the same as per the other comments in this chain of emails;

- To make it easier to troubleshoot and debug the issue, we are sending the APNS notifications in sequence now (not parallel), doing a join in the send message future straight after triggering it, to help us isolate the issue.

- Even after sending the notifications one at a time, we are still experiencing the client hanging - which started around the past 2-3 weeks;

- Because of this we removed the configuration of eventLoops or maxCurrentConnections in our client, once again to help isolate. We used to set them, but when we started getting this issues we removed parallelization altogether to make it easier to troubleshoot the issue. We are still finding that situation where the client gets stuck completing the promise to send the notificatoin. (we set a timeout so that its not stuck)

This is all the properties we are setting when declaring our client as a singleton:

ApnsClientBuilder()
.setApnsServer(host)
.setFrameLogger(new Http2FrameLogger(LogLevel.DEBUG))
.setSigningKey(ApnsSigningKey.loadFromInputStream(stream, team, keyId)
.build();
(we removed setting the eventGroupLoop and ConcurrentConnections when troubleshooting this issue since we are sending notifications one at a time now)

and then with that client we send notifications like this:

apnsClient
.sendNotification(notification)
.get(150, TimeUnit.SECONDS);
(we added a timeout after the client started hanging, we left it a bit on the high side with 150s)

- Another important piece of information, once we reach that state where the client gets stuck, because now we have a timeout and move on to the next notification, once it reaches that state, we get in a very unstable state where a lot of notifications continue to hang and cause more timeouts, one after another. I'd say once it hangs the first time, we se a pattern of then always we can process say the next 5-7 notifications, but the next one will hang again and timeout, then we get a few more notifications sent successfully (say another 5-7), and then get a new timeout. Almost as if the client is in a bad state. Once we restart our application, and a new singleton client is generated from scratch, everything works fine for the next few hours until we start getting the client to hang and timeout again. So it looks to me to relate more to a bad state in the client then to specific notification data we are sending or something with APNS, given that a restart fixes it for the next few hours.

Now moving on to the important part, the logs with a bit of tracing and httpframe logs when the operation hangs:

2020-10-08 16:37:48.495 [nioEventLoopGroup-4-1] DEBUG p.i.n.h.codec.http2.Http2FrameLogger - [id: xxxxxxx, L:/xxx.xxx.xxx.xxx:xxx - R:api.push.apple.com/17.188.140.26:443] OUTBOUND HEADERS: streamId=34841 headers=DefaultHttp2Headers[:method: POST, :authority: api.push.apple.com, :path: /3/device/XXXXXXXXXXXXXXXXXXXXXXXXXXXX, :scheme: https, apns-expiration: 0, apns-priority: 10, apns-push-type: alert, apns-topic: xxxxxxxx, authorization: bearer xxxxxxxxxxx] streamDependency=0 weight=16 exclusive=false padding=0 endStream=false

2020-10-08 16:37:48.495 [nioEventLoopGroup-4-1] TRACE c.turo.pushy.apns.ApnsClientHandler - Wrote headers on stream 34841: DefaultHttp2Headers[:method: POST, :authority: api.push.apple.com, :path: /3/device/xxxxxxxxxx, :scheme: https, apns-expiration: 0, apns-priority: 10, apns-push-type: alert, apns-topic: xxxxxxx, authorization: bearer xxxxxxxxxxxxxxx]

2020-10-08 16:37:48.495 [nioEventLoopGroup-4-1] TRACE c.turo.pushy.apns.ApnsClientHandler - Wrote payload on stream 34841: {"sourceId":"xxxx","aps":{"alert":{"body":"xxxx"},"sound":"default","content-available":1},"signature":"xxxxxx","idAtSource":"xxxx","source":"xxxx","title":"xxxx" ","priority":"xxxx","userId":"xxx","url":"xxxxxx"}

2020-10-08 16:37:48.495 [nioEventLoopGroup-4-1] DEBUG p.i.n.h.codec.http2.Http2FrameLogger - [id: xxxxxxx, L:/xxx.xxx.xxx.xxx:xx - R:api.push.apple.com/17.188.140.26:443] OUTBOUND DATA: streamId=34841 padding=0 endStream=true length=488 bytes=xxxxxxxxxxxxxxxxxxxxxxxxxx...

2020-10-08 16:38:01.668 [nioEventLoopGroup-3-1] TRACE c.turo.pushy.apns.ApnsClientHandler - Sending ping due to inactivity.

2020-10-08 16:38:01.668 [nioEventLoopGroup-3-1] DEBUG p.i.n.h.codec.http2.Http2FrameLogger - [id: xxxxxxx, L:/xxx.xxx.xxx.xxx:xx - R:api.push.apple.com/17.188.132.182:443] OUTBOUND PING: ack=false bytes=1602175081668

2020-10-08 16:38:01.730 [nioEventLoopGroup-3-1] DEBUG p.i.n.h.codec.http2.Http2FrameLogger - [id: xxxxxxx, L:/xxx.xxx.xxx.xxx:xx - R:api.push.apple.com/17.188.132.182:443] INBOUND PING: ack=true bytes=1602175081668

2020-10-08 16:38:01.730 [nioEventLoopGroup-3-1] TRACE c.turo.pushy.apns.ApnsClientHandler - Received reply to ping.

2020-10-08 16:38:48.459 [nioEventLoopGroup-4-1] TRACE c.turo.pushy.apns.ApnsClientHandler - Sending ping due to inactivity.

2020-10-08 16:38:48.459 [nioEventLoopGroup-4-1] DEBUG p.i.n.h.codec.http2.Http2FrameLogger - [id: xxxxxxxx, L:/xxx.xxx.xxx.xxx:xx - R:api.push.apple.com/17.188.140.26:443] OUTBOUND PING: ack=false bytes=1602175128459

2020-10-08 16:38:48.521 [nioEventLoopGroup-4-1] DEBUG p.i.n.h.codec.http2.Http2FrameLogger - [id: xxxxxxxxx, L:/xxx.xxx.xxx.xxx:xx - R:api.push.apple.com/17.188.140.26:443] INBOUND PING: ack=true bytes=1602175128459

2020-10-08 16:38:48.521 [nioEventLoopGroup-4-1] TRACE c.turo.pushy.apns.ApnsClientHandler - Received reply to ping.

2020-10-08 16:39:01.730 [nioEventLoopGroup-3-1] TRACE c.turo.pushy.apns.ApnsClientHandler - Sending ping due to inactivity.

2020-10-08 16:39:01.730 [nioEventLoopGroup-3-1] DEBUG p.i.n.h.codec.http2.Http2FrameLogger - [id: xxxxxxx, L:/xxx.xxx.xxx.xxx:xx - R:api.push.apple.com/17.188.132.182:443] OUTBOUND PING: ack=false bytes=1602175141730

2020-10-08 16:39:01.793 [nioEventLoopGroup-3-1] DEBUG p.i.n.h.codec.http2.Http2FrameLogger - [id: xxxxxxx, L:/xxx.xxx.xxx.xxx:xx - R:api.push.apple.com/17.188.132.182:443] INBOUND PING: ack=true bytes=1602175141730

2020-10-08 16:39:01.793 [nioEventLoopGroup-3-1] TRACE c.turo.pushy.apns.ApnsClientHandler - Received reply to ping.

2020-10-08 16:39:48.521 [nioEventLoopGroup-4-1] TRACE c.turo.pushy.apns.ApnsClientHandler - Sending ping due to inactivity.

2020-10-08 16:39:48.522 [nioEventLoopGroup-4-1] DEBUG p.i.n.h.codec.http2.Http2FrameLogger - [id: xxxxxxx, L:/xxx.xxx.xxx.xxx:xx - R:api.push.apple.com/17.188.140.26:443] OUTBOUND PING: ack=false bytes=1602175188521

2020-10-08 16:39:48.584 [nioEventLoopGroup-4-1] DEBUG p.i.n.h.codec.http2.Http2FrameLogger - [id: xxxxxxxxxx, L:/xxx.xxx.xxx.xxx:xx - R:api.push.apple.com/17.188.140.26:443] INBOUND PING: ack=true bytes=1602175188521

2020-10-08 16:39:48.584 [nioEventLoopGroup-4-1] TRACE c.turo.pushy.apns.ApnsClientHandler - Received reply to ping.

But basically it seems like we send the outbound full payload and dont get an inboud httpframe log back, then we see others sent with pings and repeating themselves (where we get inbound backs) until the operation times out. And the pattern repeats itself again and again until we restart the service.

Douglas Oliveira

unread,

Oct 8, 2020, 2:09:57 PM10/8/20

to pushy

When the application is running fine and sending notifications normally without hanging it would look like this:

2020-10-08 18:02:06.458 [nioEventLoopGroup-4-1] DEBUG p.i.n.h.codec.http2.Http2FrameLogger - [id: xxxxxxxx, L:/xxx.xxx.xxx.xxx:xx - R:api.push.apple.com/17.188.130.151:443] OUTBOUND HEADERS: streamId=26515 headers=DefaultHttp2Headers[:method: POST, :authority: api.push.apple.com, :path: /3/device/xxxxxxxx, :scheme: https, apns-expiration: 0, apns-priority: 10, apns-push-type: alert, apns-topic: xxxxxx, authorization: bearer xxxxxxxxxxx] streamDependency=0 weight=16 exclusive=false padding=0 endStream=false

2020-10-08 18:02:06.458 [nioEventLoopGroup-4-1] TRACE c.turo.pushy.apns.ApnsClientHandler - Wrote headers on stream 26515: DefaultHttp2Headers[:method: POST, :authority: api.push.apple.com, :path: /3/device/xxxxxxxxxxxx, :scheme: https, apns-expiration: 0, apns-priority: 10, apns-push-type: alert, apns-topic: xxxxxxxxx, authorization: bearer xxxxxxxxxx]

2020-10-08 18:02:06.458 [nioEventLoopGroup-4-1] TRACE c.turo.pushy.apns.ApnsClientHandler - Wrote payload on stream 26515: {"sourceId":"xxx","aps":{"alert":{"body":"xxxxxx"},"sound":"xxxx","content-available":1},"signature":"xxxxxx","idAtSource":"xxxx","source":"xxxx","title":"xxxxx"","priority":"xxxx","userId":"xxxx","url":"xxxxxx"}

2020-10-08 18:02:06.458 [nioEventLoopGroup-4-1] DEBUG p.i.n.h.codec.http2.Http2FrameLogger - [id: xxxxxxx, L:/xxx.xxx.xxx.xxx:xx - R:api.push.apple.com/17.188.130.151:443] OUTBOUND DATA: streamId=26515 padding=0 endStream=true length=514 bytes=xxxxxxxxxx...

2020-10-08 18:02:06.522 [nioEventLoopGroup-4-1] DEBUG p.i.n.h.codec.http2.Http2FrameLogger - [id: xxxxxx, L:/xxx.xxx.xxx.xxx:xxx - R:api.push.apple.com/17.188.130.151:443] INBOUND HEADERS: streamId=26515 headers=DefaultHttp2Headers[:status: 200, apns-id: xxxxxxxxxxxx] padding=0 endStream=true

2020-10-08 18:02:06.522 [nioEventLoopGroup-4-1] TRACE c.turo.pushy.apns.ApnsClientHandler - Received headers from APNs gateway on stream 26515: DefaultHttp2Headers[:status: 200, apns-id: xxxxxxx]

in comparison, it looks like when the client hangs and gets into a bad state - what causes it is that we dont receive the inbound response back (in bold) when first sending the payload (as we receive in the example above when its working fine), then we get stuck with the pings until timeouts, and then again and again for the next notifications until we restart the service.

Jon Chambers

unread,

Oct 8, 2020, 2:15:35 PM10/8/20

to pushy

Douglas,

Thank you; these logs are EXTREMELY helpful.

It looks like the server simply isn't sending a reply to some notifications, which is EXTREMELY strange. This is definitely looking like an upstream bug to me. I'm not sure what to make of the "bad state" observation; it could be a result of stream starvation where the server's refusal to release streams forces the client to buffer new notifications indefinitely, but I think it'd take a while for that to happen.

One thing that's surprising to me here given your description of your threading setup is seeing multiple threads (nioEventLoopGroup-3-1 and -4-1) in the logs. Can you please confirm that those log messages are coming from the same channel? Also, can you share the UUID (apns-id) for one of the "stuck" messages? Please feel free to contact me directly if you're not comfortable sharing that on this list, though I don't think there's any danger in doing so.

Michele, I suspect the shorter-than-expected frame in your logs might be a PING frame as shown here.

-Jon

Douglas Oliveira

unread,

Oct 8, 2020, 2:42:24 PM10/8/20

to pushy

so, these logs are for one single node running for my service, with one notification at a time. One single source for these logs. Not sure what you are referring to per different channels?

One thing that I noticed, for the hours before the hang when the push notifications were being sent fine, all we had was nioEventLoopGroup-4-1 for absolutely all messages that we were getting the proper responses back.

Then when it hangs, it also sent the main payload message under nioEventLoopGroup-4-1, then a initial ping of inactivity also under nioEventLoopGroup-4-1, then I noticed patterns of it taking turns in between nioEventLoopGroup-4-1 and nioEventLoopGroup-3-1 for all other subsequent ping activities, one at a time. I'm not sure where this is coming from, but im sure I didn't set any event groups (my client builder is as mentioned above), and it only seems to switch between 4-1 and 3-1 alternatively when we get to the pings due to inactivity, otherwise everything happens under nioEventLoopGroup-4-1 when things are working.

One more thing that I noticed, I had different IDs in the logs where we have the INBOUND and OUTBOUND headers and different ports listed when it was switching between nioEventLoopGroup-4-1 and nioEventLoopGroup-3-1 for the inactivity pings, as in:
2020-10-08 18:02:06.522 [nioEventLoopGroup-4-1] DEBUG p.i.n.h.codec.http2.Http2FrameLogger - [id: xxxxxx, L:/xxx.xxx.xxx.xxx:xxx. would give different values, depending on eventLoopGroup4-1 or 3-1.

I will check what we can do about the APNS ID for a notification example that failed and get back to you, maybe tomorrow. Let me just see if its not a privacy concern for the customer first.

Jon Chambers

unread,

Oct 8, 2020, 7:07:37 PM10/8/20

to pushy

Thank you!

And please let me expand this call to everybody who's experiencing this issue: if you can share the UUID (`apns-id`) of a notification that times out and the approximate time of the timeout, that would be an enormous help in debugging this issue.

Cheers, folks!

-Jon

Jon Chambers

unread,

Oct 9, 2020, 10:16:25 AM10/9/20

to pushy

Also, I've consolidated a number of related issues into https://github.com/jchambers/pushy/issues/816. Let's certainly keep the discussion going here, but I did want to call out that there's now an "official home" for this issue.

Thanks!

-Jon

Message has been deleted

Sam Lai

unread,

Oct 14, 2020, 10:07:27 PM10/14/20

to pushy

I'm facing the same issue on 0.14.1 too.

As apple document mentioned that "If your provider server opens and closes its connection to APNs repeatedly, APNs may treat it as a denial-of-service attack and temporarily block your server from connecting".

May I know if restart is the only solution to temporary fix this issue now? I'm just worried about if we restart the server too often, APNS will block our server IP. Does anyone have any experience on it? Thanks a lot.

Jon Chambers

unread,

Oct 24, 2020, 11:08:55 AM10/24/20

to pushy

Folks,

I've received word that this problem may have been fixed upstream. Could you please check and let us know if the problem seems to be resolved for you and report back?

Thanks kindly!

-Jon

Arvind

unread,

Oct 24, 2020, 5:09:29 PM10/24/20

to pushy

Hi Jon,

The most recent failure for our server was today at 10:10:55 AM CST. Any idea if we will need to re-establish the connection(s) in order to benefit from the upstream fix?

Thanks,

Arvind.

Jon Chambers

unread,

Oct 24, 2020, 5:14:04 PM10/24/20

to pushy

I don't know specifically, but I'd say reconnecting certainly couldn't hurt.

-Jon

Michele Lorenzini

unread,

Oct 26, 2020, 3:39:54 AM10/26/20

to pushy

Last timeout received for us:

24/10/2020 08:31 UTC

we will monitor if there will be other cases,

thanks for your investigation.

M.

Michele Lorenzini

unread,

Oct 27, 2020, 11:46:01 AM10/27/20

to pushy

After that we have more timeouts starting from yesterday evening (26.10):

26/10/2020 17:39:42.989
26/10/2020 17:39:42.818
26/10/2020 17:39:40.803
26/10/2020 17:39:32.783
26/10/2020 17:39:32.679
26/10/2020 17:39:32.622
26/10/2020 17:39:32.521
26/10/2020 17:39:31.710
26/10/2020 17:39:30.626
26/10/2020 17:39:30.534
26/10/2020 17:39:30.521
26/10/2020 17:39:29.326
26/10/2020 17:39:28.688
26/10/2020 17:39:20.543
26/10/2020 17:39:19.150
26/10/2020 17:39:18.472
26/10/2020 17:39:17.769

26/10/2020 23:16:21.336
26/10/2020 23:16:06.257
26/10/2020 23:16:02.230
26/10/2020 23:16:00.905
26/10/2020 23:16:00.157
26/10/2020 23:15:57.935
26/10/2020 23:15:57.787
26/10/2020 23:15:57.250
26/10/2020 23:15:53.889
26/10/2020 23:15:51.797
26/10/2020 23:15:45.101
26/10/2020 23:15:41.034
26/10/2020 23:15:40.854
26/10/2020 23:15:39.231
26/10/2020 23:15:38.504
26/10/2020 23:15:37.365
26/10/2020 23:15:34.250
26/10/2020 23:15:33.811
26/10/2020 23:15:33.374
26/10/2020 23:15:32.426
26/10/2020 23:15:30.557
26/10/2020 23:15:23.350
26/10/2020 23:15:13.314
26/10/2020 23:15:12.999
26/10/2020 23:15:09.908
26/10/2020 23:15:09.166
26/10/2020 23:15:06.233
26/10/2020 23:15:05.524
26/10/2020 23:15:05.279
26/10/2020 23:15:03.857
26/10/2020 23:15:02.636
26/10/2020 23:14:59.819
26/10/2020 23:14:59.356
26/10/2020 23:14:58.637
26/10/2020 23:14:57.178
26/10/2020 23:14:54.309
26/10/2020 23:14:43.712
26/10/2020 23:14:43.290
26/10/2020 23:14:38.419
26/10/2020 23:14:38.412
26/10/2020 23:14:36.773
26/10/2020 23:14:34.987
26/10/2020 23:14:33.901
26/10/2020 23:14:33.135
26/10/2020 23:14:32.031
26/10/2020 23:14:30.091
26/10/2020 23:14:25.403
26/10/2020 23:14:24.105
26/10/2020 23:14:24.056
26/10/2020 23:14:19.923
26/10/2020 23:14:19.920
26/10/2020 23:14:19.915
26/10/2020 23:14:16.898
26/10/2020 23:14:14.451
26/10/2020 23:14:11.539
26/10/2020 23:14:11.063

26/10/2020 23:14:05.703

27/10/2020 03:18:47.758

(UTC+1)

Jon Chambers

unread,

Oct 27, 2020, 3:54:25 PM10/27/20

to pushy

Michele,

Mirroring some discussion from the github issue, we believe the problem was that Apple was silently dropping some HTTP/2 streams. Messages before and after the problematic streams on the same connection worked just fine. Can you verify that's what's happening in your case? If so, can you provide apns-id values (UUIDs) for the affected notifications?