gRPC Android stub creation best practice? DEADLINE_EXCEEDED but no request made to server

605 views
Skip to first unread message

davis....@gmail.com

unread,
Jan 16, 2019, 8:30:42 PM1/16/19
to grpc.io

I believe I may not understand something about how gRPC Channels, Stubs, And Transports work. I have an Android app that creates a channel and a single blocking stub and injects it with dagger when the application is initialized. When I need to make a grpc call, I have a method in my client, that calls a method with that stub. After the app is idle a while, all of my calls return DEADLINE_EXCEEDED errors, though there are no calls showing up in the server logs.

@Singleton
@Provides
fun providesMyClient(app: Application): MyClient {
    val channel = AndroidChannelBuilder
            .forAddress("example.com", 443)
            .overrideAuthority("example.com")
            .context(app.applicationContext)
            .build()
    return MyClient(channel)
}

Where my client class has a function to return a request with a deadline:

class MyClient(channel: ManagedChannel) {
private val blockingStub: MyServiceGrpc.MyServiceBlockingStub = MyServiceGrpc.newBlockingStub(channel)

fun getStuff(): StuffResponse =
        blockingStub
                .withDeadlineAfter(7, TimeUnit.SECONDS)
                .getStuff(stuffRequest())
}
fun getOtherStuff(): StuffResponse =
        blockingStub
                .withDeadlineAfter(7, TimeUnit.SECONDS)
                .getOtherStuff(stuffRequest())
}

I make the calls to the server inside a LiveData class in My Repository, where the call looks like this: myClient.getStuff()

I am guessing that the channel looses its connection at some point, and then all of the subsequent stubs simply can't connect, but I don't see anywhere in the AndroidChannelBuilder documentation that talks about how to handle this (I believed it reconnected automatically). Is it possible that the channel I use to create my blocking stub gets stale, and I should be creating a new blocking stub each time I call getStuff()? Any help in understanding this would be greatly appreciated.

davis....@gmail.com

unread,
Jan 17, 2019, 3:31:01 PM1/17/19
to grpc.io

After researching a bit, I believe the issue was that the proxy on the server was closing the connection after a few minutes of idle time, and the client ManagedChannel didn't automatically detect that and connect again when that happened. When constructing the ManagedChannel, I added an idleTimeout to it, which will proactively kill the connection when it's idle, and reestablish it when it's needed again, and this seems to solve the problem. So the new channel construction looks like this:

@Singleton
@Provides
fun providesMyClient(app: Application): MyClient {
    val channel = AndroidChannelBuilder
            .forAddress("example.com", 443)
            .overrideAuthority("example.com")
            .context(app.applicationContext)

            .idleTimeout(60, TimeUnit.SECONDS)
            .build()
    return MyClient(channel)
}
To anyone who might see this, does that seem like a plausible explanation?

robert engels

unread,
Jan 17, 2019, 3:32:26 PM1/17/19
to davis....@gmail.com, grpc.io
Yes, it is might be more efficient to use keep-alives rather than destroying and rebuilding the connections - but that will depend on your setup/usage.

--
You received this message because you are subscribed to the Google Groups "grpc.io" group.
To unsubscribe from this group and stop receiving emails from it, send an email to grpc-io+u...@googlegroups.com.
To post to this group, send email to grp...@googlegroups.com.
Visit this group at https://groups.google.com/group/grpc-io.
To view this discussion on the web visit https://groups.google.com/d/msgid/grpc-io/1202aad5-4897-4bbb-a238-34edae74e368%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

robert engels

unread,
Jan 17, 2019, 3:48:33 PM1/17/19
to Bryant Davis, grpc.io

It should - you just need a time lower than proxy timeout.

Which is better depends a lot on how many simultaneous connections you expect to the server (i.e. how many client processes/machines). If small it would be much more efficient and provide better latency to use the keep alives rather than rebuilding the connection. If a lot of requests are coming in it won’t even matter.

On Jan 17, 2019, at 2:41 PM, Bryant Davis <davis....@gmail.com> wrote:

Thanks for your response, Robert!  I wasn't clear on how the keepalives work, and saw some warning in the docs about increasing load on servers, but perhaps thats better than redoing the tls handshake each time?  There seem to be 3 options: keepAliveTime, keepAliveTimeout, and keepAliveWithoutCalls.  I suppose I would use keepAliveTime, and that would prevent the connection from closing?

Eric Gribkoff

unread,
Jan 17, 2019, 6:51:57 PM1/17/19
to davis....@gmail.com, grpc.io
On Thu, Jan 17, 2019 at 12:31 PM <davis....@gmail.com> wrote:

After researching a bit, I believe the issue was that the proxy on the server was closing the connection after a few minutes of idle time, and the client ManagedChannel didn't automatically detect that and connect again when that happened. When constructing the ManagedChannel, I added an idleTimeout to it, which will proactively kill the connection when it's idle, and reestablish it when it's needed again, and this seems to solve the problem. So the new channel construction looks like this:

@Singleton
@Provides
fun providesMyClient(app: Application): MyClient {
    val channel = AndroidChannelBuilder
            .forAddress("example.com", 443)
            .overrideAuthority("example.com")
            .context(app.applicationContext)
            .idleTimeout(60, TimeUnit.SECONDS)
            .build()
    return MyClient(channel)
}
To anyone who might see this, does that seem like a plausible explanation?


The explanation seems plausible, but I would generally expect that when the proxy closes the connection, this would be noticed by the gRPC client. For example, if the TCP socket is closed by the proxy, then the managed channel will see this and try to reconnect. Can you provide some more details about what proxy is in use, and how you were able to determine that the proxy is closing the connection?

If you can deterministically reproduce the DEADLINE_EXCEEDED errors from the original email, it may also be helpful to ensure that you observe the same behavior when using OkHttpChannelBuilder directly instead of AndroidChannelBuilder. AndroidChannelBuilder is only intended to respond to changes in the device's internet state, so it should be irrelevant to detecting (or failing to detect) server-side disconnections, but it's a relatively new feature and would be worth ruling it out as a source of the problem.

Thanks,

Eric


  

On Wednesday, January 16, 2019 at 7:30:42 PM UTC-6, davis....@gmail.com wrote:

I believe I may not understand something about how gRPC Channels, Stubs, And Transports work. I have an Android app that creates a channel and a single blocking stub and injects it with dagger when the application is initialized. When I need to make a grpc call, I have a method in my client, that calls a method with that stub. After the app is idle a while, all of my calls return DEADLINE_EXCEEDED errors, though there are no calls showing up in the server logs.

@Singleton
@Provides
fun providesMyClient(app: Application): MyClient {
    val channel = AndroidChannelBuilder
            .forAddress("example.com", 443)
            .overrideAuthority("example.com")
            .context(app.applicationContext)
            .build()
    return MyClient(channel)
}

Where my client class has a function to return a request with a deadline:

class MyClient(channel: ManagedChannel) {
private val blockingStub: MyServiceGrpc.MyServiceBlockingStub = MyServiceGrpc.newBlockingStub(channel)

fun getStuff(): StuffResponse =
        blockingStub
                .withDeadlineAfter(7, TimeUnit.SECONDS)
                .getStuff(stuffRequest())
}
fun getOtherStuff(): StuffResponse =
        blockingStub
                .withDeadlineAfter(7, TimeUnit.SECONDS)
                .getOtherStuff(stuffRequest())
}

I make the calls to the server inside a LiveData class in My Repository, where the call looks like this: myClient.getStuff()

I am guessing that the channel looses its connection at some point, and then all of the subsequent stubs simply can't connect, but I don't see anywhere in the AndroidChannelBuilder documentation that talks about how to handle this (I believed it reconnected automatically). Is it possible that the channel I use to create my blocking stub gets stale, and I should be creating a new blocking stub each time I call getStuff()? Any help in understanding this would be greatly appreciated.

--

robert engels

unread,
Jan 17, 2019, 7:13:51 PM1/17/19
to Eric Gribkoff, davis....@gmail.com, grpc.io
A lot of proxies - at least the firewall kind - don’t implement the TCP protocol to close the connection for an idle timeout - they just drop their own side / mapping.

When you attempt to send via that connection later on, then you will get the error atthe TCP layer.

Eric Anderson

unread,
Jan 22, 2019, 8:08:09 PM1/22/19
to robert engels, Eric Gribkoff, davis....@gmail.com, grpc.io
On Thu, Jan 17, 2019 at 4:13 PM robert engels <ren...@earthlink.net> wrote:
A lot of proxies - at least the firewall kind - don’t implement the TCP protocol to close the connection for an idle timeout - they just drop their own side / mapping.

When you attempt to send via that connection later on, then you will get the error atthe TCP layer.

Yes, but that shouldn't trigger a DEADLINE_EXCEEDED unless the RPC was already in progress. If the RPC was already in progress, then idleTimeout() wouldn't work.

So here's the design for handling network-level disconnections:
  1. While at least one RPC is in progress, you want HTTP/2-level keepalives. This is controlled by keepaliveTime() on the builder.
  2. When no RPCs are in progress, you don't want to do keepalives as they are just a waste. Instead, you want to simply disconnect the connection if it is unused for a duration. This is controlled by idleTimeout() on the builder.
It is possible to do keepalives in case 2 via keepAliveWithoutCalls() on the builder, but it is generally discouraged unless you have very low latency requirements.

So if idleTimeout() fixed the problem for you... that's unexpected for a DEADLINE_EXCEEDED problem. But in any case, you probably want to set idleTimeout() and keepaliveTime() on the builder.

davis....@gmail.com

unread,
Jan 23, 2019, 4:38:30 PM1/23/19
to grpc.io
Hi Eric, Thank you very much, I really appreciate your help and explanation.  I will make sure to set both of those options on my builder.

Best,

Bryant
Reply all
Reply to author
Forward
0 new messages