--
You received this message because you are subscribed to the Google Groups "net-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to net-dev+unsubscribe@chromium.org.
To post to this group, send email to net...@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/net-dev/CAEkFr074d%2BSpSQ0VHESMsFQ_2OMYuLmem12GSBKkq2%2BFx-C22A%40mail.gmail.com.
To unsubscribe from this group and stop receiving emails from it, send an email to net-dev+u...@chromium.org.
Thanks everyone for the feedback!The goal is to investigate whether //net can open fewer sockets when //net knows a server supports QUIC and the QUIC connection establishment is waiting on DNS result.A bit of background: This is seen in a Cronet embedded app. When DNS takes a long time, //net opens 6 TCP sockets in addition to a QUIC connection. All these connect attempts are waiting on DNS.My question is:Do we really need to keep issuing ConnectJobs to the same origin when the previous attempts (including a QUiC connection attempt) are waiting on DNS?It looks to me that issuing backup ConnectJobs are unnecessary in this case. Once DNS completes, these backup ConnectJobs will waste CPU and network. I don't have any data on Chrome (only Cronet so far since the use case there is simpler) and I agree we need more data.
The problem seems to be lies in two areas:(1) Backup TCP jobs' timeout doesn't take into DNS resolution into account. If DNS takes a long time, we will create one backup TCP job every 250ms. All these ConnectJobs will be bound to the same host resolver job. We gain nothing by kicking off these backup jobs.
(2) When //net knows a server supports QUIC, we try to establish a QUIC connection. If that doesn't succeed within a period of time, we kick off a TCP connection attempt. However, that timeout logic doesn't take into account DNS.
Problem (1) makes (2) worse. If DNS takes on the order of seconds, we will have one QUIC connection and 6 TCP connections.A naive solution that came to mind is to lift DNS out of (1) and out of (2), so our timeout logic works even when DNS is taking a long time. A side benefit is that we will unify QUIC and non-QUIC DNS resolution paths.Is this making any sense. Thought?
My question is:Do we really need to keep issuing ConnectJobs to the same origin when the previous attempts (including a QUiC connection attempt) are waiting on DNS?It looks to me that issuing backup ConnectJobs are unnecessary in this case. Once DNS completes, these backup ConnectJobs will waste CPU and network. I don't have any data on Chrome (only Cronet so far since the use case there is simpler) and I agree we need more data.Just a slight challenge - the 'wasted CPU and network' isn't necessarily (or, one would think, generally) true for Chrome users, and in general, these are optimizations that help reduce the TTFB and latency, especially on slow connections. This is because the penalty (of additional connections) is only paid if the server supports H/2 or QUIC, and either it's our first observation (meaning it's quickly amortised for that connection) or it's a previous attempt with a slow DNS server and no cache. I would suspect that for most Chromium-based users, this doesn't hold, and so these serve as valuable optimizations.
Could you clarify which backup job? When I first read this, I thought you meant the IPv4 vs IPv6 backup job, which happens post-resolution.
A naive solution that came to mind is to lift DNS out of (1) and out of (2), so our timeout logic works even when DNS is taking a long time. A side benefit is that we will unify QUIC and non-QUIC DNS resolution paths.Is this making any sense. Thought?It makes sense, but I'm personally struggling on whether it's the right layering approach, given that the resolution path is not consistent between all sockets. Unifying it at a layer above will seemingly involve plumbing details down into it. I'm wondering whether an alternative approach - to allow the socket to signal (if) resolution is happening and when it ends might be suitable enough to allow backoffs by the above layer, without having to code in specific knowledge about whether or not a socket will do resolution as its connection process.
Thanks a lot, Ryan! Response inline.My question is:Do we really need to keep issuing ConnectJobs to the same origin when the previous attempts (including a QUiC connection attempt) are waiting on DNS?It looks to me that issuing backup ConnectJobs are unnecessary in this case. Once DNS completes, these backup ConnectJobs will waste CPU and network. I don't have any data on Chrome (only Cronet so far since the use case there is simpler) and I agree we need more data.Just a slight challenge - the 'wasted CPU and network' isn't necessarily (or, one would think, generally) true for Chrome users, and in general, these are optimizations that help reduce the TTFB and latency, especially on slow connections. This is because the penalty (of additional connections) is only paid if the server supports H/2 or QUIC, and either it's our first observation (meaning it's quickly amortised for that connection) or it's a previous attempt with a slow DNS server and no cache. I would suspect that for most Chromium-based users, this doesn't hold, and so these serve as valuable optimizations.You are absolutely right that in the case of Chrome not knowing whether a server supports H2 or QUIC, we shouldn't throttle connection establishments to the same origin. Those extra connection establishments are very important to TTFB and latency. We should preserve those optimizations.The use case that I am interested is where Chrome already knows a server supports QUIC. The linked NetLog (sorry, googlers-only) shows QUIC server support in HttpServerProperties. If we know we are going to use QUIC, can we be less aggressive in kicking off TCP/TLS connection establishments when the previous one is stuck in DNS? I think making our timeouts DNS-aware is a good thing to do.For DNS resolution to the same hostname, Miriam commented on the doc that these host resolver requests will be attached to the same host resolver job. So if we have a "previous attempt with a slow DNS server" that hasn't completed, subsequent attempts to the same origin will be bound to the previous attempt's host resolver job. Hence my argument that we don't gain anything by kicking off backup TCP ConnectJobs when the previous one is stuck in DNS.Could you clarify which backup job? When I first read this, I thought you meant the IPv4 vs IPv6 backup job, which happens post-resolution.The Backup TCP ConnectJob code is ClientSocketPoolBaseHelper::Group::StartBackupJobTimer() which is called in ClientSocketPoolBaseHelper::RequestSocketInternal()
The timeout is currently a hardcoded value of ClientSocketPool::kMaxConnectRetryIntervalMs = 250ms.A naive solution that came to mind is to lift DNS out of (1) and out of (2), so our timeout logic works even when DNS is taking a long time. A side benefit is that we will unify QUIC and non-QUIC DNS resolution paths.Is this making any sense. Thought?It makes sense, but I'm personally struggling on whether it's the right layering approach, given that the resolution path is not consistent between all sockets. Unifying it at a layer above will seemingly involve plumbing details down into it. I'm wondering whether an alternative approach - to allow the socket to signal (if) resolution is happening and when it ends might be suitable enough to allow backoffs by the above layer, without having to code in specific knowledge about whether or not a socket will do resolution as its connection process.I thought about this, but it seems that going down this path would get complicated very soon. I agree on the layering concern. Matt Menke also mentioned that with this approach we wouldn't be able to implement the new happy eyeball.
On Mon, Sep 25, 2017 at 7:26 PM Ryan Sleevi <rsl...@chromium.org> wrote:On Tue, Sep 26, 2017 at 7:47 AM, Helen Li <xunj...@chromium.org> wrote:Thanks everyone for the feedback!The goal is to investigate whether //net can open fewer sockets when //net knows a server supports QUIC and the QUIC connection establishment is waiting on DNS result.A bit of background: This is seen in a Cronet embedded app. When DNS takes a long time, //net opens 6 TCP sockets in addition to a QUIC connection. All these connect attempts are waiting on DNS.My question is:Do we really need to keep issuing ConnectJobs to the same origin when the previous attempts (including a QUiC connection attempt) are waiting on DNS?It looks to me that issuing backup ConnectJobs are unnecessary in this case. Once DNS completes, these backup ConnectJobs will waste CPU and network. I don't have any data on Chrome (only Cronet so far since the use case there is simpler) and I agree we need more data.Just a slight challenge - the 'wasted CPU and network' isn't necessarily (or, one would think, generally) true for Chrome users, and in general, these are optimizations that help reduce the TTFB and latency, especially on slow connections. This is because the penalty (of additional connections) is only paid if the server supports H/2 or QUIC, and either it's our first observation (meaning it's quickly amortised for that connection) or it's a previous attempt with a slow DNS server and no cache. I would suspect that for most Chromium-based users, this doesn't hold, and so these serve as valuable optimizations.The problem seems to be lies in two areas:(1) Backup TCP jobs' timeout doesn't take into DNS resolution into account. If DNS takes a long time, we will create one backup TCP job every 250ms. All these ConnectJobs will be bound to the same host resolver job. We gain nothing by kicking off these backup jobs.Could you clarify which backup job? When I first read this, I thought you meant the IPv4 vs IPv6 backup job, which happens post-resolution.(2) When //net knows a server supports QUIC, we try to establish a QUIC connection. If that doesn't succeed within a period of time, we kick off a TCP connection attempt. However, that timeout logic doesn't take into account DNS.Problem (1) makes (2) worse. If DNS takes on the order of seconds, we will have one QUIC connection and 6 TCP connections.A naive solution that came to mind is to lift DNS out of (1) and out of (2), so our timeout logic works even when DNS is taking a long time. A side benefit is that we will unify QUIC and non-QUIC DNS resolution paths.Is this making any sense. Thought?It makes sense, but I'm personally struggling on whether it's the right layering approach, given that the resolution path is not consistent between all sockets. Unifying it at a layer above will seemingly involve plumbing details down into it. I'm wondering whether an alternative approach - to allow the socket to signal (if) resolution is happening and when it ends might be suitable enough to allow backoffs by the above layer, without having to code in specific knowledge about whether or not a socket will do resolution as its connection process.
--
You received this message because you are subscribed to the Google Groups "net-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to net-dev+unsubscribe@chromium.org.
To post to this group, send email to net...@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/net-dev/CAEkFr05HHs2PEvzBbMS7ZL7j0EvtKx8PbWBDLPjB6O59Ha229Q%40mail.gmail.com.
--
To unsubscribe from this group and stop receiving emails from it, send an email to net-dev+u...@chromium.org.
To post to this group, send email to net...@chromium.org.