mmenke@: PTAL, Sorry for this large change. This became much bigger than I expected.
// TODO(crbug.com/457478038): Should we treat SpdySession creation failuremmenke@: What do you think about this? If we make SpdySession creation failure as an fatal, we may be able to merge OnHttpBasicStreamReady() and OnSpdySessionReady() into one callback method (e.g. OnStreamSocketReady), removing more delegate methods like CreateBasicStream() and CreateNewSpdySession().
TEST_F(HttpStreamPoolAttemptTest, AllEndpointsFailed) {I plan to add more tests in this CL, after confirming the overall approach looks good.
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
I have enough on my plate today that I probably won't get to it, unfortunately.
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
Sorry, may not get to this today, due to a GRAD review.
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
Attempt(Delegate* delegate, HttpStreamPool* pool);I'd rather we avoid depending on the pool if we reasonably can. Can we just pass in the StreamAttemptParams* instead? That makes tests much more self-contained.
this, previous_failure.value_or(ERR_NAME_NOT_RESOLVED));This isn't right. If the A lookup completes with some IPs, all attempts to connect to them fail, and only then the AAAA lookup completes, returning no IPs, this would result in incorrectly returning ERR_NAME_NOT_RESOLVED, instead of the connection errors, since we don't record the old error messages if MaybeAttempt() does nothing (no ip endpoints to attempt yet, not all attempts failed).
StartAttempt(std::move(*ip_endpoint));Comment that `this` may be deleted?
std::optional<IPEndPoint> HttpStreamPool::Attempt::GetIPEndPointToAttempt() {I guess we can't just copy the endpoints we want to try into IPv4/IPv6 destination lists, and remove from them, since we may want to try IPs from both the A/AAAA and HTTPS requests, so IPs of both types may come from either source?
I guess ProcessServiceEndpointChanges() could extract a list, and we could keep a list of tried endpoints - though that is fairly complicated and may have more overhead (unless we try all IPs, in which case it has less)...So maybe not worth doing. This seems like it should be possible to do in a simpler manner, but not really seeing a simpler option.
// Don't attempt more when the first attempt already exists and it's not slow.
const bool has_attempt = ipv4_attempt_ || ipv6_attempt_;
if (has_attempt && slow_timer_.IsRunning()) {
return std::nullopt;
}Let's move this first, to avoid the virtual function call when not needed.
for (const auto& ip_endpoint : ip_endpoints) {Wonder if it's worth doing:
```
std::optional<size_t> index = base::index_of_if(ip_endpoints,
[&](const IPEndpoint& ip_endpoint) {
return !base::Contains(attempted_endpoints_, ip_endpoint));
}
if (index) {
return ip_endpoints[*index];
}
```Guess it's not really much of a simplication.
for (const auto& ip_endpoint : ip_endpoints) {
if ((ipv4_attempt_ && ipv4_attempt_->ip_endpoint() == ip_endpoint) ||
(ipv6_attempt_ && ipv6_attempt_->ip_endpoint() == ip_endpoint) ||
base::Contains(failed_endpoints_, ip_endpoint)) {Rather than tracking failed endpoints, we can track attempted endpoints, update them on attempt start, and avoid the need to check the live attempts here.
MaybeStopSlowTimer();The call here and in OnTcpAttemptComplete seem wrong to me.
Events A: Consider events in this order:
1. We start an attempt for an SSL server.
2. We get AAAA addresses and start connecting to one.
3. Tcp handshake completes.
4. We stop the slow TCP timer.
5. We get A addresses.
6. MaybeAttempt() starts an IPv4 connection attempt, since the slow timer isn't running.
Events B: Now consider another series of events:
1. We start an attempt for an SSL server.
2. We get AAAA addresses and start connecting to one.
3. We get A addresses. We don't start an IPv4 connection attempt, as the slow timer is running.
4. Tcp handshake completes.
5. We don't start an IPv4 connection attempt, as MaybeAttempt() is not called (though I guess if we get HTTPS records late, we may start one then?).
Events C: And then there's also:
1. We start an attempt for an SSL server.
2. We get AAAA addresses and start connecting to one.
3. Slow timer triggers. Nothing happens.
4. We get A addresses, and start connecting to one. (Fine if 3&4 are flipped)
5. AAAA Tcp handshake completes.
6. We continue letting the A attempt connect.
It seems like we should definitely be doing the same thing with respect to an A attempt in event ordering A and B, but we're not. It also seems like we should probably event Order C consistent, w.r.t. whether we keep the A attempt around, though think that's a bit less concerning.
Also unclear what happens if the SSL negotiation fails. Is it ok to start A attempts just because it failed? In order B, we'll only stary one attempt on the error (for the next AAAA address, presumably), and still not have an A attempt, but in order A, we will thereafter have A and AAAA attempts running at once.
// TODO(crbug.com/457478038): Should we treat SpdySession creation failuremmenke@: What do you think about this? If we make SpdySession creation failure as an fatal, we may be able to merge OnHttpBasicStreamReady() and OnSpdySessionReady() into one callback method (e.g. OnStreamSocketReady), removing more delegate methods like CreateBasicStream() and CreateNewSpdySession().
I think it's reasonable to treat this as fatal (for the Attempt, or even for all attempts). Looks like this only happens with SSL configurations we don't like, so this should both be pretty corner case-y, and doesn't seem like an error we should bury.
if (rv != OK) {
HandleSingleFailure(std::move(completed_attempt), rv);Suggest making HandleSingleFailure take only the destination. Does mean that if we want to avoid a dangling pointer, we may have to extract the endpoint, but still think it's better than passing around the attempt. I'd actually suggest unconditionally extracting the stream and the destination at the start of this function, and then deleting the attempt rather than moving it, for simplicity. Guess we'd need connect timing information, too, which may be a bit much.
Also fine if we remove the raw_ptr in Attempt instead, but that's more complicated, unless we just switch to weak pointers (which I'm fine with).
// TODO(crbug.com/457478038): Shouldn't `prefer_ipv6_` be updated here?Bit confused about this TODO. Does this mean we should reconsider this, or we should not?
explicit TestAttempter(HttpStreamPool* pool, std::string_view destination)nit: explicit not needed.
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
I wasn't able to address all comments today, replying for further discussion.
I'd rather we avoid depending on the pool if we reasonably can. Can we just pass in the StreamAttemptParams* instead? That makes tests much more self-contained.
Done.
I used HttpStreamPool because we will also need to access HttpNetworkSession to get supported ALPNs in the subsequent CL. Passing copy of NextProtoVector may not be efficient so I think I will add a delegate method to get supported ALPNs.
this, previous_failure.value_or(ERR_NAME_NOT_RESOLVED));This isn't right. If the A lookup completes with some IPs, all attempts to connect to them fail, and only then the AAAA lookup completes, returning no IPs, this would result in incorrectly returning ERR_NAME_NOT_RESOLVED, instead of the connection errors, since we don't record the old error messages if MaybeAttempt() does nothing (no ip endpoints to attempt yet, not all attempts failed).
You're right. Added `most_recent_attempt_failure_` field and used to notify the delegate. Does that make sense? Added a test.
Comment that `this` may be deleted?
Done
std::optional<IPEndPoint> HttpStreamPool::Attempt::GetIPEndPointToAttempt() {I guess we can't just copy the endpoints we want to try into IPv4/IPv6 destination lists, and remove from them, since we may want to try IPs from both the A/AAAA and HTTPS requests, so IPs of both types may come from either source?
I guess ProcessServiceEndpointChanges() could extract a list, and we could keep a list of tried endpoints - though that is fairly complicated and may have more overhead (unless we try all IPs, in which case it has less)...So maybe not worth doing. This seems like it should be possible to do in a simpler manner, but not really seeing a simpler option.
Yeah, for now copying endpoints as destination lists to try may work. I considered the approach before, but didn't take the approach for the reasons you mentioned. Also, ServiceEndpointRequest could remove endpoints while updating (this was a design choice when I implemented the API with Eric) so copying endpoints could make things more complicated.
// Don't attempt more when the first attempt already exists and it's not slow.
const bool has_attempt = ipv4_attempt_ || ipv6_attempt_;
if (has_attempt && slow_timer_.IsRunning()) {
return std::nullopt;
}Let's move this first, to avoid the virtual function call when not needed.
Done
for (const auto& ip_endpoint : ip_endpoints) {Wonder if it's worth doing:
```
std::optional<size_t> index = base::index_of_if(ip_endpoints,
[&](const IPEndpoint& ip_endpoint) {
return !base::Contains(attempted_endpoints_, ip_endpoint));
}if (index) {
return ip_endpoints[*index];
}
```Guess it's not really much of a simplication.
Hmm let me keep this for loop style for now.
for (const auto& ip_endpoint : ip_endpoints) {
if ((ipv4_attempt_ && ipv4_attempt_->ip_endpoint() == ip_endpoint) ||
(ipv6_attempt_ && ipv6_attempt_->ip_endpoint() == ip_endpoint) ||
base::Contains(failed_endpoints_, ip_endpoint)) {Rather than tracking failed endpoints, we can track attempted endpoints, update them on attempt start, and avoid the need to check the live attempts here.
I thought about that idea. I didn't take the approach mostly because we probably need to distinguish attempted endpoints and failed endpoints. We need to plumb failed endpoint via HttpStreamRequest::connection_attempts() (which, I think we don't plumb endpoints that don't complete). We may also want to expose failed endpoints for testing.
However, we can revisit this. Changed to manage attempted endpoints for now.
MaybeStopSlowTimer();The call here and in OnTcpAttemptComplete seem wrong to me.
Events A: Consider events in this order:
1. We start an attempt for an SSL server.
2. We get AAAA addresses and start connecting to one.
3. Tcp handshake completes.
4. We stop the slow TCP timer.
5. We get A addresses.
6. MaybeAttempt() starts an IPv4 connection attempt, since the slow timer isn't running.Events B: Now consider another series of events:
1. We start an attempt for an SSL server.
2. We get AAAA addresses and start connecting to one.
3. We get A addresses. We don't start an IPv4 connection attempt, as the slow timer is running.
4. Tcp handshake completes.
5. We don't start an IPv4 connection attempt, as MaybeAttempt() is not called (though I guess if we get HTTPS records late, we may start one then?).Events C: And then there's also:
1. We start an attempt for an SSL server.
2. We get AAAA addresses and start connecting to one.
3. Slow timer triggers. Nothing happens.
4. We get A addresses, and start connecting to one. (Fine if 3&4 are flipped)
5. AAAA Tcp handshake completes.
6. We continue letting the A attempt connect.It seems like we should definitely be doing the same thing with respect to an A attempt in event ordering A and B, but we're not. It also seems like we should probably event Order C consistent, w.r.t. whether we keep the A attempt around, though think that's a bit less concerning.
Also unclear what happens if the SSL negotiation fails. Is it ok to start A attempts just because it failed? In order B, we'll only stary one attempt on the error (for the next AAAA address, presumably), and still not have an A attempt, but in order A, we will thereafter have A and AAAA attempts running at once.
Good points. I wrote this logic based on the discussion in the email thread (more align with HEv1, IIUC HEv1 only considers TCP handshake delays w.r.t. slow timing). I left TODO comment in TcpAttempt::OnTcpHandshakeComplete() whether we should include TLS handshake time in slow attempt timing, but probably I should have put the TODO comment here.
The current code (HttpStreamPool::TcpBasedAttempt), which will be replaced with new classes, stops the slow timer when TCP handshake completes and then resumes the timer once it's ready for TLS handshake. Probably we want to have the same behavior here?
// TODO(crbug.com/457478038): Should we treat SpdySession creation failuremmenkemmenke@: What do you think about this? If we make SpdySession creation failure as an fatal, we may be able to merge OnHttpBasicStreamReady() and OnSpdySessionReady() into one callback method (e.g. OnStreamSocketReady), removing more delegate methods like CreateBasicStream() and CreateNewSpdySession().
I think it's reasonable to treat this as fatal (for the Attempt, or even for all attempts). Looks like this only happens with SSL configurations we don't like, so this should both be pretty corner case-y, and doesn't seem like an error we should bury.
Thanks! I'll try to see if it simplify the logic.
if (rv != OK) {
HandleSingleFailure(std::move(completed_attempt), rv);Suggest making HandleSingleFailure take only the destination. Does mean that if we want to avoid a dangling pointer, we may have to extract the endpoint, but still think it's better than passing around the attempt. I'd actually suggest unconditionally extracting the stream and the destination at the start of this function, and then deleting the attempt rather than moving it, for simplicity. Guess we'd need connect timing information, too, which may be a bit much.
Also fine if we remove the raw_ptr in Attempt instead, but that's more complicated, unless we just switch to weak pointers (which I'm fine with).
Preventing dangling pointer is one reason, but main reason is that we need to extract more information from attempt based on its failure (e.g. attempt requires client auth or a certificate error). I have WIP CL to handle them.
Comment that `this` may be deleted?
Done
// TODO(crbug.com/457478038): Shouldn't `prefer_ipv6_` be updated here?Bit confused about this TODO. Does this mean we should reconsider this, or we should not?
I just wasn't sure we should update the flag here, as the existing code doesn't update the flag when a TCP-based attempt fails. But I think it's better to update the flag here. Removed the comment. I also wrote a test for this behavior.
explicit TestAttempter(HttpStreamPool* pool, std::string_view destination)Kenichi Ishibashinit: explicit not needed.
Done
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
this, previous_failure.value_or(ERR_NAME_NOT_RESOLVED));Kenichi IshibashiThis isn't right. If the A lookup completes with some IPs, all attempts to connect to them fail, and only then the AAAA lookup completes, returning no IPs, this would result in incorrectly returning ERR_NAME_NOT_RESOLVED, instead of the connection errors, since we don't record the old error messages if MaybeAttempt() does nothing (no ip endpoints to attempt yet, not all attempts failed).
You're right. Added `most_recent_attempt_failure_` field and used to notify the delegate. Does that make sense? Added a test.
That seems reasonable to me.
std::optional<IPEndPoint> HttpStreamPool::Attempt::GetIPEndPointToAttempt() {Kenichi IshibashiI guess we can't just copy the endpoints we want to try into IPv4/IPv6 destination lists, and remove from them, since we may want to try IPs from both the A/AAAA and HTTPS requests, so IPs of both types may come from either source?
I guess ProcessServiceEndpointChanges() could extract a list, and we could keep a list of tried endpoints - though that is fairly complicated and may have more overhead (unless we try all IPs, in which case it has less)...So maybe not worth doing. This seems like it should be possible to do in a simpler manner, but not really seeing a simpler option.
Yeah, for now copying endpoints as destination lists to try may work. I considered the approach before, but didn't take the approach for the reasons you mentioned. Also, ServiceEndpointRequest could remove endpoints while updating (this was a design choice when I implemented the API with Eric) so copying endpoints could make things more complicated.
Acknowledged
Good points. I wrote this logic based on the discussion in the email thread (more align with HEv1, IIUC HEv1 only considers TCP handshake delays w.r.t. slow timing). I left TODO comment in TcpAttempt::OnTcpHandshakeComplete() whether we should include TLS handshake time in slow attempt timing, but probably I should have put the TODO comment here.
The current code (HttpStreamPool::TcpBasedAttempt), which will be replaced with new classes, stops the slow timer when TCP handshake completes and then resumes the timer once it's ready for TLS handshake. Probably we want to have the same behavior here?
I think that for now, we should think of the slow timer as a one-time, per Attempt thing, and we should consider it to cover all requests.
e.g., if we start an IPv4 attempt, and it completes and fails to get a connection before the timer expires, we should not restart the timer when we start the next attempt.
So, if we have a single attempt, and then get a connection, and then get a TLS session...maybe we should restart the timer once, and then never again, regardless of errors resulting in retries?
We could also consider treater the slow timer as expired on the first error, in which case we suddenly start allowing multiple attempts. Or maybe the second error (which would allow both v4 and v6 attempts to have been tried before the timer).
I'm entirely open to other ideas here.
if (rv != OK) {
HandleSingleFailure(std::move(completed_attempt), rv);Kenichi IshibashiSuggest making HandleSingleFailure take only the destination. Does mean that if we want to avoid a dangling pointer, we may have to extract the endpoint, but still think it's better than passing around the attempt. I'd actually suggest unconditionally extracting the stream and the destination at the start of this function, and then deleting the attempt rather than moving it, for simplicity. Guess we'd need connect timing information, too, which may be a bit much.
Also fine if we remove the raw_ptr in Attempt instead, but that's more complicated, unless we just switch to weak pointers (which I'm fine with).
Preventing dangling pointer is one reason, but main reason is that we need to extract more information from attempt based on its failure (e.g. attempt requires client auth or a certificate error). I have WIP CL to handle them.
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
I think that for now, we should think of the slow timer as a one-time, per Attempt thing, and we should consider it to cover all requests.
I see. I guess I would consider having separate slow timers in the future (especially if we handle QUIC attempts in this class) since it seems closer to the behavior the HEv3 draft specifies. However, it may mean we allow multiple attempts for an address family, which brings the same problem again w.r.t the pool/group socket limits. So for now, I agree that we should stick to a single timer.
I modified to call MaybeAttempt() when TCP handshake completes and the timer is running. Diff: https://chromium-review.googlesource.com/c/chromium/src/+/7181963/16..17
This makes Event A and B consistent? What do you think? We could also defer the logic to wait for TLS handshake instead of TCP handshake, but I'm not sure which one is better.
I think treating slow timer expired (completed) on the first error is simple and works reasonably. Since we complete the slow timer in OnTcpAttemptComplete() we are already doing that IIUC.
// TODO(crbug.com/457478038): Should we treat SpdySession creation failuremmenkemmenke@: What do you think about this? If we make SpdySession creation failure as an fatal, we may be able to merge OnHttpBasicStreamReady() and OnSpdySessionReady() into one callback method (e.g. OnStreamSocketReady), removing more delegate methods like CreateBasicStream() and CreateNewSpdySession().
Kenichi IshibashiI think it's reasonable to treat this as fatal (for the Attempt, or even for all attempts). Looks like this only happens with SSL configurations we don't like, so this should both be pretty corner case-y, and doesn't seem like an error we should bury.
Thanks! I'll try to see if it simplify the logic.
Merged OnHttpBasicStreamReady() and OnSpdySessionReady() into OnStreamSocketReady(). Diff: https://chromium-review.googlesource.com/c/chromium/src/+/7181963/14..16
Now this class isn't responsible for creating SpdySession or HttpStream. AttemptManager will be reponsible for creating them. Removed relevant delegate methods.
TEST_F(HttpStreamPoolAttemptTest, AllEndpointsFailed) {Kenichi IshibashiI plan to add more tests in this CL, after confirming the overall approach looks good.
Done
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
We could restart the timer on, say, the first TLS handshake attempt and never again. If we also considered the timer expired on the first error (and never restart it afterwards), then we can just check if it's running on TLS handshake start, and if so, restart it. That seems simple enough, and should make behavior consistent as well, if done carefully.
I plan to take another look at this CL later today, just want to make sure I publish this comment.
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
I tried to pause/resume the slow timer for TLS. Diff is https://chromium-review.googlesource.com/c/chromium/src/+/7181963/17..19
Does this make sense?
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
if (has_attempt && slow_timer_.IsRunning()) {mmenke@: Hmm, should we also return std::nullopt when `slow_timer_state_` is kPausedForTlsHandshakeReady? Was that your suggestion?
The current CL behaves somewhat strange. We allow the second attempt when waiting for TLS handshake ready after the first TCP handshake completes, but we disallow the second attempt after TLS handshake is ready and the first one is still in-flight.
Or, maybe we should revert to PS#17 where we only consider TCP handshake for slow timer?
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
Regarding when to stop slow timer, I think we have three approaches:
1. Stop the timer after the first TCP handshake. This is PS #17: https://chromium-review.googlesource.com/c/chromium/src/+/7181963/17/
2. Pause the timer after TCP handshake, resume the timer after TLS handshake is ready, and stop the timer after the first TLS handshake. Allow the second attempt while waiting for HTTPS RR resolution. This is PS #19: https://chromium-review.googlesource.com/c/chromium/src/+/7181963/19/
3. Similar to 2, but disallow the second attempt while waiting for HTTPS RR resolution. This is PS #20: https://chromium-review.googlesource.com/c/chromium/src/+/7181963/20/
I lean toward 1, since it is simpler and more aligned with HEv1. Thoughts?
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
I think we should go with the simplest approach, barring evidence that we should go with another approach. Going from allowing two requests back to allowing one request, in the case that we don't have enough information to make two requests yet and we're ready for the TCP handshake, seems very complicated to me, for unclear gains.
So I'm in agreement that 1, makes the most sense of those options.
Another option, which I would be fine with, but I don't think we have any information on whether it's better than 1 or now, would be to just not stop the slow timer while waiting on the HTTPS record (at least I think that's what the handshake ready stuff does). If it's taking too long to retrieve, create another connection, since things are going slowly.
We could probably field trial that option alongside 1, if we thought it worth the effort (for now, I think probably not worth it).
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
mmenke@: Hmm, should we also return std::nullopt when `slow_timer_state_` is kPausedForTlsHandshakeReady? Was that your suggestion?
The current CL behaves somewhat strange. We allow the second attempt when waiting for TLS handshake ready after the first TCP handshake completes, but we disallow the second attempt after TLS handshake is ready and the first one is still in-flight.
Or, maybe we should revert to PS#17 where we only consider TCP handshake for slow timer?
Done
While writing tests, I realized that 1 would be inefficient in typical cases One is:
1. AAAA answered, starting the first connection.
2. TCP handshake completes, stop the timer.
3. A answered, starting the second connection since the timer isn't running.
4. The first attempt's TLS handshake completes reasonably fast, discarding the second connection.
Another is:
1. AAAA/A are already known.
2. Starting the first connection.
3. TCP handshake completes reasonably fast, stop the timer.
4. HTTPS RR answered, starting the second connection since the timer isn't running.
5. The first attempt's TLS handshake completes reasonably fast, discarding the second connection.
I think these could happen very often. So, I think 3 or your suggestion (not stopping the timer) are only viable options. I think we should go with 3, even it looks somewhat complex.
I updated this CL not to stop the timer tentatively to limit the size of this CL. I'm working on the second CL (https://crrev.com/c/7237673) to implement 3. Could you take another look?
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
Instead of using the timer not running as a signal to start the second request, we could instead use whether the timer has triggered to start the A request in your example case.
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
Updated to stop the timer after TCP handshake and check if the timer is triggered when getting endpoint. What do you think?
(I think I still would like to pause/resume the timer to wait for TLS ready, but we could discuss it later.)
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
Two significant comments about the timer stuff, the rest are nits.
GetServiceEndpointForTlsHandshake(const IPEndPoint& ip_endpoint);const? Looks like the corresponding methods on HttpStreamAttemptManager are not const, but should be?
std::optional<IPEndPoint> GetIPEndPointToAttempt();const?
// The slow timer is triggered but not fired. This is the another end state.
kNotFired,I don't understand what this means. GetIPEndPointToAttempt() doesn't start another request in this state, and MaybeStopSlowTimer() transitions to this state, but I have no understanding of why.
It looks like if the first TcpAttempt fails quickly, we enter this state, and then never start the slow timer again for the life of the attempt, no matter how slow future attempts are, which seems wrong.
Or am I missing something?
enum class SlowTimerState {Rather than focusing on the state of the timer, I think it would be better as expressing this as whether the request is slow (largely because the kFired isn't quite so obviously "this is slow" as I'd prefer, and that's what we most care about).
e.g., something like:
```
SlowState {
kTimerNotStarted,
kTimerRunning,
kAttemptIsSlow
}
```
(Could go with AttemptState or AttemptIsSlowState, etc)
// Callback methods: Only one of these methods will be called.Let's document that the Attempt must immediately be deleted, or it will enter an undefined state.
virtual HostResolver::ServiceEndpointRequest*const HostResolver::ServiceEndpointRequest&? All calls depend on this returning a non-null value.
public:
// An interface to abstract dependencies so that we can have unittests without
// depending on AttemptManager. In production code AttemptManager will
// implement this interface.Let's document that, unless otherwise specified, the Attempt must not be deleted by any callback.
if (owner_->using_tls_) {
attempt_ = std::make_unique<TlsStreamAttempt>(
owner_->stream_attempt_params_, ip_endpoint_, owner_->track_,
HostPortPair::FromSchemeHostPort(
delegate()->GetHttpStreamKey().destination()),
delegate()->GetBaseSSLConfig(),
/*delegate=*/this);
} else {
attempt_ = std::make_unique<TcpStreamAttempt>(
owner_->stream_attempt_params_, ip_endpoint_, owner_->track_);
}Suggest moving this stuff into the constructor call, and making stream_attempt() return a reference, since using pointers when things are definitively non-null is generally discouraged.
const bool is_ipv4 = ip_endpoint.address().IsIPv4();
const bool should_start_slow_timer =
!ipv4_attempt_ && !ipv6_attempt_ &&
slow_timer_state_ == SlowTimerState::kNotStarted;Seems weird to have this up here, based on the state before we start the attempt, rather than the state after we try to start an attempt - suggest moving the logic down to 157, and then if becomes checking if one of the attempt is null, and the timer isn't started.
Also, we don't start this on immediate success / failures, so seems weird to say we "should" if we're not doing so in that case.
if (has_attempt && (slow_timer_state_ == SlowTimerState::kRunning ||
slow_timer_state_ == SlowTimerState::kNotFired)) {May be clearer to check if the slow_timer_state_ is not fired? Not thinking about performance or lines of code, just clarity.
MaybeStopSlowTimer();I do think that stopping the timer here is a mistake. If we keep on failing 1 millisecond after the slow timer expires, we won't treat it as slow, despite the request being slow.
Could either not restart the timer and still let it run (though stopping it on TCP connect would still be an issue) or just treat it as having ellapsed here, and try to start two attempts (if the first attempt doesn't complete synchronously, either with a success or failure, of course).
NextProto negotiated_protocol = stream_socket->GetNegotiatedProtocol();This seems unnecessary - we're passing the StreamSocket to `delegate_`, which should be just as capable as this class of calling this method, no? At least I don't see any casting to a less specific type here that would make that not feasible.
completed_attempt.reset();
delegate_->OnStreamSocketReady(this, std::move(stream_socket),
negotiated_protocol);
// `this` is deleted.We may want to CHECK a weak pointer to this is null. If it's not destroyed, we enter an indeterminant state. We could also do this in MaybeAttempt's final failure path, though think it's more important here.
// - Certificate errors
CHECK_NE(rv, ERR_SSL_CLIENT_AUTH_CERT_NEEDED);optional: Could also check that rv is an error: != ERR_IO_PENDING, != OK, or <0 instead the latter.
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
| Commit-Queue | +1 |
Thank you for reviewing!
The slow timer thing is getting complex. For this CL, I think we probably want to do one of the followings:
1. Don't stop the slow timer when TCP handshake completes Instead, stop when TLS handshake completes. This is mostly PS#22. Consider the timer is completed on the first attempt completion (including fast failure). Follow-up on pausing/resuming the timer while waiting for HTTPS RR in subsequent CLs.
2. Similar to 1, but stop the timer on TCP handshake completes (the latest PS#25).
3. Have separate slow timer for each TcpAttempt to (re)try for fast failure. Restarting the global slow timer seems tricky.
Thoughts?
GetServiceEndpointForTlsHandshake(const IPEndPoint& ip_endpoint);const? Looks like the corresponding methods on HttpStreamAttemptManager are not const, but should be?
Done
std::optional<IPEndPoint> GetIPEndPointToAttempt();Kenichi Ishibashiconst?
Done
// The slow timer is triggered but not fired. This is the another end state.
kNotFired,I don't understand what this means. GetIPEndPointToAttempt() doesn't start another request in this state, and MaybeStopSlowTimer() transitions to this state, but I have no understanding of why.
It looks like if the first TcpAttempt fails quickly, we enter this state, and then never start the slow timer again for the life of the attempt, no matter how slow future attempts are, which seems wrong.
Or am I missing something?
We also enter this state when the first TCP handshake succeeds quickly. I added this state not to trigger the second attempt while waiting for TLS handshake. Without this state, we could start even if the first TCP handshake succeeds quickly, which seems wrong to me?
Regarding failure case, yeah, we never restart the slow timer after the first failure regardless the slowness of the failure. This behavior may not be ideal, but handling it correctly would be complex I guess?
enum class SlowTimerState {Rather than focusing on the state of the timer, I think it would be better as expressing this as whether the request is slow (largely because the kFired isn't quite so obviously "this is slow" as I'd prefer, and that's what we most care about).
e.g., something like:
```
SlowState {
kTimerNotStarted,
kTimerRunning,
kAttemptIsSlow
}
```
(Could go with AttemptState or AttemptIsSlowState, etc)
Changed names but I feel the last two ones are not consistent. Actually I had named them that way once, but it didn't feel great, specifically to distinguish between the end states.
// Callback methods: Only one of these methods will be called.Let's document that the Attempt must immediately be deleted, or it will enter an undefined state.
Done
virtual bool IsSvcbOptional() = 0;Kenichi Ishibashiconst?
Done
const HostResolver::ServiceEndpointRequest&? All calls depend on this returning a non-null value.
Good point. There is one case (GetIPEndPointToAttempt()) we check if it's null. I added the test for this case (HttpStreamPoolAttemptTest.NoServiceEndpointRequest) since AttemptManager may destroy ServiceEndpointRequest. Probably we should cancel Attempts when AttemptManager destroys ServiceEndpointRequest?
Changed to reference, I couldn't use const since I couldn't make all call site methods const.
public:
// An interface to abstract dependencies so that we can have unittests without
// depending on AttemptManager. In production code AttemptManager will
// implement this interface.Let's document that, unless otherwise specified, the Attempt must not be deleted by any callback.
Done
if (owner_->using_tls_) {
attempt_ = std::make_unique<TlsStreamAttempt>(
owner_->stream_attempt_params_, ip_endpoint_, owner_->track_,
HostPortPair::FromSchemeHostPort(
delegate()->GetHttpStreamKey().destination()),
delegate()->GetBaseSSLConfig(),
/*delegate=*/this);
} else {
attempt_ = std::make_unique<TcpStreamAttempt>(
owner_->stream_attempt_params_, ip_endpoint_, owner_->track_);
}Suggest moving this stuff into the constructor call, and making stream_attempt() return a reference, since using pointers when things are definitively non-null is generally discouraged.
Done
const bool is_ipv4 = ip_endpoint.address().IsIPv4();
const bool should_start_slow_timer =
!ipv4_attempt_ && !ipv6_attempt_ &&
slow_timer_state_ == SlowTimerState::kNotStarted;Seems weird to have this up here, based on the state before we start the attempt, rather than the state after we try to start an attempt - suggest moving the logic down to 157, and then if becomes checking if one of the attempt is null, and the timer isn't started.
Also, we don't start this on immediate success / failures, so seems weird to say we "should" if we're not doing so in that case.
Done
if (has_attempt && (slow_timer_state_ == SlowTimerState::kRunning ||
slow_timer_state_ == SlowTimerState::kNotFired)) {May be clearer to check if the slow_timer_state_ is not fired? Not thinking about performance or lines of code, just clarity.
I think we want to distinguish:
?
MaybeStopSlowTimer();I do think that stopping the timer here is a mistake. If we keep on failing 1 millisecond after the slow timer expires, we won't treat it as slow, despite the request being slow.
Could either not restart the timer and still let it run (though stopping it on TCP connect would still be an issue) or just treat it as having ellapsed here, and try to start two attempts (if the first attempt doesn't complete synchronously, either with a success or failure, of course).
Do you mean the failure comes 1ms later the timer is fired? In that case MaybeStopSlowTimer() doesn't update the state, IIUC?
I guess restarting the slow timer could be tricky. If we want to have some restarting logic, I think we should consider moving the timer to each TcpAttempt, instead of sticking to the single slow timer.
NextProto negotiated_protocol = stream_socket->GetNegotiatedProtocol();This seems unnecessary - we're passing the StreamSocket to `delegate_`, which should be just as capable as this class of calling this method, no? At least I don't see any casting to a less specific type here that would make that not feasible.
Good catch. I forgot to remove this when I simplified the Delegate interface. Removed.
completed_attempt.reset();
delegate_->OnStreamSocketReady(this, std::move(stream_socket),
negotiated_protocol);
// `this` is deleted.We may want to CHECK a weak pointer to this is null. If it's not destroyed, we enter an indeterminant state. We could also do this in MaybeAttempt's final failure path, though think it's more important here.
Done
// - Certificate errors
CHECK_NE(rv, ERR_SSL_CLIENT_AUTH_CERT_NEEDED);optional: Could also check that rv is an error: != ERR_IO_PENDING, != OK, or <0 instead the latter.
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
// The slow timer is triggered but not fired. This is the another end state.
kNotFired,Kenichi IshibashiI don't understand what this means. GetIPEndPointToAttempt() doesn't start another request in this state, and MaybeStopSlowTimer() transitions to this state, but I have no understanding of why.
It looks like if the first TcpAttempt fails quickly, we enter this state, and then never start the slow timer again for the life of the attempt, no matter how slow future attempts are, which seems wrong.
Or am I missing something?
We also enter this state when the first TCP handshake succeeds quickly. I added this state not to trigger the second attempt while waiting for TLS handshake. Without this state, we could start even if the first TCP handshake succeeds quickly, which seems wrong to me?
Regarding failure case, yeah, we never restart the slow timer after the first failure regardless the slowness of the failure. This behavior may not be ideal, but handling it correctly would be complex I guess?
I'm not seeing why we can't return to kTimerNotStarted in that case. We can then restart it when starting the TLS handshake (or in the next StartAttempt() call, where we're starting a new attempt after the first attempt has failed, and want to check if that new attempt is slow - or we could treat a failure as slow).
Another, simpler option: Just start the timer and let it run. Let it run through waiting for the HTTPS record, keep running it through TLS negotiation, keep running it through errors. Then when/if it triggers, start a second request. That's by far the simplest approach.
Even if we ultimately want to do something else, that also get's doing something else out of the critical path of getting this code landed and fully wired up.
MaybeStopSlowTimer();Kenichi IshibashiI do think that stopping the timer here is a mistake. If we keep on failing 1 millisecond after the slow timer expires, we won't treat it as slow, despite the request being slow.
Could either not restart the timer and still let it run (though stopping it on TCP connect would still be an issue) or just treat it as having ellapsed here, and try to start two attempts (if the first attempt doesn't complete synchronously, either with a success or failure, of course).
Do you mean the failure comes 1ms later the timer is fired? In that case MaybeStopSlowTimer() doesn't update the state, IIUC?
I guess restarting the slow timer could be tricky. If we want to have some restarting logic, I think we should consider moving the timer to each TcpAttempt, instead of sticking to the single slow timer.
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
enum class SlowTimerState {Kenichi IshibashiRather than focusing on the state of the timer, I think it would be better as expressing this as whether the request is slow (largely because the kFired isn't quite so obviously "this is slow" as I'd prefer, and that's what we most care about).
e.g., something like:
```
SlowState {
kTimerNotStarted,
kTimerRunning,
kAttemptIsSlow
}
```
(Could go with AttemptState or AttemptIsSlowState, etc)
Changed names but I feel the last two ones are not consistent. Actually I had named them that way once, but it didn't feel great, specifically to distinguish between the end states.
Another option would be to have an is_slow_ bool, separate from the timer state. The issue here is that I find these names and how the affect things very confusing.
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
// The slow timer is triggered but not fired. This is the another end state.
kNotFired,Kenichi IshibashiI don't understand what this means. GetIPEndPointToAttempt() doesn't start another request in this state, and MaybeStopSlowTimer() transitions to this state, but I have no understanding of why.
It looks like if the first TcpAttempt fails quickly, we enter this state, and then never start the slow timer again for the life of the attempt, no matter how slow future attempts are, which seems wrong.
Or am I missing something?
mmenkeWe also enter this state when the first TCP handshake succeeds quickly. I added this state not to trigger the second attempt while waiting for TLS handshake. Without this state, we could start even if the first TCP handshake succeeds quickly, which seems wrong to me?
Regarding failure case, yeah, we never restart the slow timer after the first failure regardless the slowness of the failure. This behavior may not be ideal, but handling it correctly would be complex I guess?
I'm not seeing why we can't return to kTimerNotStarted in that case. We can then restart it when starting the TLS handshake (or in the next StartAttempt() call, where we're starting a new attempt after the first attempt has failed, and want to check if that new attempt is slow - or we could treat a failure as slow).
Another, simpler option: Just start the timer and let it run. Let it run through waiting for the HTTPS record, keep running it through TLS negotiation, keep running it through errors. Then when/if it triggers, start a second request. That's by far the simplest approach.
Even if we ultimately want to do something else, that also get's doing something else out of the critical path of getting this code landed and fully wired up.
Another, simpler option ...
I think that's 1. of https://chromium-review.googlesource.com/c/chromium/src/+/7181963/comments/e1315319_bce1befb ?
Even if we ultimately want to do something else, ...
Agreed. I will update the CL.
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
mmenke@: Thank you for suggestion. Could you take another look?
// The slow timer is triggered but not fired. This is the another end state.
kNotFired,Kenichi IshibashiI don't understand what this means. GetIPEndPointToAttempt() doesn't start another request in this state, and MaybeStopSlowTimer() transitions to this state, but I have no understanding of why.
It looks like if the first TcpAttempt fails quickly, we enter this state, and then never start the slow timer again for the life of the attempt, no matter how slow future attempts are, which seems wrong.
Or am I missing something?
mmenkeWe also enter this state when the first TCP handshake succeeds quickly. I added this state not to trigger the second attempt while waiting for TLS handshake. Without this state, we could start even if the first TCP handshake succeeds quickly, which seems wrong to me?
Regarding failure case, yeah, we never restart the slow timer after the first failure regardless the slowness of the failure. This behavior may not be ideal, but handling it correctly would be complex I guess?
Kenichi IshibashiI'm not seeing why we can't return to kTimerNotStarted in that case. We can then restart it when starting the TLS handshake (or in the next StartAttempt() call, where we're starting a new attempt after the first attempt has failed, and want to check if that new attempt is slow - or we could treat a failure as slow).
Another, simpler option: Just start the timer and let it run. Let it run through waiting for the HTTPS record, keep running it through TLS negotiation, keep running it through errors. Then when/if it triggers, start a second request. That's by far the simplest approach.
Even if we ultimately want to do something else, that also get's doing something else out of the critical path of getting this code landed and fully wired up.
Another, simpler option ...
I think that's 1. of https://chromium-review.googlesource.com/c/chromium/src/+/7181963/comments/e1315319_bce1befb ?
Even if we ultimately want to do something else, ...
Agreed. I will update the CL.
Updated not to stop the timer after TCP handshake.
Also added a TODO comment regarding fast failure case.
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
I just noticed that you're going OOO, sorry if this bothers you.
// The slow timer is triggered but not fired. This is the another end state.
kNotFired,Kenichi IshibashiI don't understand what this means. GetIPEndPointToAttempt() doesn't start another request in this state, and MaybeStopSlowTimer() transitions to this state, but I have no understanding of why.
It looks like if the first TcpAttempt fails quickly, we enter this state, and then never start the slow timer again for the life of the attempt, no matter how slow future attempts are, which seems wrong.
Or am I missing something?
mmenkeWe also enter this state when the first TCP handshake succeeds quickly. I added this state not to trigger the second attempt while waiting for TLS handshake. Without this state, we could start even if the first TCP handshake succeeds quickly, which seems wrong to me?
Regarding failure case, yeah, we never restart the slow timer after the first failure regardless the slowness of the failure. This behavior may not be ideal, but handling it correctly would be complex I guess?
Kenichi IshibashiI'm not seeing why we can't return to kTimerNotStarted in that case. We can then restart it when starting the TLS handshake (or in the next StartAttempt() call, where we're starting a new attempt after the first attempt has failed, and want to check if that new attempt is slow - or we could treat a failure as slow).
Another, simpler option: Just start the timer and let it run. Let it run through waiting for the HTTPS record, keep running it through TLS negotiation, keep running it through errors. Then when/if it triggers, start a second request. That's by far the simplest approach.
Even if we ultimately want to do something else, that also get's doing something else out of the critical path of getting this code landed and fully wired up.
Kenichi IshibashiAnother, simpler option ...
I think that's 1. of https://chromium-review.googlesource.com/c/chromium/src/+/7181963/comments/e1315319_bce1befb ?
Even if we ultimately want to do something else, ...
Agreed. I will update the CL.
Updated not to stop the timer after TCP handshake.
Also added a TODO comment regarding fast failure case.
I just noticed that I missed your suggestion to keep running the timer for failures. Updated the code to keep the timer running on failures.
In the next CL I want to pause/resume slow timer while waiting for HTTPS RR. What do you think?
https://crrev.com/c/7237673
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
I just noticed that you're going OOO, sorry if this bothers you.
Not OOO yet. Sorry I've stopped doing daily passes on this - keep getting sidetracked (admittedly, also de-prioritizing it a bit since you're OOO, though the fact I'll be OOO soon isn't great).
void HttpStreamPool::Attempt::OnSlowTimerFired(TcpAttempt* attempt) {I don't think we need this argument. `prefer_ipb6_` no longer really matters after the first slow attempt, since we'll now always have two attempts going at once (and even if we needed it, since there's only one live attempt when we're running the timer, we can just flip it).
I guess if we wanted to keep the long tail of slow requests alive, instead of just one IPv4 or one IPv6 attempt, it might be useful again, but that's not something we're doing here.
slow_timer_completed_ = true;I think "completed" here is confusing. I'd go with "expired" or even better, just `is_slow_`.
}
CHECK(completed_attempt);CHECK_EQ(attempt, completed_attempt.get());
bool HttpStreamPool::Attempt::FailedAllAttempts() const {
return delegate_->IsServiceEndpointRequestFinished() && !ipv4_attempt_ &&
!ipv6_attempt_;
}I don't think this is right. GetIPEndPointToAttempt() should also be returning null. In this method, we're implicitly depending on the caller have already verified that.
So I think we should actually inline this logic in MaybeAttempt(), since we probably don't want to call GetIPEndPointToAttempt() a second time.
class TestAttempter finalMaybe call this TestAttempDelegate?
auto* resolver_ = static_cast<FakeServiceEndpointResolver*>(Remove underscore
auto* resolver_ = static_cast<FakeServiceEndpointResolver*>(Do we need to muck with a FakeServiceEndpointResolver at all here? Seems like all we need is the FakeServiceEndpointRequest, that we can modify. No notion of completion needed.
FakeServiceEndpointRequest* service_endpoint_request() {Given that GetServiceEndpointRequest() is also a method, suggest adding a fake_ to the name here.
// HttpStreamPool::Attempt::Delegate implementations:implementations -> implementation
// HttpStreamPool::Attempt::Delegate implementations:
const HttpStreamKey& GetHttpStreamKey() const override { return *key_; }Blank line here, to indicate that the comment applies to more than just the next line.
SetRemoteIPEndPointFromStreamSocket(*stream);
negotiated_protocol_ = stream->GetNegotiatedProtocol();Seems like this should go in SetRemoteIPEndPointFromStreamSocket
void OnServiceEndpointsUpdated() override {Would we be better off not driving a fake request? All we need is a FakeServiceEndpointRequest that we can directly manipulate. No need for it to invoke any callbacks itself, pretending to really resolve stuff.
TestAttempter attempter = CreateAttempter("http://a.test");
attempter.service_endpoint_request()->add_endpoint(
ServiceEndpointBuilder().add_v4("192.0.2.1", 80).endpoint());Would it make sense to
TEST_F(HttpStreamPoolAttemptTest, Ipv4AnsweredAfterIpv6TcpHandshake) {How about a version of this test where IPv4 answers first (Which I think would make us wait for IPv4 to fail before trying IPv6, and if IPv4 completed fine, we'd never try IPv6 - suggesting two tests).
More generally, think we want more IPv4-first test coverage.
TEST_F(HttpStreamPoolAttemptTest, Ipv6FailIpv4Ok) {Suggest a multiple fail test, that checks we keep jumping between IPv6 and IPv6 (using set_expected_addresses, and maybe checking socket() for each SequencedSocketData or adding a helper to MockClientSocketFactory to see if all sockets have been used)
TEST_F(HttpStreamPoolAttemptTest, Ipv6SlowOk) {Suggested test: Cumulatively slow:
Have IPv4 and IPv6 IPs.
First IPv6 starts. Half of slow time later, it fails.
First (only) IPv4 starts. Half of slow time later, second IPv6 starts, which succeeds.
TEST_F(HttpStreamPoolAttemptTest, Ipv6SlowOk) {Suggest two variants of this test:
Start with only IPv6 IPs.
Then IPv6 is slow.
Then get IPv4 IP.
Should start connecting to IPv4 (which we can check).
Same as above, but with IPv6 fail, and starting on the next IPv6 IP. before we get IPv4 IPs (which we should start connecting to immediately).
TEST_F(HttpStreamPoolAttemptTest, Ipv6SlowOk) {Suggested test variant: IPv4 completes first.
FastForwardBy(HttpStreamPool::GetConnectionAttemptDelay());
ASSERT_FALSE(attempter.result().has_value());Check that we're attempting IPv4 here?
We may not have a great way to do it, but checking SequencedSocketData::socket() is non-null, and maybe setting expected_addresses as well seems to get us close enough.
}Another test suggestion: One that requires most_recent_attempt_failure_. We get partial results (just an IPv4 or IPv6 address). Then it fails. Then later we learn there are no more addresses. We should get the error code the attempt failed with before the DNS resolution completed.
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |