Adding HTTP retry logic

Jonny Rein Eriksen

unread,

Jun 24, 2014, 4:40:45 AM6/24/14

to net...@chromium.org

If you are connected to the internet through a connection with a high error rate you can often experience TCP connections being reset. At times completing a page load can be a challenge and you have to reload a page multiple times to get the page loaded completely.

Opera Presto always did well on the reliability category in Tom´s Hardware Web Browser Grand Prix: http://www.tomshardware.com/reviews/windows-7-chrome-20-firefox-13-opera-12,3228-13.html

I always assumed it is due to the way we did retries on requests that were reset, and based on this I wanted to implement this in Chromium.

At the moment I have this working. Chromium will either issue a Range request if supported by the server, or issue a normal GET if not and skip data to where the connection was aborted, before it feeds data to the consumer as if nothing had happened. Currently I have set it to retry 5 times just like we used to do for Presto.

My plan is to do so only for smaller(?) resources, if etag/last-modified matches and maybe not for the main document? I guess it should be possible to do so for images/js/css even without etag/last-modified. I want to be careful here, but believe we will get most of the effect anyway.

What I want to avoid is to merge resources together that have been modified in between retries and trigger bugs that can not be reproduced. Hence I am thinking of matching the last/next X bytes before I merge two responses together if etag/last-modified is missing.

Cheers,
Jonny Rein Eriksen
Opera Software

Randy Smith

unread,

Jun 24, 2014, 11:56:51 AM6/24/14

to Jonny Rein Eriksen, net...@chromium.org

This sounds like a nice feature. We've thought about (and in fact are planning to do, it just keeps getting superseded by higher priorities) something similar for downloads. We're also doing auto-retry when we present an error page to the user, but retrying at the URLRequest layer whenever a request was broken off part way through is more general.

My concerns for doing this generally are:

* I believe there are times when servers break the spec, and GETs are state changing. I'm not sure how to do so, but it would be nice to have a handle on how often this happens, and what the consequences of auto-retrying the requests are. (As a side note that you're probably aware of, you can't do this with POSTs, because they are state changing.)

* I'm not certain whether this behavior goes against Chrome's simplicity design principle. From the user's perspective, it's certainly simple behavior (modulo the point above), but the on-the-wire behavior of the browser isn't very simple. I'm inclined to think it should be implemented above the chrome/content line, with appropriate hooks in content, and I'm not sure if we have the appropriate hooks in place at the moment. How did you implement your prototype?

-- Randy

--
You received this message because you are subscribed to the Google Groups "net-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to net-dev+u...@chromium.org.
To post to this group, send email to net...@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/net-dev/53A93983.10803%40opera.com.

Matt Menke

unread,

Jun 24, 2014, 12:06:22 PM6/24/14

to Randy Smith, Jonny Rein Eriksen, net...@chromium.org

I did investigate retrying when a GET request fails before receiving a full set of headers (That includes after a successful redirect), which is a point at which we can retry without worrying about partial gets, or receiving a different response that before. I only retried on "fast" failures, so no request timeouts. Think I only did if it the failure was in less than 10 seconds. In my experiment, under those very restrictive conditions, we had about a 3% recovery rate, mostly in the ERR_NAME_NOT_RESOLVED and ERR_CONNECTION_RESET cases, if I recall correctly.

I'd support trying to retry under more general conditions, but am definitely concerned about side effects and document mismatches. If we decide to adopt the feature, I'd suggest initially calculating a simple hash of all response bodies as they're received, and then we retry, always try to re-request the entire document, and compare hashes, logging when there's a mismatch and returning the original error. Can try to be smarter about the second request if hash failure rate looks good (And get rid of the hash checks, of course).

To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/net-dev/CAFbEG_pS1QLuyLmEX0MkP0xxYygWOL5Z-Mo_5zBMO-h69f%3DdsQ%40mail.gmail.com.

Jonny Rein Eriksen

unread,

Jun 24, 2014, 6:04:51 PM6/24/14

to net...@chromium.org

On 24.06.2014 17:56, Randy Smith wrote:

This sounds like a nice feature. We've thought about (and in fact are planning to do, it just keeps getting superseded by higher priorities) something similar for downloads. We're also doing auto-retry when we present an error page to the user, but retrying at the URLRequest layer whenever a request was broken off part way through is more general.

My concerns for doing this generally are:

* I believe there are times when servers break the spec, and GETs are state changing. I'm not sure how to do so, but it would be nice to have a handle on how often this happens, and what the consequences of auto-retrying the requests are. (As a side note that you're probably aware of, you can't do this with POSTs, because they are state changing.)

That is a possibility, and a change like this will probably always carry a risk. I am trying to minimize that risk though, and currently considering only retrying when there is last-modified/etag, not retrying if cache is set to always expire etc. Your input on such limitations is of course much appreciated. I think a conservative first implementation makes sense, and adding the possibility to detect how often this happens would be interesting. I noticed that Matt Menke added logging in his retry code in http://code.google.com/p/chromium/issues/detail?id=143425.

* I'm not certain whether this behavior goes against Chrome's simplicity design principle. From the user's perspective, it's certainly simple behavior (modulo the point above), but the on-the-wire behavior of the browser isn't very simple. I'm inclined to think it should be implemented above the chrome/content line, with appropriate hooks in content, and I'm not sure if we have the appropriate hooks in place at the moment. How did you implement your prototype?

Mostly in http_stream_parser.* and http_network_transaction.* building on HttpNetworkTransaction::ShouldResendRequest and modifying HttpStreamParser::DoReadBody and HttpStreamParser::DoReadBodyComplete. I will of course post a CL when I am more comfortable with the code quality.

Jonny

-- Randy

On Tue, Jun 24, 2014 at 1:40 AM, Jonny Rein Eriksen <jon...@opera.com> wrote:

If you are connected to the internet through a connection with a high error rate you can often experience TCP connections being reset. At times completing a page load can be a challenge and you have to reload a page multiple times to get the page loaded completely.

Opera Presto always did well on the reliability category in Tom´s Hardware Web Browser Grand Prix: http://www.tomshardware.com/reviews/windows-7-chrome-20-firefox-13-opera-12,3228-13.html

I always assumed it is due to the way we did retries on requests that were reset, and based on this I wanted to implement this in Chromium.

At the moment I have this working. Chromium will either issue a Range request if supported by the server, or issue a normal GET if not and skip data to where the connection was aborted, before it feeds data to the consumer as if nothing had happened. Currently I have set it to retry 5 times just like we used to do for Presto.

My plan is to do so only for smaller(?) resources, if etag/last-modified matches and maybe not for the main document? I guess it should be possible to do so for images/js/css even without etag/last-modified. I want to be careful here, but believe we will get most of the effect anyway.

What I want to avoid is to merge resources together that have been modified in between retries and trigger bugs that can not be reproduced. Hence I am thinking of matching the last/next X bytes before I merge two responses together if etag/last-modified is missing.

Cheers,
Jonny Rein Eriksen
Opera Software

--
You received this message because you are subscribed to the Google Groups "net-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to net-dev+u...@chromium.org.
To post to this group, send email to net...@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/net-dev/53A93983.10803%40opera.com.

--
You received this message because you are subscribed to the Google Groups "net-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to net-dev+u...@chromium.org.
To post to this group, send email to net...@chromium.org.

To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/net-dev/CAFbEG_pS1QLuyLmEX0MkP0xxYygWOL5Z-Mo_5zBMO-h69f%3DdsQ%40mail.gmail.com.

Jonny Rein Eriksen

unread,

Jun 24, 2014, 6:25:30 PM6/24/14

to Matt Menke, Randy Smith, net...@chromium.org

On 24.06.2014 18:06, Matt Menke wrote:

I did investigate retrying when a GET request fails before receiving a full set of headers (That includes after a successful redirect), which is a point at which we can retry without worrying about partial gets, or receiving a different response that before. I only retried on "fast" failures, so no request timeouts. Think I only did if it the failure was in less than 10 seconds. In my experiment, under those very restrictive conditions, we had about a 3% recovery rate, mostly in the ERR_NAME_NOT_RESOLVED and ERR_CONNECTION_RESET cases, if I recall correctly.

Hello Matt, I did look at and test your retry implementation, thanks, always good to see others interested in the same functionality.

It makes a lot of sense with 3%. At the same time it is not simple to get correct numbers, since users that really suffers from this will probably not be browsing at all since their browser is not working under these circumstances. At least that is the impression I have gotten from personal emails from Opera users who were very happy and surprised that Opera Presto worked when other browsers failed loading the same pages. I am not sure how many are affected by this, but it would be interesting if we could find out.

I'd support trying to retry under more general conditions, but am definitely concerned about side effects and document mismatches. If we decide to adopt the feature, I'd suggest initially calculating a simple hash of all response bodies as they're received, and then we retry, always try to re-request the entire document, and compare hashes, logging when there's a mismatch and returning the original error. Can try to be smarter about the second request if hash failure rate looks good (And get rid of the hash checks, of course).

Thanks, that might make sense. Are you suggesting keeping track of success rate per server and then switch to range requests after successfully retrieving full body requests that matches the hash on retry?

Jonny

Matt Menke

unread,

Jun 24, 2014, 6:49:36 PM6/24/14

to Jonny Rein Eriksen, Randy Smith, net...@chromium.org

I was thinking of using it for data collection purposes, to figure out if trying to do range requests makes sense, before landing any code to actually do range requests. Depending on success rate, recording per-server information on the client may make sense, though if success rates look good enough, I'd rather not keep the hash logic, long term, if we don't need it.

Jonny Rein Eriksen

unread,

Jul 21, 2014, 11:04:57 AM7/21/14

to Matt Menke, Randy Smith, net...@chromium.org

Based on this discussion I ended up with this as the first take on how to handle retry:

https://codereview.chromium.org/403393003/

Regards,
Jonny

Reply all

Reply to author

Forward