Standard library ETag/Last-Modified conditional request logic and best practices?

817 просмотров
Перейти к первому непрочитанному сообщению

mark...@gmail.com

не прочитано,
6 авг. 2019 г., 08:14:4006.08.2019
– golang-nuts
Hello,

I'm using Go's standard library reverse proxy and I'm trying to figure out if the standard library HTTP web server (e.g. http.ListenAndServe) implements the relevant conditional request handling logic for ETag/Last-Modified headers.

I did some Googling and noticed the HTTP file system request handler (https://golang.org/src/net/http/fs.go) does implement that logic, but I couldn't find the same for the HTTP web server.

I also couldn't find any examples of setting ETags/Last-Modified (other than this basic implementation for setting ETags: https://github.com/go-http-utils/etag/blob/master/etag.go).

What's confusing me there is the concept of "strong" and "weak" validation and how certain scenarios might influence whether an ETag is marked as either strong or weak (depending on the implementation -- see https://developer.mozilla.org/en-US/docs/Web/HTTP/Conditional_requests#Validators).

So to recap, my questions are (and I appreciate some of these are outside the scope of just Go -- so apologies if that's not allowed in this forum):

1. Should I set ETag/Last-Modified in a proxy? Last-Modified feels like it's not the responsibility of the proxy but the origin, where as an ETag is something I feel is "ok" to do in the proxy as a 'fallback' (as we already set 'serve stale' caching headers on behalf of our origins if they neglect to include them).

2. Do I need to implement `If-None-Match` and `If-Modified-Since` behaviours myself (i.e. is it not provided by the Go standard library's HTTP web server)?

3. I was planning on setting an ETag header on the response from within httputil.ReverseProxy#ModifyResponse but wasn't sure if that would be the correct place to set it.

4. What constitutes a strong/weak validator (e.g. would a simple hash function generating a digest of the URL path + response body suffice)?

Thanks for any help/insights/opinions y'all can share with me.

Kind regards,
Mark

Devon H. O'Dell

не прочитано,
6 авг. 2019 г., 10:48:4906.08.2019
– mark...@gmail.com, golang-nuts
Hi Mark,

Whether or not your proxy is caching, you may find RFC7234[1] relevant
in addressing some of your questions (as well as many you may later
encounter). I think you may find section 5.2 to be of particular
interest, though any proxy author should be familiar with the full
text.

Op di 6 aug. 2019 om 05:14 schreef <mark...@gmail.com>:
>
> Hello,
>
> I'm using Go's standard library reverse proxy and I'm trying to figure out if the standard library HTTP web server (e.g. http.ListenAndServe) implements the relevant conditional request handling logic for ETag/Last-Modified headers.
>
> I did some Googling and noticed the HTTP file system request handler (https://golang.org/src/net/http/fs.go) does implement that logic, but I couldn't find the same for the HTTP web server.
>
> I also couldn't find any examples of setting ETags/Last-Modified (other than this basic implementation for setting ETags: https://github.com/go-http-utils/etag/blob/master/etag.go).
>
> What's confusing me there is the concept of "strong" and "weak" validation and how certain scenarios might influence whether an ETag is marked as either strong or weak (depending on the implementation -- see https://developer.mozilla.org/en-US/docs/Web/HTTP/Conditional_requests#Validators).
>
> So to recap, my questions are (and I appreciate some of these are outside the scope of just Go -- so apologies if that's not allowed in this forum):

I think this is a fine question for this list, which isn't necessarily
constrained to questions about Go, but also for how to achieve things
while using Go. Lines get blurred since many technologies touch each
other. I don't think any apologies are necessary :).

> 1. Should I set ETag/Last-Modified in a proxy? Last-Modified feels like it's not the responsibility of the proxy but the origin, where as an ETag is something I feel is "ok" to do in the proxy as a 'fallback' (as we already set 'serve stale' caching headers on behalf of our origins if they neglect to include them).

ETag and Last-Modified should be sent by the origin to any proxy to
let the proxy know when the content is stale (assuming the proxy is
caching). The only case in which a proxy might set these things is if
there are configurations provided by the content owner that allow the
proxy to determine what the lifetime of the response object is outside
of response headers. This is most useful in cases where the content is
synthetically generated by the proxy as a result of the content
owner's configuration. If you don't have such a system in place, your
proxy should never be generating these response headers, and you
should be working with your customers / users to help them understand
when to set cache control headers.

> 2. Do I need to implement `If-None-Match` and `If-Modified-Since` behaviours myself (i.e. is it not provided by the Go standard library's HTTP web server)?

Unless you're serving from the filesystem handler (which does
implement IMS/INM), you'll need to implement these yourself.

Note that you _could_ simply proxy this to the origin and let it
handle the validation. This is often overkill for what people actually
need, but it is guaranteed to work.

One trick that many CDN providers leverage is to offer their customers
the option to serve the stale object while revalidating it. If that
option is set, an asynchronous revalidation request is spawned -- new
requests are blocked on the completion of that request -- and the
potentially stale content is served to the original requester without
blocking that request on revalidation.

> 3. I was planning on setting an ETag header on the response from within httputil.ReverseProxy#ModifyResponse but wasn't sure if that would be the correct place to set it.

It's unclear to me why you should be setting an etag header if you're a proxy.

> 4. What constitutes a strong/weak validator (e.g. would a simple hash function generating a digest of the URL path + response body suffice)?

A hash function over the body of the response would constitute strong
validation. I'm not sure why you'd need to mix in the path; there's
nothing wrong with serving the exact same content between two
endpoints, and the ETag is tied to a response object.

Weak validation is signified by an additional "W/" in the etag
identifier. In practice, this means that you mustn't use weak
identifiers for serving byte-range requests. Weak identifiers may be
more useful for dynamically generated content where you might for
example have a date added in, or an ad server link that is rotated
each time the page is served, or a counter, or something like this. An
example of weak validation would be something that is version and
encoding based -- each time the content changes materially, you'd
increment the version, and some identifier for the content-encoding
would also be mixed in.

> Thanks for any help/insights/opinions y'all can share with me.
>
> Kind regards,
> Mark
>

Hope that helps!

Kind regards,

--dho

[1]: https://tools.ietf.org/html/rfc7234

mark...@gmail.com

не прочитано,
6 авг. 2019 г., 12:11:0106.08.2019
– golang-nuts
Thanks Devon!

So just to clarify our request flow is:

Client > CDN > Go Reverse Proxy > Origin

Our Go Reverse Proxy has historically been responsible for adding caching headers (e.g. Cache-Control and Surrogate-Control) when the origins have failed to do so (as a way to ensure things are cached appropriately).

It's unclear to me why you should be setting an etag header if you're a proxy. 

That's why when it came to looking at setting serve stale defaults for our origins (e.g. stale-while-revalidate and stale-if-error) I realized that somewhere along the chain an appropriate ETag/Last-Modified should be set and that's why I started wondering if our proxy should be responsible for setting them.

Even then I felt like setting Last-Modified was way outside the responsibility of our proxy, but that maybe setting of ETag would have sufficed.

Unless you're serving from the filesystem handler (which does 
implement IMS/INM), you'll need to implement these yourself. 

I think your other related answers might explain to me why the go reverse proxy doesn't support conditional requests, in that it's NOT a 'caching proxy' and so being able to handle that revalidation logic wouldn't make sense.
 
Note that you _could_ simply proxy this to the origin and let it 
handle the validation. This is often overkill for what people actually 
need, but it is guaranteed to work. 

OK, so as we are indeed just proxying the request pretty much 'as is' to the origin, i.e. the CDN is making the revalidation conditional request when our stale-while-revalidate TTL expires, I'm guessing (I appreciate this is the 'basics' of how a proxy works, but I want to talk it through in case I'm mistaken in any way!) the go proxy will transparently keep that information for the origin to respond with the appropriate ETag/Last-Modified, and the go proxy again will transparently pass back their response through to the CDN to then update its cache if it indeed got a `200 OK` from origin or to continue serving stale if the origin returned a `304 Not Modified` (and in either case I expect the origin should send ETag/Last-Modified headers regardless of 200/304 status').

A hash function over the body of the response would constitute strong 
validation. I'm not sure why you'd need to mix in the path; there's 
nothing wrong with serving the exact same content between two 
endpoints, and the ETag is tied to a response object. 

Ah ok, so I was thinking along these lines, but was getting confused between content that is cached vs content that is rendered at 'runtime' (e.g. I was getting confused with the response containing a <script> tag that might dynamically change the adverts on the page depending on the client and wondering if that meant it wasn't "strong" validation just hashing the server response body, but I guess it's redundant thinking like that because the actual cached content is what's compared as far as the hash is concerned and not what the client-side scripting is modifying.

Devon H. O'Dell

не прочитано,
6 авг. 2019 г., 12:54:0006.08.2019
– mark...@gmail.com, golang-nuts
Op di 6 aug. 2019 om 09:10 schreef <mark...@gmail.com>:
>
> Thanks Devon!

You're welcome!

> So just to clarify our request flow is:
>
> Client > CDN > Go Reverse Proxy > Origin
>
> Our Go Reverse Proxy has historically been responsible for adding caching headers (e.g. Cache-Control and Surrogate-Control) when the origins have failed to do so (as a way to ensure things are cached appropriately).
>
>> It's unclear to me why you should be setting an etag header if you're a proxy.
>
> That's why when it came to looking at setting serve stale defaults for our origins (e.g. stale-while-revalidate and stale-if-error) I realized that somewhere along the chain an appropriate ETag/Last-Modified should be set and that's why I started wondering if our proxy should be responsible for setting them.
>
> Even then I felt like setting Last-Modified was way outside the responsibility of our proxy, but that maybe setting of ETag would have sufficed.

Ah, I see. So you're still the content owner; you're just further
offloading work from between your origin and the CDN. Assuming you're
not still multi-tenant behind your proxy (i.e. your proxy only serves
_your_ assets), then I think it's probably reasonable for you to make
that determination at your proxy. And from that perspective, I agree
that you'd be more interested in ETag/INM than LM/IMS on your proxy.

>> Unless you're serving from the filesystem handler (which does
>> implement IMS/INM), you'll need to implement these yourself.
>
> I think your other related answers might explain to me why the go reverse proxy doesn't support conditional requests, in that it's NOT a 'caching proxy' and so being able to handle that revalidation logic wouldn't make sense.

Right -- it boils down to whether a proxy is transparent or not. A
transparent proxy observes traffic and makes no changes to the
protocol or the discussion over it. The only impact it can really have
is if it stops servicing requests. A transparent proxy assumes that
both sides of the connection are speaking the same protocol, and so it
doesn't really have to know about protocol semantics.

A caching proxy isn't transparent. It looks like it because it ends up
having very good knowledge of the protocol it's proxying, but every
request isn't passed through unmodified, so it's by definition opaque.

>> Note that you _could_ simply proxy this to the origin and let it
>> handle the validation. This is often overkill for what people actually
>> need, but it is guaranteed to work.
>
> OK, so as we are indeed just proxying the request pretty much 'as is' to the origin, i.e. the CDN is making the revalidation conditional request when our stale-while-revalidate TTL expires, I'm guessing (I appreciate this is the 'basics' of how a proxy works, but I want to talk it through in case I'm mistaken in any way!) the go proxy will transparently keep that information for the origin to respond with the appropriate ETag/Last-Modified, and the go proxy again will transparently pass back their response through to the CDN to then update its cache if it indeed got a `200 OK` from origin or to continue serving stale if the origin returned a `304 Not Modified` (and in either case I expect the origin should send ETag/Last-Modified headers regardless of 200/304 status').

It's been a while since I was working at a CDN (Fastly) so I may be a
bit fuzzy here; what you've written sounds like a correct
understanding. Again, as the proxy is transparent, its knowledge of
the protocol is really not meaningful; as long as your CDN and origin
both implement the protocol correctly, your transparent proxy will
also be by definition correct. (Though I'd note that one could argue
that X-Forwarded-For makes most HTTP proxies not strictly transparent;
it's also not super meaningful anyway except for logging to make sure
you understand your topology when things go wrong.)

As you've already got a CDN in the picture, it seems to me (especially
if you're using origin shielding) that it won't be super helpful for
you to implement LM/IMS or ETag/INM in your proxy. Lack of explicit
support for this in Go is therefore hopefully not an issue for you
because a CDN supporting stale-while-revalidate and stale-if-error
will already be shielding your origins from heaps of revalidation
traffic.

However, it sounds like your origin isn't setting ETag or other cache
control headers everywhere it could. Adding strong ETags at your proxy
should be reasonably cheap since the CDN is shielding you from
revalidation storms, and it can also save you on your CDN's bandwidth
bill as it will allow your CDN to respond with 304s rather than 200s
for more objects.

>> A hash function over the body of the response would constitute strong
>> validation. I'm not sure why you'd need to mix in the path; there's
>> nothing wrong with serving the exact same content between two
>> endpoints, and the ETag is tied to a response object.
>
> Ah ok, so I was thinking along these lines, but was getting confused between content that is cached vs content that is rendered at 'runtime' (e.g. I was getting confused with the response containing a <script> tag that might dynamically change the adverts on the page depending on the client and wondering if that meant it wasn't "strong" validation just hashing the server response body, but I guess it's redundant thinking like that because the actual cached content is what's compared as far as the hash is concerned and not what the client-side scripting is modifying.

Yeah, this is more to do with counters, timestamps, and ad links that
are inserted at page construction (e.g. an iframe that might contain
different URLs for an ad) than it is to do with scripts that modify
the DOM to pick a different ad service or a "proxy URL" that is
capable of serving ads from different vendors. It's really more to do
with whether the semantics of the ETag are purely based on content, or
whether they're based on something more abstract like a version. (And
then why you'd pick one over another has to do with those previously
mentioned points.)

--dho
> --
> You received this message because you are subscribed to the Google Groups "golang-nuts" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/a8da59d3-c905-4e13-8d15-3792a36c2f61%40googlegroups.com.
Ответить всем
Отправить сообщение автору
Переслать
0 новых сообщений