Sharded CDN requests not making unsharded origin requests

17 views
Skip to first unread message

aro...@webscalenetworks.com

unread,
Sep 29, 2017, 5:42:18 AM9/29/17
to mod-pagespeed-discuss
We have pagespeed configured with ShardDomain to make use of a CDN for the optimized static assets.  However, we've noticed that when an incoming uncached pagespeed url is received, pagespeed will first look in the file cache for the unsharded asset and then it'll end up making a serf request using the sharded URL.  This doesn't make sense to me -- why doesn't it request the unsharded resource?

That is, we see the following sequence of operations:
[info] Starting fetch of                     https://SHARDED/URL/path/to/A.file.css.pagespeed.cf.SRCIGJU8Dx.css
[info] FetchResource:                        https://SHARDED/URL/path/to/A.file.css.pagespeed.cf.SRCIGJU8Dx.css
[info] Fetching                              https://SHARDED/URL/path/to/A.file.css.pagespeed.cf.SRCIGJU8Dx.css
[info] Finding in cache:                     https://SHARDED/URL/path/to/A.file.css.pagespeed.cf.SRCIGJU8Dx.css
[info] Unsharding:                           https://SHARDED/URL/path/to/A.file.css.pagespeed.cf.SRCIGJU8Dx.css ->
                                             https://ORIGIN/path/to/A.file.css.pagespeed.cf.SRCIGJU8Dx.css
[info] Finding                               https://ORIGIN/path/to/A.file.css.pagespeed.cf.SRCIGJU8Dx.css in cache
[info] FileCache::Get ret=0 (0B)             v3/ORIGIN/https://ORIGIN/path/to/A.file.css.pagespeed.cf.SRCIGJU8Dx.css ->
                                             /var/cache/pagespeed/CNAME/v3/ORIGIN/https,3A/,2FORIGIN/path/to/A.file.css.pagespeed.cf.SRCIGJU8Dx.css,
[info] Done: Not found, not written:         https://SHARDED/URL/path/to/A.file.css.pagespeed.cf.SRCIGJU8Dx.css
[info] Starting filter fetch:                https://SHARDED/URL/path/to/A.file.css.pagespeed.cf.SRCIGJU8Dx.css
[info] Finding                               https://SHARDED/URL/path/to/file.css in cache
[info] SUCCEEDED fetch of                    https://SHARDED/URL/path/to/A.file.css.pagespeed.cf.SRCIGJU8Dx.css
[info] FileCache::Get ret=0 (0B)             v3/ORIGIN/https://SHARDED/URL/path/to/file.css ->
                                             /var/cache/pagespeed/CNAME/v3/ORIGIN/https,3A/,2FSHARDED/URL/path/to/file.css,
[info] === SerfUrlAsyncFetcher::Fetch:       https://SHARDED/URL/path/to/file.css
[info] Initiating fetch to                   https://SHARDED/URL/path/to/file.css
[info] Actually starting the fetch itself to https://SHARDED/URL/path/to/file.css
[info] Started serf fetch:                   https://SHARDED/URL/path/to/file.css
[info] FileCache::Put (433573B)              v3/ORIGIN/https://SHARDED/URL/path/to/file.css ->
                                             /var/cache/pagespeed/CNAME/v3/ORIGIN/https,3A/,2FSHARDED/URL/path/to/file.css,

Why does the filter fetch and cache the original resource under the sharded domain?

- Augusto

Joshua Marantz

unread,
Sep 29, 2017, 8:23:12 AM9/29/17
to mod-pagespeed-discuss
Can you also paste the sharding configuration?


--
You received this message because you are subscribed to the Google Groups "mod-pagespeed-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mod-pagespeed-discuss+unsub...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/mod-pagespeed-discuss/930406ac-93f8-46ec-a295-d69a9df5ab36%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

aro...@webscalenetworks.com

unread,
Sep 29, 2017, 12:02:42 PM9/29/17
to mod-pagespeed-discuss

Otto van der Schaaf

unread,
Sep 29, 2017, 3:35:11 PM9/29/17
to mod-pagespeed-discuss
I'm curious, do you see evidence of the sharded hostname landing on the server in the access log? 
(mod_pagespeed's resource fetches will come in with a user-agent that mentions "mod_pagespeed")

Otto

--
You received this message because you are subscribed to the Google Groups "mod-pagespeed-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mod-pagespeed-di...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/mod-pagespeed-discuss/51bf40e1-6203-43ec-94b1-38d142337bcf%40googlegroups.com.

aro...@webscalenetworks.com

unread,
Sep 29, 2017, 3:50:37 PM9/29/17
to mod-pagespeed-discuss
Yes.  The relevant request logs look like:

# First request that is actually satisfied is the deepest, unsharded pagespeed request.  This is sent to the actual origin ip chosen by our loadbalancer config in the vhost.
CNAME 443 127.0.0.1    ORIGIN-IP [2017-09-27T22:28:44.620Z] 876812  "GET /path/to/file.css     HTTP/1.1" 200 - 1097 77108 M "Serf/1.3.8 (mod_pagespeed/1.12.34.2-0)" "ORIGIN-DOMAIN" 127.0.0.1 - 35.197.110.236 "text/css" TLSv1.2 ECDHE-RSA-AES128-GCM-SHA256 -
# Second request is the sharded pagespeed request made to the sharded domain.  The server host header specifies the target domain, but this came in on the sharded domain's vhost, hence the "/URL" prefix that reverse the sharding.
CNAME 443 127.0.0.1    -         [2017-09-27T22:28:44.620Z] 881638  "GET /URL/path/to/file.css HTTP/1.1" 200 - 632  77686 M "Serf/1.3.8 (mod_pagespeed/1.12.34.2-0)" "ORIGIN-DOMAIN" 127.0.0.1 - 35.197.110.236 "text/css" TLSv1.2 ECDHE-RSA-AES128-GCM-SHA256 -
# Third request is the original pagespeed request back to the original client.
CNAME 443 REQUESTER-IP -         [2017-09-27T22:28:44.694Z] 1097869 "GET /URL/path/to/A.file.css.pagespeed.cf.SRCIGJU8Dx.css HTTP/1.1" 200 - 1094 77724 P "Amazon CloudFront" "ORIGIN-DOMAIN" REQUESTER-IP - 35.197.110.236 "text/css" TLSv1.2 ECDHE-RSA-AES128-GCM-SHA256 US

To be clear, we have two vhosts in our configuration:
One vhost for the target domain and another vhost for the cdn domain.  It looks like the CDN requests come in on the cdn domain, get proxied back into apache for the cdn domain but without the pagespeed file extension which gets redirected again back into apache on the target vhost domain which finally goes through to the origin server.  I expected the middle hop to be skipped.

- Augusto

Otto van der Schaaf

unread,
Sep 29, 2017, 4:11:26 PM9/29/17
to mod-pagespeed-discuss
One more question: does the cdn-vhost share the same ModPagespeedShardDomain configuration?

Otto


--
You received this message because you are subscribed to the Google Groups "mod-pagespeed-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mod-pagespeed-di...@googlegroups.com.

Augusto Roman

unread,
Sep 29, 2017, 4:44:34 PM9/29/17
to mod-pagespeed-discuss
yes

You received this message because you are subscribed to a topic in the Google Groups "mod-pagespeed-discuss" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/mod-pagespeed-discuss/HjQQyAO7VbA/unsubscribe.
To unsubscribe from this group and all its topics, send an email to mod-pagespeed-di...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/mod-pagespeed-discuss/CAHqmWiPAbb%3DMOQVd-jOgP0QfKMQh0%3DSAnVX6un%3Dh_4oGvap7Og%40mail.gmail.com.

Otto van der Schaaf

unread,
Sep 29, 2017, 5:04:16 PM9/29/17
to mod-pagespeed-discuss
To make sure, the configuration you posted is the complete mod_pagespeed configuration for both vhosts?
(There's no allow/disallow configuration involved, right?)

If so, it may be worth tracing what happens in RewriteDriver::CreateInputResourceUnchecked().

I think that is where the framework figures out how to reverse any incoming domains that result from domain-rewriting (like sharding).

Otto


Reply all
Reply to author
Forward
0 new messages