pagespeed css request times out with 404 while optimizing

93 views
Skip to first unread message

aro...@webscalenetworks.com

unread,
Oct 11, 2017, 5:48:17 PM10/11/17
to mod-pagespeed-discuss
I've been hunting down 404 problems with css files for a while now and I finally have it nailed down.  I originally thought it was related to sharding (and to be sure, we had problems with our configuration there), but now I'm pretty sure it's a bug in pagespeed:

Version: 1.12.34.2

While optimizing a css file that references many other images, pagespeed is 404'ing subsequent requests for that resource.

Specifically:
* Starting pagespeed with a clean cache.
* Make a request for A.megahuge.css.pagespeed.cf.Ut0KGaDYUK.css
* Pagespeed fetches the original megahuge.css and starts downloading and optimizing the dependent resources.
* The original request for A.megahuge.css.pagespeed.cf.Ut0KGaDYUK.css times out after a few ms and returns the original resource with a short cache expiration.
--- so far, so good ---
* A second request for A.megahuge.css.pagespeed.cf.Ut0KGaDYUK.css comes in.  That gets stuck waiting in ResourceFetch::BlockingFetch for the callback to complete.
* Pagespeed continues working on optimizing the dependent resources from the first request.
* After 5 seconds, the BoundedWaitFor call in ResourceFetch::BlockingFetch gives up, and pagespeed returns a 404 (!) for the resource.

In this case, the css file takes several minutes to optimize, so we have tons of these 404's until somebody gets lucky and the CDN caches the optimized result.

It's even worse if the css file takes so long to optimize that pagespeed decides to refresh the content before serving the optimized result.


So, I've traced the code through and through to nail it down to this BoundedWaitFor call.  How can I fix the problem?  I want the same behavior as the original request: If the callback hasn't completed within a few ms, then return the un-optimized resource.

- Augusto

Joshua Marantz

unread,
Oct 11, 2017, 10:47:37 PM10/11/17
to mod-pagespeed-discuss
Nicely diagnosed.  This sounds like something I'm going to have to dig into.  Three questions:
  1. Do you know if this issue is new for 1.12?  Code related to this changed, I think, between 1.11 and 1.12.
  2. Can you work around this short term with:
        ModPagespeedDisallow *megahuge.css
  3. Would put this into a bug report on https://github.com/pagespeed/mod_pagespeed/issues

Thanks,
-Josh

--
You received this message because you are subscribed to the Google Groups "mod-pagespeed-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mod-pagespeed-discuss+unsub...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/mod-pagespeed-discuss/87d9d410-3c23-4095-b79a-4faea20f3672%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Joshua Marantz

unread,
Oct 12, 2017, 11:03:55 AM10/12/17
to d...@pagespeed.incubator.apache.org, aro...@webscalenetworks.com, mod-pagespeed-discuss
This bug report sent to mod-pagespeed-discuss, referencing code from instaweb_handler.cc:

void InstawebHandler::HandleAsPagespeedResource() {
  ....
  if (ResourceFetch::BlockingFetch(stripped_gurl_, server_context_, driver,
                                   callback)) {
   ...
  } else {
    server_context_->ReportResourceNotFound(original_url_, request_);
  }
  ...
}

BlockingFetch has an undocumented timeout (default 5 seconds, settable in pagespeed.conf).  There is a TODO to document it...
   ModPagespeedBlockingFetchTimeoutMs 10000
would set it to 10 seconds.

If a .pagespeed. resource can't be fetched and rewritten in the timeout, a 404 is returned and the web page breaks.  We should, for non-combined resources, either redirect to the origin resource, or just serve the origin resource with private/300 TTL.  I think a temp redirect would be easier to implement.

For combined resources, I think we should have a separate timeout, with a much higher default value. Another option is for CSS is to respond with a body containing CSS @import statements for the components, but I'm not sure if that's technically correct 100% of the time.  And for Combined JS and Sprites that would be a lot harder.  So I think it might be better to plumb in a higher timeout for combined resources, but ultimately we'd have to respond with an error if the timeout is exceeded.  And there's a question of rewritten/combined css files.  The outer-most pagespeed URL encoding will look like it has a single input, but transitively it's a combined resource and could not be solved with a redirect.

I thought in fact we *had* code somewhere for serving origin content for single-input .pagespeeed. resources.  But maybe that was for a different server?

@aroman two more questions:
   1.   You know your way around our code.  You have a great testcase,  With our guidance, do you want to take a shot at doing the fix yourself?
   2.   Now that PageSpeed is in Apache incubation, the right mailing list to subscribe to is d...@pagespeed.incubator.apache.org, which you can subscribe to by sending mail to 

-Josh

To unsubscribe from this group and stop receiving emails from it, send an email to mod-pagespeed-discuss+unsubscri...@googlegroups.com.

Augusto Roman

unread,
Oct 12, 2017, 12:37:37 PM10/12/17
to Joshua Marantz, d...@pagespeed.incubator.apache.org, mod-pagespeed-discuss
>  1. Do you know if this issue is new for 1.12?  Code related to this changed, I think, between 1.11 and 1.12.

I don't know.  I can try to establish this if it's relevant.

>  2. Can you work around this short term with:
>        ModPagespeedDisallow *megahuge.css

Yes, disabling optimization for the affected css files works.

>  3. Would put this into a bug report on https://github.com/pagespeed/mod_pagespeed/issues



>   1.   You know your way around our code.  You have a great testcase,  With our guidance, do you want to take a shot at doing the fix yourself?

Yes.  I'd prefer to serve the original resource with private/300 TTL, but I'm willing to do a redirect if that's too hard.

>   2.   Now that PageSpeed is in Apache incubation, the right mailing list to subscribe to is d...@pagespeed.incubator.apache.org, which you can subscribe to by sending mail to 

Thanks!  I'm subscribed.

For combined resources, just increasing the timeout isn't going to help.  These offending css files reference an absurd amount of dependencies.  Wouldn't the correct result be to simple concatenate the original resources if we can't serve the optimized/combined version in time?

- Augusto

To unsubscribe from this group and stop receiving emails from it, send an email to mod-pagespeed-di...@googlegroups.com.

Joshua Marantz

unread,
Oct 12, 2017, 1:19:44 PM10/12/17
to Augusto Roman, d...@pagespeed.incubator.apache.org, mod-pagespeed-discuss
On Thu, Oct 12, 2017 at 12:37 PM, Augusto Roman <aro...@webscalenetworks.com> wrote:
>  1. Do you know if this issue is new for 1.12?  Code related to this changed, I think, between 1.11 and 1.12.

I don't know.  I can try to establish this if it's relevant.

After having looked into the problem, I don't think it is relevant anymore.  Sorry for the distraction :)
Thanks!  I've followed up there. 

Reply all
Reply to author
Forward
0 new messages