Re: Varnish, mod_pagespeed, additional header for reverse proxy caching fully optimized html

2,004 views
Skip to first unread message

Jeff Kaufman

unread,
Dec 10, 2012, 10:30:53 AM12/10/12
to mod-pagesp...@googlegroups.com, Ilya Grigorik
On Sat, Dec 8, 2012 at 2:54 PM, Łukasz Rysiak <lu...@critical.pl> wrote:
>
> My problem comes from a slow backend - i have java app, which responds in
> 6-20sec and i cant throw any more hardware at it

This is a little weird, but could you put varnish between the java app
and mod_pagespeed? While mod_pagespeed won't cache html you could set
varnish to if you know its safe. This should dramatically reduce the
number of requests to your java app while still keeping
mod_pagespeed's optimizations.

Jeff

Ilya Grigorik

unread,
Dec 10, 2012, 8:26:19 PM12/10/12
to mod-pagesp...@googlegroups.com, Ilya Grigorik
This is a little weird, but could you put varnish between the java app
and mod_pagespeed?  While mod_pagespeed won't cache html you could set
varnish to if you know its safe.  This should dramatically reduce the
number of requests to your java app while still keeping
mod_pagespeed's optimizations.

That would work, but I'm guessing, not nearly as well as with Varnish at the front... 

Lukasz, thinking about your case some more.. Here's a different proposal, curious to hear your thoughts:

1) Effectively, you're using Varnish to mask the hideous response time of the app server
2) You want to add MPS in between to also optimize the output
3) The problem is that you don't know when (2) is "done" with its optimizations.

Assuming you just blindly cache the MPS output of first request, technically you shouldn't be any worse off: some optimizations may be applied, others may stil be in progress, but there is still a small win compared to no optimizations. However, images will likely remain uncached. 

So, to address the above case, you need an indicator for when MPS "is done". With that in mind..

a) You still, likely, want to cache even the "not fully optimized" case for some time, although probably for a short period
b) You can probably use combination of "grace" and "saint" modes in Varnish to continue serving the response from (a), while you dispatch a request for the fully optimized page.. Once this completes, you can cache for the full amount.

So, I think it would actually make sense to do the opposite and return a "i'm not done yet" header from MPS, if some optimizations are still running. This way, a fully optimized page does not incur any additional headers, and you still have the necessary signal to distinguish between the cases.

BUT.. all of the above is still a bit of a problem with MPS, since as Shawn pointed out, knowing upfront whether this header should be set, or not, creates some problems for MPS...

ig

Joshua Marantz

unread,
Dec 11, 2012, 8:15:51 AM12/11/12
to mod-pagespeed-discuss, Ilya Grigorik
We were just talking about this problem today.  Depending on the site, how often it changes, and the sizing of the caches, mod_pagespeed may never fully optimize the site.  We were thinking about tracking the number of a few stats so that progress could be measured even if it never fully settles.
   # of resources optimized
   # of resources that are not optimizable (cache-control private/nocache, unauthorized domain, disallowed by wildcard, etc)
   # of resources whose optimization is in flight (http-fetch in progress or rewrite in progress)
In general we don't know these numbers until the *end* of the HTML, so we might not be able to write that to the response headers.  Would it work for you to have this data come out in a comment at the end of the HTML body?  If your system never calls $flush, however, then we could update the response-headers with the full set of data.
  
Of course for your particular site it might fully settle, and that would come out in those stats as well.

By '5 minute window' are you referring to the implicit cache lifetime of resources that lack any caching headers?  We could add a flag for that but we thought it was usually pretty easy to modify that at the origin for the resources in question.  But if it's not (e.g. you are proxying a site whose configuration you do not control) then it wouldn't be hard for us to make that configurable in pagespeed.conf.


On Tue, Dec 11, 2012 at 3:22 AM, Łukasz Rysiak <lu...@critical.pl> wrote:
Jeff, Ilya,


On Tuesday, December 11, 2012 2:26:19 AM UTC+1, Ilya Grigorik wrote:

This is a little weird, but could you put varnish between the java app
and mod_pagespeed?  While mod_pagespeed won't cache html you could set
varnish to if you know its safe.  This should dramatically reduce the
number of requests to your java app while still keeping
mod_pagespeed's optimizations.

That would work, but I'm guessing, not nearly as well as with Varnish at the front... 

yes, this solution is my last resort.
 
[...]
BUT.. all of the above is still a bit of a problem with MPS, since as Shawn pointed out, knowing upfront whether this header should be set, or not, creates some problems for MPS...

ig

well, i'm overwriting ttl of html here, so I decide when html should be updated. You have to understand, that from my point of view images are most important thing to optimize. done/not_done - fine with me as long as this will work and i'll get info that MPS finished.

btw. is there a way to extend this 5min window of MPS?

 

Joshua Marantz

unread,
Dec 11, 2012, 9:43:41 AM12/11/12
to mod-pagespeed-discuss, Ilya Grigorik
That makes sense.  The flush-window is determined by whatever module is generating the HTML content that mod_pagespeed is optimizing.  In PHP, a flush-window is defined by the PHP call $flush.  I'm not entirely sure whether mod_proxy will induce a Flush.  In fact we were talking about whether it might be possible for mod_pagespeed to add more Flush commands itself between Apache buckets based on various heuristics of elapsed time & byte count.

Thanks for the doc typo report -- will fix that in the next rev.

I have another idea about how to deliver completed pages to Varnish.  Starting with 1.1.23.1 there is a pagespeed.conf option to force mod_pagespeed to fully optimize each page no matter how long it takes.

   ModPagespeedBlockingRewriteKey YOUR_KEY

If you set this option, and send request-header "X-PSA-Blocking-Rewrite: YOUR_KEY" to mod_pagespeed, then mod_pagespeed will always deliver fully optimized pages.  The downside is that on an image rich page and a cold server cache, this may take multiple seconds, so you would wind up punishing users occasionally if this was used to service an end-user request after the Varnish cache expired.

However, you might be able to construct a system with scripts, wget, Varnish & mod_pagespeed where end-user requests were *always* served by Varnish, and you periodically freshened the Varnish cache by running 'wget YOURSITE -H X-PSA-Blocking-Rewrite:YOUR_KEY'.  You might be able to convince Varnish to avoid sending a response when that header is set, but to use it to *write* into the cache.  I'm not sure how easy this would be to put together.

-Josh



On Tue, Dec 11, 2012 at 9:13 AM, Łukasz Rysiak <lu...@critical.pl> wrote:
Varnish don't have access to request body - it'd be an overkill to check source for a string.

I've seen this flush window in docs regarding css and js concatenation/minification. Can I modify flush window size? I haven't found anything on this topic in docs. I'm not sure if its apache variable, MPS variable, or defined somewhere else.

By "5min window" i mean i.e. this part of docs:
If the site owner changes the logo, then mod_pagepseed will notice within 5 minutes and begin serving a different URL to users. But if the content does not change, then the hash will not change, and the copy in each user's browser will still be valid and reachable.
(theres a typo in docs - "mod_pagepseed")

looks like i've missed this part:
mod_pagespeed uses the origin cache time-to-live (TTL), in this case 300 seconds, to periodically re-examine the content to see if it's changed. If it changes, then the hash of the content will also change. Thus it's safe to serve the hashed URL with a long timeout -- mod_pagespeed uses one year.
 
As long as it respects origin ttl, it's fine.

Joshua Marantz

unread,
Dec 11, 2012, 12:28:34 PM12/11/12
to mod-pagespeed-discuss, Ilya Grigorik
I just tried this with my own setup and it seemed to work for me.  I'm not familiar with the details of the varnish language but this is what I did.  I started mod_pagespeed on port 8080 on the examples directory with the blocking-key 'psatest' and did this sequence:

 url="http://localhost:8080/mod_pagespeed_example/rewrite_images.html?ModPagespeedFilters=rewrite_images,-inline_images"

  touch $PAGESPEED_CACHE_DIR/cache.flush
  sleep 5
  wget -q -O - --header=X-PSA-Blocking-Rewrite:psatest $url | grep -c pagespeed.ic
  3

This shows all 3 optimizable images as optimized.  Now try it without that header:
  
  touch $PAGESPEED_CACHE_DIR/cache.flush
  sleep 5
  wget -q -O - $url | grep -c pagespeed.ic
  0

Is it possible that your request-header isn't quite reaching mod_pagespeed?  Also, sanity checking that you restarted mod_pagespeed after adding the option to pagespeed.conf.



On Tue, Dec 11, 2012 at 11:13 AM, Łukasz Rysiak <lu...@critical.pl> wrote:

On Tuesday, December 11, 2012 3:43:41 PM UTC+1, jmarantz wrote:
I have another idea about how to deliver completed pages to Varnish.  Starting with 1.1.23.1 there is a pagespeed.conf option to force mod_pagespeed to fully optimize each page no matter how long it takes.

   ModPagespeedBlockingRewriteKey YOUR_KEY

If you set this option, and send request-header "X-PSA-Blocking-Rewrite: YOUR_KEY" to mod_pagespeed, then mod_pagespeed will always deliver fully optimized pages.  The downside is that on an image rich page and a cold server cache, this may take multiple seconds, so you would wind up punishing users occasionally if this was used to service an end-user request after the Varnish cache expired.

MPS version:
X-Mod-Pagespeed:
1.1.23.2-2191
pagespeed.conf:
    ModPagespeedBlockingRewriteKey lurykeyone1 
Varnish vcl change:
sub vcl_recv { unset req.http.Cookie; set req.http.X-PSA-Blocking-Rewrite = "lurykeyone1";
varnishlog -i 
   23 TxHeader     b X-PSA-Blocking-Rewrite: lurykeyone1
   15 TxHeader     b X-PSA-Blocking-Rewrite: lurykeyone1
   15 TxHeader     b X-PSA-Blocking-Rewrite: lurykeyone1
   14 TxHeader     b X-PSA-Blocking-Rewrite: lurykeyone1
   14 TxHeader     b X-PSA-Blocking-Rewrite: lurykeyone1
services restarted. 

and still, on first req/resp i get unoptimized images on first hit.

Maybe theres an issue with file name? There is no file extension here, but MPS creates jpg file out of this after 2'nd/3rd hard refresh.

Also when i check /mod_pagespeed_message i dont see any info for image_gallery or any of the parameters until pagespeed tries to fetch optimized version - should it be like this? I'd rather expect to see first fetch of an unoptimized image, info that it's being optimized and then fetch for optimized version. 

--

Lukas

Joshua Marantz

unread,
Dec 11, 2012, 4:58:06 PM12/11/12
to mod-pagespeed-discuss, Ilya Grigorik
The option was not documented because we didn't anticipate that it would be useful for users.  We put it in for testing purposes.  In fact while I've given you an outline of a solution, we still don't have a complete solution that freshens the cache periodically using a non-user-facing request. 

PSA = Page Speed Automatic, which is the previous name for PSOL (PageSpeed Optimization Libraries), the platform-independent optimization infrastructure underlying mod_pagespeed, ngx_pagespeed, and PageSpeed Service.  We are (slowly) trying to migrate from 'psa' to 'psol' in various places in our system.

If you are able to provide scripts for a complete solution using this, then we'll doc it.

-Josh


On Tue, Dec 11, 2012 at 4:02 PM, Łukasz Rysiak <lu...@critical.pl> wrote:
ok, it works, thank you

There were two problems:
  1. i forgot that i switched the setup to apache->varnish->backend so my header added in varnish went to backend without MPS
  2. image i was checking was some gallery image loaded by javascript so parser doesn't know its url.
tested on pages with 20+ images - works great.

Why this option is not documented? It could save us this whole discussion since i've read every documentation page before i've posted my question. 
I've found this header only in your github messages and in source here: http://modpagespeed.googlecode.com/svn/trunk/src/net/instaweb/http/meta_data.cc
you could at least copy this commit message to docs: http://code.google.com/p/modpagespeed/source/detail?r=1639

looks like all "PSA" headers are not documented. What PSA stands for? Advanced? :)
also X-Psa-Load-Shed has Psa instead of PSA.

Once again thank you for your help!

--

Regards,
Lukas

Adrian Boston

unread,
Dec 12, 2012, 9:31:55 PM12/12/12
to mod-pagesp...@googlegroups.com

6- 20 seconds from a java app. heehee. 

i've had exactly the same horrendous experience with a j2ee n-tier ejb sql distributed app. 

your immediate, amazing and utterly cheap solution is:

cache your entire site using WGET every nite.

wget acts as your first user who experiences this enterprise-level 6-20sec response time. 

output the html into the SAN.


i used the exact solution with a fibre-connected EMC. mind blowing performance.

good luck








On 2012-12-11, at 1:02 PM, Łukasz Rysiak wrote:

ok, it works, thank you

There were two problems:
  1. i forgot that i switched the setup to apache->varnish->backend so my header added in varnish went to backend without MPS
  2. image i was checking was some gallery image loaded by javascript so parser doesn't know its url.
tested on pages with 20+ images - works great.

Why this option is not documented? It could save us this whole discussion since i've read every documentation page before i've posted my question. 
I've found this header only in your github messages and in source here: http://modpagespeed.googlecode.com/svn/trunk/src/net/instaweb/http/meta_data.cc
you could at least copy this commit message to docs: http://code.google.com/p/modpagespeed/source/detail?r=1639

looks like all "PSA" headers are not documented. What PSA stands for? Advanced? :)
also X-Psa-Load-Shed has Psa instead of PSA.

Once again thank you for your help!

--

Regards,
Lukas

Reply all
Reply to author
Forward
0 new messages