Help understanding why the rule "ModPagespeedMapOriginDomain localhost *.example.com" does not cause an infinite loop

50 views
Skip to first unread message

Bruno Negrão Guimarães Zica

unread,
Apr 11, 2017, 1:52:29 PM4/11/17
to mod-pagespeed-discuss
Hi all,

I'd like to understand what happens behind the scenes when I use the setting bellow:

ModPagespeedMapOriginDomain localhost *.example.com

Why doesn't that generate an infinite loop

The documentation says "Pagespeed must fetch those resources using HTTP, using the URL reference specified on the HTML page." So let's say my apache server responding for *.example.com receives a request for http://www.example.com/file1.html. Will pagespeed make a new http request to the localhost to get that file? If yes, why it does not end up in an infinite loop?

Joshua Marantz

unread,
Apr 11, 2017, 3:55:40 PM4/11/17
to mod-pagespeed-discuss
Hi -- this is a great question.  The reason this doesn't loop is that mod_pagespeed does not do HTTP fetches in response to requests for an origin resource.  For example, let's say you have index.html, which consists solely of the line:
  <img src="foo.png"/>

mod_pagespeed installs an output-filer in Apache that scans that HTML and finds the reference to foo.png, maps its URL to http://localhost/foo.png, and initiates a background fetch it with an HTTP fetcher that runs in a separate thread.

This fetch gets routed back to Apache, which looks for a handler for "foo.png".  mod_pagespeed's resource handler wakes up, looks at the URL and other properties of the Apache request object, and does not see any patterns that it recognizes, and passes the request onto the next handler.  If the URL leaf had was in a specific format FILE.pagespeed.FILTER.HASH.EXT, then mod_pagespeed would handle the request.

Apache might then fall-through to a file-based handler or a proxy-based handler.

So there is no further fetch done by mod_pagespeed.

Also note that mod_pagespeed has some other loop protection in the request-headers of fetches it initiates (e.g. "pagespeed" as part of the user-agent) but I think that's more of a fallback.  I think also it sends "Pagespeed:off" in the request-headers as well.

- - - - - - - - - - - - - - - - - - - - AVISO IMPORTANTE / IMPORTANT NOTICE - - - - - - - - - - - - - - - - - - - - - -

Esta mensagem pode conter informações confidenciais e somente o indivíduo ou entidade a quem foi destinada pode utilizá-la. A transmissão incorreta da mensagem não acarreta a perda de sua confidencialidade. Caso esta mensagem tenha sido recebida por engano, solicitamos que o fato seja comunicado ao remetente e que a mensagem seja eliminada de seu sistema imediatamente. É vedado a qualquer pessoa que não seja o destinatário usar, revelar, distribuir ou copiar qualquer parte desta mensagem. Ambiente de comunicação sujeito a monitoramento.



This message may include confidential information and only the intended addressee have the right to use it as is, or any part of it. A wrong transmission does not break its confidentiality. If you've received it because of a mistake or erroneous transmission, please notify the sender and delete it from your system immediately. This communication environment is controlled and monitored.



- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

--
You received this message because you are subscribed to the Google Groups "mod-pagespeed-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mod-pagespeed-discuss+unsub...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/mod-pagespeed-discuss/536f1dd3-e22d-4951-b7d2-15afad305eb6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Bruno Negrão Guimarães Zica

unread,
Apr 12, 2017, 11:03:47 AM4/12/17
to mod-pagespeed-discuss
Hi jmarantz, thank you for the answer. Now more two questions to make the understanding complete.

When modpagespeed does that "internal fetch" you mention, would that fetch be logged to apache access_log? (i guess it won't be logged)

I also would like to know how does "In-place Resource Optimization (IPRO)" feature change the workflow you just mentioned.

You said:

 If the URL leaf had was in a specific format FILE.pagespeed.FILTER.HASH.EXT, then mod_pagespeed would handle the request

When IPRO is "on" modpagespeed will handle requests to common URLs that don't have modpagespeed's signature. How does that work?

Joshua Marantz

unread,
Apr 12, 2017, 11:19:06 AM4/12/17
to mod-pagespeed-discuss
On Wed, Apr 12, 2017 at 11:03 AM, Bruno Negrão Guimarães Zica <bruno...@infoglobo.com.br> wrote:
Hi jmarantz, thank you for the answer. Now more two questions to make the understanding complete.

When modpagespeed does that "internal fetch" you mention, would that fetch be logged to apache access_log? (i guess it won't be logged)

Yes, it  will be logged. 

I also would like to know how does "In-place Resource Optimization (IPRO)" feature change the workflow you just mentioned.

I was debating in my first reply whether to talk about the IPRO flow.  I was thinking of keeping it simple but let's dive in now :)

in-place rewriting doesn't rely on the HTML-parsing and or HTTP fetching to rewrite resources.  Instead, it records the contents of those resources in an output-filter, stores them in cache, and then optimizes them in a later pass. Usually it takes about 3 requests for a resource before it is delivered ipro-optimized:

1. collect the contents as it streams through
2. find the entry in cache and initiate a background-rewrite (which probably will miss deadline for an image rewrite) and store the optimized result in cache.
3. deliver the optimized resource.

You said:

 If the URL leaf had was in a specific format FILE.pagespeed.FILTER.HASH.EXT, then mod_pagespeed would handle the request

When IPRO is "on" modpagespeed will handle requests to common URLs that don't have modpagespeed's signature. How does that work?


 However, to your original point, this never results in an HTTP fetch.  Instead, if the resource is not in cache (either optimized or origin) then it installs a 'recorder' output-filter to save the resource as Apache streams it through.




On Tuesday, April 11, 2017 at 4:55:40 PM UTC-3, jmarantz wrote:
Hi -- this is a great question.  The reason this doesn't loop is that mod_pagespeed does not do HTTP fetches in response to requests for an origin resource.  For example, let's say you have index.html, which consists solely of the line:
  <img src="foo.png"/>

mod_pagespeed installs an output-filer in Apache that scans that HTML and finds the reference to foo.png, maps its URL to http://localhost/foo.png, and initiates a background fetch it with an HTTP fetcher that runs in a separate thread.

This fetch gets routed back to Apache, which looks for a handler for "foo.png".  mod_pagespeed's resource handler wakes up, looks at the URL and other properties of the Apache request object, and does not see any patterns that it recognizes, and passes the request onto the next handler.  If the URL leaf had was in a specific format FILE.pagespeed.FILTER.HASH.EXT, then mod_pagespeed would handle the request.

Apache might then fall-through to a file-based handler or a proxy-based handler.

So there is no further fetch done by mod_pagespeed.

Also note that mod_pagespeed has some other loop protection in the request-headers of fetches it initiates (e.g. "pagespeed" as part of the user-agent) but I think that's more of a fallback.  I think also it sends "Pagespeed:off" in the request-headers as well.
On Tue, Apr 11, 2017 at 1:52 PM, Bruno Negrão Guimarães Zica <bruno...@infoglobo.com.br> wrote:
Hi all,

I'd like to understand what happens behind the scenes when I use the setting bellow:

ModPagespeedMapOriginDomain localhost *.example.com

Why doesn't that generate an infinite loop

The documentation says "Pagespeed must fetch those resources using HTTP, using the URL reference specified on the HTML page." So let's say my apache server responding for *.example.com receives a request for http://www.example.com/file1.html. Will pagespeed make a new http request to the localhost to get that file? If yes, why it does not end up in an infinite loop?

- - - - - - - - - - - - - - - - - - - - AVISO IMPORTANTE / IMPORTANT NOTICE - - - - - - - - - - - - - - - - - - - - - -

Esta mensagem pode conter informações confidenciais e somente o indivíduo ou entidade a quem foi destinada pode utilizá-la. A transmissão incorreta da mensagem não acarreta a perda de sua confidencialidade. Caso esta mensagem tenha sido recebida por engano, solicitamos que o fato seja comunicado ao remetente e que a mensagem seja eliminada de seu sistema imediatamente. É vedado a qualquer pessoa que não seja o destinatário usar, revelar, distribuir ou copiar qualquer parte desta mensagem. Ambiente de comunicação sujeito a monitoramento.



This message may include confidential information and only the intended addressee have the right to use it as is, or any part of it. A wrong transmission does not break its confidentiality. If you've received it because of a mistake or erroneous transmission, please notify the sender and delete it from your system immediately. This communication environment is controlled and monitored.



- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -


--
You received this message because you are subscribed to the Google Groups "mod-pagespeed-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mod-pagespeed-discuss+unsubscri...@googlegroups.com.

- - - - - - - - - - - - - - - - - - - - AVISO IMPORTANTE / IMPORTANT NOTICE - - - - - - - - - - - - - - - - - - - - - -

Esta mensagem pode conter informações confidenciais e somente o indivíduo ou entidade a quem foi destinada pode utilizá-la. A transmissão incorreta da mensagem não acarreta a perda de sua confidencialidade. Caso esta mensagem tenha sido recebida por engano, solicitamos que o fato seja comunicado ao remetente e que a mensagem seja eliminada de seu sistema imediatamente. É vedado a qualquer pessoa que não seja o destinatário usar, revelar, distribuir ou copiar qualquer parte desta mensagem. Ambiente de comunicação sujeito a monitoramento.



This message may include confidential information and only the intended addressee have the right to use it as is, or any part of it. A wrong transmission does not break its confidentiality. If you've received it because of a mistake or erroneous transmission, please notify the sender and delete it from your system immediately. This communication environment is controlled and monitored.



- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

--
You received this message because you are subscribed to the Google Groups "mod-pagespeed-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mod-pagespeed-discuss+unsub...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages