Reached end of document without finding <body>

21 views
Skip to first unread message

Nigel

unread,
Nov 9, 2010, 8:50:44 PM11/9/10
to mod-pagespeed-discuss
I am finding a lot of messages in /var/log/apache2/error.log of the
form

[Wed Nov 10 02:35:03 2010] [error] ...: Reached end of document
without finding <body>

These are without fail wrong. Every file I've looked at that the
message claims not to have a <body> in does contain a <body>
statement.

Regards,

-Nigel

Joshua Marantz

unread,
Nov 9, 2010, 8:56:20 PM11/9/10
to mod-pagesp...@googlegroups.com
Actually this is likely indicating that something is wrong in the mod_pagespeed setup that we should improve.  mod_pagespeed is likely trying to process as HTML some bytes that are not HTML.  Can you provide more log fragments?

In this case it might be useful to increase the logging to include 'info' level -- I think that will tell us which URLs mod_pagespeed is trying to process.  There might be something wrong in the guard logic for mod_pagespeed, which tries to look at the mime type in the Apache request structure, and turn off mod_pagespeed it it's not html.

-Josh

Jan-Willem Maessen

unread,
Nov 9, 2010, 11:04:04 PM11/9/10
to mod-pagesp...@googlegroups.com
Note that I have managed to repro this locally on my own deployment,
so there's some hope of getting a handle on it.

-Jan

Nigel

unread,
Nov 10, 2010, 10:21:00 AM11/10/10
to mod-pagespeed-discuss
On Nov 9, 8:56 pm, Joshua Marantz <jmara...@google.com> wrote:
> Actually this is likely indicating that something is wrong in the
> mod_pagespeed setup that we should improve.  mod_pagespeed is likely trying
> to process as HTML some bytes that are not HTML.  Can you provide more log
> fragments?

All of the files are HTML files.

The only information I haven't included from the log is the name of
the html file, because I didn't think that will help - you can't get
to the original HTML file from outside, you'll get the file *after*
mod_pagespeed processing.

> -Josh

Regards,

-Nigel

Danielc1234

unread,
Jan 3, 2011, 9:28:57 AM1/3/11
to mod-pagespeed-discuss
I am having this same issue. We are calling a php file on another
site. The php is actually calling some js script for a chat software.

I was thinking since ours was pointing to a HTTPS file that
mod_pagespeed didnt like that, but I dont think that will help.

Is there a setting to bypass or exclude a url?

Jan-Willem Maessen

unread,
Jan 3, 2011, 9:51:02 AM1/3/11
to mod-pagesp...@googlegroups.com
There is in recent versions (0.9.11.3 and later).  Below is snipped from:


There are some examples of usage there.

-Jan

Restricting Resouce Rewriting Via Wildcards

Note: New feature as of 0.9.11.3

By default, all resources (css, images, javascript) found in HTML files whose origin matches the HTML file, or whose origin is authorized via ModPagespeedDomain, will be rewritten. However, this can be restricted by using wildcards, using the directives:

  ModPagespeedAllow wildcard_spec
  ModPagespeedDisallow wildcard_spec

These directives are evaluated in sequence for each resource, to determine whether the resource should be consider for rewriting. This is best considered with an example.

Jan-Willem Maessen

unread,
Jan 3, 2011, 10:04:46 AM1/3/11
to mod-pagesp...@googlegroups.com
Oh, dear, I should clarify.  The below works for *resources* (images, css, javascript) in 0.9.11.3, but  only works for *html* in recent svn versions.  So you'll need to install from source if (as I suspect) the problem is that apache is telling instaweb a non-html file has content-type text/html.

Sorry for the confusion.

-Jan
Reply all
Reply to author
Forward
0 new messages