Issues with Busy workers and Max Clients after Pagespeed implementation

339 views
Skip to first unread message

Kumaresh S

unread,
Dec 1, 2014, 7:49:53 AM12/1/14
to mod-pagesp...@googlegroups.com
Hi

After Implementing Pagespeed to one of our Online website, we started getting too many performance issues.

1. Our cpu is exploding with peak performance on web servers.
2. The Busy workers have gone high.
3. Today we had an issue with Max Clients and the entire website went down due to it(across both the servers)

I need some recommendations on how to proceed with pagespeed further, as it is causing performance bottlenecks since implementation.

Here are our Infra details:-

1. Apache Webserver x2 ( 2 Core VM) Apache/2.2.15 
2. 2 Webservers running on each server(one for website and another for m site both are pagespeed configured independently)
2. RHEL Red Hat Enterprise Linux Server release 6.4 (Santiago)
3. Pagespeed mod_pagespeed 1.8.31.4-4056
4. Mod-Cluster 1.0.x
5. Backend JBOSS EAP 5.1.1

IT uses HTTPD Worker and here is the configuration:-

This configuration is tuned for high throughout and performed well during performance benchmarks without pagespeed.

<IfModule worker.c>
ServerLimit          100
StartServers          2
MaxClients          5000
MinSpareThreads      50
MaxSpareThreads      50 
ThreadsPerChild      50
MaxRequestsPerChild   0
</IfModule>

Currently the error log size is kept increasing due to Fetch failures and I've attached error logs, Max Clients exceeded error.


I noticed the below error for the first time after implementing pagespeed:-

terminate called after throwing an instance of 'std::logic_error'
  what():  basic_string::_S_construct NULL not valid

I've attached today's log and the Max clients error can be noticed after 05:30 PM(I've masked the actual domain with example.com)



Expecting to get some sooner updates from you. Thanks in advance for your help!





apache_error_log.txt
pagespeed_stats_log.txt

Joshua Marantz

unread,
Dec 1, 2014, 8:24:13 AM12/1/14
to mod-pagespeed-discuss
Hi.  Can you share your pagespeed.conf?  Do you have something like this in it?

ModPagespeedLoadFromFile http://www.example.com/static/ /applic/media/htdocs/example/website/media

If so, can you verify that this is the correct path?  Can you also tell me what kind of file system is mounted on /applic?

-Josh

Kumaresh S

unread,
Dec 1, 2014, 8:11:12 PM12/1/14
to mod-pagesp...@googlegroups.com
Hi Josh,

Thanks for the reply. It is an NFS File System mounted on /applic/
I've attached the configuration for your reference.

And My further findings are Busy Workers shooted up to 5000 which is the max limit, as per the configuration, although the site didn't receive that much request and this happened exactly after I noticed the below error message in the logs:-


terminate called after throwing an instance of 'std::logic_error'
  what():  basic_string::_S_construct NULL not valid


I've attached the Graphs taken from Introscope for your reference.


I'm not sure if Pagespeed has triggerred this error and busy workers, as this happened only for the first time after pagespeed implementation

Kumaresh
Busy Workers.png
pagespeed - example.conf
Request Per Interval.png

Kumaresh S

unread,
Dec 1, 2014, 9:20:27 PM12/1/14
to mod-pagesp...@googlegroups.com
Hi

My Further analysis reveals that the Shooting up of the Busy workers is the symtom and that is triggered exactly after the below log condition is observed and my suspicion is it is getting triggered from pagespeed module.

Can you please help identify the root cause? As it is a trading site, and we are being hit due to this behavior.

Joshua Marantz

unread,
Dec 1, 2014, 9:23:30 PM12/1/14
to mod-pagespeed-discuss
Please turn off ModPagespeedLoadFromFile when the filesystem is on NFS.  This is because mod_pagespeed will stat each resource file every time time it handles an HTML request.  This uses a blocking operation on the file system, which is fine if it's local, but a disaster at scale with NFS.

Instead you want to use HTTP fetches to get the resources into mod_pagespeed, which is the default.  This is much better in your use-case because it will use the origin TTL from the HTML request to determine how frequently to refetch resources to see if they've changed.

With LoadFromFile and enough traffic, the blocking stat() calls can slow down over NFS and delay the HTML requests.  Because the HTML requests start taking a long time, Apache spins up more subprocesses to handle incoming requests, compounding the problem.

I suspect the std_logic exception is simply the system running out of memory when enough subprocesses are created.

-Josh

--
You received this message because you are subscribed to the Google Groups "mod-pagespeed-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mod-pagespeed-di...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/mod-pagespeed-discuss/a809360b-82a1-4552-9ab7-9a8a239e634d%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Kumaresh S

unread,
Dec 1, 2014, 11:05:37 PM12/1/14
to mod-pagesp...@googlegroups.com
Thanks Josh, I will disable this flag(ModPagespeedLoadFromFile ) from configuration tonight and let you know how it goes in the next few days

Joshua Marantz

unread,
Dec 1, 2014, 11:09:11 PM12/1/14
to mod-pagespeed-discuss
Super.  It's worth doing a quick check that the HTTP fetching path (a) works, and (b) has an appropriate cache TTL configured.

Since the origin of your resources is on a different physical machine you should probably identify that machine in a ModPagespeedMapOriginDomain command, so mod_pagespeed can do the fetches without going out to the internet and back in.

-Josh

--
You received this message because you are subscribed to the Google Groups "mod-pagespeed-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mod-pagespeed-di...@googlegroups.com.

Kumaresh

unread,
Dec 1, 2014, 11:18:37 PM12/1/14
to mod-pagesp...@googlegroups.com
Hi Josh,

Although the media(static) files are on NFS, it is mounted as FS. So for Apache, the drive appears like a local drive. So is it still required we need to map the origin domain.

Can you please evaluate the below warnings, I started getting in UAT after the disabling of ModPagespeedLoadFromFile.

Thanks for the help!

Kumaresh


--
You received this message because you are subscribed to a topic in the Google Groups "mod-pagespeed-discuss" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/mod-pagespeed-discuss/mO2pNg4bM40/unsubscribe.
To unsubscribe from this group and all its topics, send an email to mod-pagespeed-di...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/mod-pagespeed-discuss/CAGKR%2BEBXHhu%3DzBf0xYP6L8Opiz1F7rKefPp%2B9iZirLN0Dmj3%2Bg%40mail.gmail.com.
errorlogs_uat_after_change.txt

Joshua Marantz

unread,
Dec 1, 2014, 11:24:27 PM12/1/14
to mod-pagespeed-discuss
Ideally you should tell PageSpeed to fetch the files from the physical machine where they reside.  However if the TTL is high enough it should be OK.

Based on the error messages you provided, you might need to so something like:

   ModPagespeedMapOriginDomain localhost https://uat2.example.com

However it would be preferable to access the files over HTTP and not have the Apache server read from NFS.  The problem with NFS is that it presents a blocking filesystem interface to servers that need to be responsive to external requests.  At scale that spells trouble.

-Josh


Kumaresh

unread,
Dec 2, 2014, 8:25:35 AM12/2/14
to mod-pagesp...@googlegroups.com
Josh,

I'm bit confused here.

Here comes the pain  point from My end.

Our site works only on https and any request got to http will be forwarded to https port

Here our http port is 90 and https port is 453 ( It's not using higher port, please don't worry about it :) )

So As per your recommendation, I added the following to configuration:-

   ModPagespeedMapOriginDomain localhost:90 https://uat2.example.com   

(I think localhost:453 doesn't work as it is not https supported)

So This request will have to go to http and then it will forward to https.

After I added the above , I found there are no error in configuration, but here is how the access logs looks:-

Access Logs(HTTP)

"-" - - [02/Dec/2014:15:42:44 +1100] "GET /media/javascript/dm/jquery-1.3.2.min.js?18 HTTP/1.1" 301 281 494 "Serf/1.1.0 mod_pagespeed/1.8.31.4-4056"


Kindly consider the above and help us to provide a recommendation based on it.





Kumaresh


Kumaresh

unread,
Dec 2, 2014, 8:25:35 AM12/2/14
to mod-pagesp...@googlegroups.com
Hi Josh,

Further to that, I've disabled the LoadFromFile from PROD and Here is the latest Statistics I've attached.

Still I see there are a lot of fetch failures(although there are no load from stat failures). and there are a lot of serf failures.

Our Website is an ecommerce one, and constantly being hit by millions of http requests, so any effort in improving performance is a bonus, so looking for some help to get this improved.

Thank you!



Kumaresh

pagespeed+stats_03122014.txt

Joshua Marantz

unread,
Dec 2, 2014, 8:33:00 AM12/2/14
to mod-pagespeed-discuss
Sorry, my recommendations are made based on only the information I have about your setup :)

Can you clarify something for me?  Are the files physically on a hard-disk in the same box as the web server?  Or are they on a different box?

If they are on the same physical box as the web server, then I think you should go back to using LoadFromFile but change the path to reflect the local mount, and not the NFS mount you were using earlier.

If they are on a different physical box in the same intranet, and you have to access it via HTTPS, then you can enable HTTPS fetching in mod_pagespeed.  See https://developers.google.com/speed/pagespeed/module/https_support.

-Josh

Kumaresh

unread,
Dec 2, 2014, 8:48:17 AM12/2/14
to mod-pagesp...@googlegroups.com

Hi

Here is the overview of our architecture

2 Apache servers say wbp1005 and wbp1006,running on 4 core virtual machines
Apache is installed locally on each virtual machine
All the media or static files are located in an nfs drive say media1301and the drive is mounted to Apache as a file system on both the Apache servers
So this way  any media configurations or static file  changes can be done in one place instead of doing this locally in both Apache servers.

Pls let me know if you need more details

Joshua Marantz

unread,
Dec 2, 2014, 8:54:49 AM12/2/14
to mod-pagespeed-discuss
Sounds like you should get those files over HTTP or HTTPS depending on your requirements.  Please read https://developers.google.com/speed/pagespeed/module/https_support to set up HTTPS fetching from mod_pagespeed, if you can't set up a listener on an HTTP port that can access those files.

I just noticed as well that you put your PageSpeed file-cache on NFS:
ModPagespeedFileCachePath "/applic/apache/httpd/logs/dm/website/mod_pagespeed/cache/"
That's not going to work well.  Please use a local hard drive for the file-cache.  It's OK for your two serves to have separate caches.  Or you can configure memcached as described in our doc.  But don't share cache over NFS.

-Josh


Kumaresh

unread,
Dec 2, 2014, 9:11:59 AM12/2/14
to mod-pagesp...@googlegroups.com

Hi

I'm sorry for the confusion

/applic/Apache is local file system and /applic/media is nfs file system

So the cache is still in local file system.

Do you recommend I should allow localhost on http port to allow local fetch? If you think that's the right approach I can work on redirect rules preventing 301s to https just for pagespeed request alone as currently our entire site runs on https only and any http request will be 301to the load balancer url https://example. Com in https.

Looking forward for your optimised recommendation and tomorrow morning I will send you the error logs from Apache to see what kind of error logs we are getting now after  removing the load from file parameter.
The problem is after pagespeed implementation the error log size grown to 1gb a day previously it's not even 10mbs

Thank you so much for helping

Joshua Marantz

unread,
Dec 2, 2014, 9:13:48 AM12/2/14
to mod-pagespeed-discuss
Yes I think it's preferable to do the loopback fetches over HTTP, so avoiding the 301s would be great.

Note that you can specify a subdirectory in the MapOriginDomain directive if that helps.

-Josh


Kumaresh

unread,
Dec 2, 2014, 10:34:53 PM12/2/14
to mod-pagesp...@googlegroups.com
Hi Josh,

After the loadfromfile removal, I'm getting different set of errors appearing in the logs, I summarize it here and attached the logs for ur reference:

Any help in tuning pagespeed and improving performance is much appreciated, as our site usually gets rough number of hit by users everyday. Thank you!



On HTTP Error Logs
1.
[Wed Dec 03 01:32:04 2014] [warn] [mod_pagespeed 1.8.31.4-4056 @19867] Invalid value for PageSpeed: png (should be on, off, unplugged, or noscript)
[Wed Dec 03 01:32:04 2014] [warn] [mod_pagespeed 1.8.31.4-4056 @19867] Invalid PageSpeed query params or headers for request https://www.example.com/media/DM/Product/128x160/727507_0_9999_sml_v1_m56577569854690977.png. Serving with default options.

2.

[Wed Dec 03 01:31:19 2014] [warn] [mod_pagespeed 1.8.31.4-4056 @19867] 342477_0_9999_sml_v1_m56577569854691577.png:0: Resource based on https://www.example.com/media/DM/Product/128x160/342477_0_9999_sml_v1_m56577569854691577.png but cannot access the original

3. Serf Connection failures

[Wed Dec 03 01:32:27 2014] [error] [mod_pagespeed 1.8.31.4-4056 @19867] https://media/DM/Media/static/sign-up-drop-left.png:0: Error status=670002 (Name or service not known) serf_connection_create2

(Seems it doesnt pick up Request referrer name from header)

SSL Error Logs
=============

[Wed Dec 03 11:28:52 2014] [error] [client 10.168.9.149] request failed: error reading the headers


Kumaresh


http-error.txt
https-error.txt
pagespeed.conf

Kumaresh S

unread,
Dec 4, 2014, 6:43:00 AM12/4/14
to mod-pagesp...@googlegroups.com
Hi Josh,

Even after removing LoadfromFile parameter, I'm still getting the below error in PROD? Can you please shed some light?

terminate called after throwing an instance of 'std::logic_error'
  what():  basic_string::_S_construct NULL not valid


On Wednesday, December 3, 2014 2:34:53 PM UTC+11, Kumaresh S wrote:
Hi Josh,

After the loadfromfile removal, I'm getting different set of errors appearing in the logs, I summarize it here and attached the logs for ur reference:

Any help in tuning pagespeed and improving performance is much appreciated, as our site usually gets rough number of hit by users everyday. Thank you!



On HTTP Error Logs
1.
[Wed Dec 03 01:32:04 2014] [warn] [mod_pagespeed 1.8.31.4-4056 @19867] Invalid value for PageSpeed: png (should be on, off, unplugged, or noscript)
[Wed Dec 03 01:32:04 2014] [warn] [mod_pagespeed 1.8.31.4-4056 @19867] Invalid PageSpeed query params or headers for request https://www.example.com/media/DM/Product/128x160/727507_0_9999_sml_v1_m56577569854690977.png. Serving with default options.

2.

[Wed Dec 03 01:31:19 2014] [warn] [mod_pagespeed 1.8.31.4-4056 @19867] 342477_0_9999_sml_v1_m56577569854691577.png:0: Resource based on https://www.example.com/media/DM/Product/128x160/342477_0_9999_sml_v1_m56577569854691577.png but cannot access the original

3. Serf Connection failures

[Wed Dec 03 01:32:27 2014] [error] [mod_pagespeed 1.8.31.4-4056 @19867] https://media/DM/Media/static/sign-up-drop-left.png:0: Error status=670002 (Name or service not known) serf_connection_create2

(Seems it doesnt pick up Request referrer name from header)

SSL Error Logs
=============

[Wed Dec 03 11:28:52 2014] [error] [client 10.168.9.149] request failed: error reading the headers


Kumaresh


On Wed, Dec 3, 2014 at 1:13 AM, 'Joshua Marantz' via mod-pagespeed-discuss <mod-pagespeed-discuss@googlegroups.com> wrote:
Yes I think it's preferable to do the loopback fetches over HTTP, so avoiding the 301s would be great.

Note that you can specify a subdirectory in the MapOriginDomain directive if that helps.

-Josh

On Tue, Dec 2, 2014 at 9:06 AM, Kumaresh <skumares...@gmail.com> wrote:

Hi

I'm sorry for the confusion

/applic/Apache is local file system and /applic/media is nfs file system

So the cache is still in local file system.

Do you recommend I should allow localhost on http port to allow local fetch? If you think that's the right approach I can work on redirect rules preventing 301s to https just for pagespeed request alone as currently our entire site runs on https only and any http request will be 301to the load balancer url https://example. Com in https.

Looking forward for your optimised recommendation and tomorrow morning I will send you the error logs from Apache to see what kind of error logs we are getting now after  removing the load from file parameter.
The problem is after pagespeed implementation the error log size grown to 1gb a day previously it's not even 10mbs

Thank you so much for helping


Kumaresh



Kumaresh



Kumaresh


To unsubscribe from this group and stop receiving emails from it, send an email to mod-pagespeed-discuss+unsub...@googlegroups.com.

--
You received this message because you are subscribed to a topic in the Google Groups "mod-pagespeed-discuss" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/mod-pagespeed-discuss/mO2pNg4bM40/unsubscribe.
To unsubscribe from this group and all its topics, send an email to mod-pagespeed-discuss+unsub...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "mod-pagespeed-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mod-pagespeed-discuss+unsub...@googlegroups.com.

--
You received this message because you are subscribed to a topic in the Google Groups "mod-pagespeed-discuss" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/mod-pagespeed-discuss/mO2pNg4bM40/unsubscribe.
To unsubscribe from this group and all its topics, send an email to mod-pagespeed-discuss+unsub...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "mod-pagespeed-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mod-pagespeed-discuss+unsub...@googlegroups.com.

--
You received this message because you are subscribed to a topic in the Google Groups "mod-pagespeed-discuss" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/mod-pagespeed-discuss/mO2pNg4bM40/unsubscribe.
To unsubscribe from this group and all its topics, send an email to mod-pagespeed-discuss+unsub...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "mod-pagespeed-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mod-pagespeed-discuss+unsub...@googlegroups.com.

--
You received this message because you are subscribed to a topic in the Google Groups "mod-pagespeed-discuss" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/mod-pagespeed-discuss/mO2pNg4bM40/unsubscribe.
To unsubscribe from this group and all its topics, send an email to mod-pagespeed-discuss+unsub...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "mod-pagespeed-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mod-pagespeed-discuss+unsub...@googlegroups.com.

--
You received this message because you are subscribed to a topic in the Google Groups "mod-pagespeed-discuss" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/mod-pagespeed-discuss/mO2pNg4bM40/unsubscribe.
To unsubscribe from this group and all its topics, send an email to mod-pagespeed-discuss+unsub...@googlegroups.com.

Joshua Marantz

unread,
Dec 4, 2014, 8:49:00 AM12/4/14
to mod-pagespeed-discuss
My suspicion is that this is caused by running out of memory, which can happen if Apache spins up too many processes.  You should mitigate this by reducing the maximum number of child processes you allow Apache to spawn.

-Josh


To unsubscribe from this group and stop receiving emails from it, send an email to mod-pagespeed-di...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/mod-pagespeed-discuss/b782daa3-eb2f-49c7-ac6a-45fa76d2ca66%40googlegroups.com.

Kumaresh

unread,
Dec 5, 2014, 8:06:46 AM12/5/14
to mod-pagesp...@googlegroups.com
Hey Josh,

I performed capacity analysis and memory perfectly looks okay. This behaviour happened again today and I'm not sure what is triggering/ causing this and this is creating huge business impact to us, and I'm not sure how we can continue with this state for christmas trade :(

Kumaresh


To unsubscribe from this group and all its topics, send an email to mod-pagespeed-di...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/mod-pagespeed-discuss/CAGKR%2BEDxeWhqaU5Bkf%3DoKUkcwjv-p88gjWqhgDjhHq5P8yF0wA%40mail.gmail.com.

Joshua Marantz

unread,
Dec 5, 2014, 8:08:13 AM12/5/14
to mod-pagespeed-discuss
It sounds like this is important to fix.  How is it going with removing the redirect for static resources requested by PageSpeed?

-Josh


Kumaresh S

unread,
Dec 17, 2014, 12:47:28 AM12/17/14
to mod-pagesp...@googlegroups.com
Hi Josh,

I fixed the redirects yesterday in PROD and local fetching using HTTP is working fine.

However one of our webserver stopped working today. Today the symptom is not busy worker but the webserver stopped serving traffic.

This is our PROD environment, and we are skeptical about the apache performance after performance and we are trading period. Can you please treat this as priority as assist me on how to fix this issue?

I've attached the logs and as per my analysis from our monitoring system, this issue started after 03:27 PM, however restarts were carried out around 04:11 PM

Kumaresh



Kumaresh



Kumaresh



Kumaresh



Kumaresh


To unsubscribe from this group and stop receiving emails from it, send an email to mod-pagespeed-discuss+unsubscri...@googlegroups.com.

--
You received this message because you are subscribed to a topic in the Google Groups "mod-pagespeed-discuss" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/mod-pagespeed-discuss/mO2pNg4bM40/unsubscribe.
To unsubscribe from this group and all its topics, send an email to mod-pagespeed-discuss+unsubscri...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "mod-pagespeed-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mod-pagespeed-discuss+unsubscri...@googlegroups.com.

--
You received this message because you are subscribed to a topic in the Google Groups "mod-pagespeed-discuss" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/mod-pagespeed-discuss/mO2pNg4bM40/unsubscribe.
To unsubscribe from this group and all its topics, send an email to mod-pagespeed-discuss+unsubscri...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "mod-pagespeed-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mod-pagespeed-discuss+unsubscri...@googlegroups.com.

--
You received this message because you are subscribed to a topic in the Google Groups "mod-pagespeed-discuss" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/mod-pagespeed-discuss/mO2pNg4bM40/unsubscribe.
To unsubscribe from this group and all its topics, send an email to mod-pagespeed-discuss+unsubscri...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "mod-pagespeed-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mod-pagespeed-discuss+unsubscri...@googlegroups.com.

--
You received this message because you are subscribed to a topic in the Google Groups "mod-pagespeed-discuss" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/mod-pagespeed-discuss/mO2pNg4bM40/unsubscribe.
To unsubscribe from this group and all its topics, send an email to mod-pagespeed-discuss+unsubscri...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "mod-pagespeed-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mod-pagespeed-discuss+unsubscri...@googlegroups.com.

--
You received this message because you are subscribed to a topic in the Google Groups "mod-pagespeed-discuss" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/mod-pagespeed-discuss/mO2pNg4bM40/unsubscribe.
To unsubscribe from this group and all its topics, send an email to mod-pagespeed-discuss+unsubscri...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "mod-pagespeed-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mod-pagespeed-discuss+unsub...@googlegroups.com.

--
You received this message because you are subscribed to a topic in the Google Groups "mod-pagespeed-discuss" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/mod-pagespeed-discuss/mO2pNg4bM40/unsubscribe.
To unsubscribe from this group and all its topics, send an email to mod-pagespeed-discuss+unsub...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "mod-pagespeed-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mod-pagespeed-discuss+unsub...@googlegroups.com.
error-dm-website_2014-12-17 - community case.log

Joshua Marantz

unread,
Dec 17, 2014, 9:36:18 AM12/17/14
to mod-pagespeed-discuss
I didn't get much from your logs, but let's follow up on your other thread about the server being too busy to rewrite images.

To unsubscribe from this group and stop receiving emails from it, send an email to mod-pagespeed-di...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/mod-pagespeed-discuss/0a85ab38-c720-4837-ab0f-5f582a2bcf59%40googlegroups.com.

Kumaresh S

unread,
Dec 17, 2014, 6:05:58 PM12/17/14
to mod-pagesp...@googlegroups.com
Hi

In my opinion the logs indicate the issue happened after fetch timed out can you please reevaluate.

And do you know server being too busy may cause rewrite images?

Kumaresh


--
You received this message because you are subscribed to a topic in the Google Groups "mod-pagespeed-discuss" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/mod-pagespeed-discuss/mO2pNg4bM40/unsubscribe.
To unsubscribe from this group and all its topics, send an email to mod-pagespeed-discuss+unsubscri...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "mod-pagespeed-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mod-pagespeed-discuss+unsubscri...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "mod-pagespeed-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mod-pagespeed-discuss+unsub...@googlegroups.com.

Joshua Marantz

unread,
Dec 17, 2014, 10:28:24 PM12/17/14
to mod-pagespeed-discuss
Generally if a server is overloaded in CPU, you will start getting timeouts.

Mostly the reason we would overload a server is when doing too many image rewrites at once.  There are two things to configure differently to provide relief:
  • Try to get the system to optimize fewer images
  • Prevent the system from optimizing too many images at once
Getting the system to optimize fewer images:
The system may need to optimize lots of images when it's coming online for the first time, when new images are added to the site, or if there are a lot of images with the same origin cache TTL that expire at the same time.

This can also be caused by having a cache that's too small.

There are a lot of graphs and statistics and charts in YOURSITE/pagespeed_admin, especially if you are using 1.9, which can help you get a sense for what mod_pagespeed is doing.

Get the system to optimize fewer images at once:
There are three pagespeed.conf settings to adjust:
ModPagespeedImageMaxRewritesAtOnce NumImages
NumExpensiveRewriteThreads NumThreads
NumRewriteThreads NumThreads
See:


for details.  On the other thread you said that you had 4 virtual cores.  You didn't say how much memory you have, or how many other users might be sharing those cores.

Probably you want to reduce all of these numbers from their defaults, which I believe for Worker MPM are 8, 4 and 4 (in the order specified above).  Try 2, 1, 4 to start.

The effect of reducing these numbers is that it might take longer for new images to get optimized on your system, but your system might stay healthier.

Hope this helps
-Josh




To unsubscribe from this group and stop receiving emails from it, send an email to mod-pagespeed-di...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/mod-pagespeed-discuss/e019aefc-0b01-4236-baff-d6353014e1c8%40googlegroups.com.

Kumaresh S

unread,
Dec 18, 2014, 5:51:24 PM12/18/14
to mod-pagesp...@googlegroups.com
HI Josh,

I don't see the CPU is used too much and not even close to 70%, but this error is kept coming and image rewrites continued to exist. I've attached the information taken from Pagespeed_admin Console from one server.

We have 3 webservers and a CDN that is sitting infront of webservers.

Please analyse them and advise me to whether to proceed with tuning or anything else to checked. Note: We use mod_pagespeed 1.8.31.4-4056
...
statistics.txt
Too_Busy_to_Write.txt
configuration.txt
PageSpeed Console.pdf

Joshua Marantz

unread,
Dec 18, 2014, 6:11:27 PM12/18/14
to mod-pagespeed-discuss
I suggest:

ModPagespeedImageMaxRewritesAtOnce 2
ModPagespeedNumExpensiveRewriteThreads 1

-Josh

To unsubscribe from this group and stop receiving emails from it, send an email to mod-pagespeed-di...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/mod-pagespeed-discuss/826ae607-b036-46f5-8785-fbc930f77bb1%40googlegroups.com.

Kumaresh

unread,
Dec 23, 2014, 9:32:02 AM12/23/14
to mod-pagesp...@googlegroups.com
Thanks Josh!
But before making this change, I wanted to understand the possible root cause for this behavior.
I spent several hours on analysing capacity and I don't see a problem in CPU utilization or memory.

Can you please hint out other possibilities for this behavior?  

As I have mentioned in the past our infrastructure looks like below:-

1. CDN
2. F5
3.3 Apache Webservers(Virtual, 4 cores, 8GB )- Apache running on local disk, and static images are stored in NFS System shared between all the 3 apache webservers.
4. Application Servers cluster(All Virtual)



Kumaresh


To unsubscribe from this group and all its topics, send an email to mod-pagespeed-di...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/mod-pagespeed-discuss/CAGKR%2BEB3%3DwSXbgm2XSciRMwvsz-Eh2hX8g%3D0Rd67%2B9sMusfJbg%40mail.gmail.com.

Joshua Marantz

unread,
Dec 23, 2014, 9:48:02 AM12/23/14
to mod-pagespeed-discuss
In the steady state you might have enough CPU and memory to take your load with mod_pagespeed running.  But if mod_pagespeed learns it needs to optimize (or re-optimize) 100 new images, you don't want it to do all of them at once because the load spike can destabilize Apache.  If mod_pagespeed tries to do more concurrent image optimizations than you have physical cores, then they'll take longer as the OS time-slices each CPU.  During this time, Apache will see that your processes are slow, and depending on how you have it configured, will start spawning more processes to handle incoming requests (rather than sending 500s to clients).  This assumption that Apache is making is fine for situations where its existing child processes are I/O bound.  But image optimization is compute-bound.  And (if we don't throttle), mod_pagespeed might just wake up and do more image optimizations with the new processes that Apache starts.

So the threading parameters I suggested will help smooth the load over time, and avoid swamping your machine with a spike in image-optimization requests.

-Josh

Kumaresh

unread,
Dec 23, 2014, 1:53:32 PM12/23/14
to mod-pagesp...@googlegroups.com

Thanks Josh!  Your reply is certainly convincing.  Can I ask you to list me what other configurations that requires tuning for a large scale website like ours. So I can push changes together.?
Thanks heaps!

Joshua Marantz

unread,
Dec 23, 2014, 2:35:22 PM12/23/14
to mod-pagespeed-discuss
That's hard to say.  It certainly seems worthwhile to scan over our docs to see what looks relevant to you and your site.  It does seem like you might benefit from adding hardware if your site is that large.

I would also recommend making changes one at a time so you can see which change had which effect, rather than batching changes together.

-Josh


Kumaresh S

unread,
Feb 21, 2015, 8:59:47 PM2/21/15
to mod-pagesp...@googlegroups.com
Hi

I've implemented tuning of threads but still I'm getting "Too busy to rewrite image." messages in Pagespeed logs. Please suggest how this can be fixed?
...

Kumaresh S

unread,
Mar 3, 2015, 12:36:33 AM3/3/15
to mod-pagesp...@googlegroups.com
Any updates?
...

Joshua Marantz

unread,
Mar 3, 2015, 12:46:26 AM3/3/15
to mod-pagespeed-discuss
Sorry for the delay in response.

Can you clarify whether there is a negative symptom other than seeing messages in the log?  I know that earlier you were getting allocation failures, which is bad because it will cause requests to be dropped by Apache.  But I'm hoping your tuning has eliminated that symptom.

A log message saying "too busy to rewrite images" is not necessarily wrong if you have a burst of image-rewriting load that exceeds the capacity of your allocated CPUs.  Given that you only have one VM with limited CPU capacity, the aggregate load of your clients across different client subprocesses is the root cause of this issue.  The rewrites will be dropped by mod_pagespeed to keep the system healthy, and they should be retried again later.  Once all your images are optimized and in cache, the system should yield fully optimized web pages until the cache entries expire and/or the cache fills up and images are evicted.

-Josh

--
You received this message because you are subscribed to the Google Groups "mod-pagespeed-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mod-pagespeed-di...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/mod-pagespeed-discuss/9032bc28-92e4-40ba-9da8-1a20588f065b%40googlegroups.com.

Kumaresh

unread,
Mar 3, 2015, 6:24:52 PM3/3/15
to mod-pagesp...@googlegroups.com
Thanks for the response Josh!

I've not seen negative symptom since the change(10 days ago). I will monitor and keep you posted if there is a problem happens again.

Kumaresh


--
You received this message because you are subscribed to a topic in the Google Groups "mod-pagespeed-discuss" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/mod-pagespeed-discuss/mO2pNg4bM40/unsubscribe.
To unsubscribe from this group and all its topics, send an email to mod-pagespeed-di...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/mod-pagespeed-discuss/CAGKR%2BECbv-MSx3A9dcvs_SwLmS45szcCfbfPb%3DaF0JiKeZDfzg%40mail.gmail.com.

Kumaresh S

unread,
Mar 18, 2015, 8:37:14 PM3/18/15
to mod-pagesp...@googlegroups.com
Hi Josh,

Unfortunately the error appeared again and one of our webserver stopped responding. The following is the error:-

Although "terminate called after throwing an instance of 'std::logic_error'" happens often in the error logs but webserver stopped responding always co-relates with "Fetch timed out" error.

Please help us on fixing the issue permanently as all the tuning exercise 

terminate called after throwing an instance of 'std::logic_error'
what(): basic_string::_S_construct NULL not valid
terminate called after throwing an instance of 'std::logic_error'
what(): basic_string::_S_construct NULL not valid
[Tue Mar 17 14:05:45 2015] [notice] child pid 18198 exit signal Aborted (6)
[Tue Mar 17 14:05:45 2015] [notice] child pid 18258 exit signal Aborted (6)
terminate called after throwing an instance of 'std::logic_error'
what(): basic_string::_S_construct NULL not valid
[Tue Mar 17 14:05:47 2015] [notice] child pid 18370 exit signal Aborted (6)
[Tue Mar 17 14:06:01 2015] [warn] [mod_pagespeed 1.8.31.4-4056 @18429] Fetch timed out for https://www.example.com.au/media/ex/Product/128x160/128x160x916746_0_9999_sml_v1_m56577569854679787.png.pagespeed.ic.xsi5oS7Ksk.png
[Tue Mar 17 14:06:01 2015] [warn] [mod_pagespeed 1.8.31.4-4056 @18429] https://www.example.com.au/media/ex/Product/128x160/128x160x916746_0_9999_sml_v1_m56577569854679787.png.pagespeed.ic.xsi5oS7Ksk.png resource_404_count: not found (404)

On Wednesday, March 4, 2015 at 10:24:52 AM UTC+11, Kumaresh S wrote:
Thanks for the response Josh!

I've not seen negative symptom since the change(10 days ago). I will monitor and keep you posted if there is a problem happens again.

Kumaresh


To unsubscribe from this group and stop receiving emails from it, send an email to mod-pagespeed-discuss+unsub...@googlegroups.com.

--
You received this message because you are subscribed to a topic in the Google Groups "mod-pagespeed-discuss" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/mod-pagespeed-discuss/mO2pNg4bM40/unsubscribe.
To unsubscribe from this group and all its topics, send an email to mod-pagespeed-discuss+unsub...@googlegroups.com.

Kumaresh S

unread,
Mar 23, 2015, 7:06:23 AM3/23/15
to mod-pagesp...@googlegroups.com
Hi Josh,
Any updates? This issue has happened again
...

Joshua Marantz

unread,
Mar 23, 2015, 10:01:35 AM3/23/15
to mod-pagespeed-discuss
I suspect that the 'std::logic_error' messages are a result of running out of memory in your server.  This may happen if Apache spins up too many processes.  Can you tell me more about your Apache setup?  Which MPM?  How much memory is available to your VMs?  What are your settings for the maximum number of child processes that Apache can start up?

-Josh


--
You received this message because you are subscribed to the Google Groups "mod-pagespeed-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mod-pagespeed-di...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/mod-pagespeed-discuss/164cf8e1-a183-4f36-a810-c3640bfe0243%40googlegroups.com.

Kumaresh S

unread,
Mar 23, 2015, 11:58:23 PM3/23/15
to mod-pagesp...@googlegroups.com
Hi Josh,

We use worker MPM, here is the thread configuration

<IfModule worker.c>
ServerLimit          100
StartServers          2
MaxClients          5000
MinSpareThreads      50
MaxSpareThreads      50 
ThreadsPerChild      50
MaxRequestsPerChild   0
</IfModule>


VM Memory: 8 GB
[jbossadm@ncdlmorwbp1008 ~]$ free -m
             total       used       free     shared    buffers     cached
Mem:          7872       7712        160          0        569       4597
-/+ buffers/cache:       2545       5326
Swap:         8015          3       8012

CPU: 4 Cores


Also importantly we had another issue in which Fetch Timeout happened around 11 AM on last sunday(22nd March) and we did not get any monitoring alerts (I believe our monitoring server missed to alert) and the server went unresponsible for a while and when we checked yesterday evening, apache had 100,000+ open files count and then VM itself went unresponse. 
I'm sure this can be co-related to Fetch Time out problem but need your help to get this fixed.

Joshua Marantz

unread,
Mar 24, 2015, 11:01:52 AM3/24/15
to mod-pagespeed-discuss
I don't know all the other settings that are relevant in your system.  But it seems likely that the system is very sensitive to load-spikes and we should tune mod_pagespeed to optimize less aggressively to avoid inducing load-spikes.

Really it would be a lot better if mod_pagespeed did this in a more automated way, but that's not something we're going to be able to fix quickly.

Scanning back over the threads I saw I suggested this:

ModPagespeedImageMaxRewritesAtOnce 2
ModPagespeedNumExpensiveRewriteThreads 1

Did you do that?  If not, please do it.  If so, can you decrease MaxRewritesAtOnce to 1?

There may be other parameters to tune, but let's try those first.
-Josh

--
You received this message because you are subscribed to the Google Groups "mod-pagespeed-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mod-pagespeed-di...@googlegroups.com.

Kumaresh

unread,
Mar 24, 2015, 7:26:53 PM3/24/15
to mod-pagesp...@googlegroups.com
HI Josh,

We have already made the recommended settings as below:-

 ModPagespeedImageMaxRewritesAtOnce      2
ModPagespeedNumExpensiveRewriteThreads 1

Are you recommending to change the     ModPagespeedImageMaxRewritesAtOnce from 2 to 1? And what other recommendations you have ? This has become everyday issue now. Please help us to get this fixed.



Kumaresh


--
You received this message because you are subscribed to a topic in the Google Groups "mod-pagespeed-discuss" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/mod-pagespeed-discuss/mO2pNg4bM40/unsubscribe.
To unsubscribe from this group and all its topics, send an email to mod-pagespeed-di...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/mod-pagespeed-discuss/CAGKR%2BECrd%3DDsgsaGZ%3DqZAXmYeuUiqbM1zufp%2BMW0M6bFQSDkdw%40mail.gmail.com.

Joshua Marantz

unread,
Mar 24, 2015, 8:52:43 PM3/24/15
to mod-pagespeed-discuss
Yes, I'm  recommending to change the ModPagespeedImageMaxRewritesAtOnce from 2 to 1.  I'm trying to minimize the amount of CPU that can be consumed by mod_pagespeed and still make progress optimizing your site.   My theory is that your system is underpowered for the load (including image-optimizations), and you need to trade off how quickly the site can optimize new content to avoid stability issues. Once you have set all related MPS settings to 1 I'm not sure how much more MPS-specific you can do, so we better look at those Apache directives.

To be honest, I'm not an expert in tuning Apache.  From http://httpd.apache.org/docs/2.2/mod/mpm_common.html#serverlimit :

Special care must be taken when using this directive. If ServerLimit is set to a value much higher than necessary, extra, unused shared memory will be allocated. If bothServerLimit and MaxClients are set to values higher than the system can handle, Apache may not start or the system may become unstable.

This sounds like what might be happening, and it makes sense to me.  I'd start cranking some of these numbers down slowly until you can achieve stability.

One common approach is to put Varnish between Apache and the outside world.  If you let Varnish just handle the cacheable resources that should help improve your throughput.  For the most part, Apache will only need to handle HTML traffic.  However I see that you are using a CDN, so this may not be necessary.  Can you describe your CDN setup in more detail?

-Josh 

Kumaresh

unread,
Mar 24, 2015, 9:10:10 PM3/24/15
to mod-pagesp...@googlegroups.com
HI Josh,

CDN Configuration is :-

CNAME record is set to CDN Provider DNS name for www.example.com and CDN is configured to connect to Apache via Front End Load Balancers.

CDN will crawl the images from apache as and when requested for the first time and store it in CDN Storage. Future requests coming for media folders will be served by CDN, and anything dynamic will pass throughed to Apache via CDN Network.



Kumaresh


Joshua Marantz

unread,
Mar 24, 2015, 9:13:52 PM3/24/15
to mod-pagespeed-discuss
So you should be getting close to zero traffic for static resources, and serving only HTML.  Is that right?

One other question: what is the cache lifetime you set for your static resources?  mod_pagespeed of course will cache-extend them by signing the URLs and serve them for with a 1-year TTL.  But your system load is definitely affected by how often mod_pagespeed must re-fetch and (potentially) re-optimize resources.

-Josh

Joshua Marantz

unread,
Mar 25, 2015, 8:50:30 AM3/25/15
to mod-pagespeed-discuss
More thoughts: earlier in the thread I think you indicated you were using mod_pagespeed 1.8.  You should upgrade to 1.9 because I believe we have added some http-fetch throttling in that version.  1.9 is both our stable and our beta branch, so 1.8 is history at this point.

Also earlier in the thread, you were using ModPagespeedLoadFromFile, but the file-system was an NFS mount.  Your distribution strategy for 3 web servers is to have one shared file server where the content lives.  Would you consider replicating the files on each of your 3 servers via rdist or some other mechanism?  Then you could use ModpagespeedLoadFromFile more effectively.

-Josh
Reply all
Reply to author
Forward
0 new messages