Nginx Worker 100% CPU

463 views
Skip to first unread message

u...@zoey.com

unread,
Feb 11, 2017, 2:14:00 PM2/11/17
to ngx-pagespeed-discuss
Hello,

We recently upgraded about 800 servers to Nginx Pagespeed v 1.11.33.4 with Nginx 1.10.3 - it seems that on some servers nginx gets stuck in a loop where 100% of the CPU is used and is never released. I've searched the archives and have sene some mentions of this but nothing that I found helpful to my problem. I've attached the backtrace of the process as well as some other information. Please let me know what more I can share to help debug this. I will note that on some servers this was happening very frequently and we rolled back the kernel from  4.4.0-62-generic (Ubuntu) to  4.2.0-27-generic (Ubuntu)  and they totally went away. As you can see this server is running 3.19.0-79 - but we cannot roll this kernel back due to bug fixes - we have other servers running 3.19.0-25-generic that do not exhibit this problem at all. I am not saying it is a Kernel issue but it is the only thing I have found that makes this "go away". For reference, all servers are running the same version of NginX/Pagespeed.

If you need me to post any configuration files please let me know.

Thanks,
Uri


root@xx lsb_release  -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 14.04.3 LTS
Release:        14.04
Codename:       trusty

root@xx# uname -r
3.19.0-79-generic

root@xx:~# nginx -V
nginx version: nginx/1.10.3
built by gcc 4.8.4 (Ubuntu 4.8.4-2ubuntu1~14.04.1)
built with OpenSSL 1.0.2j  26 Sep 2016
TLS SNI support enabled
configure arguments: --prefix=/etc/nginx --sbin-path=/usr/sbin/nginx --conf-path=/etc/nginx/nginx.conf --error-log-path=/var/log/nginx/error.log --http-log-path=/var/log/nginx/access.log --pid-path=/var/run/nginx.pid --lock-path=/var/run/nginx.lock --http-client-body-temp-path=/var/cache/nginx/client_temp --http-proxy-temp-path=/var/cache/nginx/proxy_temp --http-fastcgi-temp-path=/var/cache/nginx/fastcgi_temp --http-uwsgi-temp-path=/var/cache/nginx/uwsgi_temp --http-scgi-temp-path=/var/cache/nginx/scgi_temp --user=nginx --group=nginx --with-http_ssl_module --with-http_realip_module --with-http_addition_module --with-http_sub_module --with-http_dav_module --with-http_flv_module --with-http_mp4_module --with-http_gunzip_module --with-http_gzip_static_module --with-http_random_index_module --with-http_secure_link_module --with-http_stub_status_module --with-http_auth_request_module --with-mail --with-mail_ssl_module --with-file-aio --with-http_v2_module --with-cc-opt='-g -O2 -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security' --with-ld-opt='-Wl,-Bsymbolic-functions -Wl,-z,relro' --add-module=/root/ngx_pagespeed-release-1.11.33.4-beta --with-ipv6


(gdb) bt
#0  0x00007fe4e9a4135d in write () at ../sysdeps/unix/syscall-template.S:81
#1  0x00000000004c1d92 in net_instaweb::NgxEventConnection::WriteEvent (this=0x20d1100, type=<optimized out>, sender=sender@entry=0x21e8a30) at /root/ngx_pagespeed-release-1.11.33.4-beta/src/ngx_event_connection.cc:142
#2  0x00000000004c124c in net_instaweb::NgxBaseFetch::RequestCollection (this=0x21e8a30, type=<optimized out>) at /root/ngx_pagespeed-release-1.11.33.4-beta/src/ngx_base_fetch.cc:290
#3  0x00000000004c12ae in RequestCollection (type=70 'F', this=<optimized out>) at /root/ngx_pagespeed-release-1.11.33.4-beta/src/ngx_base_fetch.cc:317
#4  net_instaweb::NgxBaseFetch::HandleFlush (this=<optimized out>, handler=<optimized out>) at /root/ngx_pagespeed-release-1.11.33.4-beta/src/ngx_base_fetch.cc:318
#5  0x00000000004c6fc2 in net_instaweb::(anonymous namespace)::ps_send_to_pagespeed (in=in@entry=0x2228470, cfg_s=<optimized out>, ctx=<optimized out>, ctx=<optimized out>, r=<optimized out>)
    at /root/ngx_pagespeed-release-1.11.33.4-beta/src/ngx_pagespeed.cc:2087
#6  0x00000000004c7147 in net_instaweb::(anonymous namespace)::html_rewrite::ps_html_rewrite_body_filter (r=<optimized out>, in=0x2228470) at /root/ngx_pagespeed-release-1.11.33.4-beta/src/ngx_pagespeed.cc:2309
#7  0x0000000000481753 in ngx_http_ssi_body_filter (r=0x21dd6e0, in=<optimized out>) at src/http/modules/ngx_http_ssi_filter_module.c:447
#8  0x00000000004846b7 in ngx_http_charset_body_filter (r=0x21dd6e0, in=0x2228470) at src/http/modules/ngx_http_charset_filter_module.c:647
#9  0x0000000000486787 in ngx_http_addition_body_filter (r=0x21dd6e0, in=0x2228470) at src/http/modules/ngx_http_addition_filter_module.c:166
#10 0x0000000000486dfc in ngx_http_gunzip_body_filter (r=0x21dd6e0, in=0x2228470) at src/http/modules/ngx_http_gunzip_filter_module.c:185
#11 0x000000000042dae7 in ngx_output_chain (ctx=ctx@entry=0x2214f70, in=in@entry=0x21cf640) at src/core/ngx_output_chain.c:214
#12 0x0000000000488d5c in ngx_http_copy_filter (r=0x21dd6e0, in=0x21cf640) at src/http/ngx_http_copy_filter_module.c:152
#13 0x000000000045c88b in ngx_http_output_filter (r=r@entry=0x21dd6e0, in=<optimized out>) at src/http/ngx_http_core_module.c:1970
#14 0x0000000000470489 in ngx_http_upstream_output_filter (data=0x21dd6e0, chain=<optimized out>) at src/http/ngx_http_upstream.c:3587
#15 0x00000000004479c3 in ngx_event_pipe_write_to_downstream (p=0x21df188) at src/event/ngx_event_pipe.c:690
#16 ngx_event_pipe (p=0x21df188, do_write=do_write@entry=0) at src/event/ngx_event_pipe.c:33
#17 0x0000000000470c7b in ngx_http_upstream_process_upstream (r=0x21dd6e0, u=0x21ded10) at src/http/ngx_http_upstream.c:3727
#18 0x000000000046f639 in ngx_http_upstream_handler (ev=<optimized out>) at src/http/ngx_http_upstream.c:1117
#19 0x000000000044d857 in ngx_epoll_process_events (cycle=<optimized out>, timer=<optimized out>, flags=<optimized out>) at src/event/modules/ngx_epoll_module.c:822
#20 0x0000000000445b13 in ngx_process_events_and_timers (cycle=cycle@entry=0x1f1c390) at src/event/ngx_event.c:242
#21 0x000000000044ba71 in ngx_worker_process_cycle (cycle=cycle@entry=0x1f1c390, data=data@entry=0x16) at src/os/unix/ngx_process_cycle.c:753
#22 0x000000000044a544 in ngx_spawn_process (cycle=cycle@entry=0x1f1c390, proc=proc@entry=0x44b9f0 <ngx_worker_process_cycle>, data=data@entry=0x16, name=name@entry=0xa6a72f "worker process", respawn=respawn@entry=-3)
    at src/os/unix/ngx_process.c:198
#23 0x000000000044bc34 in ngx_start_worker_processes (cycle=cycle@entry=0x1f1c390, n=32, type=type@entry=-3) at src/os/unix/ngx_process_cycle.c:358
#24 0x000000000044c638 in ngx_master_process_cycle (cycle=cycle@entry=0x1f1c390) at src/os/unix/ngx_process_cycle.c:130
#25 0x0000000000426779 in main (argc=<optimized out>, argv=<optimized out>) at src/core/nginx.c:367
(gdb)

u...@zoey.com

unread,
Feb 11, 2017, 2:47:47 PM2/11/17
to ngx-pagespeed-discuss
I forgot to add the output of strace which is basically just

write(64, "F\0\0\0\0\0\0\0\220\231y\2\0\0\0\0\260AJ\2\0\0\0\0", 24) = -1 EAGAIN (Resource temporarily unavailable)

repeated infinity.

u...@zoey.com

unread,
Feb 11, 2017, 2:54:22 PM2/11/17
to ngx-pagespeed-discuss
Sorry for the multiple replies, I do see in the source that there is a comment

TODO(oschaaf): should we worry about spinning here?

Seems to be what's happening given the size is -1 from the strace and the condition is EAGAIN - we seem to just be waiting for it to change forever? Having said that I am not sure what triggers this condition to begin with. I'll await a reply from here - hopefully there is a suggestion. Perhaps the Linux Kernel version affects how often we get thrown into this endless loop by whatever is causing it?

Otto van der Schaaf

unread,
Feb 12, 2017, 4:34:38 AM2/12/17
to ngx-pagespeed-discuss
You may have diagnosed the problem correctly. I suspect the SSI module may also be a required element to reproduce this.

As a quick workaround, increasing the send buffer size for the pipe used to communicate between nginx and ngx_pagespeed may lessen or even eliminate the problem.
But a better solution would probably be setting up a mechanism where we reschedule events for sending later, when we can not send them right away (because write returned EAGAIN).


Otto




--
You received this message because you are subscribed to the Google Groups "ngx-pagespeed-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ngx-pagespeed-di...@googlegroups.com.
Visit this group at https://groups.google.com/group/ngx-pagespeed-discuss.
For more options, visit https://groups.google.com/d/optout.

u...@zoey.com

unread,
Feb 12, 2017, 6:32:33 PM2/12/17
to ngx-pagespeed-discuss
Hi Otto,

Thanks for the reply. A few things:
  1. It seems that the issue happens on specific servers consistently which is good because we can replicate it but it's bad because I have no idea why some exhibit it and some don't. For reference, these are identical containers running the same PHP codebase, PHP-FPM and NginX configuration.

  2. I looked at our old build of Nginx and the new one - I do not see any parameters in the configuration that would indicate SSI was previously disabled and now is disabled which means it has always been running. Nevertheless, on one of the servers that is exhibiting this behavior I added "ssi off;" in the main server {} block. I will let you know if it happens again.

  3. You mention that a workaround would be to increase the send buffer size. Nginx has multiple buffers - which one would we increase? We proxy all requests so is it the proxy buffer, the nginx instance itself, both? Which variable specifically in nginx would you modify and how would you compute the value to use. For reference looking at any value that contains "buffer" here is what we are working with today:

    • Nginx inside container (where the processes get stuck at 100% cpu):
    • client_body_buffer_size 128k; (nginx.conf)
    • ssl_buffer_size 4k; (sites-enabled)
    • fastcgi_buffer_size 4k; (sites-enabled)
    • fastcgi_buffers 512 4k; (sites-enabled)
    • fastcgi_busy_buffers_size 8k; (sites-enabled)
    • Nginx proxy:
    • client_body_buffer_size 128k; (nginx.conf)
    • proxy_buffering on; (sites-enabled)
    • proxy_buffer_size 4k; (sites-enabled)
    • proxy_buffers  4 32k; (sites-enabled)
    • proxy_busy_buffers_size 64k; (sites-enabled)
    • ssl_buffer_size 4k; (sites-enabled)

  4. Appreciate you creating the issue - what is the general timeline / process that I should expect for it to be fixed?
Thanks,
Uri

To unsubscribe from this group and stop receiving emails from it, send an email to ngx-pagespeed-discuss+unsub...@googlegroups.com.

Otto van der Schaaf

unread,
Feb 13, 2017, 4:48:56 PM2/13/17
to ngx-pagespeed-discuss
On 1) Does the output of  "ulimit -a" look identical on the servers? Has there been any sysctl tuning?

On 2) If the problem still occurs without SSI involvement, that would be useful to know. 

On 3): A small patch is required to increase the relevant buffers. My hopes are up that applying the following may improve or work around:

I can't comment on a timeline (sorry). 

Otto


To unsubscribe from this group and stop receiving emails from it, send an email to ngx-pagespeed-di...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "ngx-pagespeed-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ngx-pagespeed-di...@googlegroups.com.

Uri Foox

unread,
Feb 13, 2017, 5:36:37 PM2/13/17
to ngx-pagesp...@googlegroups.com
1) Nearly identical

Server without problem:
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 515362
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1000000
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 515362
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

Server with problem:
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 2063264
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1000000
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 2063264
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

2) Since adding "ssi off;" we have not seen the problem occur again. The true test will be to reboot one of these machines into the newest Kernel, see that the problem happens and then turn off SSI to see if it goes away. I will need to find a time to do that and get back to you. 

3) Since I compile the code and do not use a package can I just apply this patch and recompile?

4) Understood.

To unsubscribe from this group and stop receiving emails from it, send an email to ngx-pagespeed-discuss+unsub...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "ngx-pagespeed-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ngx-pagespeed-discuss+unsub...@googlegroups.com.

--
You received this message because you are subscribed to a topic in the Google Groups "ngx-pagespeed-discuss" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/ngx-pagespeed-discuss/yrEMXV2m-Ig/unsubscribe.
To unsubscribe from this group and all its topics, send an email to ngx-pagespeed-discuss+unsub...@googlegroups.com.



--
Uri Foox | Zoey
http://www.zoey.com

u...@zoey.com

unread,
Feb 14, 2017, 6:30:23 PM2/14/17
to ngx-pagespeed-discuss
Unfortunately the container just pegged NginX again at 100% CPU with the same BT. SSI off in the nginx server { } directive did not solve it.

What is the recommendation from here since a fix is not scheduled yet? 

Otto van der Schaaf

unread,
Feb 15, 2017, 9:50:48 AM2/15/17
to ngx-pagespeed-discuss
You could try if increasing the buffer sizes of the pipe offers a remedy by applying the patch, as you are compiling from source.

Otto

To unsubscribe from this group and stop receiving emails from it, send an email to ngx-pagespeed-di...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "ngx-pagespeed-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ngx-pagespeed-di...@googlegroups.com.

--
You received this message because you are subscribed to a topic in the Google Groups "ngx-pagespeed-discuss" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/ngx-pagespeed-discuss/yrEMXV2m-Ig/unsubscribe.
To unsubscribe from this group and all its topics, send an email to ngx-pagespeed-di...@googlegroups.com.



--
Uri Foox | Zoey
http://www.zoey.com

--
You received this message because you are subscribed to the Google Groups "ngx-pagespeed-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ngx-pagespeed-di...@googlegroups.com.

u...@zoey.com

unread,
Sep 19, 2017, 9:50:20 AM9/19/17
to ngx-pagespeed-discuss
Hello,

I do apologize for the very delayed reply to this thread. As we run thousands of instances these types of upgrades need to be planned. Nevertheless, we have successfully upgraded our infrastructure to the following config with the patch provided at https://gist.github.com/oschaaf/2382c735e29f4c960b1e3ca1dacc22fd - 24 hours later we are not seeing any hung NginX processes - so perhaps it is fixed. I will go ahead and update the thread in a week to provide a larger sample size.

Please let me know if there is anything else I can help with in troubleshooting this issue.

Thanks,
Uri

nginx version: nginx/1.12.1
built by gcc 4.8.4
built with OpenSSL 1.1.0f  25 May 2017
TLS SNI support enabled
configure arguments: [redacted] --add-module=/root/ngx_pagespeed-1.12.34.2-stable

To unsubscribe from this group and stop receiving emails from it, send an email to ngx-pagespeed-discuss+unsub...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "ngx-pagespeed-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ngx-pagespeed-discuss+unsub...@googlegroups.com.

--
You received this message because you are subscribed to a topic in the Google Groups "ngx-pagespeed-discuss" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/ngx-pagespeed-discuss/yrEMXV2m-Ig/unsubscribe.
To unsubscribe from this group and all its topics, send an email to ngx-pagespeed-discuss+unsub...@googlegroups.com.



--
Uri Foox | Zoey
http://www.zoey.com

--
You received this message because you are subscribed to the Google Groups "ngx-pagespeed-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ngx-pagespeed-discuss+unsub...@googlegroups.com.

Otto van der Schaaf

unread,
Sep 19, 2017, 10:13:54 AM9/19/17
to ngx-pagespeed-discuss
Thanks for the update. If you can let us know if the patch improves the situation, that more or less corners the issue.

Otto


To unsubscribe from this group and stop receiving emails from it, send an email to ngx-pagespeed-di...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "ngx-pagespeed-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ngx-pagespeed-di...@googlegroups.com.

--
You received this message because you are subscribed to a topic in the Google Groups "ngx-pagespeed-discuss" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/ngx-pagespeed-discuss/yrEMXV2m-Ig/unsubscribe.
To unsubscribe from this group and all its topics, send an email to ngx-pagespeed-di...@googlegroups.com.



--
Uri Foox | Zoey
http://www.zoey.com

--
You received this message because you are subscribed to the Google Groups "ngx-pagespeed-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ngx-pagespeed-di...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "ngx-pagespeed-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ngx-pagespeed-di...@googlegroups.com.

u...@zoey.com

unread,
Sep 24, 2017, 11:37:08 AM9/24/17
to ngx-pagespeed-discuss
It's been about a week and we have not seen one instance of a stuck nginx process since upgrading to 1.12.1 / pagespeed 1.12.34.2 / patch provided.

I believe the issue can be marked closed.

Please advise whether you plan to merge this into mainline or if we need to keep this patch as part of our build process.

Thanks,
Uri

Otto


To unsubscribe from this group and stop receiving emails from it, send an email to ngx-pagespeed-discuss+unsub...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "ngx-pagespeed-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ngx-pagespeed-discuss+unsub...@googlegroups.com.

--
You received this message because you are subscribed to a topic in the Google Groups "ngx-pagespeed-discuss" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/ngx-pagespeed-discuss/yrEMXV2m-Ig/unsubscribe.
To unsubscribe from this group and all its topics, send an email to ngx-pagespeed-discuss+unsub...@googlegroups.com.



--
Uri Foox | Zoey
http://www.zoey.com

--
You received this message because you are subscribed to the Google Groups "ngx-pagespeed-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ngx-pagespeed-discuss+unsub...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "ngx-pagespeed-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ngx-pagespeed-discuss+unsub...@googlegroups.com.

Otto van der Schaaf

unread,
Sep 25, 2017, 4:51:32 AM9/25/17
to ngx-pagespeed-discuss
I created https://github.com/pagespeed/ngx_pagespeed/pull/1481, let's hear from others what they think about merging that. (You can subscribe on the page above to receive updates).

Thanks for updating this thread with your findings!

Otto

To unsubscribe from this group and stop receiving emails from it, send an email to ngx-pagespeed-di...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "ngx-pagespeed-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ngx-pagespeed-di...@googlegroups.com.

--
You received this message because you are subscribed to a topic in the Google Groups "ngx-pagespeed-discuss" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/ngx-pagespeed-discuss/yrEMXV2m-Ig/unsubscribe.
To unsubscribe from this group and all its topics, send an email to ngx-pagespeed-di...@googlegroups.com.



--
Uri Foox | Zoey
http://www.zoey.com

--
You received this message because you are subscribed to the Google Groups "ngx-pagespeed-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ngx-pagespeed-di...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "ngx-pagespeed-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ngx-pagespeed-di...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "ngx-pagespeed-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ngx-pagespeed-di...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages