nginx worker process high CPU usage

Florin Dragos

unread,

Jun 10, 2012, 8:01:24 AM6/10/12

to vcap-dev

Hi everyone,

We're having some problems with nginx worker process that takes up too
much CPU on vcap router.
Only component running on the VM is the router. After deployment,
nginx worker process starts using more and more CPU, until reaching
100%. After killing the worker process, everything returns to normal.
Component is set up using vcap_dev_setup.

Any idea what might cause this high CPU usage and how to fix it?
Thanks

Kwon-Han Bae

unread,

Jun 10, 2012, 8:19:25 AM6/10/12

to vcap...@cloudfoundry.org

what's you nginx version?

upgrade it to the lastest.

2012/6/10 Florin Dragos <florin...@gmail.com>

--
배권한
KwonHan Bae
Kris Bae
http://iz4u.net/blog
linux, python, php, ruby developer

Yongkun Anfernee Gui

unread,

Jun 10, 2012, 12:42:15 PM6/10/12

to vcap...@cloudfoundry.org

Hi,

Can that be reliably reproduced? I never saw that. Is it 100% reproducable

even without serving any requests?

Could you check the nginx logs under $DEPLOY/devbox/log/nginx.*.log?

or probably run strace to get the syscall which nginx is spending cpu

time on.

thanks,

anfernee

Florin Dragos

unread,

Jun 12, 2012, 3:08:59 AM6/12/12

to vcap-dev

Nginx version is 1.2.0.
It reproduces only if it is serving requests.
I ran strace while the server was being accessed. During this time CPU
reached even 100%.
This is the output:

% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
80.44 0.105620 4 26281 brk
5.64 0.007410 0 53047 writev
4.73 0.006208 0 55666 epoll_wait
3.96 0.005198 0 23636 mremap
2.07 0.002716 0 35218 17609 connect
1.74 0.002289 2 1128 munmap
0.29 0.000385 0 17410 sendto
0.28 0.000374 0 35606 close
0.28 0.000371 0 17611 write
0.26 0.000336 0 136365 65479 recvfrom
0.17 0.000217 0 35218 socket
0.04 0.000056 0 36006 epoll_ctl
0.03 0.000040 0 17609 241 readv
0.03 0.000035 0 35218 getsockopt
0.02 0.000020 0 1128 mmap
0.02 0.000020 0 35218 ioctl
0.00 0.000000 0 1 open
0.00 0.000000 0 31 pwrite
0.00 0.000000 0 6 sendfile
0.00 0.000000 0 389 shutdown
0.00 0.000000 0 395 setsockopt
0.00 0.000000 0 1 unlink
0.00 0.000000 0 394 accept4
------ ----------- ----------- --------- --------- ----------------
100.00 0.131295 563582 83329 total

On Jun 10, 7:42 pm, Yongkun Anfernee Gui <a...@rbcon.com> wrote:
> Hi,
>
> Can that be reliably reproduced? I never saw that. Is it 100% reproducable
> even without serving any requests?
>
> Could you check the nginx logs under $DEPLOY/devbox/log/nginx.*.log?
> or probably run strace to get the syscall which nginx is spending cpu
> time on.
>
> thanks,
> anfernee
>

Yongkun Anfernee Gui

unread,

Jun 12, 2012, 3:45:01 AM6/12/12

to vcap...@cloudfoundry.org

First thing, we are officially using nginx 0.8.54 in cloud foundry, though I think

1.2.0 should work the same.

Next, other than upgrading nginx, did you other special things, like changing the

nginx config file, etc?

Next, what is the result of the following: uname -a, nginx -V, lsb_release? I know it

works very well on ubuntu 10.04, x86_64/i686.

Your requests failed or became slow when cpu went to 100%? Is there anything

abnormal in nginx access log and error log? Can you try a simple config, to isolate

the nginx problem?

Thanks,

Anfernee

Chunjie Zhu

unread,

Jun 12, 2012, 5:14:26 AM6/12/12

to vcap...@cloudfoundry.org

From strace output, it seems brk system call consumes most of all CPU resource.

As we all know, brk system call is called by glibc malloc to allocate heap memory. So, a naive guess is that, the process's heap space runs out, so kernel struggles to revoke memory and do re-allocation.

Please refer to /proc/<pid>/maps to check the process virtual memory layout and find out whether the heap space runs out when this problem happens again.

Regards,
Chunjie

From: "Yongkun Anfernee Gui" <ag...@rbcon.com>
To: vcap...@cloudfoundry.org
Sent: Tuesday, June 12, 2012 3:45:01 PM
Subject: Re: [vcap-dev] Re: nginx worker process high CPU usage

Yongkun Anfernee Gui

unread,

Jun 12, 2012, 5:36:01 AM6/12/12

to vcap...@cloudfoundry.org

also, can you try google-perftools for more detailed profiling of nginx.

- anfernee

Florin Dragos

unread,

Jun 12, 2012, 11:07:19 AM6/12/12

to vcap-dev

I'm not sure how to interpret /proc/<pid>/maps
The output right now is

01ac2000-023f7000 rw-p 00000000 00:00
0 [heap]

before doing server requests it stayed at 01ac2000-01e7c000, while
serving requests, second number keeps changing.

the requested outputs:

uname -a: Linux test-ubuntu1 2.6.32-33-server #70-Ubuntu SMP Thu Jul 7
22:28:30 UTC 2011 x86_64 GNU/Linux

nginx -V: nginx version: nginx/1.2.0
built by gcc 4.4.3 (Ubuntu 4.4.3-4ubuntu5.1)
configure arguments: --prefix=/home/cfuser/.deployments/deployment/
deploy/nginx/nginx-1.2.0 --with-pcre=../pcre-8.21 --add-module=../
nginx_upload_module-2.2.0 --add-module=../agentzh-headers-more-nginx-
module-5fac223 --add-module=../simpl-ngx_devel_kit-bc97eea --add-
module=../chaoslawful-lua-nginx-module-204ce2b

lsb_release: No LSB modules are available.

On Jun 12, 12:36 pm, Yongkun Anfernee Gui <a...@rbcon.com> wrote:
> also, can you try google-perftools for more detailed profiling of nginx.
>
> - anfernee
>
>
>
>
>
>
>
> On Tue, Jun 12, 2012 at 5:14 PM, Chunjie Zhu <c...@rbcon.com> wrote:
> > From strace output, it seems brk system call consumes most of all CPU
> > resource.
>
> > As we all know, brk system call is called by glibc malloc to allocate heap
> > memory. So, a naive guess is that, the process's heap space runs out, so
> > kernel struggles to revoke memory and do re-allocation.
>
> > Please refer to /proc/<pid>/maps to check the process virtual memory
> > layout and find out whether the heap space runs out when this problem
> > happens again.
>
> > Regards,
> > Chunjie
>

> > ------------------------------
> > *From: *"Yongkun Anfernee Gui" <a...@rbcon.com>
> > *To: *vcap-...@cloudfoundry.org
> > *Sent: *Tuesday, June 12, 2012 3:45:01 PM
> > *Subject: *Re: [vcap-dev] Re: nginx worker process high CPU usage

>
> > First thing, we are officially using nginx 0.8.54 in cloud foundry, though
> > I think
> > 1.2.0 should work the same.
>
> > Next, other than upgrading nginx, did you other special things, like
> > changing the
> > nginx config file, etc?
>
> > Next, what is the result of the following: uname -a, nginx -V,
> > lsb_release? I know it
> > works very well on ubuntu 10.04, x86_64/i686.
>
> > Your requests failed or became slow when cpu went to 100%? Is there
> > anything
> > abnormal in nginx access log and error log? Can you try a simple config,
> > to isolate
> > the nginx problem?
>
> > Thanks,
> > Anfernee
>

Florin Dragos

unread,

Jun 12, 2012, 3:46:29 PM6/12/12

to vcap-dev

Tthere was a warning when running nginx -t:

[warn] 2048 worker_connections exceed open
file resource limit: 1024

Reducing worker_connections to 1024, seems to solve the issue. At
least for now, CPU seems stable.

Yongkun Anfernee Gui

unread,

Jun 12, 2012, 8:23:29 PM6/12/12

to vcap...@cloudfoundry.org

Glad your problem is fixed.

FYI:

adding this to nginx.conf wil increase the number of open file in worker:

worker_rlimit_nofile 2048;

Thanks,

Anfernee

Chunjie Zhu

unread,

Jun 12, 2012, 10:20:03 PM6/12/12

to vcap...@cloudfoundry.org

Most likely, the default limit for open file descriptors per linux process is 1024, network socket is also taken into account.

chunjie@ubuntu:~$ ulimit -n
1024

However, this value is a soft limit, but not a hard limit. It means, if the soft limit exceeds then kernel will try to do something (refer to linux kernel fs/file.c alloc_fd function), kernel may trap into the loop of "expand fd array -> error -> repeat -> expand fd array". While hard limit or sysctl_nr_open exceeds then error will be returned to user land applications directly.

So, besides the suggestion from Anfernee, we can also use ulimit to set open fd rlimit (ulimit -n 2048, additional configuration is needed if we want it take effect when system boots up). From the perspective of low-level, they both call setrlimit system call. In normal case, we do not need to touch sysctl_nr_open, because its value is large enough.

NOTE: The priority for the above limits in linux kernel is, soft limit -> hard limit -> sysctl_nr_open

Regards,
Chunjie

From: "Yongkun Anfernee Gui" <ag...@rbcon.com>
To: vcap...@cloudfoundry.org

Sent: Wednesday, June 13, 2012 8:23:29 AM
Subject: Re: [vcap-dev] Re: nginx worker process high CPU usage

yssk22

unread,

Aug 16, 2012, 5:29:40 AM8/16/12

to vcap...@cloudfoundry.org

Hi,

I also encountered the same issue. I could reproduce even if I set worker process as 1024. The reproduce procedure is just to make stresses nginx server like this:

- restart nginx server

- perform httperf as 'httperf --hog --server={nginx-ip} --port=80 --uri=/ --num-conns=300000 --rate=500 --timeout 5 --send-buffer=4096 --recv-buffer=16384 --server-name=non-existent.examplecom'

(num-conns and rate depend on your system)

- at first, CPU usage is around 10% (depending on your system and --rate param) and it works well.

- after a while, CPU usage gets 100%.

- strace reports 'brk' system call usage as mentioned.

- if kill httperf and resume again, CPU usage gets 100% soon.

I found the string leak of 'package.path' and 'package.cpath' in nginx setup cookbook:

https://github.com/cloudfoundry/vcap/blob/master/dev_setup/cookbooks/nginx/templates/default/router-nginx.conf.erb#L121

This append strings on package.(c)path unlimitedly by requests. I guess this cause the large number of 'malloc' and

It should be removed and we should use 'lua_package_path' directive not to append but to set instead of this configuration, which should solve this issue.

Thanks.

2012年6月13日水曜日 11時20分03秒 UTC+9 Chunjie Zhu:

Yongkun Anfernee Gui

unread,

Aug 16, 2012, 7:20:39 AM8/16/12

to vcap...@cloudfoundry.org

Hi Yohei,

Thanks for reporting and analyzing the issue. That was an issue

which only appears in dev_setup. The cause is exactly as you

said. The fix is submitted here:

http://reviews.cloudfoundry.org/#/c/8456/

Thanks again,

Anfernee

--
Cheers,

Anfernee

yssk22

unread,

Aug 16, 2012, 10:54:17 AM8/16/12

to vcap...@cloudfoundry.org

Thanks! It seems fine.

2012年8月16日木曜日 20時20分39秒 UTC+9 Anfernee Gui:

Reply all

Reply to author

Forward