Memory leak in 1.9.15.1?

303 views
Skip to first unread message

Hamish Forbes

unread,
Aug 18, 2016, 11:40:15 AM8/18/16
to openresty-en
Hi,

We're seeing some strange memory leaks in a couple of production systems, I'm not 100% sure this is actually the cause of the problem but it's odd enough I thought I'd make a post anyway...

Essentially what I'm seeing is openresty/1.9.15.1 (and 1.11.2rc1) workers not releasing memory when connections are closed.
However equivalent versions of vanilla nginx don't exhibit the same behaviour, nor does Openresty 1.9.7.5

The idea is to make requests wait so there are many concurrent open requests.
Then hit this config with high concurrency: `wrk -c 9000 -t 100 http://localhost:80/ -d 30 --timeout 15`

What I see is worker memory usage climb as expected and `ss` show around 20k open connections.
With openresty >1.9.7.5 the memory usage never drops off again.
If I halt the test and wait for `ss` to show normal connections again there is no change in the worker memory usage.

Repeating the same test with vanilla nginx 1.11.2 (and the echo module compiled in) shows worker memory usage drop back to normal very quickly.

I've also tried using ngx_lua rather than echo_sleep, same behaviour.
And as a last resort I tried using proxy_pass to a separate openresty instance, in that case the openresty instance listening on 80 does not use any non-standard modules but still exhibits the same behaviour.

Test system is
Centos 6.8
Kernel 2.6.32-573.3.1.el6.x86_64

nginx version: openresty/1.9.15.1
built by gcc 4.4.7 20120313 (Red Hat 4.4.7-16) (GCC)
built with OpenSSL 1.0.1e-fips 11 Feb 2013
TLS SNI support enabled
configure arguments: --prefix=/usr/local/openresty/nginx --with-cc-opt=-O2 --add-module=../ngx_devel_kit-0.3.0 --add-module=../echo-nginx-module-0.59 --add-module=../xss-nginx-module-0.05 --add-module=../ngx_coolkit-0.2rc3 --add-module=../set-misc-nginx-module-0.30 --add-module=../form-input-nginx-module-0.12 --add-module=../encrypted-session-nginx-module-0.05 --add-module=../srcache-nginx-module-0.31 --add-module=../ngx_lua-0.10.5 --add-module=../ngx_lua_upstream-0.05 --add-module=../headers-more-nginx-module-0.30 --add-module=../array-var-nginx-module-0.05 --add-module=../memc-nginx-module-0.17 --add-module=../redis2-nginx-module-0.13 --add-module=../redis-nginx-module-0.3.7 --add-module=../rds-json-nginx-module-0.14 --add-module=../rds-csv-nginx-module-0.07 --with-ld-opt=-Wl,-rpath,/usr/local/openresty/luajit/lib --with-pcre=/root/pcre-8.38 --with-pcre-jit --with-http_geoip_module --with-http_realip_module --with-http_gunzip_module --with-http_ssl_module

Any ideas or pointers on where to go with debugging this would be much appreciated!

Thanks
Hamish

Robert Paprocki

unread,
Aug 18, 2016, 11:45:32 AM8/18/16
to openre...@googlegroups.com
Have you tried running valgrind with the no pool patch for each test case?

Sent from my iPhone
--
You received this message because you are subscribed to the Google Groups "openresty-en" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openresty-en...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Hamish Forbes

unread,
Aug 18, 2016, 11:51:23 AM8/18/16
to openresty-en
Not yet, that was next on my list.
But I have to figure out how to do that first :)

Hamish Forbes

unread,
Aug 18, 2016, 12:26:04 PM8/18/16
to openresty-en

So 1.9.7.5 compiled with --with-no-pool-patch shows the same behaviour as 1.9.15.1, still uses memory after connections have closed.

Valgrind doesn't show any lost memory (just running valgrind --leak-check=full ./nginx-1.9.7.5-nopool assume thats right?)

Hamish Forbes

unread,
Aug 22, 2016, 6:38:20 AM8/22/16
to openresty-en
I've tried compiling openresty 1.11.2.1rc1 using --without-http_lua_upstream_module --without-http_lua_module and this shows the same behaviour as vanilla nginx (memory usage returns to very low after connections close).
Did something change in ngx_lua to persist some memory allocations per-connection between 1.9.7.5 and 1.9.15.1?

Yichun Zhang (agentzh)

unread,
Aug 25, 2016, 2:31:04 AM8/25/16
to openresty-en
Hello!

On Thu, Aug 18, 2016 at 8:40 AM, Hamish Forbes wrote:
>
> Any ideas or pointers on where to go with debugging this would be much
> appreciated!
>

Since you are on Linux, could you try generating a memory leak flame
graph against your leaking nginx worke processes? Please see

https://github.com/openresty/stapxx#sample-bt-leaks

Thanks!

Best regards,
-agentzh

Hamish Forbes

unread,
Aug 25, 2016, 5:14:39 AM8/25/16
to openresty-en
Hi,

Couldn't get the scripts to work properly on Centos 6 but no problems on C7.

I compiled openresty-1.11.2.1rc1 and ran the test which results in this flamegraph: http://hamish.codes/leak.svg
This results in an nginx process that started out at ~6MB resident and ended up at ~16MB but never freed that memory up

I re-compiled using --without-http_lua_upstream_module --without-http_lua_module and got this: http://hamish.codes/noleak.svg
This setup also started around ~6MB resident, peaked to 16MB during the test but fell back to 6MB afterwards

root@c7-test01:/usr/local/openresty/nginx/sbin# ./nginx-nolua -V
nginx version
: openresty/1.11.2.1rc1
built
by gcc 4.8.5 20150623 (Red Hat 4.8.5-4) (GCC)

built
with OpenSSL 1.0.1e-fips 11 Feb 2013

TLS SNI support enabled
configure arguments
: --prefix=/usr/local/openresty/nginx --with-cc-opt=-O2 --add-module=../ngx_devel_kit-0.3.0 --add-module=../echo-nginx-module-0.60 --add-module=../xss-nginx-module-0.05 --add-module=../ngx_coolkit-0.2rc3 --add-module=../set-misc-nginx-module-0.31 --add-module=../form-input-nginx-module-0.12 --add-module=../encrypted-session-nginx-module-0.06 --add-module=../srcache-nginx-module-0.31 --add-module=../headers-more-nginx-module-0.31 --add-module=../array-var-nginx-module-0.05 --add-module=../memc-nginx-module-0.17 --add-module=../redis2-nginx-module-0.13 --add-module=../redis-nginx-module-0.3.7 --add-module=../rds-json-nginx-module-0.14 --add-module=../rds-csv-nginx-module-0.07 --with-pcre-jit --with-http_realip_module --with-http_gunzip_module --with-http_ssl_module

root@c7
-test01:/usr/local/openresty/nginx/sbin# ./nginx-1.11.2.1rc1 -V
nginx version
: openresty/1.11.2.1rc1
built
by gcc 4.8.5 20150623 (Red Hat 4.8.5-4) (GCC)

built
with OpenSSL 1.0.1e-fips 11 Feb 2013

TLS SNI support enabled
configure arguments
: --prefix=/usr/local/openresty/nginx --with-cc-opt=-O2 --add-module=../ngx_devel_kit-0.3.0 --add-module=../echo-nginx-module-0.60 --add-module=../xss-nginx-module-0.05 --add-module=../ngx_coolkit-0.2rc3 --add-module=../set-misc-nginx-module-0.31 --add-module=../form-input-nginx-module-0.12 --add-module=../encrypted-session-nginx-module-0.06 --add-module=../srcache-nginx-module-0.31 --add-module=../ngx_lua-0.10.6 --add-module=../ngx_lua_upstream-0.06 --add-module=../headers-more-nginx-module-0.31 --add-module=../array-var-nginx-module-0.05 --add-module=../memc-nginx-module-0.17 --add-module=../redis2-nginx-module-0.13 --add-module=../redis-nginx-module-0.3.7 --add-module=../rds-json-nginx-module-0.14 --add-module=../rds-csv-nginx-module-0.07 --with-ld-opt=-Wl,-rpath,/usr/local/openresty/luajit/lib --with-pcre-jit --with-http_realip_module --with-http_gunzip_module --with-http_ssl_module


Thanks!
Hamish

Yichun Zhang (agentzh)

unread,
Aug 25, 2016, 3:56:01 PM8/25/16
to openresty-en
Hello!

On Thu, Aug 25, 2016 at 2:14 AM, Hamish Forbes wrote:
> I compiled openresty-1.11.2.1rc1 and ran the test which results in this
> flamegraph: http://hamish.codes/leak.svg
> This results in an nginx process that started out at ~6MB resident and ended
> up at ~16MB but never freed that memory up
>

Does it go further without bound? Or it just max out at 16MB?

> I re-compiled using --without-http_lua_upstream_module
> --without-http_lua_module and got this: http://hamish.codes/noleak.svg
> This setup also started around ~6MB resident, peaked to 16MB during the test
> but fell back to 6MB afterwards
>

There is no real leaks at all according to your flame graphs. The
things shown in your flame graphs are just request pool allocations,
which should be just false positives, or "noises". You can confirm
that further by running the following tool:

https://github.com/openresty/nginx-systemtap-toolkit#ngx-leaked-pools

So I think we can conclude that it's just "false leak" due to memory
fragmentation, which is completely normal. The memory management on
the system level can be very complicated so interpreting the RSS
metric reported by system needs special care when saying something is
a leak or not. Compiling the lua modules inside your nginx just
changes the memory layout of your nginx processes a bit, which may
explain the differences.

I'm only interested in real leaks that can eat up memory without bound.

Regards,
-agentzh

Yichun Zhang (agentzh)

unread,
Aug 25, 2016, 4:11:21 PM8/25/16
to openresty-en
Hello!

On Thu, Aug 18, 2016 at 8:40 AM, Hamish Forbes wrote:
> Essentially what I'm seeing is openresty/1.9.15.1 (and 1.11.2rc1) workers
> not releasing memory when connections are closed.
> However equivalent versions of vanilla nginx don't exhibit the same
> behaviour, nor does Openresty 1.9.7.5
>

One important change since 1.9.15.1 is that ngx_lua enforces glibc to
allocate memory using the mmap() syscall instead of sbrk(), so as to
preserve as much as the low 2GB address space to LuaJIT's GC-managed
memory. The glibc's allocator implementation may *cache* freed memory
pages allocated by mmap() in the user space so as to reuse the pages
for later allocations for the best performance.

As I've said, system-level memory management is complicated due to
many lower level optimizations in glibc or even the Linux kernel. If
you cannot make the process's RSS grow without bound, then you are
very likely just getting a false positive due to lower level memory
block caching in glibc, which is beyond our control on the application
level.

Regards,
-agentzh

Hamish Forbes

unread,
Aug 26, 2016, 3:57:26 AM8/26/16
to openresty-en
Hi,

In this specific test I cannot make it grow without bound. 
I agree with you, this is not a 'leak' as such and feels like a change in allocation behaviour more than anything else.
I was trying to narrow down the problem we are seeing with live systems as much as possible.

In our production systems however I do not know if memory use grows without bound or not.
We are currently forced to HUP openresty every couple of days to prevent the system running out of memory and invoking oomkiller.

With openresty 1.9.7.5 we would still, obviously, see memory grow beyond the initial amount but with 1.9.15.1 RSS is 5-10x higher.
Given the traffic, config and OS haven't changed this is a fairly significant problem for us.
Traffic levels that were easily handled on a 4GB VM now cause memory exhaustion.

Now that I've got all the stapxx/systemtap scripts working on Centos 7 I'm going to try replaying prod traffic at a test system and see if I can find anything more useful!

Is the mmap/sbrk change something that I can disable or easily revert in order to rule that change in or out?

Thanks for your help!

Hamish

Yichun Zhang (agentzh)

unread,
Aug 30, 2016, 4:55:25 PM8/30/16
to openresty-en
Hello!

On Fri, Aug 26, 2016 at 12:57 AM, Hamish Forbes wrote:
>
> Is the mmap/sbrk change something that I can disable or easily revert in
> order to rule that change in or out?
>

It should be harmless since glibc just perserve those memory pages for
later use. Even on a small box, those memory pages can get swapped out
if activity has been low since the last traffic peek.

We're working on a patch to make glibc more conservative in keeping
freed memory for later use. Before that, you could apply the following
patch to disable this change on your side:

diff --git a/config b/config
index 0f2749d..e20c844 100644
--- a/config
+++ b/config
@@ -495,7 +495,7 @@ exit(1);
SAVED_CC_TEST_FLAGS="$CC_TEST_FLAGS"
CC_TEST_FLAGS="-Werror -Wall $CC_TEST_FLAGS"

-. auto/feature
+#. auto/feature

CC_TEST_FLAGS="$SAVED_CC_TEST_FLAGS"

@@ -515,7 +515,7 @@ ngx_feature_test="exit(a);"
SAVED_CC_TEST_FLAGS="$CC_TEST_FLAGS"
CC_TEST_FLAGS="-Werror -Wall $CC_TEST_FLAGS"

-. auto/feature
+#. auto/feature

CC_TEST_FLAGS="$SAVED_CC_TEST_FLAGS"

Regards,
-agentzh

Yichun Zhang (agentzh)

unread,
Oct 2, 2016, 3:21:21 PM10/2/16
to openresty-en
Hello!

On Thu, Aug 25, 2016 at 1:11 PM, Yichun Zhang (agentzh) wrote:
> One important change since 1.9.15.1 is that ngx_lua enforces glibc to
> allocate memory using the mmap() syscall instead of sbrk(), so as to
> preserve as much as the low 2GB address space to LuaJIT's GC-managed
> memory. The glibc's allocator implementation may *cache* freed memory
> pages allocated by mmap() in the user space so as to reuse the pages
> for later allocations for the best performance.
>

Okay, this is a problem (or limitation) of glibc on Linux that holds
too much freed memory when it fails to usebrk() to allocate memory in
the heap. The work-around is to call malloc_trim() periodically to
release such free memory blocks back to the OS.

I've just implemented the lua_malloc_trim N config directive in the
malloc-trim github branch of this repo, that calls malloc_trim() every
N main requests processed by the nginx core. By default,N is 1000. You
can tune the number to fit your use cases (for testing purposes, you
can set N to 1). Mind you, N = 0 disables the periodical trimming
altogether.

For more details, please see

https://github.com/openresty/lua-nginx-module/tree/malloc-trim

https://github.com/openresty/lua-nginx-module/commit/f0b45946d

Please try this branch and directive out on your side. Feedback
welcome! If it works good for the community, I'll merge this branch
into master.

BTW, this change also helps the old days where glibc *can* use brk()
to allocate memory by moving the program break ;)

Thanks a lot!

Best regards,
-agentzh

P.S. I must thank @yangshuxin for the workaround and suggestion here :)
Reply all
Reply to author
Forward
0 new messages