Memory leak: Need help to debug root cause

576 views
Skip to first unread message

Nishant Kumar

unread,
Feb 26, 2018, 5:15:08 AM2/26/18
to openresty-en
Hi,

I am using openresty 1.13.6.1 (by compiling source code) on centOS 7 (40 core, 64 GB RAM bare metal machine).

$ cat /etc/redhat-release
CentOS Linux release 7.4.1708 (Core)

$ uname -a
Linux hostname 4.13.1-1.el7.elrepo.x86_64 #1 SMP Sun Sep 10 11:11:27 EDT 2017 x86_64 x86_64 x86_64 GNU/Linux

$ nginx -V
nginx version
: openresty/1.13.6.1
built
by gcc 4.8.5 20150623 (Red Hat 4.8.5-16) (GCC)
built
with OpenSSL 1.0.2n  7 Dec 2017
TLS SNI support enabled
configure arguments
: --prefix=/usr/local/openresty/nginx --with-cc-opt=-O2 --add-module=../ngx_devel_kit-0.3.0 --add-module=../echo-nginx-module-0.61 --add-module=../xss-nginx-module-0.05 --add-module=../ngx_coolkit-0.2rc3 --add-module=../set-misc-nginx-module-0.31 --add-module=../form-input-nginx-module-0.12 --add-module=../encrypted-session-nginx-module-0.07 --add-module=../srcache-nginx-module-0.31 --add-module=../ngx_lua-0.10.11 --add-module=../ngx_lua_upstream-0.07 --add-module=../headers-more-nginx-module-0.33 --add-module=../array-var-nginx-module-0.05 --add-module=../memc-nginx-module-0.18 --add-module=../redis2-nginx-module-0.14 --add-module=../redis-nginx-module-0.3.7 --add-module=../rds-json-nginx-module-0.15 --add-module=../rds-csv-nginx-module-0.08 --add-module=../ngx_stream_lua-0.0.3 --with-ld-opt=-Wl,-rpath,/usr/local/openresty/luajit/lib --sbin-path=/usr/sbin --with-http_geoip_module --pid-path=/var/run/nginx.pid --with-http_stub_status_module --with-dtrace-probes --with-pcre-jit --with-ipv6 --with-http_ssl_module --with-openssl=/home/platform/installers/openresty-1.13.6.1/../../openssl-1.0.2n --with-stream --with-stream_ssl_module

I am noticing memory leak. On server start, RAM uses is around 14 GB and it goes up to 64 GB within 20-24 hours and causes OOM.

If I restart workers in between (sudo nginx -s reload) then again RAM used reduces to 14-16 GB.

I am heavily using local lua tables and shared DICT across workers.

I tried memory leak detection tool provided by Openresy but not able to understand.

[root@machine stapxx]# ./samples/sample-bt-leaks.sxx -x 4457 --arg time=5 -D STP_NO_OVERLOAD -D MAXMAPENTRIES=10000 > a.bt

I have attached flamegraph.

Below is the output of ngx-leaked-pools script.

[root@machine openresty-systemtap-toolkit]# ./ngx-leaked-pools -p 4457
Tracing 4457 (/usr/sbin/nginx)...
Hit Ctrl-C to end.
^C
586 pools leaked at backtrace 0x4546da 0x47153a 0x4791f7 0x470d4b 0x477362 0x475e44 0x4777c4 0x4781cb 0x452052 0x7f64a9bb3c05 0x4523ae
31 pools leaked at backtrace 0x4546da 0x515616 0x51640c 0x7f64aa9096ea 0x507a79 0x50ace6 0x50b05b 0x50a83c 0x488b1e 0x483705 0x48dee3 0x48e75b 0x4791f7 0x470d4b 0x477362 0x475e44 0x4777c4 0x4781cb 0x452052 0x7f64a9bb3c05
1 pools leaked at backtrace 0x4546da 0x520827 0x471153 0x470d77 0x477362 0x475e44 0x4777c4 0x4781cb 0x452052 0x7f64a9bb3c05 0x4523ae
1 pools leaked at backtrace 0x4546da 0x515616 0x51640c 0x7f64aa9096ea 0x507a79 0x50ace6 0x50b05b 0x50a83c 0x488b1e 0x483705 0x48dee3 0x48e75b 0x471220 0x470d84 0x477362 0x475e44 0x4777c4 0x4781cb 0x452052 0x7f64a9bb3c05

Run the command "./ngx-backtrace -p 4457 <backtrace>" to get details.
For total 25009 pools allocated.


[root@machine openresty-systemtap-toolkit]# ./ngx-backtrace -p 4457 0x4546da 0x47153a 0x4791f7 0x470d4b 0x477362 0x475e44 0x4777c4 0x4781cb 0x452052 0x7f64a9bb3c05 0x4523ae
ngx_create_pool
/home/platform/installers/openresty-1.13.6.1/build/nginx-1.13.6/src/core/ngx_palloc.c:43
ngx_event_accept
/home/platform/installers/openresty-1.13.6.1/build/nginx-1.13.6/src/event/ngx_event_accept.c:162
ngx_epoll_process_events
/home/platform/installers/openresty-1.13.6.1/build/nginx-1.13.6/src/event/modules/ngx_epoll_module.c:902
ngx_process_events_and_timers
/home/platform/installers/openresty-1.13.6.1/build/nginx-1.13.6/src/event/ngx_event.c:259
ngx_worker_process_cycle
/home/platform/installers/openresty-1.13.6.1/build/nginx-1.13.6/src/os/unix/ngx_process_cycle.c:817
ngx_spawn_process
/home/platform/installers/openresty-1.13.6.1/build/nginx-1.13.6/src/os/unix/ngx_process.c:205
ngx_start_worker_processes
/home/platform/installers/openresty-1.13.6.1/build/nginx-1.13.6/src/os/unix/ngx_process_cycle.c:399 (discriminator 2)
ngx_master_process_cycle
/home/platform/installers/openresty-1.13.6.1/build/nginx-1.13.6/src/os/unix/ngx_process_cycle.c:252
main
/home/platform/installers/openresty-1.13.6.1/build/nginx-1.13.6/src/core/nginx.c:384
??
??:0
_start
??:?


[root@machine openresty-systemtap-toolkit]#  ./ngx-backtrace -p 4457 0x4546da 0x515616 0x51640c 0x7f64aa9096ea 0x507a79 0x50ace6 0x50b05b 0x50a83c 0x488b1e 0x483705 0x48dee3 0x48e75b 0x4791f7 0x470d4b 0x477362 0x475e44 0x4777c4 0x4781cb 0x452052 0x7f64a9bb3c05
ngx_create_pool
/home/platform/installers/openresty-1.13.6.1/build/nginx-1.13.6/src/core/ngx_palloc.c:43
ngx_http_lua_socket_resolve_retval_handler
/home/platform/installers/openresty-1.13.6.1/build/nginx-1.13.6/../ngx_lua-0.10.11/src/ngx_http_lua_socket_tcp.c:1101
ngx_http_lua_socket_tcp_connect
/home/platform/installers/openresty-1.13.6.1/build/nginx-1.13.6/../ngx_lua-0.10.11/src/ngx_http_lua_socket_tcp.c:699
??
??:0
ngx_http_lua_run_thread
/home/platform/installers/openresty-1.13.6.1/build/nginx-1.13.6/../ngx_lua-0.10.11/src/ngx_http_lua_util.c:1014
ngx_http_lua_content_by_chunk
/home/platform/installers/openresty-1.13.6.1/build/nginx-1.13.6/../ngx_lua-0.10.11/src/ngx_http_lua_contentby.c:122
ngx_http_lua_content_handler_file
/home/platform/installers/openresty-1.13.6.1/build/nginx-1.13.6/../ngx_lua-0.10.11/src/ngx_http_lua_contentby.c:285
ngx_http_lua_content_handler
/home/platform/installers/openresty-1.13.6.1/build/nginx-1.13.6/../ngx_lua-0.10.11/src/ngx_http_lua_contentby.c:223
ngx_http_core_content_phase
/home/platform/installers/openresty-1.13.6.1/build/nginx-1.13.6/src/http/ngx_http_core_module.c:1173
ngx_http_core_run_phases
/home/platform/installers/openresty-1.13.6.1/build/nginx-1.13.6/src/http/ngx_http_core_module.c:864
ngx_http_process_request
/home/platform/installers/openresty-1.13.6.1/build/nginx-1.13.6/src/http/ngx_http_request.c:1953
ngx_http_process_request_line
/home/platform/installers/openresty-1.13.6.1/build/nginx-1.13.6/src/http/ngx_http_request.c:1051
ngx_epoll_process_events
/home/platform/installers/openresty-1.13.6.1/build/nginx-1.13.6/src/event/modules/ngx_epoll_module.c:902
ngx_process_events_and_timers
/home/platform/installers/openresty-1.13.6.1/build/nginx-1.13.6/src/event/ngx_event.c:259
ngx_worker_process_cycle
/home/platform/installers/openresty-1.13.6.1/build/nginx-1.13.6/src/os/unix/ngx_process_cycle.c:817
ngx_spawn_process
/home/platform/installers/openresty-1.13.6.1/build/nginx-1.13.6/src/os/unix/ngx_process.c:205
ngx_start_worker_processes
/home/platform/installers/openresty-1.13.6.1/build/nginx-1.13.6/src/os/unix/ngx_process_cycle.c:399 (discriminator 2)
ngx_master_process_cycle
/home/platform/installers/openresty-1.13.6.1/build/nginx-1.13.6/src/os/unix/ngx_process_cycle.c:252
main
/home/platform/installers/openresty-1.13.6.1/build/nginx-1.13.6/src/core/nginx.c:384
??
??:0

Thanks.
a.svg

Robert Paprocki

unread,
Feb 26, 2018, 5:34:08 AM2/26/18
to openre...@googlegroups.com
Can you post a full, complete, minimal nginx config and Lua source code representation of the environment so that someone can try to reproduce the issue?

@ Mobile World CongressBarcelona, February 26 - March 1
@ ad:techNew Delhi, March 8 - 9
@ Digital Marketing SummitSeoul, March 16
@ Game Developers ConferenceSan Francisco, March 19 - 23


Click here to meet us at an event near you.


--
You received this message because you are subscribed to the Google Groups "openresty-en" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openresty-en...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
<a.svg>

Nishant Kumar

unread,
Feb 26, 2018, 6:32:15 AM2/26/18
to openresty-en
Getting LUA source code is little difficult but I will try. 

Here is my nginx.conf

# user as which the nginx worker threads with spawn
user admin;

# number of worker processes
worker_processes auto;

#worker_cpu_affinity auto;

# maximum number of files which can be opened at the same time
worker_rlimit_nofile 999999;

# TODO: need to explain this
events {
  accept_mutex off;
  multi_accept on;
  worker_connections 65536;
  use epoll;
}

# environment variables
env PLATFORM_ENV=production;

http {
  variables_hash_max_size 1024;
  variables_hash_bucket_size 1024;

  sendfile         on;
  tcp_nopush       on;
  tcp_nodelay      on;

  # to hide nginx version from the header
  server_tokens off;

  # GeoIP dat files
  geoip_org "data/geoip/GeoIPISP.dat";
  geoip_city "data/geoip/GeoIPCity.dat";
  geoip_proxy 0.0.0.0/0;
  geoip_proxy_recursive off;

  # initialization constants
  init_by_lua_file "config/environments/production/init.lua";

  # include common config. Init shared DICT and lua path
  include "../common.conf";

  # include exchange config
  include "exchange.conf";
}


exchange.conf

log_format ACCESS_LOG $access_log;
log_format KYT_ACCESS_LOG $kyt_access_log;
server
{
  listen 80;
  listen 443 ssl;
  lua_code_cache on;

  ssl_dhparam /home/user/a.pem;
  ssl_prefer_server_ciphers on;

  ssl_certificate     /etc/ssl/certs/a.crt;
  ssl_certificate_key /etc/ssl/certs/a.key;

  ssl_buffer_size 16k;

  ssl_ciphers ECDHE-RSA-AES128-GCM-SHA256:ECDHE-RSA-AES256-GCM-SHA384:AES128-GCM-SHA256:AES256-GCM-SHA384:ECDH+AESGCM:DH+AESGCM:ECDH+AES256:DH+AES256:ECDH+AES128:DH+AES:RSA+AESGCM:RSA+AES:!aNULL:!MD5:!DSS;

  ssl_protocols TLSv1 TLSv1.1 TLSv1.2;

  ssl_session_cache shared:SSL:20m;
  ssl_session_timeout 180m;

  client_body_in_single_buffer on;
  client_body_buffer_size 24k;
  client_max_body_size 24k;

  keepalive_requests 1000000;
  keepalive_timeout 120s;

  server_name ~^([a-z0-9]*).test.com$;

  # change server name in header
  more_set_headers 'Server: Prod/1.0s';

  # log
  set $access_log "";
  access_log "logs/req-acc.log" ACCESS_LOG;
  error_log "logs/production.error.log" error;

  set $kyt_access_log "";
  access_log "logs/production.kyt-access.log" KYT_ACCESS_LOG;

  set $V $1;
  set $EXCHANGE_REQUEST_URL "http://${V}.hostname.com/";


  location / {
    content_by_lua_file 'code/controllers/request.lua';
    #content_by_lua "ngx.exit(204)";
  }

  location /compressed {
    gzip on;
    gzip_vary on;
    gzip_types 'application/json' 'text/plain';
    content_by_lua_file 'code/controllers/request.lua';
    #content_by_lua "ngx.exit(204)";
  }

  include "../ignore.conf";
}

tokers

unread,
Feb 26, 2018, 10:38:17 PM2/26/18
to openresty-en
Hello!

> tried memory leak detection tool provided by Openresy but not able to understand.
root@machine stapxx]# ./samples/sample-bt-leaks.sxx -x 4457 --arg time=5 -D STP_NO_OVERLOAD -D MAXMAPENTRIES=10000 > a.bt

This script helps us to trace the code path about memory allocation. So the code paths in the flamegraph just show the memory allocation situation.

I galanced this flamegraph, all of the code path seems normal except this:

...
_Z20substring_to_exampleP2vwP7example9substring
_ZN2VW12read_exampleER2vwPc
getValueFromVW
lj_BC_FUNCC
ngx_http_lua_run_thread
ngx_http_lua_socket_tcp_resume_helper
ngx_http_lua_socket_tcp_read
ngx_http_lua_socket_tcp_handler
....

Do you any third-party dynamic libraries(so file) in your lua code?

Nishant Kumar

unread,
Feb 26, 2018, 11:18:08 PM2/26/18
to openresty-en
Yes, we are using vowpal wabbit C library 

tokers

unread,
Feb 27, 2018, 12:42:22 AM2/27/18
to openresty-en
Hello!

> Yes, we are using vowpal wabbit C library 

Did you test this library?  Maybe you can test by using valgrind.

Nishant Kumar

unread,
Mar 1, 2018, 1:28:11 AM3/1/18
to openresty-en
There was a memory leak in our code. We were not releasing all resource of the third-party library. I have attached latest Flamegraph. Do you see more scope for memory/performance optimization?
a1.svg

Nishant Kumar

unread,
Mar 1, 2018, 6:58:56 AM3/1/18
to openresty-en
Sadly it didn't fix it. My fix started segfault and worker are restarting so I am not seeing the leak :( 

tokers

unread,
Mar 4, 2018, 10:29:39 PM3/4/18
to openresty-en
Hi!

The new flamegraph seems the memory allocation path is healthy.

> Sadly it didn't fix it. My fix started segfault and worker are restarting so I am not seeing the leak :( 

Have you fixed the segment fault? I'm looking forward to the result of memory leak correction :) .
Reply all
Reply to author
Forward
0 new messages