Getting backtraces "double free or corruption (!prev)"

895 views
Skip to first unread message

erikwi...@gmail.com

unread,
Oct 17, 2013, 8:58:32 PM10/17/13
to openre...@googlegroups.com
Hi,

I'm running a fresh install of OpenResty 1.4.2.9 on Ubuntu 12.04.

I have content_by_lua_file script running that generates an error in the logs saying:
*** glibc detected *** nginx: worker process: double free or corruption (!prev): 0x00000000022a4c50 ***

It sometimes also dumps a full backtrace into the logs -- here is a sample: https://gist.github.com/erikcw/7033102

This started as a Lapis application, but I stripped it down to just a plain Lua/Moonscript script file when I started getting all the errors in testing just to rule out Lapis....

I've narrowed the error down to a call to ngx.location.capture which initiates a proxy_pass to some backend application servers.  I then modify the response with Lua before returning it to the client.

Here is the Moonscript that seems to be causing the problem:

        headers = ngx.req.get_headers()
        headers['CF_VISITOR'] = ngx.var.x_scheme_json
   
        --# this header will help App server figurer out which host this is for.
        -...@req.headers['X-Forwarded-Host'] = ngx.var.host
        headers['X-Forwarded-Host'] = ngx.var.host

        res = ngx.location.capture('/proxy', {
            ctx: {headers: headers},
            vars: {_url: url},
        })

        body = res.body

Here is the Moonscript compiled to Lua just in case that makes getting to the bottom of this easier: https://gist.github.com/erikcw/7034863


From my nginx.conf, here is the /proxy location

    location /proxy {
            # this location is used by Lapis for http.simple
            internal;
            rewrite_by_lua "
              local req = ngx.req

              for k,v in pairs(req.get_headers()) do
                if k ~= 'content-length' then
                  req.clear_header(k)
                end
              end

              if ngx.ctx.headers then
                for k,v in pairs(ngx.ctx.headers) do
                  req.set_header(k, v)
                end
              end
            ";

            resolver 8.8.8.8;
            proxy_http_version 1.1;
            proxy_pass $_url;
        }

If I remove the ngx.location.capture call, the error goes away.

Any thoughts on what the issue might be?

Thanks!
Erik

Yichun Zhang (agentzh)

unread,
Oct 18, 2013, 3:06:04 PM10/18/13
to openresty-en
Hello!

On Thu, Oct 17, 2013 at 5:58 PM, <erikwi...@gmail.com> wrote:
>
> I've narrowed the error down to a call to ngx.location.capture which
> initiates a proxy_pass to some backend application servers. I then modify
> the response with Lua before returning it to the client.
>
[...]
>
> If I remove the ngx.location.capture call, the error goes away.
>

It looks like a bug in ngx_lua's subrequest scheduler. Thank you for the report!

Here're some suggestions on tracing this further:

1. If possible, please prepare a self-contained minimal example that
can reproduce the issue you're seeing relatively reliably on my side?
Preferably without MoonScript. If I can reproduce the issue on my
side, it'll be trivial for me to fix it :)

2. Please make your nginx produce a core dump file and then use gdb to
get a full backtrace (your backtrace lacks a lot of info). See here
for more details: http://wiki.nginx.org/Debugging#Core_dump

3. If possible, please use valgrind memcheck to run your nginx to see
if it can catch any memory issues. You need to turn off
"master_process" and "daemon" in your nginx.conf when using valgrind
memcheck.

Just as a side note: in the long run, it'll be recommended to use Lua
libraries based completely on ngx_lua's cosocket API for such kind of
things (like lua-resty-http, for example). Fiddling with nginx's
subrequests is no fun at all :)

Thanks!
-agentzh

Erik Wickstrom

unread,
Oct 18, 2013, 5:29:13 PM10/18/13
to openre...@googlegroups.com
Hi agentzh,

Thanks for your quick response!

On Friday, October 18, 2013 12:06:04 PM UTC-7, agentzh wrote:
Hello!

On Thu, Oct 17, 2013 at 5:58 PM,  <erikwi...@gmail.com> wrote:
>
> I've narrowed the error down to a call to ngx.location.capture which
> initiates a proxy_pass to some backend application servers.  I then modify
> the response with Lua before returning it to the client.
>
[...]
>
> If I remove the ngx.location.capture call, the error goes away.
>

It looks like a bug in ngx_lua's subrequest scheduler. Thank you for the report!

Here're some suggestions on tracing this further:

1. If possible, please prepare a self-contained minimal example that
can reproduce the issue you're seeing relatively reliably on my side?
Preferably without MoonScript. If I can reproduce the issue on my
side, it'll be trivial for me to fix it :)

Here is a self contained version of the Lua and nginx.conf.  Note the cjson import at the top of the Lua file -- I'm not using cjson in any of that code, but if I remove it I get the error in the comments at the top of the file.  Might be another bug...


2. Please make your nginx produce a core dump file and then use gdb to
get a full backtrace (your backtrace lacks a lot of info). See here
for more details: http://wiki.nginx.org/Debugging#Core_dump

I recompiled openresty --with-debug and added the config parameters to nginx.conf -- but I can't get nginx to drop the core dump into my working directory.  Not sure why that isn't working.  Will my test code be enough to create the problem on your machine?

3. If possible, please use valgrind memcheck to run your nginx to see
if it can catch any memory issues. You need to turn off
"master_process" and "daemon" in your nginx.conf when using valgrind
memcheck.

This is getting a bit out of my area of expertise... 

Just as a side note: in the long run, it'll be recommended to use Lua
libraries based completely on ngx_lua's cosocket API for such kind of
things (like lua-resty-http, for example). Fiddling with nginx's
subrequests is no fun at all :)

Would you advise using lua-resty-http now?  Or is this "in the long run" only? :)  I googled and there appear to be 3 different projects with that name.  Any thoughts on which are stable currently?


Thanks!
-agentzh

Thanks!
Erik (erikcw) 

Erik Wickstrom

unread,
Oct 18, 2013, 6:20:37 PM10/18/13
to openre...@googlegroups.com
Updates...

I was able to get Nginx to produce a core dump -- here is the output:

(gdb) bt
#0  0x000000000050f888 in ngx_http_lua_socket_tcp_handler (ev=0xe8da18) at ../ngx_lua-0.9.0/src/ngx_http_lua_socket_tcp.c:2028
#1  0x000000000044b02b in ngx_epoll_process_events (cycle=0xdf6f40, timer=1000, flags=1) at src/event/modules/ngx_epoll_module.c:714
#2  0x000000000043b3d4 in ngx_process_events_and_timers (cycle=0xdf6f40) at src/event/ngx_event.c:249
#3  0x0000000000448e1f in ngx_worker_process_cycle (cycle=0xdf6f40, data=0x0) at src/os/unix/ngx_process_cycle.c:813
#4  0x0000000000445600 in ngx_spawn_process (cycle=0xdf6f40, proc=0x448c59 <ngx_worker_process_cycle>, data=0x0, name=0x541fcb "worker process", respawn=-3)
    at src/os/unix/ngx_process.c:198
#5  0x0000000000447bcc in ngx_start_worker_processes (cycle=0xdf6f40, n=1, type=-3) at src/os/unix/ngx_process_cycle.c:364
#6  0x0000000000447207 in ngx_master_process_cycle (cycle=0xdf6f40) at src/os/unix/ngx_process_cycle.c:136
#7  0x0000000000416732 in main (argc=5, argv=0x7fff05faa718) at src/core/nginx.c:412

(gdb) backtrace full
#0  0x000000000050f888 in ngx_http_lua_socket_tcp_handler (ev=0xe8da18) at ../ngx_lua-0.9.0/src/ngx_http_lua_socket_tcp.c:2028
        c = 0xe43a20
        r = 0xe01ca0
        ctx = 0x2917882ad920910b
        u = 0x41733de0
#1  0x000000000044b02b in ngx_epoll_process_events (cycle=0xdf6f40, timer=1000, flags=1) at src/event/modules/ngx_epoll_module.c:714
        events = 1
        revents = 5
        instance = 0
        i = 0
        level = 7920512
        err = 0
        rev = 0xe73a08
        wev = 0xe8da18
        queue = 0xe73960
        c = 0xe43c60
#2  0x000000000043b3d4 in ngx_process_events_and_timers (cycle=0xdf6f40) at src/event/ngx_event.c:249
        flags = 1
        timer = 1000
        delta = 1382134470469
#3  0x0000000000448e1f in ngx_worker_process_cycle (cycle=0xdf6f40, data=0x0) at src/os/unix/ngx_process_cycle.c:813
        worker = 0
        i = 4284512
        c = 0x0
#4  0x0000000000445600 in ngx_spawn_process (cycle=0xdf6f40, proc=0x448c59 <ngx_worker_process_cycle>, data=0x0, name=0x541fcb "worker process", respawn=-3)
    at src/os/unix/ngx_process.c:198
        on = 1
        pid = 0
        s = 0
#5  0x0000000000447bcc in ngx_start_worker_processes (cycle=0xdf6f40, n=1, type=-3) at src/os/unix/ngx_process_cycle.c:364
        i = 0
        ch = {command = 1, pid = 0, slot = 0, fd = 0}
#6  0x0000000000447207 in ngx_master_process_cycle (cycle=0xdf6f40) at src/os/unix/ngx_process_cycle.c:136
        title = 0xe3e79c "master process /usr/local/openresty/nginx/sbin/nginx -p /home/cc/clickcertain-nginx/ -c nginx.conf.direct"
        p = 0xe3e805 ""
        size = 106
        i = 5
        n = 140416990712373
        sigio = 4
        set = {__val = {0 <repeats 16 times>}}
        itv = {it_interval = {tv_sec = 140416964924664, tv_usec = 0}, it_value = {tv_sec = 0, tv_usec = 0}}
        live = 140416988470640
        delay = 6
        ls = 0x0
        ccf = 0xdf7f10
#7  0x0000000000416732 in main (argc=5, argv=0x7fff05faa718) at src/core/nginx.c:412
        i = 66
        log = 0x78b980
        cycle = 0xdf6f40
        init_cycle = {conf_ctx = 0x0, pool = 0xdf6330, log = 0x78b980, new_log = {log_level = 0, file = 0x0, connection = 0, handler = 0, data = 0x0, action = 0x0}, files = 0x0, 
          free_connections = 0x0, free_connection_n = 0, reusable_connections_queue = {prev = 0x0, next = 0x0}, listening = {elts = 0x0, nelts = 0, size = 0, nalloc = 0, pool = 0x0}, 
          paths = {elts = 0x0, nelts = 0, size = 0, nalloc = 0, pool = 0x0}, open_files = {last = 0x0, part = {elts = 0x0, nelts = 0, next = 0x0}, size = 0, nalloc = 0, pool = 0x0}, 
          shared_memory = {last = 0x0, part = {elts = 0x0, nelts = 0, next = 0x0}, size = 0, nalloc = 0, pool = 0x0}, connection_n = 0, files_n = 0, connections = 0x0, 
          read_events = 0x0, write_events = 0x0, old_cycle = 0x0, conf_file = {len = 45, data = 0xdf6380 "\002"}, conf_param = {len = 0, data = 0x0}, conf_prefix = {len = 28, 
            data = 0xdf6380 "\002"}, prefix = {len = 28, data = 0x7fff05fac8d0 ""}, lock_file = {len = 0, data = 0x0}, hostname = {len = 0, data = 0x0}}
        ccf = 0xdf7f10

Hope this helps... :)
Erik

Yichun Zhang (agentzh)

unread,
Oct 18, 2013, 7:21:56 PM10/18/13
to openresty-en
Hello!

On Fri, Oct 18, 2013 at 2:29 PM, Erik Wickstrom wrote:
>
> Here is a self contained version of the Lua and nginx.conf. Note the cjson
> import at the top of the Lua file -- I'm not using cjson in any of that
> code, but if I remove it I get the error in the comments at the top of the
> file. Might be another bug...
>
> https://gist.github.com/7048443

I cannot reproduce any crash on my side with your configuration and
Lua file. No matter how hard I load my Nginx worker by tools like ab.

Even by removing the cjson line, I can still get nothing. Everything
works as expected.

Also, I use Valgrind memcheck to run your example, still getting nothing.

Several questions:

1. What steps do you take to reproduce the crash with your configuration above?

2. Are you using a 32-bit system or 64-bit? I've been testing on Linux x86_64.

3. Your gdb backtrace suggests something (the cosocket API) that you
never use in your minimal example above? Are you sure your minimal
example can reproduce similar backtraces?

Thank you for your patience!

>
> Would you advise using lua-resty-http now? Or is this "in the long run"
> only? :) I googled and there appear to be 3 different projects with that
> name. Any thoughts on which are stable currently?
>

I meant "in the long term", literally :)

There is no official lua-resty-http library yet. And I cannot comment
on the existing 3rd-party implementations because I have no experience
with them :) Maybe you can get help from their authors :)

Best regards,
-agentzh

Erik Wickstrom

unread,
Oct 21, 2013, 2:41:57 PM10/21/13
to openre...@googlegroups.com
Hi agentzh,


On Friday, October 18, 2013 4:21:56 PM UTC-7, agentzh wrote:
Hello!

On Fri, Oct 18, 2013 at 2:29 PM, Erik Wickstrom wrote:
>
> Here is a self contained version of the Lua and nginx.conf.  Note the cjson
> import at the top of the Lua file -- I'm not using cjson in any of that
> code, but if I remove it I get the error in the comments at the top of the
> file.  Might be another bug...
>
> https://gist.github.com/7048443

I cannot reproduce any crash on my side with your configuration and
Lua file. No matter how hard I load my Nginx worker by tools like ab.

Even by removing the cjson line, I can still get nothing. Everything
works as expected.

Also, I use Valgrind memcheck to run your example, still getting nothing.

Several questions:

1. What steps do you take to reproduce the crash with your configuration above?

I've updated the gist to reflect my latest testing setup. I've also included fresh backtraces and gdb output.

I just do the following:

a) /usr/local/openresty/nginx/sbin/nginx -p `pwd`/ -c nginx.conf.debug
b) Hit '/' in Chrome
c) Refresh 3 times.
Bam!  Stacktrace everytime. 


2. Are you using a 32-bit system or 64-bit? I've been testing on Linux x86_64.

I'm testing on a 64-bit Micro EC2 instance.  uname -a reports:
Linux ip-10-73-139-251 3.11.0-12-generic #19-Ubuntu SMP Wed Oct 9 16:20:46 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

On Friday I was using Ubuntu 12.04.  I just built a fresh server running Ubuntu 13.10 to see if there was anything in the stack that may have been causing these issues.  I'm still able to reproduce the issue easily.


3. Your gdb backtrace suggests something (the cosocket API) that you
never use in your minimal example above? Are you sure your minimal
example can reproduce similar backtraces?

Since my example config and Lua are stripped down -- I went ahead and added a fresh backtrace to the gist.  My main application code just hits redis to check for cache entries and updates some stats before returning to the client.

~Erik 

Erik Wickstrom

unread,
Oct 23, 2013, 8:08:03 PM10/23/13
to openre...@googlegroups.com
Hi agentzh,

I've kept testing, and noticed that the issue only presents itself when certain headers are present in the sub request.

For example -- the following headers seem to produce a crash very consistently.

curl -H "Cache-Control: max-age=0"\
    -H "Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8"\
    -H "User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/30.0.1599.101 Safari/537.36"\
    -H "Accept-Encoding: gzip,deflate,sdch"\
    -H "Accept-Language: en-US,en;q=0.8"\
    -H "Cookie: test=cookie;"\

However if I remove the Accept-Encoding AND Accept-Language AND Cookie headers -- the the errors seem to go away.

curl -H "Cache-Control: max-age=0"\
    -H "Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8"\
    -H "User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/30.0.1599.101 Safari/537.36"\


Perhaps this explains why you weren't able to get the errors to reproduce when using ab?  Does this offer any clues into what the issue may be?  Seems that I should be able to proxy those headers upstream, no?

Thanks!
Erik

Yichun Zhang (agentzh)

unread,
Oct 25, 2013, 6:45:44 PM10/25/13
to openresty-en
Hello!

On Wed, Oct 23, 2013 at 5:08 PM, Erik Wickstrom wrote:
> I've kept testing, and noticed that the issue only presents itself when
> certain headers are present in the sub request.
>
> For example -- the following headers seem to produce a crash very
> consistently.
>
[...]
>
> However if I remove the Accept-Encoding AND Accept-Language AND Cookie
> headers -- the the errors seem to go away.

Though I still fail to reproduce the crash with your example, I could,
however, produce a crash with a code example provided by Bjørnar Ness
which is similar to yours.

I've committed a patch to fix the crash for Bjørnar Ness's example to
ngx_lua's master:

https://github.com/chaoslawful/lua-nginx-module/commit/1f65751

Could you please try this patch out on your side?

Thanks!
-agentzh

Erik Wickstrom

unread,
Oct 28, 2013, 6:26:15 PM10/28/13
to openre...@googlegroups.com
Hi agentzh,


I've been running it for the last few hours and so far so good.  No errors!

I'll postback if the issue comes back up -- but I think that did the trick.

When do you estimate releasing an update OpenResty with this patch?

Thanks!
Erik

Yichun Zhang (agentzh)

unread,
Oct 28, 2013, 6:43:25 PM10/28/13
to openresty-en
Hello Erik!

On Mon, Oct 28, 2013 at 3:26 PM, Erik Wickstrom wrote:
> I've been running it for the last few hours and so far so good. No errors!
>
> I'll postback if the issue comes back up -- but I think that did the trick.
>

Glad to hear that! :)

> When do you estimate releasing an update OpenResty with this patch?
>

Very soon :) Just in the next two days or so :)

Thanks!
-agentzh
Reply all
Reply to author
Forward
0 new messages