Unbounded memory increase in v0.10.12 until program crashes

124 views
Skip to first unread message

Sumit Agarwal

unread,
Jul 5, 2013, 3:41:19 AM7/5/13
to nod...@googlegroups.com

We have 10 node.js http servers running on v0.10.12. The servers basically act as a gateway where they pipe response of a http client request they make, to the original http request.

The code caches around 500k-600k 10 digit integer keys in a dictionary for around 24 hours before releasing it. (notice the 2nd line of memoryUsage where it frees that). However, after that point, the heapTotal starts growing unboundedly even though headUsed does not grow, until the program crashes. This has happened 5 days in a row but not on all servers. And its pretty random about which server it happens to.

Linux version 2.6.32-279.14.1.el6.x86_64 (mock...@c6b8.bsys.dev.centos.org)
Running the process with parameter --max_old_space_size=1700 or without did not change anything.

Here's the code snippet that caches the data. We create random ids for mysql row so that the request is non-blocking, and we cache it to avoid collisions. Also, we move to a new table everyday, so at midnight every day the entire cache is deleted.

//add to cache code

var generateUniqueRandom = function() {

        var recentRequestIdsForTable = recentRequestIds[table_name]; // recentRequestIds is a dictionary and recentRequestIdsForTable is also dictionary keyed by table_name. The most filled recentRequestIdsForTable is for the current day table and it goes up to 600k, expected to reach 1 million per day soon.

        var attempts = 0;

        var random;

        while(attempts < 5) {

            random = Math.floor(Math.random() * (9999999999 - 1000000000) + 1000000000); // 10 digit random id

            if (!(random in recentRequestIdsForTable)) {

                recentRequestIdsForTable[random] = true;

                return callback(null, table_name + ":" + random);

            }

            attempts++;

        }

         return callback("Could not generate requestID", null);

    };

//delete from cache code

exports.clearRecentRequestIds = function(table_name) {

    if (table_name in recentRequestIds) {

        delete recentRequestIds[table_name]; // drop the entire cached dictionary for the table_name.

        logger.info('cleared all recentRequestIds for table ' + table_name);

        return;

    }

    logger.warn(table_name + ' not found in recentRequestIds');

}

Here's the memory usage. It increased to the max in around 1.5 hours. During that time, the server was functioning normally and had no CPU spikes. It is weird that around 500k keys should mean up to around 5MB-10MB of data if worst case keys are stored as string. However they take up space over 100 MB. (Coz freeing up reduced memory from 217MB to 76MB). Maybe it has something to do with that?

{"rss":"368.0078125 MB","heapTotal":"291.7858581542969 MB","heapUsed":"217.14698028564453 MB"}
{"rss":"357.16796875 MB","heapTotal":"287.84947204589844 MB","heapUsed":"76.96298217773438 MB"}
{"rss":"475.1796875 MB","heapTotal":"409.88169860839844 MB","heapUsed":"66.50176239013672 MB"}
{"rss":"577.9921875 MB","heapTotal":"514.1995697021484 MB","heapUsed":"83.56008911132812 MB"}

- and eventually to

{"rss":"1463.27734375 MB","heapTotal":"1381.2594375610352 MB","heapUsed":"95.49384307861328 MB"}
{"rss":"1500.05078125 MB","heapTotal":"1413.7357559204102 MB","heapUsed":"93.21699523925781 MB"}

Also, we did experience the issue in 0.10.10 which was busy loop in net module that led to infinite memory increase and CPU spike a couple of times which is why we upgraded to 0.10.12. Maybe this issue is related? Thought this looks more like a v8 issue.

Please let me know if anyone has any idea/hunch of why this could be happening. Not sure how to come up with a scenario to reproduce this, but I think the sudden release of large size of memory leaves the memory allocator in a bad state or something. I could roll back to 0.8.21 and see how that behaves. 

Thanks

- Sumit

Bert Belder

unread,
Jul 5, 2013, 10:03:38 AM7/5/13
to nod...@googlegroups.com

I tried to reproduce the leak with a small test reconstructed from your test case, and I couldn't reproduce. Can you try to create a standalone test case that I can run locally?

It might also help to log the output of `node --trace-gc yourscript.js` to get an idea what the garbage collector is doing.

We create random ids for mysql row so that the request is non-blocking 

I'm not sure I understand what you mean, but nothing is magically becoming non-blocking here. 
If you're concerned about doing multiple concurrent INSERTs I'd suggest to use an AUTO_INCREMENT key. You can get the automatically created key with this. The current approach seems exceptionally brittle and non-scalable.

There also are a few important issues with your code:
  •   you're making synchronous callbacks which is entirely inappropriate here. I don't see why you'd want these functions to appear asynchronous, but you should at least use .nextTick() the callback.
  •   recentRequestIds[table_name] is never initialized with an object. clearRecentRequestIds deletes the key instead of setting it to an empty object.

- Bert
Message has been deleted

Sumit Agarwal

unread,
Jul 5, 2013, 10:34:10 AM7/5/13
to nod...@googlegroups.com
Yeah its quite random but I can come up with a standalone script. Basically I feel the object needs to be in memory for long to be moved on old gen space or something. Anyway, will try to repro this outside of my application.


> I'm not sure I understand what you mean, but nothing is magically becoming non-blocking here. 
> If you're concerned about doing multiple concurrent INSERTs I'd suggest to use an AUTO_INCREMENT key. You can get the automatically created key with this. The current approach seems exceptionally brittle and non-scalable.
- I meant non-blocking for the requestor. I can respond to him immediately with a random_id rather than wait for mysql's auto-incremented id. Esp useful during db slowdowns or failures.

 > you're making synchronous callbacks which is entirely inappropriate here. I don't see why you'd want these functions to appear asynchronous, but you should at least use .nextTick() the callback.
    - I don't want the function to be async, it is completely sync in generating a random id for me, except the case when the key "table_name" does not exist in the dictionary, at which point is has to query the db to load all previously created keys. (can only happen on process restarts) The generateUniqueRandom is just an inner function in the actual function which takes care of loading if needed and all.

  > recentRequestIds[table_name] is never initialized with an object. clearRecentRequestIds deletes the key instead of setting it to an empty object.
   - It is initialized by the generateUniqueRandom's enclosing function if needed. I delete the key coz that key is for previous day. It will never be reused. The table name is different each day. Data is sharded per day and there is no limit to how many days in future the data can be. So, I create  a new table for each day and have date appended to the name.

Anyway let me try to come up with a standalone stript that mimics our application. Only thing that will be hard to incorporate is the high number of http requests to the app, if at all its relevant. 

-Sumit

Sumit Agarwal

unread,
Jul 5, 2013, 12:11:52 PM7/5/13
to nod...@googlegroups.com
Attached a test that will add 20 random ids every second. The keys will be deleted every 24 hours. You can configure the numbers to add more keys or delete more frequently. I'll keep this running on a few boxes separately and see if its reproducible. It definitely  happens on our live server everyday though but that is under way more stress than this script.
gc-test.js

Bert Belder

unread,
Jul 5, 2013, 2:12:38 PM7/5/13
to nod...@googlegroups.com
I meant non-blocking for the requestor. I can respond to him immediately with a random_id rather than wait for mysql's auto-incremented id. Esp useful during db slowdowns or failures.

If you like your job, don't.

> I don't want the function to be async, it is completely sync in generating a random id for me, except the case when the key "table_name" does not exist in the dictionary, at which point is has to query the db to load all previously created keys. (can only happen on process restarts) The 
generateUniqueRandom is just an inner function in the actual function which takes care of loading if needed and all.

The general way node does it is it guarantees that callbacks are always made from a fresh stack (except when .emit() is used). Your function will, in that case, *sometimes* return from a fresh stack.


> Attached a test that will add 20 random ids every second. The keys will be deleted every 24 hours. You can configure the numbers to add more keys or delete more frequently. I'll keep this running on a few boxes separately and see if its reproducible. It definitely  happens on our live server everyday though but that is under way more stress than this script.

I don't want to test something for 24 hours - unless you are certain that it certainly does not happen within minutes. Why are you only generating 20 keys per second? Also you don't even know if this reproduces it for yourself since no 24 hours have passed since me asking for a reduced test case.

- Bert

Sumit Agarwal

unread,
Jul 5, 2013, 3:03:35 PM7/5/13
to nod...@googlegroups.com
Correct. I don't know if this will reproduce it coz the live servers are doing much more than just id generation. I chose the values I did in order to mimic the servers as close as possible. More than load, I feel the issue is with time the data spends in memory. Regardless of the application/code, the fact that heapUsed stays almost constant and less than 100MB but heapTotal keeps increasing super quickly until program crashes (in around an hour) is alarming to me. Anyway, I will put this script on a few boxes with different parameters and see if I can repro in the few days. Will post if I do. Thanks

Sumit

Yury Michurin

unread,
Jul 30, 2013, 6:21:35 AM7/30/13
to nod...@googlegroups.com
Hi,
Do you have any update on that? Have you managed to solve it? If so, i'll be glad if you could elaborate,

Thanks,
Yury.
Reply all
Reply to author
Forward
0 new messages