Is it possible to break v8's memory limitation?

NStal Loit

unread,

Aug 20, 2013, 5:06:48 AM8/20/13

to nod...@googlegroups.com

I've googled for some while, but fail to find an approach to break v8's heap limits. I need to handle a very big key-value object and nodejs's object manipulation is very fast (about 10x faster than python dict), so is it possible to break though v8's ~2GB heap limits at current stage or in the near future?

Edmond Meinfelder

unread,

Aug 20, 2013, 4:51:38 PM8/20/13

to nod...@googlegroups.com

By any chance are you using a 32-bit version v8? What version of node are you using?

-Edmond

NStal Loit

unread,

Aug 20, 2013, 9:36:48 PM8/20/13

to nod...@googlegroups.com

I‘m using node 0.10.16 and 64bit ubuntu 13.04.

If there is no way to break through this, then I have to split it into several process using fork. And that approach is more tricky and maybe slower but still much faster than python dict.

Edmond Meinfelder

unread,

Aug 20, 2013, 11:04:25 PM8/20/13

to nod...@googlegroups.com

I recall in the early versions of 0.6.x the default was to compile v8 for 32 bits and that lead to a heap limit of 1.4GB. In 64-bit v8, I know of no restrictions on heap size.

Have you considered using Buffer objects? Buffers exist outside of v8 and can hold a gigabyte each.

-Edmond

--
--
Job Board: http://jobs.nodejs.org/
Posting guidelines: https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
You received this message because you are subscribed to the Google
Groups "nodejs" group.
To post to this group, send email to nod...@googlegroups.com
To unsubscribe from this group, send email to
nodejs+un...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/nodejs?hl=en?hl=en

---
You received this message because you are subscribed to the Google Groups "nodejs" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nodejs+un...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Ben Noordhuis

unread,

Aug 21, 2013, 7:40:32 AM8/21/13

to nod...@googlegroups.com

On Wed, Aug 21, 2013 at 5:04 AM, Edmond Meinfelder
<edmond.m...@gmail.com> wrote:
> In 64-bit v8, I know of no restrictions on heap size.

It's about 1.9 GB.

Juraj Kirchheim

unread,

Aug 21, 2013, 9:57:01 AM8/21/13

to nod...@googlegroups.com

This may be a stupid question, but how about distributing the workload
to multiple node processes?

This should sidestep the memory barrier and also happens to leverage
all cores. IPC is pretty straight forward with node
(http://nodejs.org/api/child_process.html#child_process_child_send_message_sendhandle).
And should the amount of data ever grow to surpass the capabilities of
one machine, you can swap out the IPC for some network protocol and
run the stuff on multiple machines.

Without knowing more about the problem, I would suggest aiming for a
map-reduce-ish information flow between the distinct processes. Maybe
this is a good starting point: http://www.mapjs.org/

Regards,
Juraj

NStal Loit

unread,

Aug 21, 2013, 10:42:33 PM8/21/13

to nod...@googlegroups.com

Well, using buffer is not a good choice for my use case, because in my project, the bottle neck is not the data itself but the data index. I use v8 object as sort of in-memory db.
If I have to use buffer or manipulate index/hash my self, then I'm likely to do this in c++ or with redis. But probably not be able to have better or even equivalent performance over v8.

@back2dos your suggestion is in fact my first choice if it's unlikely to break through the heap limit. Though a logic level map reduce is not possible because my data structure is a net, but down to in-memory db layer, I can split data and load them in several v8 process. It's actually work without much penalty, and may even have performance gain when the cpu become bottle neck.

And another question, why v8 has this hard heap size limit?

By the way, during my test I found 2 things interesting ( and weird).

randomStringByLength = function(length) {
    var str = "";
    for (var i=0; i <= length;i++) {
        var offset = 97 + Math.floor(Math.random() * 26);
        str += String.fromCharCode(offset);
    }
    return str;
}
memberLength = 100 * 1000 * 1000;
console.time("total");
console.log("init big object of " + memberLength + ", with random string at length 32 as key and an object as value");
console.time("1/100");
var obj = {}
for (var index = 0;index<memberLength;index++) {
    if (index % (memberLength / 100) === 0) {
        console.timeEnd("1/100");
        console.time("1/100");
        console.log(index / (memberLength / 100));
    }
    obj[randomStringByLength(32)] = {};
}
console.log("build time:");
console.timeEnd("total");
//END

It will end at about 10M (10/100) empty object asigned to the object.(about 1.9G mem use without surprise)

But look at the result.
init big object of 100000000, with random string at length 32 as key and an object as value
1/100: 0ms
0
1/100: 2981ms
1
1/100: 2647ms
2
1/100: 4254ms
3
1/100: 2348ms
4
1/100: 2393ms
5
1/100: 3880ms
6
1/100: 2488ms
7
1/100: 2552ms
8
1/100: 4740ms
9
1/100: 2673ms
10
1/100: 8279ms
11
FATAL ERROR: CALL_AND_RETRY_0 Allocation failed - process out of memory
There is the first interesting thing.
It's not a big surprise that the average time to add 1M member to an object has no significant growth when member count increase. V8 should use a hash table or something to optimize things.
But the time for each 1M add are quite stable.Say when adding the 4th 1M data to the object, it's always about 2200ms~2600ms. And on a slower machine, the average time scales at a certain rate.
The result hints that the time taken to add a value is affected by count of object member, but it's not growing by the count. It's more likely to be decide by the index of value you add, say I'm adding the 10000th value to object using a random 32byte key, then the time usage is fixed, no matter what previous added value's keys are.
The hash strategy behind the scene must be interesting.

Then comes the second interesting thing, and weird.
When the value is null instead of empty object.
//obj[randomStringByLength(32)] = {};
obj[randomStringByLength(32)] = null;
Then the program become extrodinary(200x) slow at 5M-6M but recovers at 7M.
Maybe nulling a member cause unwanted GC walk through? I'm just wonder.

Peter Rust

unread,

Aug 22, 2013, 1:04:17 AM8/22/13

to nod...@googlegroups.com

If all you need is a hash table, there are a lot of hash table C/C++ libraries that you could integrate with Node (like Judy) and there are a few that have already been integrated, like https://npmjs.org/package/hashtable (that specifically lives outside of v8's memory constraints) as well as quite a few others on npm: https://npmjs.org/search?q=hashtable.

> The hash strategy behind the scenes must be interesting

I found this page that describes (at least some of) what's happening behind the scenes: https://developers.google.com/v8/design?hl=sv&csw=1#prop_access. v8 is creating a new dynamic behind-the-scenes class when you add properties. As someone on the v8 users list says: "in general JS objects in V8 are not optimized to be used as large hash tables. Most likely, it would be best to implement your own hash table data structure using an array as a backing store instead of using a JS object as a hash table".

Best of luck!

-- peter

Peter Rust

unread,

Aug 22, 2013, 1:23:21 AM8/22/13

to nod...@googlegroups.com

> as well as quite a few others on npm: https://npmjs.org/search?q=hashtable.

Ok, maybe not "quite a few" -- most of the other ones are implemented in Javascript. The only C++ ones I could find were

A github search for "hash table" returns 107 C hits and 64 C++ hits, so there are a lot to choose from, you would just need to write node bindings. Here are some pages that may be helpful if you're interested in picking & choosing:

a great breakdown of the performance characteristics of the top hash table implementations
a detailed comparison of a Judy Array and a fine-tuned hash table
a different comparison of hash table libraries.

-- peter

On Wednesday, August 21, 2013 10:04:17 PM UTC-7, Peter Rust wrote:

If all you need is a hash table, there are a lot of hash table C/C++ libraries that you could integrate with Node (like Judy) and there are a few that have already been integrated, like https://npmjs.org/package/hashtable (that specifically lives outside of v8's memory constraints)

NStal Loit

unread,

Aug 22, 2013, 2:48:01 AM8/22/13

to nod...@googlegroups.com

Thank you peter, C++ binding is another choice I would consider, if I make sure it's hard to remove v8 heap limit right now or in near future and split process has unacceptable overhead.

Marcel Laverdet

unread,

Aug 22, 2013, 12:36:54 PM8/22/13

to nodejs

Why hasn't anyone posted --max-old-space-size=<size in MB> yet? Just do that w/ 8192 for instance, for 8gb.

Peter Rust

unread,

Aug 22, 2013, 1:54:05 PM8/22/13

to nod...@googlegroups.com

Marcel,

Good point. That was mentioned in the v8-users discussion I linked to (https://groups.google.com/forum/#!topic/v8-users/jrw3Eh2cO4U) -- I should have called attention to it. It doesn't seem like an optimal solution (there is a garbage collection performance penalty, v8 is creating static classes under the hood, etc), but it should be a quick fix...

-- peter

NStal Loit

unread,

Aug 22, 2013, 10:30:13 PM8/22/13

to nod...@googlegroups.com

Marcel ,

I forgot to mention that. This option in node is actually --max_old_space_size according to the node's manpage, but not work for me when greater than ~2G.

During my search and test, I found this option only useful to remove the soft limit in old days (say node 0.6). It likely because that the old version v8 has an 512MB limit on 32bit and 1G limit on 64bit.
But latest 64bit v8 seems no longer holding this soft limit. So it may not work for me, unless I miss something.

My test commands are:
node --max-old-space-size=8192 speedTestRaw.js
node --max_old_space_size=8192 speedTestRaw.js

Both make no difference, and terminate about 10M.

Reply all

Reply to author

Forward