Object as hash with 7 million entries slows V8

801 views
Skip to first unread message

Joran Greef

unread,
May 11, 2012, 3:51:44 AM5/11/12
to nod...@googlegroups.com
I have posted this in v8-users but perhaps someone else here will also be familiar with this:

I am using V8 as part of Node and have written a Javascript implementation of Bitcask, using a Javascript object as a hash to keep file offsets in memory.

This object has 7 million entries and I'm noticing that while the JS code is resting, doing nothing, V8 is hitting 100% CPU every few seconds and doing this continually.

Attached is the full result of running V8 with --prof.

And of particular interest:

[C++]:
   ticks  total  nonlib   name
  73615   43.1%   43.1%  v8::internal::StaticMarkingVisitor::VisitUnmarkedObjects
  68436   40.1%   40.1%  _accept$NOCANCEL
   4796    2.8%    2.8%  v8::internal::FlexibleBodyVisitor<v8::internal::StaticMarkingVisitor, v8::internal::JSObject::BodyDescriptor, void>::VisitSpecialized<40>

Should I be using many smaller hashes to keep this overhead down? i.e. some sort of sparse hash implementation? Or using key mod 1000 to determine the hash it should be in?

Does V8 have limits on hash table sizes?

Thanks.
profile.txt

Jérémy Lal

unread,
May 11, 2012, 3:57:47 AM5/11/12
to nod...@googlegroups.com
Isn't that gc doing its work ?
As a workaround, you can turn it off and run it manually
node --nouse_idle_notification --expose_gc
> global.gc();

Regards,
J�r�my.
> --
> Job Board: http://jobs.nodejs.org/
> Posting guidelines: https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
> You received this message because you are subscribed to the Google
> Groups "nodejs" group.
> To post to this group, send email to nod...@googlegroups.com
> To unsubscribe from this group, send email to
> nodejs+un...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/nodejs?hl=en?hl=en

Joran Greef

unread,
May 11, 2012, 4:20:26 AM5/11/12
to nod...@googlegroups.com, hol...@gmail.com
The thing is the JS is doing nothing, the huge hash is just sitting there.


On Friday, May 11, 2012 9:57:47 AM UTC+2, kapouer wrote:
Isn't that gc doing its work ?
As a workaround, you can turn it off and run it manually
node --nouse_idle_notification --expose_gc
> global.gc();

Regards,
J�r�my.

On 11/05/2012 09:51, Joran Greef wrote:
> I have posted this in v8-users but perhaps someone else here will also be familiar with this:
>
> I am using V8 as part of Node and have written a Javascript implementation of Bitcask, using a Javascript object as a hash to keep file offsets in memory.
>
> This object has 7 million entries and I'm noticing that while the JS code is resting, doing nothing, V8 is hitting 100% CPU every few seconds and doing this continually.
>
> Attached is the full result of running V8 with --prof.
>
> And of particular interest:
>
> [C++]:
>    ticks  total  nonlib   name
>   73615   43.1%   43.1%  v8::internal::StaticMarkingVisitor::VisitUnmarkedObjects
>   68436   40.1%   40.1%  _accept$NOCANCEL
>    4796    2.8%    2.8%  v8::internal::FlexibleBodyVisitor<v8::internal::StaticMarkingVisitor, v8::internal::JSObject::BodyDescriptor, void>::VisitSpecialized<40>
>
> Should I be using many smaller hashes to keep this overhead down? i.e. some sort of sparse hash implementation? Or using key mod 1000 to determine the hash it should be in?
>
> Does V8 have limits on hash table sizes?
>
> Thanks.
>
> --
> Job Board: http://jobs.nodejs.org/
> Posting guidelines: https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
> You received this message because you are subscribed to the Google
> Groups "nodejs" group.
> To post to this group, send email to nod...@googlegroups.com
> To unsubscribe from this group, send email to
> nodejs+unsubscribe@googlegroups.com

Jérémy Lal

unread,
May 11, 2012, 4:43:08 AM5/11/12
to Joran Greef, nod...@googlegroups.com
Idle -> GC -> visiting objects (?)

Hence my suggestion : control gc() calls yourself.
> > Posting guidelines: https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines <https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines>
> > You received this message because you are subscribed to the Google
> > Groups "nodejs" group.
> > To post to this group, send email to nod...@googlegroups.com <mailto:nod...@googlegroups.com>
> > To unsubscribe from this group, send email to
> > nodejs+un...@googlegroups.com <mailto:nodejs%2Bunsu...@googlegroups.com>
> > For more options, visit this group at
> > http://groups.google.com/group/nodejs?hl=en?hl=en <http://groups.google.com/group/nodejs?hl=en?hl=en>
>

Marcel Laverdet

unread,
May 11, 2012, 4:57:20 AM5/11/12
to nod...@googlegroups.com, Joran Greef
There is an issue in v8 where idle tick GC does not pick up where the old GC left off and leads to lots of time wasted. See v8 issue #1458. This is fixed in bleeding_edge, but hasn't landed in node yet, not even 0.7.x. Try Jeremy's suggestion or you could also try using bleeding_edge v8 on node 0.7.x. I imagine both would lead to improvements.

Joran Greef

unread,
May 11, 2012, 5:35:58 AM5/11/12
to nod...@googlegroups.com, Joran Greef
Jeremy, I was trying to understand why GC was spending time when there should be no work to do.

Marcel, I came across your blog post on the subject an hour ago and spotted the v8 issue as well just before your post here.

Thanks for your suggestions.
>     > To post to this group, send email to nod...@googlegroups.com <mailto:nodejs@googlegroups.com>

>     > To unsubscribe from this group, send email to
>

--
Job Board: http://jobs.nodejs.org/
Posting guidelines: https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
You received this message because you are subscribed to the Google
Groups "nodejs" group.
To post to this group, send email to nod...@googlegroups.com
To unsubscribe from this group, send email to

Tim Caswell

unread,
May 11, 2012, 10:26:00 AM5/11/12
to nod...@googlegroups.com, Joran Greef
So this will finally get fixed!  This exact case was one of the blockers that made me abandon nStore.  I was storing offsets in a js object and the GC spun out of control around 1 million entries. (The other blocker was node's fs interface was way too slow to implement a disk-based database, I clocked 90% CPU time in mutex locks)

Joran Greef

unread,
May 11, 2012, 12:07:47 PM5/11/12
to nod...@googlegroups.com, Joran Greef
Ja it was becoming a problem around 3 million entries.

Tried multiple smaller hashes, but no help, the GC was still visiting every key in every hash.

With nouse_idle_notification, there's no problem, works great now.

Tim, I was looking at nStore's file implementation and saw you were serializing reads. Were you doing this only to use a shared read buffer, or does this also work with the disk better? I was thinking allowing multiple concurrent readers MVCC style would work better together with the filesystem cache.

Ben Noordhuis

unread,
May 11, 2012, 12:15:38 PM5/11/12
to nod...@googlegroups.com
On Fri, May 11, 2012 at 6:07 PM, Joran Greef <jo...@ronomon.com> wrote:
> Ja it was becoming a problem around 3 million entries.
>
> Tried multiple smaller hashes, but no help, the GC was still visiting every
> key in every hash.
>
> With nouse_idle_notification, there's no problem, works great now.

As a general FYI, we're aware that the current 'gc when idle' approach
doesn't work well in some cases. We're revisiting the algorithm in an
upcoming release, keep an eye on the changelog.

Tim Caswell

unread,
May 11, 2012, 12:17:25 PM5/11/12
to nod...@googlegroups.com


On Fri, May 11, 2012 at 11:07 AM, Joran Greef <jo...@ronomon.com> wrote:

Tim, I was looking at nStore's file implementation and saw you were serializing reads. Were you doing this only to use a shared read buffer, or does this also work with the disk better? I was thinking allowing multiple concurrent readers MVCC style would work better together with the filesystem cache.
 
I was doing that to prevent the fs threads from locking too much.  Initially I was doing everything concurrent and spent all my time in mutex locks.  I'm no expert in threads and filesystem interaction.  I'm sure it could have been done better, but I was unable to figure it out at the time.

Jorge

unread,
May 11, 2012, 4:05:17 PM5/11/12
to nod...@googlegroups.com
On May 11, 2012, at 6:17 PM, Tim Caswell wrote:
>
>
> I was doing that to prevent the fs threads from locking too much. Initially I was doing everything concurrent and spent all my time in mutex locks. I'm no expert in threads and filesystem interaction. I'm sure it could have been done better, but I was unable to figure it out at the time.

Were you testing that in OSX or in a Linux ?
--
Jorge.

Tim Caswell

unread,
May 12, 2012, 8:39:38 AM5/12/12
to nod...@googlegroups.com
I think it was OSX.  I remember using the system dtrace tool to do the profiling.

--
Job Board: http://jobs.nodejs.org/
Posting guidelines: https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
You received this message because you are subscribed to the Google
Groups "nodejs" group.
To post to this group, send email to nod...@googlegroups.com
To unsubscribe from this group, send email to

Ben Noordhuis

unread,
May 12, 2012, 9:15:14 AM5/12/12
to nod...@googlegroups.com
On Sat, May 12, 2012 at 2:39 PM, Tim Caswell <t...@creationix.com> wrote:
> I think it was OSX.  I remember using the system dtrace tool to do the
> profiling.

All writes are serialized on OS X due to a bug in its write() and
pwrite() syscall implementations, that's why you were seeing lots of
lock contention.

Ryan Schmidt

unread,
May 12, 2012, 1:58:49 PM5/12/12
to nod...@googlegroups.com

On May 12, 2012, at 08:15, Ben Noordhuis wrote:

> All writes are serialized on OS X due to a bug in its write() and
> pwrite() syscall implementations, that's why you were seeing lots of
> lock contention.

What has been Apple's response to the bug report that was filed with them about this?


Jorge

unread,
May 12, 2012, 5:04:48 PM5/12/12
to nod...@googlegroups.com
On May 12, 2012, at 2:39 PM, Tim Caswell wrote:
>> On Fri, May 11, 2012 at 3:05 PM, Jorge <jo...@jorgechamorro.com> wrote:
>>> On May 11, 2012, at 6:17 PM, Tim Caswell wrote:
>>>
>>> I was doing that to prevent the fs threads from locking too much. Initially I was doing everything concurrent and spent all my time in mutex locks. I'm no expert in threads and filesystem interaction. I'm sure it could have been done better, but I was unable to figure it out at the time.
>>
>> Were you testing that in OSX or in a Linux ?
>
> I think it was OSX. I remember using the system dtrace tool to do the profiling.

write()s are serialized in OSX by these patches:

https://github.com/joyent/node/commit/6aa92d5289996780834ebd5e9317718b3e55408c
https://github.com/joyent/libuv/commit/cbbb254e575422206558672b8de85cc5fef1ed86

The bug doesn't exist in Linux-es.
--
Jorge.

Jorge

unread,
May 12, 2012, 5:08:20 PM5/12/12
to nod...@googlegroups.com
This:

> Hello Jorge,
>
> This is a follow up to Bug ID# 8682751. After further investigation it has been determined that this is a known issue, which is currently being investigated by engineering. This issue has been filed in our bug database under the original Bug ID# 6847912. The original bug number being used to track this duplicate issue can be found in the State column, in this format: Duplicate/OrigBug#.
>
> Thank you for submitting this bug report. We truly appreciate your assistance in helping us discover and isolate bugs.
>
> Best Regards,
>
> Developer Support
> Apple Worldwide Developer Relations

The bug exists too in read()s, but it's been fixed only for write()s.
--
Jorge.

Jorge

unread,
May 12, 2012, 5:10:04 PM5/12/12
to nod...@googlegroups.com
On May 12, 2012, at 11:08 PM, Jorge wrote:
>
>
> The bug exists too in read()s, but it's been fixed only for write()s.

Fixed *in*nodejs* I mean, not in OSX...
--
Jorge.

Jorge

unread,
May 19, 2012, 7:08:40 AM5/19/12
to nod...@googlegroups.com, Joran Greef
If this is not fixed yet, you could move the hash to a thread_a_gogo so that the hiccups won't happen in node's event loop thread, but in the thread_a_gogo. You will still see delays when accessing the hash keys, but as the access to the hash will be asynchronous they won't be blocking node.

<https://gist.github.com/2730481>

Supply a 2nd argument to run with threads_a_gogo, let it run for a while and see that you still get a sane figure in "event loop ticks per second":

$ node keys.js yes
Multi thread
...
...
***** Event loop ticks per second -> 366547, keys per second -> 1360
...

Unlike when you run it single-threaded because the GC hiccups are blocking it:

$ node keys.js
Single thread
...
...
***** Event loop ticks per second -> 4138, keys per second -> 4138
...

Threads_a_gogo calls the GC every 2000 or so turns IIRC, that's perhaps too often, and that's why the "keys per second" figure is much lower. But over a thousand per second on average might be plenty depending on your application.

https://github.com/xk/node-threads-a-gogo
--
Jorge.
> > > To post to this group, send email to nod...@googlegroups.com <mailto:nod...@googlegroups.com>
> > > To unsubscribe from this group, send email to
> > > nodejs+un...@googlegroups.com <mailto:nodejs%2Bunsu...@googlegroups.com>
> > > For more options, visit this group at
> > > http://groups.google.com/group/nodejs?hl=en?hl=en <http://groups.google.com/group/nodejs?hl=en?hl=en>
> >
>
> --
> Job Board: http://jobs.nodejs.org/
> Posting guidelines: https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
> You received this message because you are subscribed to the Google
> Groups "nodejs" group.
> To post to this group, send email to nod...@googlegroups.com
> To unsubscribe from this group, send email to
> nodejs+un...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/nodejs?hl=en?hl=en
>
>
> --
> Job Board: http://jobs.nodejs.org/
> Posting guidelines: https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
> You received this message because you are subscribed to the Google
> Groups "nodejs" group.
> To post to this group, send email to nod...@googlegroups.com
> To unsubscribe from this group, send email to
> nodejs+un...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/nodejs?hl=en?hl=en
>
>
> --
> Job Board: http://jobs.nodejs.org/
> Posting guidelines: https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
> You received this message because you are subscribed to the Google
> Groups "nodejs" group.
> To post to this group, send email to nod...@googlegroups.com
> To unsubscribe from this group, send email to
> nodejs+un...@googlegroups.com

Joran Greef

unread,
May 19, 2012, 8:05:00 AM5/19/12
to nod...@googlegroups.com, Joran Greef
Thanks Jorge, I turned off idle notifications and exposed the gc which runs every 30 minutes or so. That's had a good impact on CPU across the whole application. Then for the hash I switched to a Buffer backed hash implementation. I can keep the per key memory overhead much lower like that and the access times are on par with the native JS hash. It allows for shorter keys, for example 16 bytes binary is enough for 128 bit keys, whereas with the JS hash you would need about 24 bytes of Base62 to get the same. Also, since it's just a buffer, and the keys are coming from disk on startup, you can copy the keys from disk directly into the hash without having to do thousands of mem copies, so it's a few times faster than the original implementation. Another advantage is you can predict the size of the hash in advance and avoid expensive hash resizes (tens of seconds for a few million keys using the JS hash) for a much faster startup time. Have tested up to 60 million keys and it handles fine and starts up in about 60 seconds parsing and loading the keys from disk. Memory overhead is much more predictable (exactly 25 bytes per key all included). JS strings alone on 64 bit are 24 bytes excluding JS hash overhead and value overhead. Just from this small exercise I think TypedArrays are really going to start making a huge difference for JS as a dynamic language.
> >     > To post to this group, send email to nod...@googlegroups.com <mailto:nodejs@googlegroups.com>
> >     > To unsubscribe from this group, send email to
> >     > nodejs+unsubscribe@googlegroups.com <mailto:nodejs%2Bunsubscribe@googlegroups.com>
> >     > For more options, visit this group at
> >     > http://groups.google.com/group/nodejs?hl=en?hl=en <http://groups.google.com/group/nodejs?hl=en?hl=en>
> >
>
> --
> Job Board: http://jobs.nodejs.org/
> Posting guidelines: https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
> You received this message because you are subscribed to the Google
> Groups "nodejs" group.
> To post to this group, send email to nod...@googlegroups.com
> To unsubscribe from this group, send email to
> nodejs+unsubscribe@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/nodejs?hl=en?hl=en
>
>
> --
> Job Board: http://jobs.nodejs.org/
> Posting guidelines: https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
> You received this message because you are subscribed to the Google
> Groups "nodejs" group.
> To post to this group, send email to nod...@googlegroups.com
> To unsubscribe from this group, send email to
> nodejs+unsubscribe@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/nodejs?hl=en?hl=en
>
>
> --
> Job Board: http://jobs.nodejs.org/
> Posting guidelines: https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
> You received this message because you are subscribed to the Google
> Groups "nodejs" group.
> To post to this group, send email to nod...@googlegroups.com
> To unsubscribe from this group, send email to
> nodejs+unsubscribe@googlegroups.com
Reply all
Reply to author
Forward
0 new messages