String.replace() performance issues

547 views
Skip to first unread message

Victor Grishchenko

unread,
Aug 23, 2011, 8:34:03 AM8/23/11
to nodejs
Hi guys!

I am currently investigating some performance issues caused by weird
behavior of WebKit regex engine(s). Long story short, in some
situations (one-pass) String.replace() is much slower than manual
iteration.
I made one observation which might be a good lead, so I am interested
in any reflections on that:

gritzko@ubuntu:~/Projects/node/deps/v8$ d8 test-d8.js
started: String.replace performance test
1mln char array composed in 23ms
array joined in 112ms <<<<< HERE
1mln symbol string: all symbols replaced in 93ms
too smart optimizations excluded 1000000
gritzko@ubuntu:~/Projects/node/deps/v8$ node test-node.js
started: String.replace performance test
1mln char array composed in 29ms
array joined in 3ms <<< AND HERE
1mln symbol string: all symbols replaced in 97ms
too smart optimizations excluded 1000000

Mysteriously, v8 in node.js is joining a 1mln-symbol array 30 times
faster than vanilla v8 (the same v8, right from deps/v8). WebKit
browsers are showing results closer to vanilla v8.
What is a possible reason to that?
I did not find any custom node.js String implementation or suchlike.
I'm puzzled.


test-node.js:
> console.log("started: String.replace performance test");
>
> var start_t = (new Date()).getTime();
> var a = [];
> for(var i=0;i<1000000;i++)
> a.push('a');
> var comp_t = (new Date()).getTime();
> var b = a.join('');
> var join_t = (new Date()).getTime();
> var c = b.replace(/a/g,'c');
> var repl_t = (new Date()).getTime();
>
> console.log("1mln char array composed in "+(comp_t-start_t)+"ms");
> console.log("array joined in "+(join_t-comp_t)+"ms");
> console.log("1mln symbol string: all symbols replaced in "+(repl_t-join_t)+"ms ");
> console.log("too smart optimizations excluded "+a.length+' '+b.length+' '+c.length);

test-d8.js: the same, except console.log() changed to print()

Vyacheslav Egorov

unread,
Aug 24, 2011, 7:52:21 AM8/24/11
to nod...@googlegroups.com
Your d8 and node are probably of different architectures. Try:

file `which node`
file `which d8`

node build defaults to x64 on x64 hosts while v8 build always defaults to ia32.

Fastest join happens when v8 manages to fit result of the join into
new space allocated string. See

http://code.google.com/p/v8/source/browse/trunk/src/array.js#411
http://code.google.com/p/v8/source/browse/trunk/src/ia32/full-codegen-ia32.cc#3372

On x64 new space is larger so %_FastAsciiArrayJoin succeeds, but on
ia32 it fails to allocate the result.

--
Vyacheslav Egorov

> --
> Job Board: http://jobs.nodejs.org/
> Posting guidelines: https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
> You received this message because you are subscribed to the Google
> Groups "nodejs" group.
> To post to this group, send email to nod...@googlegroups.com
> To unsubscribe from this group, send email to
> nodejs+un...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/nodejs?hl=en?hl=en
>

Victor Grishchenko

unread,
Aug 29, 2011, 12:32:42 AM8/29/11
to nod...@googlegroups.com
On 24 August 2011 17:52, Vyacheslav Egorov <veg...@chromium.org> wrote:
> Your d8 and node are probably of different architectures. Try:
> node build defaults to x64 on x64 hosts while v8 build always defaults to ia32.

Indeed, that was exactly the case.

> Fastest join happens when v8 manages to fit result of the join into
> new space allocated string. See

> On x64 new space is larger so %_FastAsciiArrayJoin succeeds, but on
> ia32 it fails to allocate the result.

Indeed. With a x64 binary, I may also break FastAsciiArrayJoin by
adding an accented character to the string (like in "café"). Then, it
falls back to the JavaScript impl.

Regarding replace(/a/g,'b'), IMO the problem is the
ReplacementStringBuilder class. It first builds an array of *objects*
which are either strings or encoded offsets within strings. Then, it
merges it all into the actual result string.
In case we replace large spans within a string, that technique
probably optimizes something. If we just replace 'a' with 'b', then
all that typecasts, double-checks, encodings and decodings eat quite a
lot, compared to the actual work :)

--
Victor

Victor Grishchenko

unread,
Aug 29, 2011, 1:24:29 AM8/29/11
to nod...@googlegroups.com
Sorry, but the v8 profiler disagrees with me. Regarding
String.replace(), only 25% of cycles went to StringBuilder-related
activities, while 43% is eaten by RegExpImpl::AtomExec. That is a
little bit strange, because AtomExec is doing dispatching/conversion,
no "real" work at all. Unless SearchString is inlined to it, but that
is unlikely, because I see StringSearch::SingleCharSearch in the
profile and it ate 1.7% :))
So, basically, it all goes into the overhead in very complicated ways.
Sorry for OT, I will put this on v8 bug tracker (hope they read it).

On 29 August 2011 10:32, Victor Grishchenko

Reply all
Reply to author
Forward
0 new messages