charAt vs fastCodeAt vs regex: neko vs cpp => Benchmarking JSON

瀏覽次數:77 次
跳到第一則未讀訊息

Marc Weber

未讀,
2011年12月18日 上午10:41:222011/12/18
收件者:haxelang
Because I tried reading some json based log files recently using neko
(failing because hx2JSON is much too slow) - I tried writing a new HaXe
neko json library (proof of concept).

Benchmarking [1] (iterating over 380 characters)
a) charAt
b) StringTools.fastCodeAt
c) regex

d) JSON
e) hxJson2
f) json_decode (PHP, JS native JSON decoding)

neko: (seconds)

a) 2.73544788360596e-04
b) 2.53748893737793e-05 (approx 10 times faster than first)
c) 3.22461128234863e-06 (approx 100 times faster than first)

d) 1.24084615707397 (my JSON)
e) 9.21192598342896 (hxJson2)

CPP
a) 1.25002861e-05
b) 4.768371582e-09
c) 1.574754715e-06

d) 0.5905880928 (my JSON)
e) 0.4547591209 (hxJson2)


PHP
a) 0.00026384472846985
b) 0.0002096700668335
c) 1.2679100036621E-5

d) 4.8874289989471 (my JSON)
e) 4.9091980457306 (hxJson2)
f) 0.062371015548706


JS (chrome):
a) 0.000004999637603759765 (e-6)
b) 0.000004999637603759765 (e-6)
c) 0.000005000829696655274 (e-6)

d) 0.9459998607635498 (my JSON)
e) 0.20399999618530273 (hxJson2)
f) 0.9320001602172852 (!! ?? native is slower than hxJson2 !)

results:
=============

json_decode (native PHP) much faster than everything else.
(compare f) with d) e) of all other targets)

There is some hope that JSON or hxjson2 can be made faster by
using custom parser and fastCodeAt.

regex are always a good choice. Only on CPP you have a chance to be
faster using fastCodeAt.


neko: regex easily outperform every kind of loop.
php: charAt and fastCodeAt are similar fast
cpp: charAt is as fast as fastCodeAt (neko), and fastCodeAt is a lot
faster than charAt.

js (chrome): everything is upside down?

a,b,c are rouhgly equally fast! (only the CPP version for fastCodeAt
is little faster (but not enough to care about)

CPP already showed that hxJson2 can be little faster than regex,
The regex version is significant slower than hxJson2!

And real world tests d, e are fastest here.
Surprisingly JSON.decode is slower than hxJson2


Applying the regex idea to JSON parsing yields:
https://github.com/MarcWeber/haxe-json
for quoting and unquoting I use the EReg.customReplace function.

Running it on my test case which is a file containing of 7400 json
dictionaries (about 250 characters long, each) it easily outperforms hxJson 8
times. (d vs e in neko cases above).
Because I have to use substr often because I can't apply a regex on a string at
char XY.. Also the dictionaries did not contain much escaping such as "\u2430" or such.

PHP's native json_decode function is still 10 times faster than
everything else.

Conclusion: HaXe makes it easy to target many systems - but optimizing an
implementation for each target requires individual code for each ..
Neko is nice for prototyping - but not for speed if you can use existing
native code. Chrome's JS engine does so much optimization .. incredible.

Marc Weber

[1]:
trace(bench(200, function(){
for (i in 0...s.length)
s.charAt(i);
}));
trace (bench(200, function(){
for (i in 0...s.length)
StringTools.fastCodeAt(s, i);
}));

var r = ~/^(.*)$/;
trace (bench(200, function(){
r.match(s);
r.matched(1);
}));

Justin Donaldson

未讀,
2011年12月18日 下午1:51:372011/12/18
收件者:haxe...@googlegroups.com
Thanks for this, it is interesting.  In the case of chrome, is it somehow factoring out the string?  I.E., are you generating the string dynamically, or is it something static?

-Justin
--
Justin Donaldson, BigML, Inc.
o: 313-31BIGML | c: 919-BUZZJJD

Marc Weber

未讀,
2011年12月18日 下午1:58:392011/12/18
收件者:haxelang
Excerpts from Justin Donaldson's message of Sun Dec 18 19:51:37 +0100 2011:

> Thanks for this, it is interesting. In the case of chrome, is it somehow
> factoring out the string? I.E., are you generating the string dynamically,
> or is it something static?
The pattern repeats about 7000 times and looks like this: (I replaced a-zA-Z0-9 by x):
var s = "{\"xx\":\"xxxxxxxxxxxxx\",\"x\":xxxx,\"xx\":\"xxxxxxx\\/x.x (xxxxxxxxxx; xxxxxxxxx\\/x.x; +xxxx:\\/\\/xxx.xxxxxx.xxx\\/xxx.xxxx)\",\"x\":xxxxxxxxxx,\"x\":\"xxxx\",\"x\":\"xxxxxxx:xxx:xxxx-xxx-xxxxxxx\"}\x\n"

You can find the simple benchmark implementation here:
https://github.com/MarcWeber/haxe-json/tree/master/benchmark

Marc Weber

回覆所有人
回覆作者
轉寄
0 則新訊息