Possible to optimize an array of numbers with some undefined values?

27 views
Skip to first unread message

Jon Harris

unread,
Aug 15, 2016, 12:07:28 PM8/15/16
to v8-users
I'm writing some charting animation code where the values for each series (shape in the chart) are specified as an array of numbers.

When series data changes, say from start -> end, I compute the delta something like so:

var start = [1, 2, 3, 4, 5];
var end = [3, 2, 1, 4, 4];
var deltas = [2, 0, -2, 0, -1];

To animate, I tween a deltaPercent between 0 and 1 and use requestAnimationFrame to compute the current series values like so:

var deltaPercent = 0.5;
var current = start.slice();
var i, length = current.length;
for (i=0; i<length; i++) {
  current[i]+= deltas[i] * deltaPercent; 
}

This is nice and simple, but things get a little more interesting when the requirement that some series values can be missing/unspecified is introduced to the mix.

For the purposes of deltas, I can treat undefined to be like 0 when the opposite value at start/end is not undefined, to get something like this:

var start = [1, undefined, 3, undefined, 5];
var end = [3, 2, 1, undefined, undefined];
var cleanStart = [1, 0, 3, undefined, 5];
var cleanEnd = [3, 2, 1, undefined, 0];
var deltas = [2, 2, -2, 0, -5];

Then my animation code becomes something like this:

var deltaPercent = 0.5;
var current = cleanStart.slice();
var i, length = current.length;
for (i=0; i<length; i++) {
  if (deltas[i] !== 0) {
    current[i]+= deltas[i] * deltaPercent; 
  }
}

My concern is how I can optimize this logic, since my application can have many series with many values (ex: > 50 arrays with length > 100) and needs to run this code at 60 frames per second to achieve a smooth animation effect.

Having read some posts about v8 performance tips (http://www.html5rocks.com/en/tutorials/speed/v8/) I understand that arrays have a hidden class & type.

My understanding of how v8 optimization works is that the presence of undefined in my arrays will cause v8 to no longer optimize for a number-only array.

Anyone have any thoughts / suggestions for how to improve performance for this scenario?

Jon Harris

unread,
Aug 15, 2016, 2:15:46 PM8/15/16
to v8-users
When attempting to benchmark this (see attached source file), I'm seeing an approximate 10x performance slowdown when checking for undefined values.
array-number-vs-undefined.js

Jakob Kummerow

unread,
Aug 16, 2016, 9:21:12 AM8/16/16
to v8-users
Well, "undefined" is not a number, so arrays with some undefined values are not number-only arrays. Sticking to number-only arrays provides massive speed benefits, so doing that is the best advice to give here. Since you "treat undefined to be like 0" anyway, why don't you put 0 into the array in the first place? If for some reason that doesn't work, maybe you can use NaN (or Infinity) as marker for nonexistent values.

Side note: some of the benchmark results are misleading artifacts stemming from particular implementation details of this microbenchmark. For example, doing a literal "theArray[xi] !== undefined" comparison instead of passing it in as "checkValue" lets V8 use faster code for the comparison and saves about 40% of the overall execution time of the "undefined big number array with undefined check" test. A number of tests are also considerably faster when run standalone as opposed to in series; presumably due to less type feedback pollution.
That said, number-only arrays are and always will be much faster than generic arrays containing mixed kinds of values.

Scary technical details:

"big number array" generates this code in the innermost loop:
                  ;;; <@235,#209> compare-numeric-and-branch ("xi < xl")
0xe2e03778781   769  3bd1           cmpl rdx,rcx
0xe2e03778783   771  0f8d44000000   jge 845  (0xe2e037787cd)
                  ;;; <@242,#218> stack-check
0xe2e03778789   777  493ba5700b0000 REX.W cmpq rsp,[r13+0xb70]
0xe2e03778790   784  0f82e5010000   jc 1275  (0xe2e0377897b)
                  ;;; <@244,#226> bounds-check (for "other[xi]")
0xe2e03778796   790  443bf2         cmpl r14,rdx
0xe2e03778799   793  0f864e020000   jna 1389  (0xe2e037789ed)
                  ;;; <@246,#227> load-keyed ("other[xi]")
0xe2e0377879f   799  c5fb1044d70f   vmovsd xmm0,[rdi+rdx*8+0xf]
                  ;;; <@248,#234> bounds-check (for "theArray[xi]")
0xe2e037787a5   805  3bc2           cmpl rax,rdx
0xe2e037787a7   807  0f8645020000   jna 1394  (0xe2e037789f2)
                  ;;; <@250,#235> load-keyed ("theArray[xi]")
0xe2e037787ad   813  c4c17b104cd70f vmovsd xmm1,[r15+rdx*8+0xf]
                  ;;; <@252,#236> add-d
0xe2e037787b4   820  c5f358c0       vaddsd xmm0,xmm1,xmm0
                  ;;; <@254,#246> bounds-check (for "result[xi]")
0xe2e037787b8   824  443bca         cmpl r9,rdx
0xe2e037787bb   827  0f8636020000   jna 1399  (0xe2e037789f7)
                  ;;; <@256,#247> store-keyed ("result[xi] = ...")
0xe2e037787c1   833  c4c17b1144d00f vmovsd [r8+rdx*8+0xf],xmm0
                  ;;; <@258,#250> add-i  ("xi++")
0xe2e037787c8   840  83c201         addl rdx,0x1
                  ;;; <@261,#253> goto
0xe2e037787cb   843  ebb4           jmp 769  (0xe2e03778781)

That's pretty much as fast as it gets. (With the possible exception of bounds checks that might be eliminatable, but our past experiments exploring that haven't shown much impact.)

In comparison, "undefined big number array with undefined check" needs to box and unbox all the numbers before storing and after loading them; and the stores need to record having written a newly-allocated object (all the allocations will also cause quite a bit of GC activity). From the code size alone, you can tell that this is going to be slower; differences in red:
                  ;;; <@265,#232> compare-numeric-and-branch ("xi < xl")
0xe2e0377cc68  1224  3bd1           cmpl rdx,rcx
0xe2e0377cc6a  1226  0f8d1c010000   jge 1516  (0xe2e0377cd8c)
                  ;;; <@272,#241> stack-check
0xe2e0377cc70  1232  493ba5700b0000 REX.W cmpq rsp,[r13+0xb70]
0xe2e0377cc77  1239  0f82c4020000   jc 1953  (0xe2e0377cf41)
                  ;;; <@274,#247> bounds-check (for "theArray[xi]")
0xe2e0377cc7d  1245  443bf2         cmpl r14,rdx
0xe2e0377cc80  1248  0f86a1030000   jna 2183  (0xe2e0377d027)
                  ;;; <@276,#248> load-keyed ("theArray[xi]")
0xe2e0377cc86  1254  4c8b5cd70f     REX.W movq r11,[rdi+rdx*8+0xf]
                  ;;; <@279,#249> cmp-object-eq-and-branch ("!== undefined")
0xe2e0377cc8b  1259  49ba1143808d0f290000 REX.W movq r10,0x290f8d804311    ;; object: 0x290f8d804311 <undefined>
0xe2e0377cc95  1269  4d3bda         REX.W cmpq r11,r10
0xe2e0377cc98  1272  0f84df000000   jz 1501  (0xe2e0377cd7d)
                  ;;; <@286,#272> bounds-check (for "other[xi]")
0xe2e0377cc9e  1278  3bc2           cmpl rax,rdx
0xe2e0377cca0  1280  0f8686030000   jna 2188  (0xe2e0377d02c)
                  ;;; <@288,#273> load-keyed ("other[xi]")
0xe2e0377cca6  1286  4d8b64d70f     REX.W movq r12,[r15+rdx*8+0xf]
                  ;;; <@290,#449> double-untag (for "theArray[xi]")
0xe2e0377ccab  1291  41f6c301       testb r11,0x1
0xe2e0377ccaf  1295  7416           jz 1319  (0xe2e0377ccc7)
0xe2e0377ccb1  1297  4d8b5550       REX.W movq r10,[r13+0x50]
0xe2e0377ccb5  1301  4d3953ff       REX.W cmpq [r11-0x1],r10
0xe2e0377ccb9  1305  c4c17b104307   vmovsd xmm0,[r11+0x7]
0xe2e0377ccbf  1311  0f856c030000   jnz 2193  (0xe2e0377d031)
0xe2e0377ccc5  1317  eb10           jmp 1335  (0xe2e0377ccd7)
0xe2e0377ccc7  1319  4d8bd3         REX.W movq r10,r11
0xe2e0377ccca  1322  49c1ea20       REX.W shrq r10, 32
0xe2e0377ccce  1326  c5f957c0       vxorpd xmm0,xmm0,xmm0
0xe2e0377ccd2  1330  c4c17b2ac2     vcvtlsi2sd xmm0,xmm0,r10
                  ;;; <@292,#450> double-untag (for "other[xi]")
0xe2e0377ccd7  1335  41f6c401       testb r12,0x1
0xe2e0377ccdb  1339  7418           jz 1365  (0xe2e0377ccf5)
0xe2e0377ccdd  1341  4d8b5550       REX.W movq r10,[r13+0x50]
0xe2e0377cce1  1345  4d395424ff     REX.W cmpq [r12-0x1],r10
0xe2e0377cce6  1350  c4c17b104c2407 vmovsd xmm1,[r12+0x7]
0xe2e0377cced  1357  0f8543030000   jnz 2198  (0xe2e0377d036)
0xe2e0377ccf3  1363  eb10           jmp 1381  (0xe2e0377cd05)
0xe2e0377ccf5  1365  4d89e2         REX.W movq r10,r12
0xe2e0377ccf8  1368  49c1ea20       REX.W shrq r10, 32
0xe2e0377ccfc  1372  c5f157c9       vxorpd xmm1,xmm1,xmm1
0xe2e0377cd00  1376  c4c1732aca     vcvtlsi2sd xmm1,xmm1,r10
                  ;;; <@294,#274> add-d
0xe2e0377cd05  1381  c5f358c0       vaddsd xmm0,xmm1,xmm0
                  ;;; <@296,#285> bounds-check (for "result[xi] = ...")
0xe2e0377cd09  1385  443bca         cmpl r9,rdx
0xe2e0377cd0c  1388  0f8629030000   jna 2203  (0xe2e0377d03b)
                  ;;; <@298,#451> number-tag-d (for "result[xi] = ...")
0xe2e0377cd12  1394  4d8ba558100000 REX.W movq r12,[r13+0x1058]
0xe2e0377cd19  1401  4d89e3         REX.W movq r11,r12
0xe2e0377cd1c  1404  4983c310       REX.W addq r11,0x10
0xe2e0377cd20  1408  4d3b9d60100000 REX.W cmpq r11,[r13+0x1060]
0xe2e0377cd27  1415  0f8759020000   ja 2022  (0xe2e0377cf86)
0xe2e0377cd2d  1421  4d899d58100000 REX.W movq [r13+0x1058],r11
0xe2e0377cd34  1428  49ffc4         REX.W incq r12
0xe2e0377cd37  1431  4d8b5550       REX.W movq r10,[r13+0x50]
0xe2e0377cd3b  1435  4d895424ff     REX.W movq [r12-0x1],r10
0xe2e0377cd40  1440  c4c17b11442407 vmovsd [r12+0x7],xmm0
                  ;;; <@299,#451> gap
0xe2e0377cd47  1447  498bc0         REX.W movq rax,r8
0xe2e0377cd4a  1450  4c8bda         REX.W movq r11,rdx
                  ;;; <@300,#287> store-keyed ("result[xi] = ...")
0xe2e0377cd4d  1453  4e8964d80f     REX.W movq [rax+r11*8+0xf],r12
0xe2e0377cd52  1458  4e8d5cd80f     REX.W leaq r11,[rax+r11*8+0xf]
0xe2e0377cd57  1463  4981e40000f0ff REX.W andq r12,0xfffffffffff00000
0xe2e0377cd5e  1470  41f644240802   testb [r12+0x8],0x2
0xe2e0377cd64  1476  7417           jz 1501  (0xe2e0377cd7d)
0xe2e0377cd66  1478  49c7c40000f0ff REX.W movq r12,0xfff00000
0xe2e0377cd6d  1485  4c23e0         REX.W andq r12,rax
0xe2e0377cd70  1488  41f644240804   testb [r12+0x8],0x4
0xe2e0377cd76  1494  7405           jz 1501  (0xe2e0377cd7d)
0xe2e0377cd78  1496  e8e3f0ffff     call 0xe2e0377be60       ;; code: STUB, RecordWriteStub, minor: 11200
                  ;;; <@314,#296> add-i ("xi++")
0xe2e0377cd7d  1501  83c201         addl rdx,0x1
                  ;;; <@316,#299> gap
0xe2e0377cd80  1504  488b8558ffffff REX.W movq rax,[rbp-0xa8]
                  ;;; <@317,#299> goto
0xe2e0377cd87  1511  e9dcfeffff     jmp 1224  (0xe2e0377cc68)

You can see that the comparison itself is not much of an issue. If you look closely, you can also see that the duplicate "theArray[xi]" load was actually avoided and only performed once. The expensive part is the number boxing/unboxing.


On Mon, Aug 15, 2016 at 8:15 PM, Jon Harris <harr...@gmail.com> wrote:
When attempting to benchmark this (see attached source file), I'm seeing an approximate 10x performance slowdown when checking for undefined values.

--
--
v8-users mailing list
v8-u...@googlegroups.com
http://groups.google.com/group/v8-users
---
You received this message because you are subscribed to the Google Groups "v8-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to v8-users+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages