Hi Andrei,
V8 has an optimization that turns arrays into unboxed double arrays. If you build a debug build of V8 you could do %DebugPrint(arr1) for example, and you'll see that the elements kind is set to FAST_DOUBLE_ELEMENTS (or FAST_HOLEY_DOUBLE_ELEMENTS).
This means that doubles are stored unboxed in the array, i.e., there's no heap-number wrapping around them. When you read such a number in optimized code, it goes straight to a double-register. When you write it back to a double array, it gets written directly inline into the array. This is a lot faster than having to allocate heap-numbers and wrap/unwrap them for computation.
There's a catch however: fullcodegen, our slow compiler used to gather typefeedback, does not support reading raw doubles. It can only work with heap numbers. This means that when you read a double from a raw-double array in fullcodegen, for example to copy it, you have to allocate a heap-number and wrap the double in it. When you write this heap-number to another double-array, the value gets taken out and written into the next array. Thus the temporary heap-number is now garbage.
So if you'd do trace-opt, hopefully you wouldn't see GC anymore after the point where your copying function is optimized.
hth,
Toon