appending to a Vector

Russ P.

unread,

Oct 26, 2012, 2:51:59 PM10/26/12

to scala...@googlegroups.com

In one of Martin's coursera lectures for this week, he discusses appending an element to a Vector. Please correct me if I am wrong, but that does not appear to be an efficient operation that one would want to do many times to build a Vector. Also, the resulting Vector does not appear to have the same form as one would expect. Did I understand correctly that simply appending a single element always adds an array of length 32? I hope not!

A while back I discovered and started using a class called VectorBuilder, which I assume is a more efficient way to build a Vector. Perhaps Martin should have mentioned it in his lecture.

That's just a nitpick on his otherwise excellent lectures, of course.

--Russ P.

Simon Ochsenreither

unread,

Oct 26, 2012, 5:50:49 PM10/26/12

to scala...@googlegroups.com

Vector uses VectorBuilder underneath, as far as I know (look for newBuilder). Appending a single element shouldn't always add an array of length 32 anyway.

Rex Kerr

unread,

Oct 26, 2012, 5:56:54 PM10/26/12

to scala...@googlegroups.com

Vector has a chunk size of 32 elements, but it's clever about how it fills those entries in. It has several different states, so nothing is really true in general, but if you add repeatedly to one end, it's actually quite fast. Repeated add/remove cycles from both ends are the worst: you get a lot of rebuilding of (admittedly very shallow, but 32-branch-width) trees then.

Anyway, Vector cannot actually replicate the speed of a mutable ring buffer in all operations, but for a lot of common use cases it's surprisingly good. Try some microbenchmarks! You may be surprised at how good its performance is when you come in thinking about how much work it naively seems it would need to do to stay immutable.

--Rex

Daniel Sobral

unread,

Oct 26, 2012, 8:58:30 PM10/26/12

to Russ P., scala...@googlegroups.com

On Fri, Oct 26, 2012 at 4:51 PM, Russ P. <russ.p...@gmail.com> wrote:
> In one of Martin's coursera lectures for this week, he discusses appending
> an element to a Vector. Please correct me if I am wrong, but that does not
> appear to be an efficient operation that one would want to do many times to
> build a Vector. Also, the resulting Vector does not appear to have the same
> form as one would expect. Did I understand correctly that simply appending a
> single element always adds an array of length 32? I hope not!
>
> A while back I discovered and started using a class called VectorBuilder,
> which I assume is a more efficient way to build a Vector. Perhaps Martin
> should have mentioned it in his lecture.

It does create at least one array of length 32, and possibly up to 6,
as would any update operation. But, here's the kick: because of the
bad locality of common lists, creating an array of 32 elements and
creating a single cons have about the same cost compared to the cache
miss. The vector then gains performance back because of locality of
reference on reads.

This is the standard data structure used by Clojure, where it has proven itself.

--
Daniel C. Sobral

I travel to the future all the time.

Russ P.

unread,

Oct 27, 2012, 12:13:25 AM10/27/12

to scala...@googlegroups.com, Russ P.

On Friday, October 26, 2012 5:58:55 PM UTC-7, Daniel Sobral wrote:

On Fri, Oct 26, 2012 at 4:51 PM, Russ P. <russ.p...@gmail.com> wrote:
> In one of Martin's coursera lectures for this week, he discusses appending
> an element to a Vector. Please correct me if I am wrong, but that does not
> appear to be an efficient operation that one would want to do many times to
> build a Vector. Also, the resulting Vector does not appear to have the same
> form as one would expect. Did I understand correctly that simply appending a
> single element always adds an array of length 32? I hope not!
>
> A while back I discovered and started using a class called VectorBuilder,
> which I assume is a more efficient way to build a Vector. Perhaps Martin
> should have mentioned it in his lecture.

It does create at least one array of length 32, and possibly up to 6,
as would any update operation. But, here's the kick: because of the
bad locality of common lists, creating an array of 32 elements and
creating a single cons have about the same cost compared to the cache
miss. The vector then gains performance back because of locality of
reference on reads.

OK, let me see if I have this straight. Let's say I create a Vector of one element. Then I append another element to it. Are you saying that another array of length 32 is created to accommodate the new element? I'm just trying to understand. Thanks.

--Russ P.

Simon Ochsenreither

unread,

Oct 27, 2012, 1:19:30 AM10/27/12

to scala...@googlegroups.com, Russ P.

A new array is allocated, the old element is copied, the new one added. If the old tree/array is not referenced anymore, it gets garbage-collected. (As far as I have understood it.)

Rex Kerr

unread,

Oct 27, 2012, 2:08:21 AM10/27/12

to Russ P., scala...@googlegroups.com

No, actually, that's not what happens. The same underlying array is used, and the next vector is told that the array is filled in one more slot than the first vector. (See the initFrom stuff in VectorPointer.) So you do have to create a new Vector each time you add one, but that just says to look at more and more of the array (until you run out of room, or you do something like remove and then add again, where you need a new array).

Check out the source. It's a bit of slow going to get through without comments (and those are sparse), but it's actually quite clever about not copying arrays unless you must (in the spots I've checked; I haven't read the whole file).

--Rex

Simon Ochsenreither

unread,

Oct 27, 2012, 2:13:34 AM10/27/12

to scala...@googlegroups.com, Russ P.

Thanks, must have missed that!

nicola...@gmail.com

unread,

Oct 27, 2012, 10:51:54 AM10/27/12

to Rex Kerr, Russ P., scala...@googlegroups.com

> No, actually, that's not what happens. The same underlying array is used,
> and the next vector is told that the array is filled in one more slot than
> the first vector. (See the initFrom stuff in VectorPointer.) So you do
> have to create a new Vector each time you add one, but that just says to
> look at more and more of the array (until you run out of room, or you do
> something like remove and then add again, where you need a new array).

Does that means that a pointer to the old vector can prevent the new
added element to be garbage collected?
(If the new vector is collectable.)

Rex Kerr

unread,

Oct 27, 2012, 3:42:43 PM10/27/12

to nicola...@gmail.com, Russ P., scala...@googlegroups.com

I haven't actually tested it, but I could imagine that Vector is not the best structure to use when you want to make sure that every individual element is free for garbage collection. I didn't see special logic for zeroing out now-unused entries, but I haven't looked at the code in enough detail to know for sure.

--Rex

Daniel Sobral

unread,

Oct 27, 2012, 8:19:47 PM10/27/12

to Rex Kerr, Russ P., scala...@googlegroups.com

So it marks the original vector as extended, so that a new extension
will cause a new array creation? I wondered if it applied such an
optimization. It's tricky, because such a mutation can cause problem
with multithreaded software.

>
> Check out the source. It's a bit of slow going to get through without
> comments (and those are sparse), but it's actually quite clever about not
> copying arrays unless you must (in the spots I've checked; I haven't read
> the whole file).
>
> --Rex
>

Daniel Sobral

unread,

Oct 27, 2012, 8:31:59 PM10/27/12

to Rex Kerr, Russ P., scala...@googlegroups.com

Mmmmm. I don't see any synchronization around dirty. I'm half of a
mind to break out the tool box and stress test multithreaded append. I
have bad memories of multithread-unsafe "immutable" hash map of Scala
up to 2.7.

Tiark Rompf

unread,

Oct 28, 2012, 7:40:47 AM10/28/12

to nicola...@gmail.com, Daniel Sobral, Rex Kerr, Russ P., scala...@googlegroups.com

On Oct 27, 2012, at 4:51 PM, nicola...@gmail.com wrote:

>> No, actually, that's not what happens. The same underlying array is used,
>> and the next vector is told that the array is filled in one more slot than
>> the first vector. (See the initFrom stuff in VectorPointer.) So you do
>> have to create a new Vector each time you add one, but that just says to
>> look at more and more of the array (until you run out of room, or you do
>> something like remove and then add again, where you need a new array).

Actually, this is not what happens :) for precisely the reason below (avoiding
memory leaks). Updating an element or appending on either end will allocate
a new 32 block. For larger vectors, subsequent updates to the same 32 block will
only copy this block, not all the blocks on the way to the root.

The fasted way to build a vector linearly is by using a VectorBuilder.

- Tiark

Tiark Rompf

unread,

Oct 28, 2012, 7:47:55 AM10/28/12

to nicola...@gmail.com, Daniel Sobral, Rex Kerr, Russ P., scala...@googlegroups.com

On Oct 28, 2012, at 12:40 PM, Tiark Rompf <tiark...@epfl.ch> wrote:

>
> On Oct 27, 2012, at 4:51 PM, nicola...@gmail.com wrote:
>
>>> No, actually, that's not what happens. The same underlying array is used,
>>> and the next vector is told that the array is filled in one more slot than
>>> the first vector. (See the initFrom stuff in VectorPointer.) So you do
>>> have to create a new Vector each time you add one, but that just says to
>>> look at more and more of the array (until you run out of room, or you do
>>> something like remove and then add again, where you need a new array).
>
> Actually, this is not what happens :) for precisely the reason below (avoiding
> memory leaks). Updating an element or appending on either end will allocate
> a new 32 block.

I should clarify that if the old structure has less than 32 elements in the last block,
the new block will not be appended, but replace the current last block. There
will be no 'holes' in the structure.

Simon Ochsenreither

unread,

Oct 28, 2012, 8:00:06 AM10/28/12

to scala...@googlegroups.com, nicola...@gmail.com, Daniel Sobral, Rex Kerr, Russ P.

Is there btw a paper about Vectors in general (I have only seen an older one about VLists by Phil Bagwell and a quite recent one about Concatenation by Phil Bagwell and you)?

Rex Kerr

unread,

Oct 28, 2012, 8:17:41 AM10/28/12

to Tiark Rompf, nicola...@gmail.com, Daniel Sobral, Russ P., scala...@googlegroups.com

Ack, my mistake. I didn't follow the logic all the way through gotoPosWritable. I figured that "goto" did not mean "always copy a bunch of arrays, whether marked dirty or not". And I had misremembered tests I'd done with VectorBuilder as being with :+.

Please ignore everything I've said, except that appending to vectors is a good idea if you care at all about performance. It's fully 8-10x slower than list or ArrayBuffer.

*frantically scrubbing out incorrect assumptions from brain*

--Rex

P.S. (Now the question is why it's not implemented the way I stated for performance.)

nicola...@gmail.com

unread,

Oct 28, 2012, 12:06:59 PM10/28/12

to Rex Kerr, Tiark Rompf, Daniel Sobral, Russ P., scala...@googlegroups.com

>
> P.S. (Now the question is why it's not implemented the way I stated for
> performance.)
>

I think because it is not safe for space.

Rex Kerr

unread,

Oct 28, 2012, 12:27:52 PM10/28/12

to nicola...@gmail.com, Tiark Rompf, Daniel Sobral, Russ P., scala...@googlegroups.com

It is trickier to make it safe for space; you need to do reference counting, basically. I'm now tempted to write Vector as I had imagined it instead of the way it is...ought to be ~6x faster on simple extension and otherwise be pretty much the same except that GC would take two sweeps instead of one to clean out transiently added stuff.

--Rex

nicola...@gmail.com

unread,

Oct 28, 2012, 12:33:38 PM10/28/12

to Rex Kerr, Tiark Rompf, Daniel Sobral, Russ P., scala...@googlegroups.com

On Sun, Oct 28, 2012 at 4:27 PM, Rex Kerr <ich...@gmail.com> wrote:
> It is trickier to make it safe for space; you need to do reference counting,
> basically. I'm now tempted to write Vector as I had imagined it instead of
> the way it is...ought to be ~6x faster on simple extension and otherwise be
> pretty much the same except that GC would take two sweeps instead of one to
> clean out transiently added stuff.
>

You want to add a finaliser to every vector?

Rex Kerr

unread,

Oct 28, 2012, 12:48:25 PM10/28/12

to nicola...@gmail.com, Tiark Rompf, Daniel Sobral, Russ P., scala...@googlegroups.com

I was thinking of a WeakReference scheme. But since I haven't coded it, it could well be that the memory costs are too high.

(Finalizers are a great way to slow things down by orders of magnitude, unfortunately. Conceptually, they're the right solution.)

--Rex

Tiark Rompf

unread,

Oct 28, 2012, 12:49:32 PM10/28/12

to Rex Kerr, nicola...@gmail.com, Daniel Sobral, Russ P., scala...@googlegroups.com

We considered this model for a while -- it's very close to the original VList idea. But we concluded that there's no good way to make it work reliably. There are many hidden costs and of course the knee-jerk response to "ref counting" is "cycles".

- Tiark

Rex Kerr

unread,

Oct 28, 2012, 1:01:18 PM10/28/12

to Tiark Rompf, nicola...@gmail.com, Daniel Sobral, Russ P., scala...@googlegroups.com

You reference "count" the linear extension only. Any branching (i.e. two child vectors with the same parent) causes duplication as now. No cycles.

--Rex

√iktor Ҡlang

unread,

Oct 28, 2012, 1:02:15 PM10/28/12

to Rex Kerr, Tiark Rompf, nicola...@gmail.com, Daniel Sobral, Russ P., scala...@googlegroups.com

Trust me, both finalizers and WeakReferences are things you want to avoid if you want performance.

--
Viktor Klang

Akka Tech Lead

Typesafe - The software stack for applications that scale

Twitter: @viktorklang

Tiark Rompf

unread,

Oct 28, 2012, 1:17:38 PM10/28/12

to √iktor Ҡlang, Rex Kerr, Tiark Rompf, nicola...@gmail.com, Daniel Sobral, Russ P., scala...@googlegroups.com

On Oct 28, 2012, at 6:02 PM, √iktor Ҡlang <viktor...@gmail.com> wrote:

Trust me, both finalizers and WeakReferences are things you want to avoid if you want performance.

These are some of the hidden costs I mentioned.

On Sun, Oct 28, 2012 at 6:01 PM, Rex Kerr <ich...@gmail.com> wrote:

You reference "count" the linear extension only. Any branching (i.e. two child vectors with the same parent) causes duplication as now. No cycles.

Of course you can have cycles:

var myGlobalVector = … // long-lived var

val a = new Array[AnyRef](1000000000)

val v2 = myGlobalVector :+ a

a(0) = v2

Now myGlobalVector points to a, which points to v2, so v2 is not GC'ed and therefore a is a GB-size memory leak.

- Tiark

Rex Kerr

unread,

Oct 28, 2012, 1:31:29 PM10/28/12

to Tiark Rompf, √iktor Ҡlang, scala...@googlegroups.com

Sorry, I mean you won't have any _extra_ worry about cycles due to the reference-counting scheme. The GC (and programmer) of course have to worry about cycles.

Viktor: WeakReference doesn't look that bad in my limited use and limited benchmarking. It's a wrapper, so yes, there's a cost to having an extra wrapper. Otherwise, it doesn't seem to be worse than any other indirection, whereas finalize makes GC take considerably longer (~3-4x in my unrealistic benchmarks).

--Rex

ScottC

unread,

Oct 28, 2012, 3:30:45 PM10/28/12

to scala...@googlegroups.com, Tiark Rompf, √iktor Ҡlang

Yes, avoid finalizers at all costs where performance is a concern.

WeakReference >> finalize()

With weak references, you only need to keep the minimal information necessary for cleanup, and can define your own strategy for when that happens after GC (e.g. multiple threads, asynchronous, synchronous). With finalizers you rely on the JVM's finalizer proces (a single low priority thread in OracleJDK/OpenJDK).

The reason for the performance difference is simple: a finalizer makes the GC have to treat the object as live for longer -- when it is first not reachable it MUST be copied and kept around for at least one more GC until after it is finalized. Therefore, it takes at minimum two GC cycles to remove from the heap.

WeakReferences let the object being referenced be cleaned up on the first GC pass, only the WeakReference object is left in memory and in many use cases this is a much smaller object.

Although neither provides predictable cleanup, in practice WeakReference leads to much more consistent and reliable behavior.

I once replaced a system that used finalizers to track objects holding on to C++ resources via JNI to one using WeakReferences. Performance of the application improved under heavy load by about a factor of 3, and memory use decreased by 60%.

√iktor Ҡlang

unread,

Oct 28, 2012, 5:21:18 PM10/28/12

to Rex Kerr, Tiark Rompf, scala...@googlegroups.com

On Sun, Oct 28, 2012 at 6:31 PM, Rex Kerr <ich...@gmail.com> wrote:

Sorry, I mean you won't have any _extra_ worry about cycles due to the reference-counting scheme. The GC (and programmer) of course have to worry about cycles.

Viktor: WeakReference doesn't look that bad in my limited use and limited benchmarking. It's a wrapper, so yes, there's a cost to having an extra wrapper. Otherwise, it doesn't seem to be worse than any other indirection, whereas finalize makes GC take considerably longer (~3-4x in my unrealistic benchmarks).

Perhaps I have different performance needs from most other people. :-)

Cheers,

√

Rex Kerr

unread,

Oct 28, 2012, 5:35:36 PM10/28/12

to √iktor Ҡlang, Tiark Rompf, scala...@googlegroups.com

On Sun, Oct 28, 2012 at 5:21 PM, √iktor Ҡlang <viktor...@gmail.com> wrote:

On Sun, Oct 28, 2012 at 6:31 PM, Rex Kerr <ich...@gmail.com> wrote:
Sorry, I mean you won't have any _extra_ worry about cycles due to the reference-counting scheme. The GC (and programmer) of course have to worry about cycles.

Viktor: WeakReference doesn't look that bad in my limited use and limited benchmarking. It's a wrapper, so yes, there's a cost to having an extra wrapper. Otherwise, it doesn't seem to be worse than any other indirection, whereas finalize makes GC take considerably longer (~3-4x in my unrealistic benchmarks).

Perhaps I have different performance needs from most other people. :-)

Cheers,
√

Possibly! But your typical use is likely closer to a Vector-user's than is mine. My inner loops can't use immutable anything because I can't afford the GC/object creation time. Arithmetic on piles of primitive numbers calls for a pretty narrow set of optimization strategies. I'd trust your instincts more than mine here. (But I might trust a good benchmark even more. Mine's just enough to make me uncertain.)

--Rex

√iktor Ҡlang

unread,

Oct 28, 2012, 5:50:10 PM10/28/12

to Rex Kerr, Tiark Rompf, scala...@googlegroups.com

So you want optimal single-thread performance, I need optimal multithread performance :-)

--Rex

Reply all

Reply to author

Forward