Removing an element from an immutable Vector

2,759 views
Skip to first unread message

Oliver Ruebenacker

unread,
Feb 7, 2013, 1:52:59 PM2/7/13
to scala-user
Hello,

I want to remove an element from an immutable Vector. I know the index.

I know I can do:

scala> val vec = Vector(1, 2, 3, 4, 5)
vec: scala.collection.immutable.Vector[Int] = Vector(1, 2, 3, 4, 5)

scala> val vec2 = vec.slice(0, 2) ++ vec.drop(3)
vec2: scala.collection.immutable.Vector[Int] = Vector(1, 2, 4, 5)

Just feels there should be a simpler way.

Thanks!

Take care
Oliver

--
IT Project Lead at PanGenX (http://www.pangenx.com)
The purpose is always improvement

Jason Zaugg

unread,
Feb 7, 2013, 2:00:06 PM2/7/13
to Oliver Ruebenacker, scala-user
On Thu, Feb 7, 2013 at 7:52 PM, Oliver Ruebenacker <cur...@gmail.com> wrote:
     Hello,

  I want to remove an element from an immutable Vector. I know the index.

  I know I can do:

  scala> val vec = Vector(1, 2, 3, 4, 5)
  vec: scala.collection.immutable.Vector[Int] = Vector(1, 2, 3, 4, 5)

  scala> val vec2 = vec.slice(0, 2) ++ vec.drop(3)
  vec2: scala.collection.immutable.Vector[Int] = Vector(1, 2, 4, 5)

  Just feels there should be a simpler way.

Here's one way:

scala> Vector(1, 2, 3, 4, 5) patch (from = 2, patch = Nil, replaced = 1)
res4: scala.collection.immutable.Vector[Int] = Vector(1, 2, 4, 5)

 -jason

Oliver Ruebenacker

unread,
Feb 7, 2013, 2:17:19 PM2/7/13
to Jason Zaugg, scala-user
Hello,
Thanks! This is simpler and - I suspect - better optimized.

Rex Kerr

unread,
Feb 7, 2013, 2:22:59 PM2/7/13
to Jason Zaugg, Oliver Ruebenacker, scala-user
The fastest but not simplest way I know of is actually to filter.  You can't intrinsically filter by index, but if you

class IsN[A](n: Int) extends (A => Boolean) {
  private[this] var i = -1
  def apply(a: A): Boolean = { i += 1; i==n }
}

then you can

val n = new IsN[Int](2)
val vec2 = vec.filter(n)

which is about twice as fast as what you're doing.

Jason's solution splits the difference in speed, so is probably what you should go for.

  --Rex

On Thu, Feb 7, 2013 at 2:00 PM, Jason Zaugg <jza...@gmail.com> wrote:

--
You received this message because you are subscribed to the Google Groups "scala-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scala-user+...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Rex Kerr

unread,
Feb 7, 2013, 2:23:42 PM2/7/13
to Jason Zaugg, Oliver Ruebenacker, scala-user
That should be filterNot.  (Or use IsntN with i!=n)

Oliver Ruebenacker

unread,
Feb 7, 2013, 2:36:25 PM2/7/13
to Rex Kerr, Jason Zaugg, scala-user
Hello,

Why is this faster? This needs to look at, and make a decision
about, every element, unlike the other solutions.

Take care
Oliver

Erik Osheim

unread,
Feb 7, 2013, 2:46:42 PM2/7/13
to Oliver Ruebenacker, scala-user
On Thu, Feb 07, 2013 at 02:36:25PM -0500, Oliver Ruebenacker wrote:
> Why is this faster? This needs to look at, and make a decision
> about, every element, unlike the other solutions.

So without studying the code in detail here is my guess (someone else
may follow up with a more detailed analysis):

Every time you want to add or remove parts of a vector, a new vector
has to be created. In the case where you're appending (or popping off
the end) this can be pretty fast (involving copying a 32-element
array 1-6 times and changing 6 entries).

However, in cases where you want to remove from the beginning or middle
of a Vector, or when you want to concatenate vectors, you end up having
to "shift" significant portions of the inner array(s) which is more
expensive.

So for instance, if you call drop(1) on a vector it will need to run
through the entire vector shifting everything over by 1, through the
nested tree of arrays. This isn't necessarily any faster than a map or
filter call. Similarly, when you concatenate vectors it will have to
copy the lhs vector (relatively fast) then shift the rhs
element-by-element into the copy. It might use an iterator or some
other method but it will be relatively expensive.

By contrast, a filter will just allocate a new empty (and mutable)
vector, and iterate through the existing vector copying (or not) the
values. You could certainly write a delete method that would be faster
than filter, but it seems reasonable to me that concatenating vectors
is unlikely to be faster than a filter or map.

-- Erik

Rex Kerr

unread,
Feb 7, 2013, 2:47:14 PM2/7/13
to Oliver Ruebenacker, Jason Zaugg, scala-user
Testing takes very little time.  Filter builds the vector in one sweep.  Patch does also, but for the tail it uses a view which slows things down on every element access.

  --Rex

Rüdiger Klaehn

unread,
Feb 7, 2013, 2:53:33 PM2/7/13
to Rex Kerr, Oliver Ruebenacker, Jason Zaugg, scala-user
Didn't Phil Bagwell propose an update to the vector data structure that would make things like concatenation or splice able to reuse the tree? 


Is Vector going to be updated to this algorithm at some point in the future? That would make something like splice pretty fast, right?

pagoda_5b

unread,
Feb 7, 2013, 4:26:46 PM2/7/13
to scala...@googlegroups.com
Is 

  Vector(1,2,3,4,5).zipWithIndex.filterNot(_._2 == 2).unzip._1

terribly expensive?

Ivano

Rex Kerr

unread,
Feb 7, 2013, 4:50:03 PM2/7/13
to pagoda_5b, scala...@googlegroups.com
Takes about twice as long as anything else.
  --Rex

--

Juha Heljoranta

unread,
Feb 7, 2013, 4:49:50 PM2/7/13
to Rüdiger Klaehn, scala...@googlegroups.com, Rex Kerr, Oliver Ruebenacker, Jason Zaugg
On Thursday, February 07, 2013 20:53:33 Rüdiger Klaehn wrote:
> That would make something like splice pretty fast, right?

Yep, slice and concatenation operations become pretty fast operations when
they managed to get it into Scala library.
I benchmarked rrb based Vector some time a go with my red-black tree based
Vector implementation:
https://subtype.wordpress.com/2012/10/20/spinning-the-vectors/
Updated results:
http://koti.kapsi.fi/~seqas/barray-2012-10-30/

Cheers,
Juha

Erik Osheim

unread,
Feb 7, 2013, 5:02:59 PM2/7/13
to pagoda_5b, scala...@googlegroups.com
On Thu, Feb 07, 2013 at 01:26:46PM -0800, pagoda_5b wrote:
> Is
>
> Vector(1,2,3,4,5).zipWithIndex.filterNot(_._2 == 2).unzip._1
>
> terribly expensive?

I guess it depends what you mean. Here's what (I think) happens:

(given vector with n Ints)

1. burn through the vector, allocating n tuples, building new vector
with them

2. burn through the new vector filtering out exactly one tuple,
building an n-1 length vector

3. allocate two new vectors (n-1 length), burn through filtered vector copying
into two new vectors

4. take the first new vector

So, we've allocated 4 new vectors of length ~n, as well as allocated n
tuples, and iterated across ~n values 3 times. It certainly doesn't
strike me as the most efficient way to do this.

It's ugly, but my guess is that one of the fastest solutions might be
something like this:

def delete[A](v: Vector[A], n: Int): Vector[A] = {
if (n < 0) sys.error("argh")

val it = v.toIterator

val it2 = new Iterator[A] {
var i = if (n == 0) {
if (it.hasNext) it.next
-1
} else {
n
}

def hasNext: Boolean = it.hasNext

def next: A = {
val a = it.next
if (i < 0) {
} else if (i > 1) {
i -= 1
} else {
i = -1
if (it.hasNext) it.next
}
a
}
}
it2.toVector
}

The iterator filters the index as it goes, so the new vector should be
able to be built in a single pass (thus, one traversal of ~n elements,
one vector allocation, nothing else).

-- Erik

Erik Osheim

unread,
Feb 7, 2013, 5:08:11 PM2/7/13
to pagoda_5b, scala...@googlegroups.com
On Thu, Feb 07, 2013 at 05:02:59PM -0500, Erik Osheim wrote:
> It's ugly, but my guess is that one of the fastest solutions might be
> something like this:

(I mean one of the fastest for the existing Vector, where the delete
operation has to be eager. One can imagine other ways of writing Vector
that would allow you to lazily accumulate deletes, or do other kinds of
smart concatenation.)

-- Erik

Rex Kerr

unread,
Feb 7, 2013, 5:28:33 PM2/7/13
to Erik Osheim, pagoda_5b, scala...@googlegroups.com
It's almost always faster to use the builder, and foreach usually beats iterator for vectors.  The fastest I know of is this:

def delete2[A](v: Vector[A], n: Int): Vector[A] = {
   val b = v.companion.newBuilder[A]
   var i = 0
   v.foreach{ x => if (i != n) { b += x }; i += 1 }
   b.result
 }

Yours takes ~1.5x longer, mine with IsN takes 2x longer, patch takes 3x longer, zipWithIndex takes 5x longer.

  --Rex



-- Erik

Erik Osheim

unread,
Feb 7, 2013, 6:23:17 PM2/7/13
to Rex Kerr, pagoda_5b, scala...@googlegroups.com
On Thu, Feb 07, 2013 at 05:28:33PM -0500, Rex Kerr wrote:
> It's almost always faster to use the builder, and foreach usually beats
> iterator for vectors. The fastest I know of is this:

Ah, that's a better solution, thanks!

Seems like I'm often forgetting about builders.

-- Erik

Tim P

unread,
Apr 1, 2014, 6:44:27 AM4/1/14
to scala...@googlegroups.com, Erik Osheim, pagoda_5b
Sorry to reawaken an ancient thread, but given the plethora of alternate techniques for this operation and quite possible the frequent use case, is there any reason why this is not built into the api using the fastest available method? Surely we shouldn't have to run around benchmarking different operations to find the best way to delete an element. Is there a good reason for the omission? (but PS thanks to Rex et al who contributed to this solution)
Tim

Rex Kerr

unread,
Apr 1, 2014, 4:27:03 PM4/1/14
to Tim P, scala-user, Erik Osheim, pagoda_5b
This could be said for every collections operation: why isn't it already the fastest, most efficient possible?

Quite simply because someone has to go around benchmarking all the alternatives to know which is the fastest.  Otherwise, the default is to use the solution that is clearest or which has the best code reuse without being obviously inefficient.

  --Rex



For more options, visit https://groups.google.com/d/optout.

Oliver Ruebenacker

unread,
Dec 9, 2014, 8:06:53 AM12/9/14
to Rex Kerr, Tim P, scala-user, Erik Osheim, pagoda_5b

     Hello,

  Presumably, the larger the Vector, the more efficient it would be to reuse part of the old Vector. I wonder at what size it may become more efficient than rebuilding? Thanks!

     Best, Oliver
--
Oliver Ruebenacker
Solutions Architect at Altisource Labs
Be always grateful, but never satisfied.
Reply all
Reply to author
Forward
0 new messages