Performance and complexity

Cédric Beust ♔

unread,

Jul 19, 2011, 5:07:47 PM7/19/11

to scala-user

Hi everyone,

I became intrigued with a problem that Mark Nelson posted to the list last week. Here is a link to the discussion.

The problem is simple: eliminate all the occurrences of the sequence (26, 20, 64) from a list. For example, given [1, 2, 26, 20, 64, 3, 26, 20, 64, 4], the result should be [1, 2, 3, 4].

Of all the solutions proposed, only the one proposed by Ruediger Keller worked:

def eliminate[@specialized(Byte) T : ClassManifest](xs: Array[T],
slice: Seq[T]): Array[T] = {
val i = xs indexOfSlice slice
if(i == -1) xs else xs.take(i) ++ eliminate(xs.drop(i + slice.length), slice)

}

Daniel Sobral also offered a correct solution but it only works on Byte, so not really applicable here.

I became curious to see what kind of complexity the functional solution above is and how it compares to the trivial and linear imperative version (no need for Boyer-Moore, really):

private static List<Integer> eliminate(List<Integer> value) {

List<Integer> result = Lists.newArrayList();

for (int i = 0; i < value.size(); i++) {

if (value.get(i) == SLICE.get(0) && value.get(i + 1) == SLICE.get(1)

&& value.get(i + 2) == SLICE.get(2)) {

i += 2;

} else {

result.add(value.get(i));

}

return result;

}

I decided to run a few benchmarks and the results were surprising. I tried with various sizes of elements (10k, 100k, 1M, 10M) and with a fixed number of sequences to remove from them (3).

The imperative version solves the problem for lists up to 10M of elements in a few milliseconds.

However, the Scala version above takes 2 seconds for 1000 elements and doesn't finish after a few minutes for 10k elements.

Can someone suggest additional functional solutions to this problem that can at least approach the linear performance of the imperative version?

Also, what are the complexities of these functional solutions?

--

Cédric

Josh Suereth

unread,

Jul 19, 2011, 5:21:51 PM7/19/11

to Cédric Beust ♔, scala-user

Your Java version doesn't abstract over the size of the slice.

I'm working on a better scala version, but part of the pain is avoiding the scala.Stream class right now. I should have something shortly.

- Josh

2011/7/19 Cédric Beust ♔ <ced...@beust.com>

Rex Kerr

unread,

Jul 19, 2011, 5:37:37 PM7/19/11

to Cédric Beust ♔, scala-user

The Scala version uses concat on arrays, which, since concat is O(n), is a O(n^2) operation. Oh, and it's not tail-recursive, which will clobber your stack. (I mentioned this already in that thread.)

It's not hard to avoid those problems:

def eliminated[@specialized(Byte) T: ClassManifest](
xs: Array[T], slice: Seq[T], found: List[Array[T]] = Nil

): Array[T] = {
val i = xs indexOfSlice slice

if (i < 0) (xs :: found).toArray.reverse.flatten
else eliminated(xs.drop(i+slice.length), slice, xs.take(i) :: found)
}

--Rex

2011/7/19 Cédric Beust ♔ <ced...@beust.com>

Dave

unread,

Jul 19, 2011, 5:41:45 PM7/19/11

to scala-user

Maybe you could optimize the fucntion it for tail call recursion

package eliminate

import scala.annotation.tailrec

object Main extends App {

@tailrec
def eliminate[@specialized(Byte) T : ClassManifest](acc: Array[T], xs:
Array[T], slice: Seq[T]): Array[T] = {

val i = xs indexOfSlice slice

i match {
case -1 => acc ++ xs
case _ => eliminate(acc ++ xs.take(i), xs.drop(i + slice.length),
slice)
}
}

val acc : Array[Byte] = Array.empty
val ar = List[Byte](1, 2, 26, 20, 64, 3, 26, 20, 64, 4).toArray
val sl = Seq[Byte](26, 20, 64)
val el = eliminate[Byte](acc, ar, sl)
println(el.toList)

}

Derek Williams

unread,

Jul 19, 2011, 6:59:06 PM7/19/11

to Cédric Beust ♔, scala-user

This seems plenty fast (written quickly so variable names suffer):

def eliminate[A](seq: Seq[A], slice: Seq[A]): List[A] = {
val sliceList = slice.toList
def iter(xs: List[A], ys: List[A], result: List[A], fallback:
List[A]): List[A] = xs match {
case xs if ys.isEmpty => iter(xs, sliceList, result, result)
case Nil => fallback.reverse
case x :: xs if x == ys.head => iter(xs, ys.tail, result, x :: fallback)
case x :: xs =>
val newResult = x :: result
iter(xs, sliceList, newResult, newResult)
}
iter(seq.toList, sliceList, Nil, Nil)
}

Only tested with an Array[Byte] up to 1M, and it was just the original
10 digits repeated. Didn't time it, only tested within the REPL, but
it returns fast even if I convert the List back to an Array.

--
Derek Williams

Derek Williams

unread,

Jul 19, 2011, 7:20:05 PM7/19/11

to Derek Williams, scala-user, Cédric Beust ♔

Woops, little bug there. newResult should be built on fallback, not result.

Derek Williams

Stefan Wagner

unread,

Jul 19, 2011, 10:41:27 PM7/19/11

to scala...@googlegroups.com

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Am 20.07.2011 01:20, schrieb Derek Williams:
> Woops, little bug there. newResult should be built on fallback, not result.

and a second one:

eliminate (Seq(1, 2, 3, 4, 3, 4, 2, 5), Seq(3, 4, 2))
res168: List[Int] = List(1, 2, 3, 4, 3, 4, 2, 5)

easy to find, since I shortly thought a fast solution ... looking for a
3-element pattern from the back, and jumping length(pattern) forward, if
it doesn't match, but: The end of the pattern might match in the
beginning again.

def eliminate[A](seq: Seq[A], slice: Seq[A]): List[A] = {
val sliceList = slice.toList
def iter(xs: List[A], ys: List[A], result: List[A], fallback:
List[A]): List[A] = xs match {
case xs if ys.isEmpty => iter(xs, sliceList, result, result)
case Nil => fallback.reverse
case x :: xs if x == ys.head => iter(xs, ys.tail, result, x :: fallback)
case x :: xs =>

val newResult = x :: fallback

iter(xs, sliceList, newResult, newResult)
}
iter(seq.toList, sliceList, Nil, Nil)
}

I understood ^.

- --

Tschööö--->...Stefan
- ---------------------------
Don't visit my homepage at:
http://home.arcor-online.net/hirnstrom
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk4mQFcACgkQQeATqGpDnRqnUACfZ4FGGSyHweMO2aFwgdsCN4D0
KoIAnAwVGaUl5w+jCAET9TA7EAXZpbBO
=Cy+t
-----END PGP SIGNATURE-----

Derek Williams

unread,

Jul 19, 2011, 10:59:39 PM7/19/11

to Stefan Wagner, scala...@googlegroups.com

On Tue, Jul 19, 2011 at 8:41 PM, Stefan Wagner <hirn...@arcor.de> wrote:
> easy to find, since I shortly thought a fast solution ... looking for a
> 3-element pattern from the back, and jumping length(pattern) forward, if
> it doesn't match, but: The end of the pattern might match in the
> beginning again.

Good catch. I knew I was missing something. I actually had it checking
for that before and I must have "optimized" it away. This one works:

def eliminate[A](seq: Seq[A], slice: Seq[A]): List[A] = {
val sliceList = slice.toList
def iter(xs: List[A], ys: List[A], result: List[A], fallback:
List[A]): List[A] = xs match {
case xs if ys.isEmpty => iter(xs, sliceList, result, result)
case Nil => fallback.reverse
case x :: xs if x == ys.head => iter(xs, ys.tail, result, x ::
fallback)

case x :: xs if ys eq sliceList =>

val newResult = x :: result
iter(xs, sliceList, newResult, newResult)

case xs =>
iter(xs, sliceList, fallback, fallback)
}
iter(seq.toList, sliceList, Nil, Nil)
}

--
Derek Williams

martin odersky

unread,

Jul 20, 2011, 3:12:55 AM7/20/11

to Derek Williams, Stefan Wagner, scala...@googlegroups.com

I should also say, there's really nothing wrong in just coding the imperative algorithm in Scala. It's not a sacrilege to use imperative code, in particular if it's inside a function. Sometimes it's the clearest way to express things. Many other times it is not.

Generally, it is much more important for a system that the interfaces are functional than that the implementations are. Or, put otherwise, a local mutable variable is no big deal, but think twice before introducing a global one!

For instance, if you look inside the scala.collection.immutable.List class you find that most methods use a local mutable ListBuffer and a while loop to fill it.
It's done this way to make these list operations not blow the stack for long lists.
So we have an imperative implementation that ensures good functional behavior on the outside. Compare with OCaml, where lists are purely functional but blow the stack for lists longer than some 1000's of elements.

Cheers

-- Martin

Alex Cruise

unread,

Jul 20, 2011, 3:55:39 AM7/20/11

to scala-user

On Wed, Jul 20, 2011 at 12:12 AM, martin odersky <martin....@epfl.ch> wrote:

... a local mutable variable is no big deal, but think twice before introducing a global one!

I wish I could get a compiler warning when a local mutable variable escapes into a closure... Oh, effect system, will you ever be mine? ;)

-0xe1a

Sadache Aldrobi

unread,

Jul 20, 2011, 4:06:37 AM7/20/11

to martin odersky, Derek Williams, Stefan Wagner, scala...@googlegroups.com

My 2 cents.

def trimStart[A](slice:Seq[A],original:Seq[A]):Seq[A] = {

@scala.annotation.tailrec

def trimSt(sliceLeft:Seq[A],l:Seq[A]):Seq[A] = (sliceLeft,l) match {

case (Nil,_) => l

case (_,Nil) => original

case (h1::t1,h2::t2) => if(h1 == h2) trimSt(t1,t2) else original

}

trimSt(slice,original)

}

def removeOccurence[A](slice:Seq[A],original:Seq[A])= {

@scala.annotation.tailrec

def remove(leftOriginal:Seq[A],result:Seq[A]):Seq[A]=

trimStart(slice,leftOriginal) match{

case(h::tail) => remove(tail,h +: result)

case(Nil) => result.reverse

}

remove(original,Nil)

}

worth noting that you can use a buffer for the result in the second method instead of a Seq (Or even an immutable structure that allows adding elements to the end) so that you don't require the last result.reverse.

Also I am sure it can be written with fewer lines, I just happen to be late for work :)

Sadek

--
www.sadekdrobi.com
ʎdoɹʇuǝ

Sadache Aldrobi

unread,

Jul 20, 2011, 5:20:59 AM7/20/11

to scala...@googlegroups.com

I forgot a call :p

def trimStart[A](slice:Seq[A],original:Seq[A]):Seq[A] = {

@scala.annotation.tailrec

def trimSt(sliceLeft:Seq[A],l:Seq[A]):Seq[A] = (sliceLeft,l) match {

case (Nil,l) => trimStart(slice,l)

--
www.sadekdrobi.com
ʎdoɹʇuǝ

Sadache Aldrobi

unread,

Jul 20, 2011, 5:43:53 AM7/20/11

to scala...@googlegroups.com

I will stop spamming the list, but here is a version with all recursive calls optimized

def trimStart[A](slice:Seq[A],list:Seq[A]):Seq[A] = {

@scala.annotation.tailrec

def trimSt(sliceLeft:Seq[A],l:Seq[A],original:Seq[A]=list):Seq[A] = (sliceLeft,l) match {

case (Nil,l) => trimSt(slice,l,l)

case (_,Nil) => original

case (h1::t1,h2::t2) => if(h1 == h2) trimSt(t1,t2) else original

}

trimSt(slice,list,list)

--
www.sadekdrobi.com
ʎdoɹʇuǝ

martin odersky

unread,

Jul 20, 2011, 4:35:56 PM7/20/11

to Cédric Beust ♔, scala-user

Like I said, if performance matters and the array is big, I would use the imperative solution. If simplicity and elegance matters, I think the most straightforward solution is to use pattern matching over lists:

def elim(xs: List[Int]): List[Int] = xs match {
    case 26 :: 20 :: 64 :: rest => elim(rest)
    case x :: xs1 => x :: elim(xs1)
    case Nil => Nil
}

Cheers

-- Martin

Cédric Beust ♔

unread,

Jul 20, 2011, 5:14:18 PM7/20/11

to martin odersky, scala-user

Hi Martin,

Thanks, that's the most intuitive solution so far, although I see two slight problems with it: 1) it will only work if the slice is known at compile time and 2) it's not tail recursive, so it will most likely blow up the stack on big lists.

Unless I'm missing something?

--

Cédric

2011/7/20 martin odersky <martin....@epfl.ch>

Germán Ferrari

unread,

Jul 20, 2011, 11:12:27 PM7/20/11

to Cédric Beust ♔, scala-user

Hi,

What about this version?

def elim(xs: List[Int], slice: List[Int]): Vector[Int] = {

@tailrec

def elimAcc(xs: List[Int], acc: Vector[Int]): Vector[Int] = xs match {

case xs if(xs.take(3) == slice) => elimAcc(xs drop 3, acc)

case x :: xs1 => elimAcc(xs1, acc :+ x)

case Nil => acc

}

elimAcc(xs, Vector.empty[Int])

}

Regards,

Germán

2011/7/20 Cédric Beust ♔ <ced...@beust.com>

Tony Morris

unread,

Jul 21, 2011, 12:29:52 AM7/21/11

to scala...@googlegroups.com

I mostly agree. If performance matters, and you are doing prefix
matching on a cons list, you might want to consider a more appropriate
data structure. This inappropriate use of data structures is completely
aside from the functional/imperative false dilemma.

http://paste.pocoo.org/show/443309/

--
Tony Morris
http://tmorris.net/

Derek Williams

unread,

Jul 21, 2011, 1:14:47 AM7/21/11

to tmo...@tmorris.net, scala...@googlegroups.com

Do you suggest working with Arrays? Or is there a better alternative?

At the defense of List, I have found prepending to a List, followed by reverse, to be faster than appending to most others, especially other immutable collections.

Derek Williams

Tony Morris

unread,

Jul 21, 2011, 1:19:31 AM7/21/11

to Derek Williams, scala...@googlegroups.com

On 21/07/11 15:14, Derek Williams wrote:
>
> Do you suggest working with Arrays? Or is there a better alternative?
>

This depends ultimately on all the operations required, including how
the "List" came to be in the first place. Certainly, prefix matching on
a cons list (or even a mutable list) for anything but a trivial data set
is not particularly noble and then measuring the performance is... a bit
ridonkulous.

A prefix tree (trie) may be more appropriate (of which there are many
variations), or perhaps even a Burkhard-Keller Tree (a bit of a stretch
there -- need to know all operations).

> At the defense of List, I have found prepending to a List, followed by
> reverse, to be faster than appending to most others, especially other
> immutable collections.
>

Scalaz 7 will include a list with O(1) cons and snoc (at the expense of
other operations). It took some gymnastics to get there though.

Derek Williams

unread,

Jul 21, 2011, 1:25:03 AM7/21/11

to tmo...@tmorris.net, Derek Williams, scala...@googlegroups.com

On Jul 20, 2011 11:19 PM, "Tony Morris" <tonym...@gmail.com> wrote:
> Scalaz 7 will include a list with O(1) cons and snoc (at the expense of
> other operations). It took some gymnastics to get there though.

I'll definitely have to check that out, thanks Tony.

Tony Morris

unread,

Jul 21, 2011, 1:31:49 AM7/21/11

to Derek Williams, scala...@googlegroups.com

https://github.com/scalaz/scalaz/blob/scalaz7/core/src/main/scala/scalaz/Dequeue.scala

It is similar to DList[1], which would unfortunately blow the stack for any large input, so is instead modelled with a pair of lists[2]. Mark Hibberd wrote most of it.

[1]
case class Endo[A](e: A => A)
case class DList[A](k: Endo[List[A]])
http://hackage.haskell.org/package/dlist

[2]
case class Endo[A](e: A => A)
case class Dequeue[A](pre: List[Endo[List[A]]], post: List[Endo[List[A]]])

martin odersky

unread,

Jul 22, 2011, 6:56:44 AM7/22/11

to Cédric Beust ♔, scala-user

2011/7/20 Cédric Beust ♔ <ced...@beust.com>

Hi Martin,

Thanks, that's the most intuitive solution so far, although I see two slight problems with it: 1) it will only work if the slice is known at compile time and 2) it's not tail recursive, so it will most likely blow up the stack on big lists.

Unless I'm missing something?

Not at all. It's not tail recursive, and I would not use on it on very long lists. The first problem (generalize to arbitrary subsequences) is solved by straightforward generalization. The following works for any slice:

def elim(xs: List[Int], slice: Seq[Int]): List[Int] = xs match {
    case _ if xs startsWith slice => xs drop slice.length
    case x :: xs1 => x :: elim(xs1, slice)
    case Nil => Nil
}

To come back to the original problem (long array, performance matters), I would simply fall back on an imperative solution, which is actually also pretty nice:

def elim(xs: Array[Byte], slice: Seq[Byte]): Array[Byte] = {
    val buf = new ArrayBuffer[Byte]
    var i = 0
    while (i < xs.length)
      if (xs startsWith (slice, i)) i += slice.length
      else { buf += xs(i); i += 1 }
    buf.toArray
}

Cheers

-- Martin

Reply all

Reply to author

Forward