The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Performance and complexity
 There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic. There was an error processing your request. Please try again. Standard view   View as tree
 22 messages

From:
To:
Cc:
Followup To:
Subject:
 Validation: For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon.

More options Jul 19 2011, 5:07 pm
From: Cédric Beust ♔ <ced...@beust.com>
Date: Tue, 19 Jul 2011 14:07:47 -0700
Local: Tues, Jul 19 2011 5:07 pm
Subject: Performance and complexity

Hi everyone,

I became intrigued with a problem that Mark Nelson posted to the list last
week. Here is a link to the
.

The problem is simple: eliminate all the occurrences of the sequence (26,
20, 64) from a list. For example, given [1, 2, 26, 20, 64, 3, 26, 20, 64,
4], the result should be [1, 2, 3, 4].

Of all the solutions proposed, only the one proposed by Ruediger Keller
worked:

def eliminate[@specialized(Byte) T : ClassManifest](xs: Array[T],
slice: Seq[T]): Array[T] = {
val i = xs indexOfSlice slice
if(i == -1) xs else xs.take(i) ++ eliminate(xs.drop(i + slice.length),
slice)

}

Daniel Sobral also offered a correct solution but it only works on Byte, so
not really applicable here.

I became curious to see what kind of complexity the functional solution
above is and how it compares to the trivial and linear imperative version
(no need for Boyer-Moore, really):

private static List<Integer> eliminate(List<Integer> value) {

List<Integer> result = Lists.newArrayList();

for (int i = 0; i < value.size(); i++) {

if (value.get(i) == SLICE.get(0) && value.get(i + 1) == SLICE.get(1)

&& value.get(i + 2) == SLICE.get(2)) {

i += 2;

} else {

}

}

return result;

}

I decided to run a few benchmarks and the results were surprising. I tried
with various sizes of elements (10k, 100k, 1M, 10M) and with a fixed number
of sequences to remove from them (3).

The imperative version solves the problem for lists up to 10M of elements in
a few milliseconds.

However, the Scala version above takes 2 seconds for 1000 elements and
doesn't finish after a few minutes for 10k elements.

Can someone suggest additional functional solutions to this problem that can
at least approach the linear performance of the imperative version?

Also, what are the complexities of these functional solutions?

--
Cédric

To post a message you must first join this group.
You do not have the permission required to post.
More options Jul 19 2011, 5:21 pm
From: Josh Suereth <joshua.suer...@gmail.com>
Date: Tue, 19 Jul 2011 17:21:51 -0400
Local: Tues, Jul 19 2011 5:21 pm
Subject: Re: [scala-user] Performance and complexity

Your Java version doesn't abstract over the size of the slice.

I'm working on a better scala version, but part of the pain is avoiding the
scala.Stream class right now.   I should have something shortly.

- Josh

2011/7/19 Cédric Beust ♔ <ced...@beust.com>

To post a message you must first join this group.
You do not have the permission required to post.
More options Jul 19 2011, 5:37 pm
From: Rex Kerr <icho...@gmail.com>
Date: Tue, 19 Jul 2011 17:37:37 -0400
Local: Tues, Jul 19 2011 5:37 pm
Subject: Re: [scala-user] Performance and complexity

The Scala version uses concat on arrays, which, since concat is O(n), is a
O(n^2) operation.  Oh, and it's not tail-recursive, which will clobber your

It's not hard to avoid those problems:

def eliminated[@specialized(Byte) T: ClassManifest](
xs: Array[T], slice: Seq[T], found: List[Array[T]] = Nil
): Array[T] = {
val i = xs indexOfSlice slice
if (i < 0) (xs :: found).toArray.reverse.flatten
else eliminated(xs.drop(i+slice.length), slice, xs.take(i) :: found)

}

--Rex

2011/7/19 Cédric Beust ♔ <ced...@beust.com>

To post a message you must first join this group.
You do not have the permission required to post.
More options Jul 19 2011, 5:41 pm
From: Dave <dave.mahabiers...@hotmail.com>
Date: Tue, 19 Jul 2011 14:41:45 -0700 (PDT)
Local: Tues, Jul 19 2011 5:41 pm
Subject: Re: Performance and complexity
Maybe you could optimize the fucntion it for tail call recursion

package eliminate

import scala.annotation.tailrec

object Main extends App {

@tailrec
def eliminate[@specialized(Byte) T : ClassManifest](acc: Array[T], xs:
Array[T], slice: Seq[T]): Array[T] = {
val i = xs indexOfSlice slice
i match {
case -1 => acc ++ xs
case _  => eliminate(acc ++ xs.take(i), xs.drop(i + slice.length),
slice)
}

}

val acc : Array[Byte] = Array.empty
val ar = List[Byte](1, 2, 26, 20, 64, 3, 26, 20, 64, 4).toArray
val sl = Seq[Byte](26, 20, 64)
val el = eliminate[Byte](acc, ar, sl)
println(el.toList)

To post a message you must first join this group.
You do not have the permission required to post.
More options Jul 19 2011, 6:59 pm
From: Derek Williams <de...@nebvin.ca>
Date: Tue, 19 Jul 2011 16:59:06 -0600
Local: Tues, Jul 19 2011 6:59 pm
Subject: Re: [scala-user] Performance and complexity
This seems plenty fast (written quickly so variable names suffer):

def eliminate[A](seq: Seq[A], slice: Seq[A]): List[A] = {
val sliceList = slice.toList
def iter(xs: List[A], ys: List[A], result: List[A], fallback:
List[A]): List[A] = xs match {
case xs if ys.isEmpty => iter(xs, sliceList, result, result)
case Nil => fallback.reverse
case x :: xs if x == ys.head => iter(xs, ys.tail, result, x :: fallback)
case x :: xs =>
val newResult = x :: result
iter(xs, sliceList, newResult, newResult)
}
iter(seq.toList, sliceList, Nil, Nil)

}

Only tested with an Array[Byte] up to 1M, and it was just the original
10 digits repeated. Didn't time it, only tested within the REPL, but
it returns fast even if I convert the List back to an Array.

--
Derek Williams

To post a message you must first join this group.
You do not have the permission required to post.
More options Jul 19 2011, 7:20 pm
From: Derek Williams <de...@nebvin.ca>
Date: Tue, 19 Jul 2011 17:20:05 -0600
Local: Tues, Jul 19 2011 7:20 pm
Subject: Re: [scala-user] Performance and complexity

Woops, little bug there. newResult should be built on fallback, not result.

Derek Williams
On Jul 19, 2011 4:59 PM, "Derek Williams" <de...@nebvin.ca> wrote:

To post a message you must first join this group.
You do not have the permission required to post.
More options Jul 19 2011, 10:41 pm
From: Stefan Wagner <hirnst...@arcor.de>
Date: Wed, 20 Jul 2011 04:41:27 +0200
Local: Tues, Jul 19 2011 10:41 pm
Subject: Re: [scala-user] Performance and complexity
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Am 20.07.2011 01:20, schrieb Derek Williams:

> Woops, little bug there. newResult should be built on fallback, not result.

and a second one:

eliminate (Seq(1, 2, 3, 4, 3, 4, 2, 5), Seq(3, 4, 2))
res168: List[Int] = List(1, 2, 3, 4, 3, 4, 2, 5)

easy to find, since I shortly thought a fast solution ... looking for a
3-element pattern from the back, and jumping length(pattern) forward, if
it doesn't match, but: The end of the pattern might match in the
beginning again.

def eliminate[A](seq: Seq[A], slice: Seq[A]): List[A] = {
val sliceList = slice.toList
def iter(xs: List[A], ys: List[A], result: List[A], fallback:
List[A]): List[A] = xs match {
case xs if ys.isEmpty => iter(xs, sliceList, result, result)
case Nil => fallback.reverse
case x :: xs if x == ys.head => iter(xs, ys.tail, result, x :: fallback)
case x :: xs =>
val newResult = x :: fallback
iter(xs, sliceList, newResult, newResult)
}
iter(seq.toList, sliceList, Nil, Nil)

}

I understood ^.

- --

Tschööö--->...Stefan
- ---------------------------
Don't visit my homepage at:
http://home.arcor-online.net/hirnstrom
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk4mQFcACgkQQeATqGpDnRqnUACfZ4FGGSyHweMO2aFwgdsCN4D0
KoIAnAwVGaUl5w+jCAET9TA7EAXZpbBO
=Cy+t
-----END PGP SIGNATURE-----

To post a message you must first join this group.
You do not have the permission required to post.
More options Jul 19 2011, 10:59 pm
From: Derek Williams <de...@nebvin.ca>
Date: Tue, 19 Jul 2011 20:59:39 -0600
Local: Tues, Jul 19 2011 10:59 pm
Subject: Re: [scala-user] Performance and complexity

On Tue, Jul 19, 2011 at 8:41 PM, Stefan Wagner <hirnst...@arcor.de> wrote:
> easy to find, since I shortly thought a fast solution ... looking for a
> 3-element pattern from the back, and jumping length(pattern) forward, if
> it doesn't match, but: The end of the pattern might match in the
> beginning again.

Good catch. I knew I was missing something. I actually had it checking
for that before and I must have "optimized" it away. This one works:

def eliminate[A](seq: Seq[A], slice: Seq[A]): List[A] = {
val sliceList = slice.toList
def iter(xs: List[A], ys: List[A], result: List[A], fallback:
List[A]): List[A] = xs match {
case xs if ys.isEmpty => iter(xs, sliceList, result, result)
case Nil => fallback.reverse
case x :: xs if x == ys.head => iter(xs, ys.tail, result, x ::
fallback)
case x :: xs if ys eq sliceList =>
val newResult = x :: result
iter(xs, sliceList, newResult, newResult)
case xs =>
iter(xs, sliceList, fallback, fallback)
}
iter(seq.toList, sliceList, Nil, Nil)

}

--
Derek Williams

To post a message you must first join this group.
You do not have the permission required to post.
More options Jul 20 2011, 3:12 am
From: martin odersky <martin.oder...@epfl.ch>
Date: Wed, 20 Jul 2011 09:12:55 +0200
Local: Wed, Jul 20 2011 3:12 am
Subject: Re: [scala-user] Performance and complexity

I should also say, there's really nothing wrong in just coding the
imperative algorithm in Scala. It's not a sacrilege to use imperative code,
in particular if it's inside a function. Sometimes it's the clearest way to
express things. Many other times it is not.

Generally, it is much more important for a system that the interfaces are
functional than that the implementations are. Or, put otherwise, a local
mutable variable is no big deal, but think twice before introducing a global
one!

For instance, if you look inside the scala.collection.immutable.List class
you find that most methods use a local mutable ListBuffer and a while loop
to fill it.
It's done this way to make these list operations not blow the stack for long
lists.
So we have an imperative implementation that ensures good functional
behavior on the outside. Compare with OCaml, where lists are purely
functional but blow the stack for lists longer than some 1000's of elements.

Cheers

-- Martin

To post a message you must first join this group.
You do not have the permission required to post.
More options Jul 20 2011, 3:55 am
From: Alex Cruise <a...@cluonflux.com>
Date: Wed, 20 Jul 2011 00:55:39 -0700
Local: Wed, Jul 20 2011 3:55 am
Subject: Re: [scala-user] Performance and complexity

On Wed, Jul 20, 2011 at 12:12 AM, martin odersky <martin.oder...@epfl.ch>wrote:

> ... a local mutable variable is no big deal, but think twice before
> introducing a global one!

I wish I could get a compiler warning when a local mutable variable escapes
into a closure... Oh, effect system, will you ever be mine? ;)

-0xe1a

To post a message you must first join this group.
You do not have the permission required to post.
More options Jul 20 2011, 4:06 am
Date: Wed, 20 Jul 2011 10:06:37 +0200
Local: Wed, Jul 20 2011 4:06 am
Subject: Re: [scala-user] Performance and complexity

My 2 cents.

def trimStart[A](slice:Seq[A],original:Seq[A]):Seq[A] = {
@scala.annotation.tailrec
def trimSt(sliceLeft:Seq[A],l:Seq[A]):Seq[A] = (sliceLeft,l) match {
case (Nil,_) => l
case (_,Nil) => original
case (h1::t1,h2::t2) => if(h1 == h2)  trimSt(t1,t2) else original
}
trimSt(slice,original)

}

def removeOccurence[A](slice:Seq[A],original:Seq[A])= {
@scala.annotation.tailrec
def remove(leftOriginal:Seq[A],result:Seq[A]):Seq[A]=
trimStart(slice,leftOriginal) match{
case(h::tail) => remove(tail,h +: result)
case(Nil) => result.reverse
}
remove(original,Nil)
}

}

worth noting that you can use a buffer for the result in the second method
instead of a Seq (Or even an immutable structure that allows adding elements
to the end) so that you don't require the last result.reverse.
Also I am sure it can be written with fewer lines, I just happen to be late
for work :)

On Wed, Jul 20, 2011 at 9:12 AM, martin odersky <martin.oder...@epfl.ch>wrote:

--
ʎdoɹʇuǝ

To post a message you must first join this group.
You do not have the permission required to post.
More options Jul 20 2011, 5:20 am
Date: Wed, 20 Jul 2011 11:20:59 +0200
Local: Wed, Jul 20 2011 5:20 am
Subject: Re: [scala-user] Performance and complexity

I forgot a call :p

def trimStart[A](slice:Seq[A],original:Seq[A]):Seq[A] = {
@scala.annotation.tailrec
def trimSt(sliceLeft:Seq[A],l:Seq[A]):Seq[A] = (sliceLeft,l) match {
case (Nil,l) => *trimStart(slice,l)*
case (_,Nil) => original
case (h1::t1,h2::t2) => if(h1 == h2)  trimSt(t1,t2) else original
}
trimSt(slice,original)

}

def removeOccurence[A](slice:Seq[A],original:Seq[A])= {
@scala.annotation.tailrec
def remove(leftOriginal:Seq[A],result:Seq[A]):Seq[A]=
trimStart(slice,leftOriginal) match{
case(h::tail) => remove(tail,h +: result)
case(Nil) => result.reverse
}
remove(original,Nil)
}

--
ʎdoɹʇuǝ

To post a message you must first join this group.
You do not have the permission required to post.
More options Jul 20 2011, 5:43 am
Date: Wed, 20 Jul 2011 11:43:53 +0200
Local: Wed, Jul 20 2011 5:43 am
Subject: Re: [scala-user] Performance and complexity

I will stop spamming the list, but here is a version with all recursive
calls optimized

def trimStart[A](slice:Seq[A],list:Seq[A]):Seq[A] = {
@scala.annotation.tailrec
def trimSt(sliceLeft:Seq[A],l:Seq[A],*original:Seq[A]=list*):Seq[A] =
(sliceLeft,l) match {
case (Nil,l) => trimSt(slice,l,l)
case (_,Nil) => original
case (h1::t1,h2::t2) => if(h1 == h2)  trimSt(t1,t2) else original
}
trimSt(slice,list,list)

}

def removeOccurence[A](slice:Seq[A],original:Seq[A])= {
@scala.annotation.tailrec
def remove(leftOriginal:Seq[A],result:Seq[A]):Seq[A]=
trimStart(slice,leftOriginal) match{
case(h::tail) => remove(tail,h +: result)
case(Nil) => result.reverse
}
remove(original,Nil)
}

--
ʎdoɹʇuǝ

To post a message you must first join this group.
You do not have the permission required to post.
More options Jul 20 2011, 4:35 pm
From: martin odersky <martin.oder...@epfl.ch>
Date: Wed, 20 Jul 2011 22:35:56 +0200
Local: Wed, Jul 20 2011 4:35 pm
Subject: Re: [scala-user] Performance and complexity

Like I said, if performance matters and the array is big, I would use the
imperative solution. If simplicity and elegance matters, I think the most
straightforward solution is to use pattern matching over lists:

def elim(xs: List[Int]): List[Int] = xs match {
case 26 :: 20 :: 64 :: rest => elim(rest)
case x :: xs1 => x :: elim(xs1)
case Nil => Nil
}

Cheers

-- Martin

To post a message you must first join this group.
You do not have the permission required to post.
More options Jul 20 2011, 5:14 pm
From: Cédric Beust ♔ <ced...@beust.com>
Date: Wed, 20 Jul 2011 14:14:18 -0700
Local: Wed, Jul 20 2011 5:14 pm
Subject: Re: [scala-user] Performance and complexity

Hi Martin,

Thanks, that's the most intuitive solution so far, although I see two slight
problems with it: 1) it will only work if the slice is known at compile time
and 2) it's not tail recursive, so it will most likely blow up the stack on
big lists.

Unless I'm missing something?

--
Cédric

2011/7/20 martin odersky <martin.oder...@epfl.ch>

To post a message you must first join this group.
You do not have the permission required to post.
More options Jul 20 2011, 11:12 pm
From: Germán Ferrari <german.ferr...@gmail.com>
Date: Thu, 21 Jul 2011 00:12:27 -0300
Local: Wed, Jul 20 2011 11:12 pm
Subject: Re: [scala-user] Performance and complexity

Hi,

def elim(xs: List[Int], slice: List[Int]): Vector[Int] = {
@tailrec
def elimAcc(xs: List[Int], acc: Vector[Int]): Vector[Int] = xs match {
case xs if(xs.take(3) == slice) => elimAcc(xs drop 3, acc)
case x :: xs1 => elimAcc(xs1, acc :+ x)
case Nil => acc
}

elimAcc(xs, Vector.empty[Int])

}

Regards,
Germán

2011/7/20 Cédric Beust ♔ <ced...@beust.com>

To post a message you must first join this group.
You do not have the permission required to post.
More options Jul 21 2011, 12:29 am
From: Tony Morris <tonymor...@gmail.com>
Date: Thu, 21 Jul 2011 14:29:52 +1000
Local: Thurs, Jul 21 2011 12:29 am
Subject: Re: [scala-user] Performance and complexity
On 21/07/11 06:35, martin odersky wrote:

> Like I said, if performance matters and the array is big, I would use
> the imperative solution. If simplicity and elegance matters, I think
> the most straightforward solution is to use pattern matching over lists:

>  def elim(xs: List[Int]): List[Int] = xs match {
>     case 26 :: 20 :: 64 :: rest => elim(rest)
>     case x :: xs1 => x :: elim(xs1)
>     case Nil => Nil
>   }

> Cheers

>  -- Martin

I mostly agree. If performance matters, and you are doing prefix
matching on a cons list, you might want to consider a more appropriate
data structure. This inappropriate use of data structures is completely
aside from the functional/imperative false dilemma.

--
Tony Morris
http://tmorris.net/

To post a message you must first join this group.
You do not have the permission required to post.
More options Jul 21 2011, 1:14 am
From: Derek Williams <de...@nebvin.ca>
Date: Wed, 20 Jul 2011 23:14:47 -0600
Local: Thurs, Jul 21 2011 1:14 am
Subject: Re: [scala-user] Performance and complexity

Do you suggest working with Arrays? Or is there a better alternative?

At the defense of List, I have found prepending to a List, followed by
reverse, to be faster than appending to most others, especially other
immutable collections.

Derek Williams
On Jul 20, 2011 10:30 PM, "Tony Morris" <tonymor...@gmail.com> wrote:

To post a message you must first join this group.
You do not have the permission required to post.
More options Jul 21 2011, 1:19 am
From: Tony Morris <tonymor...@gmail.com>
Date: Thu, 21 Jul 2011 15:19:31 +1000
Local: Thurs, Jul 21 2011 1:19 am
Subject: Re: [scala-user] Performance and complexity
On 21/07/11 15:14, Derek Williams wrote:

> Do you suggest working with Arrays? Or is there a better alternative?

This depends ultimately on all the operations required, including how
the "List" came to be in the first place. Certainly, prefix matching on
a cons list (or even a mutable list) for anything but a trivial data set
is not particularly noble and then measuring the performance is... a bit
ridonkulous.

A prefix tree (trie) may be more appropriate (of which there are many
variations), or perhaps even a Burkhard-Keller Tree (a bit of a stretch
there -- need to know all operations).

> At the defense of List, I have found prepending to a List, followed by
> reverse, to be faster than appending to most others, especially other
> immutable collections.

Scalaz 7 will include a list with O(1) cons and snoc (at the expense of
other operations). It took some gymnastics to get there though.

--
Tony Morris
http://tmorris.net/

To post a message you must first join this group.
You do not have the permission required to post.
More options Jul 21 2011, 1:25 am
From: Derek Williams <de...@nebvin.ca>
Date: Wed, 20 Jul 2011 23:25:03 -0600
Local: Thurs, Jul 21 2011 1:25 am
Subject: Re: [scala-user] Performance and complexity

On Jul 20, 2011 11:19 PM, "Tony Morris" <tonymor...@gmail.com> wrote:

> Scalaz 7 will include a list with O(1) cons and snoc (at the expense of
> other operations). It took some gymnastics to get there though.

I'll definitely have to check that out, thanks Tony.

To post a message you must first join this group.
You do not have the permission required to post.
More options Jul 21 2011, 1:31 am
From: Tony Morris <tonymor...@gmail.com>
Date: Thu, 21 Jul 2011 15:31:49 +1000
Local: Thurs, Jul 21 2011 1:31 am
Subject: Re: [scala-user] Performance and complexity

On 21/07/11 15:25, Derek Williams wrote:

> On Jul 20, 2011 11:19 PM, "Tony Morris" <tonymor...@gmail.com
> <mailto:tonymor...@gmail.com>> wrote:
> > Scalaz 7 will include a list with O(1) cons and snoc (at the expense of
> > other operations). It took some gymnastics to get there though.

> I'll definitely have to check that out, thanks Tony.

> > --
> > Tony Morris
> > http://tmorris.net/

https://github.com/scalaz/scalaz/blob/scalaz7/core/src/main/scala/sca...

It is similar to DList[1], which would unfortunately blow the stack for
any large input, so is instead modelled with a pair of lists[2]. Mark
Hibberd wrote most of it.

[1]
case class Endo[A](e: A => A)
case class DList[A](k: Endo[List[A]])

[2]
case class Endo[A](e: A => A)
case class Dequeue[A](pre: List[Endo[List[A]]], post: List[Endo[List[A]]])

--
Tony Morris
http://tmorris.net/

To post a message you must first join this group.
You do not have the permission required to post.
More options Jul 22 2011, 6:56 am
From: martin odersky <martin.oder...@epfl.ch>
Date: Fri, 22 Jul 2011 12:56:44 +0200
Local: Fri, Jul 22 2011 6:56 am
Subject: Re: [scala-user] Performance and complexity

2011/7/20 Cédric Beust ♔ <ced...@beust.com>

> Hi Martin,

> Thanks, that's the most intuitive solution so far, although I see two
> slight problems with it: 1) it will only work if the slice is known at
> compile time and 2) it's not tail recursive, so it will most likely blow up
> the stack on big lists.

> Unless I'm missing something?

> Not at all. It's not tail recursive, and I would not use on it on very long

lists. The first problem (generalize to arbitrary subsequences) is solved by
straightforward generalization. The following works for any slice:

def elim(xs: List[Int], slice: Seq[Int]): List[Int] = xs match {
case _ if xs startsWith slice => xs drop slice.length
case x :: xs1 => x :: elim(xs1, slice)
case Nil => Nil
}

To come back to the original problem (long array, performance matters), I
would simply fall back on an imperative solution, which is actually also
pretty nice:

def elim(xs: Array[Byte], slice: Seq[Byte]): Array[Byte] = {
val buf = new ArrayBuffer[Byte]
var i = 0
while (i < xs.length)
if (xs startsWith (slice, i)) i += slice.length
else { buf += xs(i); i += 1 }
buf.toArray
}

Cheers

-- Martin