The problem is simple: eliminate all the occurrences of the sequence (26, 20, 64) from a list. For example, given [1, 2, 26, 20, 64, 3, 26, 20, 64, 4], the result should be [1, 2, 3, 4].
Of all the solutions proposed, only the one proposed by Ruediger Keller worked:
def eliminate[@specialized(Byte) T : ClassManifest](xs: Array[T], slice: Seq[T]): Array[T] = { val i = xs indexOfSlice slice if(i == -1) xs else xs.take(i) ++ eliminate(xs.drop(i + slice.length), slice)
}
Daniel Sobral also offered a correct solution but it only works on Byte, so not really applicable here.
I became curious to see what kind of complexity the functional solution above is and how it compares to the trivial and linear imperative version (no need for Boyer-Moore, really):
if (value.get(i) == SLICE.get(0) && value.get(i + 1) == SLICE.get(1)
&& value.get(i + 2) == SLICE.get(2)) {
i += 2;
} else {
result.add(value.get(i));
}
}
return result;
}
I decided to run a few benchmarks and the results were surprising. I tried with various sizes of elements (10k, 100k, 1M, 10M) and with a fixed number of sequences to remove from them (3).
The imperative version solves the problem for lists up to 10M of elements in a few milliseconds.
However, the Scala version above takes 2 seconds for 1000 elements and doesn't finish after a few minutes for 10k elements.
Can someone suggest additional functional solutions to this problem that can at least approach the linear performance of the imperative version?
Also, what are the complexities of these functional solutions?
> The problem is simple: eliminate all the occurrences of the sequence (26, > 20, 64) from a list. For example, given [1, 2, 26, 20, 64, 3, 26, 20, 64, > 4], the result should be [1, 2, 3, 4].
> Of all the solutions proposed, only the one proposed by Ruediger Keller > worked:
> Daniel Sobral also offered a correct solution but it only works on Byte, so > not really applicable here.
> I became curious to see what kind of complexity the functional solution > above is and how it compares to the trivial and linear imperative version > (no need for Boyer-Moore, really):
> I decided to run a few benchmarks and the results were surprising. I tried > with various sizes of elements (10k, 100k, 1M, 10M) and with a fixed number > of sequences to remove from them (3).
> The imperative version solves the problem for lists up to 10M of elements > in a few milliseconds.
> However, the Scala version above takes 2 seconds for 1000 elements and > doesn't finish after a few minutes for 10k elements.
> Can someone suggest additional functional solutions to this problem that > can at least approach the linear performance of the imperative version?
> Also, what are the complexities of these functional solutions?
The Scala version uses concat on arrays, which, since concat is O(n), is a O(n^2) operation. Oh, and it's not tail-recursive, which will clobber your stack. (I mentioned this already in that thread.)
It's not hard to avoid those problems:
def eliminated[@specialized(Byte) T: ClassManifest]( xs: Array[T], slice: Seq[T], found: List[Array[T]] = Nil ): Array[T] = { val i = xs indexOfSlice slice if (i < 0) (xs :: found).toArray.reverse.flatten else eliminated(xs.drop(i+slice.length), slice, xs.take(i) :: found)
> The problem is simple: eliminate all the occurrences of the sequence (26, > 20, 64) from a list. For example, given [1, 2, 26, 20, 64, 3, 26, 20, 64, > 4], the result should be [1, 2, 3, 4].
> Of all the solutions proposed, only the one proposed by Ruediger Keller > worked:
> Daniel Sobral also offered a correct solution but it only works on Byte, so > not really applicable here.
> I became curious to see what kind of complexity the functional solution > above is and how it compares to the trivial and linear imperative version > (no need for Boyer-Moore, really):
> I decided to run a few benchmarks and the results were surprising. I tried > with various sizes of elements (10k, 100k, 1M, 10M) and with a fixed number > of sequences to remove from them (3).
> The imperative version solves the problem for lists up to 10M of elements > in a few milliseconds.
> However, the Scala version above takes 2 seconds for 1000 elements and > doesn't finish after a few minutes for 10k elements.
> Can someone suggest additional functional solutions to this problem that > can at least approach the linear performance of the imperative version?
> Also, what are the complexities of these functional solutions?
This seems plenty fast (written quickly so variable names suffer):
def eliminate[A](seq: Seq[A], slice: Seq[A]): List[A] = { val sliceList = slice.toList def iter(xs: List[A], ys: List[A], result: List[A], fallback: List[A]): List[A] = xs match { case xs if ys.isEmpty => iter(xs, sliceList, result, result) case Nil => fallback.reverse case x :: xs if x == ys.head => iter(xs, ys.tail, result, x :: fallback) case x :: xs => val newResult = x :: result iter(xs, sliceList, newResult, newResult) } iter(seq.toList, sliceList, Nil, Nil)
}
Only tested with an Array[Byte] up to 1M, and it was just the original 10 digits repeated. Didn't time it, only tested within the REPL, but it returns fast even if I convert the List back to an Array.
> This seems plenty fast (written quickly so variable names suffer):
> def eliminate[A](seq: Seq[A], slice: Seq[A]): List[A] = { > val sliceList = slice.toList > def iter(xs: List[A], ys: List[A], result: List[A], fallback: > List[A]): List[A] = xs match { > case xs if ys.isEmpty => iter(xs, sliceList, result, result) > case Nil => fallback.reverse > case x :: xs if x == ys.head => iter(xs, ys.tail, result, x :: fallback) > case x :: xs => > val newResult = x :: result > iter(xs, sliceList, newResult, newResult) > } > iter(seq.toList, sliceList, Nil, Nil) > }
> Only tested with an Array[Byte] up to 1M, and it was just the original > 10 digits repeated. Didn't time it, only tested within the REPL, but > it returns fast even if I convert the List back to an Array.
easy to find, since I shortly thought a fast solution ... looking for a 3-element pattern from the back, and jumping length(pattern) forward, if it doesn't match, but: The end of the pattern might match in the beginning again.
def eliminate[A](seq: Seq[A], slice: Seq[A]): List[A] = { val sliceList = slice.toList def iter(xs: List[A], ys: List[A], result: List[A], fallback: List[A]): List[A] = xs match { case xs if ys.isEmpty => iter(xs, sliceList, result, result) case Nil => fallback.reverse case x :: xs if x == ys.head => iter(xs, ys.tail, result, x :: fallback) case x :: xs => val newResult = x :: fallback iter(xs, sliceList, newResult, newResult) } iter(seq.toList, sliceList, Nil, Nil)
On Tue, Jul 19, 2011 at 8:41 PM, Stefan Wagner <hirnst...@arcor.de> wrote: > easy to find, since I shortly thought a fast solution ... looking for a > 3-element pattern from the back, and jumping length(pattern) forward, if > it doesn't match, but: The end of the pattern might match in the > beginning again.
Good catch. I knew I was missing something. I actually had it checking for that before and I must have "optimized" it away. This one works:
def eliminate[A](seq: Seq[A], slice: Seq[A]): List[A] = { val sliceList = slice.toList def iter(xs: List[A], ys: List[A], result: List[A], fallback: List[A]): List[A] = xs match { case xs if ys.isEmpty => iter(xs, sliceList, result, result) case Nil => fallback.reverse case x :: xs if x == ys.head => iter(xs, ys.tail, result, x :: fallback) case x :: xs if ys eq sliceList => val newResult = x :: result iter(xs, sliceList, newResult, newResult) case xs => iter(xs, sliceList, fallback, fallback) } iter(seq.toList, sliceList, Nil, Nil)
I should also say, there's really nothing wrong in just coding the imperative algorithm in Scala. It's not a sacrilege to use imperative code, in particular if it's inside a function. Sometimes it's the clearest way to express things. Many other times it is not.
Generally, it is much more important for a system that the interfaces are functional than that the implementations are. Or, put otherwise, a local mutable variable is no big deal, but think twice before introducing a global one!
For instance, if you look inside the scala.collection.immutable.List class you find that most methods use a local mutable ListBuffer and a while loop to fill it. It's done this way to make these list operations not blow the stack for long lists. So we have an imperative implementation that ensures good functional behavior on the outside. Compare with OCaml, where lists are purely functional but blow the stack for lists longer than some 1000's of elements.
def trimStart[A](slice:Seq[A],original:Seq[A]):Seq[A] = { @scala.annotation.tailrec def trimSt(sliceLeft:Seq[A],l:Seq[A]):Seq[A] = (sliceLeft,l) match { case (Nil,_) => l case (_,Nil) => original case (h1::t1,h2::t2) => if(h1 == h2) trimSt(t1,t2) else original } trimSt(slice,original)
worth noting that you can use a buffer for the result in the second method instead of a Seq (Or even an immutable structure that allows adding elements to the end) so that you don't require the last result.reverse. Also I am sure it can be written with fewer lines, I just happen to be late for work :)
Sadek
On Wed, Jul 20, 2011 at 9:12 AM, martin odersky <martin.oder...@epfl.ch>wrote:
> I should also say, there's really nothing wrong in just coding the > imperative algorithm in Scala. It's not a sacrilege to use imperative code, > in particular if it's inside a function. Sometimes it's the clearest way to > express things. Many other times it is not.
> Generally, it is much more important for a system that the interfaces are > functional than that the implementations are. Or, put otherwise, a local > mutable variable is no big deal, but think twice before introducing a global > one!
> For instance, if you look inside the scala.collection.immutable.List class > you find that most methods use a local mutable ListBuffer and a while loop > to fill it. > It's done this way to make these list operations not blow the stack for > long lists. > So we have an imperative implementation that ensures good functional > behavior on the outside. Compare with OCaml, where lists are purely > functional but blow the stack for lists longer than some 1000's of elements.
def trimStart[A](slice:Seq[A],original:Seq[A]):Seq[A] = { @scala.annotation.tailrec def trimSt(sliceLeft:Seq[A],l:Seq[A]):Seq[A] = (sliceLeft,l) match { case (Nil,l) => *trimStart(slice,l)* case (_,Nil) => original case (h1::t1,h2::t2) => if(h1 == h2) trimSt(t1,t2) else original } trimSt(slice,original)
> worth noting that you can use a buffer for the result in the second method > instead of a Seq (Or even an immutable structure that allows adding elements > to the end) so that you don't require the last result.reverse. > Also I am sure it can be written with fewer lines, I just happen to be late > for work :)
> Sadek
> On Wed, Jul 20, 2011 at 9:12 AM, martin odersky <martin.oder...@epfl.ch>wrote:
>> I should also say, there's really nothing wrong in just coding the >> imperative algorithm in Scala. It's not a sacrilege to use imperative code, >> in particular if it's inside a function. Sometimes it's the clearest way to >> express things. Many other times it is not.
>> Generally, it is much more important for a system that the interfaces are >> functional than that the implementations are. Or, put otherwise, a local >> mutable variable is no big deal, but think twice before introducing a global >> one!
>> For instance, if you look inside the scala.collection.immutable.List class >> you find that most methods use a local mutable ListBuffer and a while loop >> to fill it. >> It's done this way to make these list operations not blow the stack for >> long lists. >> So we have an imperative implementation that ensures good functional >> behavior on the outside. Compare with OCaml, where lists are purely >> functional but blow the stack for lists longer than some 1000's of elements.
I will stop spamming the list, but here is a version with all recursive calls optimized
def trimStart[A](slice:Seq[A],list:Seq[A]):Seq[A] = { @scala.annotation.tailrec def trimSt(sliceLeft:Seq[A],l:Seq[A],*original:Seq[A]=list*):Seq[A] = (sliceLeft,l) match { case (Nil,l) => trimSt(slice,l,l) case (_,Nil) => original case (h1::t1,h2::t2) => if(h1 == h2) trimSt(t1,t2) else original } trimSt(slice,list,list)
>> worth noting that you can use a buffer for the result in the second method >> instead of a Seq (Or even an immutable structure that allows adding elements >> to the end) so that you don't require the last result.reverse. >> Also I am sure it can be written with fewer lines, I just happen to be >> late for work :)
>> Sadek
>> On Wed, Jul 20, 2011 at 9:12 AM, martin odersky <martin.oder...@epfl.ch>wrote:
>>> I should also say, there's really nothing wrong in just coding the >>> imperative algorithm in Scala. It's not a sacrilege to use imperative code, >>> in particular if it's inside a function. Sometimes it's the clearest way to >>> express things. Many other times it is not.
>>> Generally, it is much more important for a system that the interfaces are >>> functional than that the implementations are. Or, put otherwise, a local >>> mutable variable is no big deal, but think twice before introducing a global >>> one!
>>> For instance, if you look inside the scala.collection.immutable.List >>> class you find that most methods use a local mutable ListBuffer and a while >>> loop to fill it. >>> It's done this way to make these list operations not blow the stack for >>> long lists. >>> So we have an imperative implementation that ensures good functional >>> behavior on the outside. Compare with OCaml, where lists are purely >>> functional but blow the stack for lists longer than some 1000's of elements.
Like I said, if performance matters and the array is big, I would use the imperative solution. If simplicity and elegance matters, I think the most straightforward solution is to use pattern matching over lists:
def elim(xs: List[Int]): List[Int] = xs match { case 26 :: 20 :: 64 :: rest => elim(rest) case x :: xs1 => x :: elim(xs1) case Nil => Nil }
Thanks, that's the most intuitive solution so far, although I see two slight problems with it: 1) it will only work if the slice is known at compile time and 2) it's not tail recursive, so it will most likely blow up the stack on big lists.
> Like I said, if performance matters and the array is big, I would use the > imperative solution. If simplicity and elegance matters, I think the most > straightforward solution is to use pattern matching over lists:
> def elim(xs: List[Int]): List[Int] = xs match { > case 26 :: 20 :: 64 :: rest => elim(rest) > case x :: xs1 => x :: elim(xs1) > case Nil => Nil > }
> Thanks, that's the most intuitive solution so far, although I see two > slight problems with it: 1) it will only work if the slice is known at > compile time and 2) it's not tail recursive, so it will most likely blow up > the stack on big lists.
> Unless I'm missing something?
> -- > Cédric
> 2011/7/20 martin odersky <martin.oder...@epfl.ch>
>> Like I said, if performance matters and the array is big, I would use the >> imperative solution. If simplicity and elegance matters, I think the most >> straightforward solution is to use pattern matching over lists:
>> def elim(xs: List[Int]): List[Int] = xs match { >> case 26 :: 20 :: 64 :: rest => elim(rest) >> case x :: xs1 => x :: elim(xs1) >> case Nil => Nil >> }
> Like I said, if performance matters and the array is big, I would use > the imperative solution. If simplicity and elegance matters, I think > the most straightforward solution is to use pattern matching over lists:
> def elim(xs: List[Int]): List[Int] = xs match { > case 26 :: 20 :: 64 :: rest => elim(rest) > case x :: xs1 => x :: elim(xs1) > case Nil => Nil > }
> Cheers
> -- Martin
I mostly agree. If performance matters, and you are doing prefix matching on a cons list, you might want to consider a more appropriate data structure. This inappropriate use of data structures is completely aside from the functional/imperative false dilemma.
Do you suggest working with Arrays? Or is there a better alternative?
At the defense of List, I have found prepending to a List, followed by reverse, to be faster than appending to most others, especially other immutable collections.
Derek Williams On Jul 20, 2011 10:30 PM, "Tony Morris" <tonymor...@gmail.com> wrote:
> On 21/07/11 06:35, martin odersky wrote: >> Like I said, if performance matters and the array is big, I would use >> the imperative solution. If simplicity and elegance matters, I think >> the most straightforward solution is to use pattern matching over lists:
>> def elim(xs: List[Int]): List[Int] = xs match { >> case 26 :: 20 :: 64 :: rest => elim(rest) >> case x :: xs1 => x :: elim(xs1) >> case Nil => Nil >> }
>> Cheers
>> -- Martin
> I mostly agree. If performance matters, and you are doing prefix > matching on a cons list, you might want to consider a more appropriate > data structure. This inappropriate use of data structures is completely > aside from the functional/imperative false dilemma.
> Do you suggest working with Arrays? Or is there a better alternative?
This depends ultimately on all the operations required, including how the "List" came to be in the first place. Certainly, prefix matching on a cons list (or even a mutable list) for anything but a trivial data set is not particularly noble and then measuring the performance is... a bit ridonkulous.
A prefix tree (trie) may be more appropriate (of which there are many variations), or perhaps even a Burkhard-Keller Tree (a bit of a stretch there -- need to know all operations).
> At the defense of List, I have found prepending to a List, followed by > reverse, to be faster than appending to most others, especially other > immutable collections.
Scalaz 7 will include a list with O(1) cons and snoc (at the expense of other operations). It took some gymnastics to get there though.
> On Jul 20, 2011 11:19 PM, "Tony Morris" <tonymor...@gmail.com > <mailto:tonymor...@gmail.com>> wrote: > > Scalaz 7 will include a list with O(1) cons and snoc (at the expense of > > other operations). It took some gymnastics to get there though.
> I'll definitely have to check that out, thanks Tony.
It is similar to DList[1], which would unfortunately blow the stack for any large input, so is instead modelled with a pair of lists[2]. Mark Hibberd wrote most of it.
> Thanks, that's the most intuitive solution so far, although I see two > slight problems with it: 1) it will only work if the slice is known at > compile time and 2) it's not tail recursive, so it will most likely blow up > the stack on big lists.
> Unless I'm missing something?
> Not at all. It's not tail recursive, and I would not use on it on very long
lists. The first problem (generalize to arbitrary subsequences) is solved by straightforward generalization. The following works for any slice:
def elim(xs: List[Int], slice: Seq[Int]): List[Int] = xs match { case _ if xs startsWith slice => xs drop slice.length case x :: xs1 => x :: elim(xs1, slice) case Nil => Nil }
To come back to the original problem (long array, performance matters), I would simply fall back on an imperative solution, which is actually also pretty nice:
def elim(xs: Array[Byte], slice: Seq[Byte]): Array[Byte] = { val buf = new ArrayBuffer[Byte] var i = 0 while (i < xs.length) if (xs startsWith (slice, i)) i += slice.length else { buf += xs(i); i += 1 } buf.toArray }