Message from discussion
Performance and complexity
Received: by 10.42.161.71 with SMTP id s7mr3653746icx.87.1311111463575;
Tue, 19 Jul 2011 14:37:43 -0700 (PDT)
X-BeenThere: scala-user@googlegroups.com
Received: by 10.231.185.5 with SMTP id cm5ls2786443ibb.3.gmail; Tue, 19 Jul
2011 14:37:38 -0700 (PDT)
Received: by 10.42.153.137 with SMTP id m9mr4021654icw.41.1311111458793;
Tue, 19 Jul 2011 14:37:38 -0700 (PDT)
Received: by 10.42.153.137 with SMTP id m9mr4021653icw.41.1311111458779;
Tue, 19 Jul 2011 14:37:38 -0700 (PDT)
Return-Path: <icho...@gmail.com>
Received: from mail-iy0-f182.google.com (mail-iy0-f182.google.com [209.85.210.182])
by gmr-mx.google.com with ESMTPS id w7si2923595icv.6.2011.07.19.14.37.38
(version=TLSv1/SSLv3 cipher=OTHER);
Tue, 19 Jul 2011 14:37:38 -0700 (PDT)
Received-SPF: pass (google.com: domain of icho...@gmail.com designates 209.85.210.182 as permitted sender) client-ip=209.85.210.182;
Authentication-Results: gmr-mx.google.com; spf=pass (google.com: domain of icho...@gmail.com designates 209.85.210.182 as permitted sender) smtp.mail=icho...@gmail.com; dkim=pass (test mode) header...@gmail.com
Received: by iyb11 with SMTP id 11so5466350iyb.13
for <scala-user@googlegroups.com>; Tue, 19 Jul 2011 14:37:38 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
d=gmail.com; s=gamma;
h=mime-version:in-reply-to:references:date:message-id:subject:from:to
:cc:content-type;
bh=75F9W5ndFu3xQAh1IJCBuSUDpCzll2Mh/WgdRmwJuYI=;
b=pK6bPZpyJQ/33DE2qnAaL/LSd8b2G9BghUtrGiF1DJ5HVTVYlsDyNUimNYzHgCX4sa
Ed1Zg1IZcKDggYRmFdkTM35hKOnAaw70I+CgFErwucJ3mbrkgPQxZi0j3LNPpJ8LyftA
zZrjXLK4k/3G+Btr8l8zYPkoF14xjLtHK4EME=
MIME-Version: 1.0
Received: by 10.43.46.193 with SMTP id up1mr2610394icb.349.1311111457156; Tue,
19 Jul 2011 14:37:37 -0700 (PDT)
Received: by 10.42.213.199 with HTTP; Tue, 19 Jul 2011 14:37:37 -0700 (PDT)
In-Reply-To: <CAOphgJAiUb5Pfx2Rbtc4K7_gk=EvzfnXmJpm89t_sO5XS9u_Mg@mail.gmail.com>
References: <CAOphgJAiUb5Pfx2Rbtc4K7_gk=EvzfnXmJpm89t_sO5XS9u_Mg@mail.gmail.com>
Date: Tue, 19 Jul 2011 17:37:37 -0400
Message-ID: <CAP_xLa1v8ELB7eAXCjubt91Y=Qt7-X=h+K=UB8nwHAgPaPn...@mail.gmail.com>
Subject: Re: [scala-user] Performance and complexity
From: Rex Kerr <icho...@gmail.com>
To: =?UTF-8?B?Q8OpZHJpYyBCZXVzdCDimZQ=?= <ced...@beust.com>
Cc: scala-user <scala-user@googlegroups.com>
Content-Type: multipart/alternative; boundary=bcaec5299593aa2f0d04a872ecab
--bcaec5299593aa2f0d04a872ecab
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
The Scala version uses concat on arrays, which, since concat is O(n), is a
O(n^2) operation. Oh, and it's not tail-recursive, which will clobber your
stack. (I mentioned this already in that thread.)
It's not hard to avoid those problems:
def eliminated[@specialized(Byte) T: ClassManifest](
xs: Array[T], slice: Seq[T], found: List[Array[T]] =3D Nil
): Array[T] =3D {
val i =3D xs indexOfSlice slice
if (i < 0) (xs :: found).toArray.reverse.flatten
else eliminated(xs.drop(i+slice.length), slice, xs.take(i) :: found)
}
--Rex
2011/7/19 C=C3=A9dric Beust =E2=99=94 <ced...@beust.com>
> Hi everyone,
>
> I became intrigued with a problem that Mark Nelson posted to the list las=
t
> week. Here is a link to the discussion<http://groups.google.com/group/sca=
la-user/browse_thread/thread/f26f40d0a19e7c27>
> .
>
> The problem is simple: eliminate all the occurrences of the sequence (26,
> 20, 64) from a list. For example, given [1, 2, 26, 20, 64, 3, 26, 20, 64,
> 4], the result should be [1, 2, 3, 4].
>
> Of all the solutions proposed, only the one proposed by Ruediger Keller
> worked:
>
> def eliminate[@specialized(Byte) T : ClassManifest](xs: Array[T],
> slice: Seq[T]): Array[T] =3D {
> val i =3D xs indexOfSlice slice
> if(i =3D=3D -1) xs else xs.take(i) ++ eliminate(xs.drop(i + slice.lengt=
h),
> slice)
>
> }
>
> Daniel Sobral also offered a correct solution but it only works on Byte, =
so
> not really applicable here.
>
> I became curious to see what kind of complexity the functional solution
> above is and how it compares to the trivial and linear imperative version
> (no need for Boyer-Moore, really):
>
> private static List<Integer> eliminate(List<Integer> value) {
>
> List<Integer> result =3D Lists.newArrayList();
>
> for (int i =3D 0; i < value.size(); i++) {
>
> if (value.get(i) =3D=3D SLICE.get(0) && value.get(i + 1) =3D=3D SLI=
CE.get(1)
>
> && value.get(i + 2) =3D=3D SLICE.get(2)) {
>
> i +=3D 2;
>
> } else {
>
> result.add(value.get(i));
>
> }
>
> }
>
>
> return result;
>
> }
>
> I decided to run a few benchmarks and the results were surprising. I trie=
d
> with various sizes of elements (10k, 100k, 1M, 10M) and with a fixed numb=
er
> of sequences to remove from them (3).
>
> The imperative version solves the problem for lists up to 10M of elements
> in a few milliseconds.
>
> However, the Scala version above takes 2 seconds for 1000 elements and
> doesn't finish after a few minutes for 10k elements.
>
> Can someone suggest additional functional solutions to this problem that
> can at least approach the linear performance of the imperative version?
>
> Also, what are the complexities of these functional solutions?
>
> --
> C=C3=A9dric
>
>
>
--bcaec5299593aa2f0d04a872ecab
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
The Scala version uses concat on arrays, which, since concat is O(n), is a =
O(n^2) operation.=C2=A0 Oh, and it's not tail-recursive, which will clo=
bber your stack.=C2=A0 (I mentioned this already in that thread.)<br><br>It=
's not hard to avoid those problems:<br>
<br>def eliminated[@specialized(Byte) T: ClassManifest](<br>=C2=A0 xs: Arra=
y[T], slice: Seq[T], found: List[Array[T]] =3D Nil<br>): Array[T] =3D {<br>=
=C2=A0 val i =3D xs indexOfSlice slice<br>=C2=A0 if (i < 0) (xs :: found=
).toArray.reverse.flatten<br>
=C2=A0 else eliminated(xs.drop(i+slice.length), slice, xs.take(i) :: found)=
<br>}<br><br>=C2=A0 --Rex<br><br><div class=3D"gmail_quote">2011/7/19 C=C3=
=A9dric Beust =E2=99=94 <span dir=3D"ltr"><<a href=3D"mailto:cedric@beus=
t.com">ced...@beust.com</a>></span><br>
<blockquote class=3D"gmail_quote" style=3D"margin: 0pt 0pt 0pt 0.8ex; borde=
r-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">Hi everyone,<div>=
<br></div><div>I became intrigued with a problem that Mark Nelson posted to=
the list last week. <a href=3D"http://groups.google.com/group/scala-user/b=
rowse_thread/thread/f26f40d0a19e7c27" target=3D"_blank">Here is a link to t=
he discussion</a>.</div>
<div><br></div><div>The problem is simple: eliminate all the occurrences of=
the sequence (26, 20, 64) from a list. For example, given [1, 2, 26, 20, 6=
4, 3, 26, 20, 64, 4], the result should be [1, 2, 3, 4].</div><div><br>
</div><div>Of all the solutions proposed, only the one proposed by Ruediger=
Keller worked:</div><div><span style=3D"font-family: arial,sans-serif; fon=
t-size: 12px;"><p>def eliminate[@specialized(Byte) T : ClassManifest](xs: A=
rray[T],=C2=A0<br>
slice: Seq[T]): Array[T] =3D {=C2=A0<br>=C2=A0 val i =3D xs indexOfSlice sl=
ice=C2=A0<br>=C2=A0 if(i =3D=3D -1) xs else xs.take(i) ++ eliminate(xs.drop=
(i + slice.length), slice)=C2=A0<br></p><p></p><div style=3D"display: block=
;">
}=C2=A0</div></span></div><div><br></div><div>Daniel Sobral also offered a =
correct solution but it only works on Byte, so not really applicable here.<=
/div><div><br></div><div>I became curious to see what kind of complexity th=
e functional solution above is and how it compares to the trivial and linea=
r imperative version (no need for Boyer-Moore, really):</div>
<div><br></div><div>
<p>=C2=A0=C2=A0<span>private</span> <span>static</span> List<Integer>=
eliminate(List<Integer> value) {</p>
<p>=C2=A0 =C2=A0 List<Integer> result =3D Lists.newArrayList();</p>
<p>=C2=A0 =C2=A0 <span>for</span> (<span>int</span> i =3D 0; i < value.s=
ize(); i++) {</p>
<p>=C2=A0 =C2=A0 =C2=A0 <span>if</span> (value.get(i) =3D=3D <span>SLICE</s=
pan>.get(0) && value.get(i + 1) =3D=3D <span>SLICE</span>.get(1)</p=
><p>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 && value.get(i + 2) =3D=3D <=
span>SLICE</span>.get(2)) {</p>
<p><span style=3D"color: rgb(0, 0, 0);">=C2=A0 =C2=A0 =C2=A0 =C2=A0 i +=3D =
2;</span></p>
<p>=C2=A0 =C2=A0 =C2=A0 } <span>else</span> {</p>
<p>=C2=A0 =C2=A0 =C2=A0 =C2=A0 result.add(value.get(i));</p>
<p>=C2=A0 =C2=A0 =C2=A0 }</p>
<p>=C2=A0 =C2=A0 }</p>
<p><br></p>
<p>=C2=A0 =C2=A0 <span>return</span> result;</p>
<p>=C2=A0 }</p></div><div><br></div><div>I decided to run a few benchmarks =
and the results were surprising. I tried with various sizes of elements (10=
k, 100k, 1M, 10M) and with a fixed number of sequences to remove from them =
(3).</div>
<div><br></div><div>The imperative version solves the problem for lists up =
to 10M of elements in a few milliseconds.</div><div><br></div><div>However,=
the Scala version above takes 2 seconds for 1000 elements and doesn't =
finish after a few minutes for 10k elements.</div>
<div><br></div><div>Can someone suggest additional functional solutions to =
this problem that can at least approach the linear performance of the imper=
ative version?</div><div><br></div><div>Also, what are the complexities of =
these functional solutions?</div>
<div><br></div><div>--=C2=A0<div>C=C3=A9dric<br><br></div><br>
</div>
</blockquote></div><br>
--bcaec5299593aa2f0d04a872ecab--