I've been learning scala recently (already know java/c#/python) and
been thoroughly enjoying the higher order collections functions.
filter/map is pretty obvious, and I've more or less figured out
foldLeft/reduceLeft. I've heard somewhere that you probably shouldn't
use foldRight/reduceRight because of O(n^2) complexity, but how about
the non-specific fold/reduce? The scaladoc for reduce says:
Reduces the elements of this sequence using the specified associative
binary operator.
The order in which operations are performed on elements is unspecified
and may be nondeterministic.
Which sounds very unsettling, particularly the "unspecified and
nondeterministic" part! Is there any case where I should use a plain
"reduce", rather than reduceLeft? I can't think of a case where i'd
prefer a unspecified and non-determistic execution order, unless it's
something to do with letting parallel collections work properly in
parallel. Or is the unspecified and nondeterministic part a non-issue,
and I can shorten all my ".reduceLeft"s to ".reduce"s?
Thanks!
-Haoyi
_ + _
_ - _
_ * _
....
Cheers,
√
>
> Thanks!
> -Haoyi
--
Viktor Klang
Akka Tech Lead
Typesafe - The software stack for applications that scale
Twitter: @viktorklang
Hello Everyone,
I've been learning scala recently (already know java/c#/python) and
been thoroughly enjoying the higher order collections functions.
filter/map is pretty obvious, and I've more or less figured out
foldLeft/reduceLeft. I've heard somewhere that you probably shouldn't
use foldRight/reduceRight because of O(n^2) complexity
, but how about
the non-specific fold/reduce? The scaladoc for reduce says:
Reduces the elements of this sequence using the specified associative
binary operator.
The order in which operations are performed on elements is unspecified
and may be nondeterministic.
Which sounds very unsettling, particularly the "unspecified and
nondeterministic" part! Is there any case where I should use a plain
"reduce", rather than reduceLeft?
-Haoyi
On Sat, Feb 11, 2012 at 7:32 AM, Rex Kerr <ich...@gmail.com> wrote:
> On Sat, Feb 11, 2012 at 5:59 AM, Alex Repain <alex....@gmail.com> wrote:
>>
>>
>>
>> 2012/2/10 Rex Kerr <ich...@gmail.com>
>>>
>>> On Fri, Feb 10, 2012 at 1:36 PM, Haoyi Li <haoy...@gmail.com> wrote:
>>>>
>>>> Hello Everyone,
>>>>
>>>> I've been learning scala recently (already know java/c#/python) and
>>>> been thoroughly enjoying the higher order collections functions.
>>>> filter/map is pretty obvious, and I've more or less figured out
>>>> foldLeft/reduceLeft. I've heard somewhere that you probably shouldn't
>>>> use foldRight/reduceRight because of O(n^2) complexity
>>>
>>>
>>> That would be a strange implementation. The typical implementation
>>> requires O(n) stack space, which is the real problem--for many collections
>>> of practical size, you throw stack overflow exceptions.
>>
>>
>> Does it have to be on the stack though ? Can't the function be tail
>> recursive on the reversed collection ?
>
>
> Of course. xs.foldRight(z)(f) is the same as xs.reverse.foldLeft(z)((a,b)
> => f(b,a)). It's often not implemented that way in the library, however.
>
> --Rex
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
More importantly than algorithmic complexity, for non-strict data
On 02/12/2012 08:24 AM, Rex Kerr wrote:
> For example, if you have something that is lazily computed (e.g. a Stream
> or a view), you have just committed yourself to computing the entire
> collection (potentially). Depending on your fold, you might not need to.
structures, a foldRight written using foldLeft is a different function
altogether, since for some values that would typically terminate, they
no longer do. That is to say, your program will no longer work.
- --
Tony Morris
http://tmorris.net/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
iQEcBAEBAgAGBQJPN3whAAoJEPxHMY3rBz0PgdEH+waEDVG2gveEjFrT0waONK6k
TFlh2wJistsBE3/CrOAXlszRddad/RORpvHmlEL7j1hMFeE1iTlYg4BW9SBS1O2I
NlhCBOKFZmwGOok+A+t/OyDmjs7Bg+Mdo4xnu1BZQxQkfaedVYsGCitCtqvppeZU
HIsVlUYAmx6Dr9E3rl15zcyGQNb9INPFXIcSvBNjYCBUsGkudO+NfeoyH6P7H+ho
nEwPzazVWuL5dH0GPBQrAwe4ASO9rSqE8tSl9hf2W8hNEMKdY46KIZtB8LZmUD2G
KL1SdUBq728nszqAkuQE4u9nPFWOn8xkhCHKRAtL+uuJ8AvFsNzitN7PLNKqrUI=
=L9ZZ
-----END PGP SIGNATURE-----
2012/2/12 Tony Morris <tonym...@gmail.com>-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
More importantly than algorithmic complexity, for non-strict data
On 02/12/2012 08:24 AM, Rex Kerr wrote:
> For example, if you have something that is lazily computed (e.g. a Stream
> or a view), you have just committed yourself to computing the entire
> collection (potentially). Depending on your fold, you might not need to.
structures, a foldRight written using foldLeft is a different function
altogether, since for some values that would typically terminate, they
no longer do. That is to say, your program will no longer work.
Interesting, could you provide an example for that case, please ?
On 02/12/2012 08:06 PM, Alex Repain wrote:
> 2012/2/12 Tony Morris <tonym...@gmail.com>
>
> On 02/12/2012 08:24 AM, Rex Kerr wrote:
>>>> For example, if you have something that is lazily computed (e.g. a Stream
>>>> or a view), you have just committed yourself to computing the entire
>>>> collection (potentially). Depending on your fold, you might not need to.
>
> More importantly than algorithmic complexity, for non-strict data
> structures, a foldRight written using foldLeft is a different function
> altogether, since for some values that would typically terminate, they
> no longer do. That is to say, your program will no longer work.
>
>
>> Interesting, could you provide an example for that case, please ?
Yepper. http://paste.pocoo.org/show/549682/
case class EStream[A](x: Option[(() => A, () => EStream[A])]) {
import EStream._
def foldRight[B](b: => B)(f: (=> A, => B) => B): B =
x match {
case Some((h,t)) => f(h(), t().foldRight(b)(f))
case None => b
}
@annotation.tailrec
final def foldLeft[B](b: B)(f: (B, A) => B): B =
x match {
case Some((h,t)) => t().foldLeft(f(b,h()))(f)
case None => b
}
def reverse: EStream[A] =
foldLeft(nil[A])((b, a) => cons(() => a, () => b))
def foldRightBroken[B](b: => B)(f: (=> A, => B) => B): B =
foldLeft(b)((a, b) => f(b, a))
}
object EStream {
def rep[A](a: => A): EStream[A] =
cons(() => a, () => rep(a))
def cons[A](h: () => A, t: () => EStream[A]): EStream[A] =
EStream(Some(h, t))
def nil[A]: EStream[A] =
EStream(None)
}
object M {
import EStream._
def main(args: Array[String]) {
val ones = rep(1)
val twos = ones.foldRight(nil[Int])((a, b) => cons(() => a+1, () => b))
println(twos.x match {
case None => "hi!"
case Some((h, _)) => "lo" + h()
})
val oopsie = ones.foldRightBroken(nil[Int])((a, b) => cons(() =>
a+1, () => b))
println("you ain't never gonna be seeing this")
}
}
- --
Tony Morris
http://tmorris.net/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
iQEcBAEBAgAGBQJPN6a0AAoJEPxHMY3rBz0PvyEIAJyPTHpOnbI0aLH5qfF33xnt
ZmjAjgl972XnSdmDlTGy/Ygre6p86snvRK3Bh25FljrmZzKOddSy80R813SP0X9A
mk71heUa9fTKWu1F4BJvZ0xf3Dd+44nNoD6Ynx2yxrs53HZMp3m54Zo3yx4XydNA
9ht0Qx9hLDvsdfo7iJhhH8PiY6kBQ+gS3Idzvb5iYra5A0sQFRIPUlN6wQ7Vj/er
jdhsdJqLubEWz9fguJdKG3SkHVK/Z/f/JRUUpVfQ8AZOPY0y7D8+WToZuMLbuP7D
vrKbTR43s7ZEsjXj+BTEwOBadO2PYREtfg/U0nAbB9YVTvrIt5p27AAPmhQTauI=
=DdQZ
-----END PGP SIGNATURE-----
On Feb 12, 2012 1:03 AM, "Haoyi Li" <haoy...@gmail.com> wrote:
>
> Why isn't it, though? It seems to me that having O(n) heap space (for
> the reversed collection) would be superior to having O(n) stack space,
> which may blow up your stack. And they're both O(n) runtime, so
> risking a stack overflow seems much more dangerous than running a
> constant factor slower.
It is not true that for all values of xs then foldRight can be written with foldLeft. I should imagine this is why it is not implemented that way. Indeed, it is only true for very few heap-based structures like List.
Cheers,
√
--
On 02/12/2012 08:24 AM, Rex Kerr wrote:
> For example, if you have something that is lazily computed (e.g. a Stream
> or a view), you have just committed yourself to computing the entire
> collection (potentially). Depending on your fold, you might not need to.
More importantly than algorithmic complexity, for non-strict data
structures, a foldRight written using foldLeft is a different function
altogether, since for some values that would typically terminate, they
no longer do. That is to say, your program will no longer work.
- --
Tony Morris
http://tmorris.net/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
iQEcBAEBAgAGBQJPN3whAAoJEPxHMY3rBz0PgdEH+waEDVG2gveEjFrT0waONK6k
On Fri, Feb 10, 2012 at 1:36 PM, Haoyi Li <haoy...@gmail.com> wrote:Hello Everyone,
I've been learning scala recently (already know java/c#/python) and
been thoroughly enjoying the higher order collections functions.
filter/map is pretty obvious, and I've more or less figured out
foldLeft/reduceLeft. I've heard somewhere that you probably shouldn't
use foldRight/reduceRight because of O(n^2) complexity
That would be a strange implementation. The typical implementation requires O(n) stack space, which is the real problem--for many collections of practical size, you throw stack overflow exceptions.
, but how about
2012/2/10 Rex Kerr <ich...@gmail.com>On Fri, Feb 10, 2012 at 1:36 PM, Haoyi Li <haoy...@gmail.com> wrote:Hello Everyone,
I've been learning scala recently (already know java/c#/python) and
been thoroughly enjoying the higher order collections functions.
filter/map is pretty obvious, and I've more or less figured out
foldLeft/reduceLeft. I've heard somewhere that you probably shouldn't
use foldRight/reduceRight because of O(n^2) complexity
That would be a strange implementation. The typical implementation requires O(n) stack space, which is the real problem--for many collections of practical size, you throw stack overflow exceptions.
Does it have to be on the stack though ? Can't the function be tail recursive on the reversed collection ?
On Feb 12, 2012 1:03 AM, "Haoyi Li" <haoy...@gmail.com> wrote:
>
> Why isn't it, though? It seems to me that having O(n) heap space (for
> the reversed collection) would be superior to having O(n) stack space,
> which may blow up your stack. And they're both O(n) runtime, so
> risking a stack overflow seems much more dangerous than running a
> constant factor slower.
It is not true that for all values of xs then foldRight can be written with foldLeft. I should imagine this is why it is not implemented that way. Indeed, it is only true for very few heap-based structures like List.
The use case I'm looking at is using the | operator in the parser
combinator library: I want to parametrize a parser by a list of tokens
it can accept. i.e. instead of hard coding it into the parser:
object MyParser extends RegexParsers{
def operator = "+" | "-" | "*" | "/"
}
I do something like this
class MyParser(opChars: List[String] = List("+", "-", "*", "/")
extends RegexParsers{
def operator = opChars.reduce(_|_)
}
Which would then let the person instantiating the parser to override
the default choice of operators if need be. My worry is that if I have
something like
A | B | C
I could end up with either
A | (B | C)
or
(A | B) | C
Which are technically different parsers: one has the subtree on the
left and one has the subtree on the right. However, both of them
should parse the same input in the same way. Is there any reason I
shouldn't use the non-deterministic reduce in this case? I'm
completely new to formal language theory/parser generators/parser
combinators, so it's also possible that what i'm trying to do is
completely off base.
Thanks!
-Haoyi
Hi Haoyi,
| combinator is technically left-associative, so that A | B | C is interpreted as (A | B) | C.
But semantically it is (fully) associative, i.e. (A | B) | C and A | (B | C) have exactly the
same behavior: first A is tried, then B, then C. That is these variants will always produce the same
result.
This is how it works:
1) (A | B) | C is an alternation parser as a whole, so first its left-hand-side is tried (that is
the ´A|B´ parser), then the right-hand-side (that is the ´C´ parser). The ´A|B´ parser is again an
alternation parser, so first ´A´ is tried, then ´B´. The overall order of applying parsers to the
input is A, B, C.
2) in ´A | (B | C)´, first ´A´ is tried, then ´B|C´. When ´B|C´ comes into play, then first ´B´,
then ´C´. The overall order is again A, B, C.
So, no difference in results regardless of folding direction.
But as for the runtime behavior, the right-folded version could be more effective, since the call to
the first parser goes directly to it from the outer parser. In the left-folded version it have to go
through (several) intermediate alternation combinators. But if the optimizer did a good job and
these calls got inlined, then there should be no difference at runtime, too.
(What can also make a difference is the overall order of alternatives depending on their
probabilities, but it's not related to the folding question.)
For an example of folding operator-parsers you might take a look at implementation in StdLexical [1]
and usage example in Json-Parser [2].
--
EL
[2] https://github.com/scala/scala/blob/v2.9.1/src/library/scala/util/parsing/json/Parser.scala#L120
should be better read as
But semantically it is (fully) associative, i.e. (A | B) | C and A | (B | C) always produce the
same result, since the order of applying involved parsers is always the same: first A is tried, then
B, then C.