SIP: Comprehensive Comprehensions

705 views
Skip to first unread message

Chris Hodapp

unread,
Sep 17, 2013, 12:27:56 AM9/17/13
to scala...@googlegroups.com


Hi, I'd like to propose the following as a result of my GSOC 2013 project:

SIP: Comprehensive Comprehensions

Motivation

Scala's for comprehensions can be an extremely useful tool for concisely representing a specific type of query-like monadic computation without creating a difficult-to-read nesting structure. However, in their current form, for- comprehensions have severe limitations when used to express queries, especially when compared with purpose-designed query languages, such as SQL, and other language-integrated query systems, such as Microsoft's LINQ. Specifically, for-comprehensions lack integrated support for two kinds of iteration- transforming operations that both SQL and LINQ support, making it rather painful (and verbose) to write some types of queries in Scala. This proposal seeks to properly define these two pain points and propose solutions for each.

Shape-Preserving Transformations

SQL defines the LIMITORDER BYWHERE, and HAVING clauses. For our purposes, we can say that LINQ supports all of these operations, while Scala's for-comprehensions support WHERE and HAVING through guards. The Scala standard library also provides methods that implement the functionality of LIMIT (in take) and ORDER BY (in sortBy), but these methods do not integrate very well with for-comprehensions. For example, in order to iterate over xs and xs, two Seq[Int]'s and receive (x, y) pairs in increasing-sum order, one would have to write something like:

val result = (
    for {
        x <- xs
        y <- ys
    } yield (x, y)
 ).sortBy { case (x, y) => x + y }

In addition to being ugly, this method of applying the sort requires that the identifier names from the comprehension be manually rebound in order to be used in the sort. It also breaks the for-comprehension, since these very same identifiers would need to be bound a third time to continue it.

SQL's LIMIT is similar to the take operation that is implemented on Scala's collections. However, using take for this purpose has similar drawbacks to using sortBy, as above:

val result = (
    for {
        x <- xs
        y <- ys
    } yield (x, y)
 ).take(100)

Although we don't have to rebind x and y here, we still have to wrap the comprehension in parentheses and cannot continue it after the take, unless we rebind the identifiers we've been using. For demonstration, such a rebinding would look like:

val result = for {
  (x, y) <- (
      for {
          x <- xs
          y <- ys
      } yield (x, y)
   ).take(100)
   z <- xz
} yield (x, y, z)

Aggregating Transformations

In addition to the clauses described above, SQL defines a GROUP BY clause and a set of aggregate functions. This type of clause is also supported in LINQ. A GROUP BY clause accepts a tuple of column names (possibly with the DESC keyword to reverse the sort). All of the columns not listed in the grouping tuple are transformed into 'aggregate' columns in SELECT and WHERE clauses. An aggregate function, such as SUMAVGMAXCOUNT, gets a single value from an aggregate column, and allows the aggregate column to be used as a value.

In Scala, aggregation is especially cumbersome and does not integrate with for- comprehensions very well at all. The Scala standard library features a number of aggregating methods on collections (in contrast to SQL, which only provides GROUP BY), as well as a number of aggregate methods, which can be used with the results.

Scala's aggregating methods can be divided into two categories; There are key-generating aggregating methods and non-key-generating aggregating methods. The most commonly-used key-generating aggregating methods is groupBy. This method takes a function, which maps from from a collection element to a "group key". It returns a Map, viewable as a sequence of Tuple2's, where _1 is a key _2 is a collection of collection elements in the group (elements are grouped on key identity). In code form:

val g = List(1,2,3,4).groupBy(_ % 2)
g == Map(1 -> List(1, 3), 0 -> List(2, 4))

Non-key-gene rationing aggregating methods seem to be a bit less popular, but several are defined in the Scala standard library. These methods turn a collection (of elements of a given type) into a collection of collections (of elements of that same type). An example is inits, which turns a sequence into an iterator of sequences, with the first being the full sequence and each subsequent sequence being shorter in length by 1. In code:

val i = List(1,2,3).inits
i.next == List(1,2,3)
i.next == List(1,2)
i.next == List(1)
i.next == List()

Aggregate methods in Scala's standard library reduce a collection to a single value and include "max", "min", and "sum".

Note: The previous descriptions refer to specific methods, but what makes a method a transformer, an aggretate function, etc. is its return type. Any method that is defined on a collection and returns correct type is a transformation under this definition

Proposal

For-Transformers

I propose that extensions be made to the Scala syntax and alterations made to its parser to allow for integration of transforming operations and for-comprehensions. Specifically, I propose that two (symmetric) syntactic constructs be added to Scala, with one improving the syntax of each of the transformation types described above.

Then-Transformers

Let's first improve the syntax of shape-preserving transformations. To do this, we introduce the concept of then-transformers. A then-transformer is a line within the enumerator list of a for-loop or for-comprehension that (starts with the keyord then) and runs an operation on the current state of the comprehension. To understand what this means, let's look at one:

val result = for {
   x <- xs
   y <- ys
   then take 100
   z <- xz
} yield (x, y, z)

What does this comprehension do? In fact, it is functionally identical to comprehension directly above the 'Aggregating Transformations' header. Conceptually, we are drawingx from xs, drawing y from yscroping to 100 results, and drawing z from zsin that order. Of course, any shape-preserving transformation operation can be used in a then-transformer.

There is a slight wrinkle: sortBy takes a function from the collection element type to a sorting key; We can't just pass it a standard expression, based on the bound-above identifiers, and expect it to work. Instead, we draft the with keyword to request that the compiler generate a function for us:

val result = for {
      x <- xs
      y <- ys
      then sortBy with (x + y)
} yield (x, y)

As you might guess, this is equivalent to the first comprehension above. The with (x + y) causes the compiler to generate a function that maps from a comprehension element to the sum of x and y for that element.

Group-Transformers

Non-key-generating aggregating transformers can be integrated with for-comprehensions with a similar extension, group-transformers:

val v = for {
  x <- Seq(1, 2)
  y <- Seq('a', 'b')
  group inits
} yield x zip y

v.toList == List(
  List((1,'a'), (1,'b'), (2,'a'), (2,'b')),
  List((1,'a'), (1,'b'), (2,'a')),
  List((1,'a'), (1,'b')),
  List((1,'a')),
  List()
)

So what is going on here? In short, when we hit group inits, type of each already-bound identifier changes, becoming an aggregate (collection of collections). In this case, they become the prefixes of the sequences that x and y draw from, in order of decreasing length. Note that conceptually, it is the comprehension being grouped, not the identifiers, so the x-prefix and the y-prefix are the same length on each iteration.

A much more commonly-used aggregating transformer is groupBy. However, because groupBy is a key-generating aggregate transformation, we need a way to the compiler that the key exists and needs to be captured:

val v = for {
  x <- 1 to 5
  y <- 1 to 5
  product <- group groupBy with (x * y)
} yield (product, x zip y toSet)
v == Map(
  1 -> Set((1,1)),
  2 -> Set((1,2), (2,1)),
  3 -> Set((1,3), (3,1)),
  4 -> Set((1,4), (2,2), (4,1)),
  5 -> Set((1,5), (5,1)),
  6 -> Set((2,3), (3,2)),
  8 -> Set((2,4), (4,2)),
  9 -> Set((3,3)),
  10 -> Set((2,5), (5,2)),
  12 -> Set((3,4), (4,3)),
  15 -> Set((3,5), (5,3)),
  16 -> Set((4,4)),
  20 -> Set((4,5), (5,4)),
  25 -> Set((5,5))
)

This will create a map from each potential product to a Set of x-y pairs that create that product.

Here is one more example, which shows how group-transformers can be useful, even when there is only one normal enumerator:

val v = for {
  str <- strs if str.nonEmpty
  firstLetter <- group groupBy with str.head
} yield (firstLetter, str.maxBy(_.length))

This gives the longest string in a sequence starting with each letter (actually, a Map from the starting letter to the word).

A Brief Return to Then-Transformers

Presently, the Scala standard library does not define any key-generating shape-preserving transformations. However, it is entirely possible that it or user-made libraries will in the future. Therefore, in the interest of symmetry between then-transformers and group-transformers, I propose that then-transformers also support the value-drawing syntax:

val numberedLines = for {
  line <- lines
  num <- then numbered
} yield (num, line)

Note that numbered is a fictional method which functions much like zipWithIndex, but returns Pairs with the numbering in _1 and the element-values in _2.

Formalization

New Keywords

My proposal requires that group be made a keyword and that then be made a proper keyword (as opposed to use-deprecated identifier that it currently is).

Grammar Changes

My proposal requires that the Scala grammar be amended such that the following EBNF rules are part of it:

Expr1 ::= ‘for’ (‘(’ Enumerators ‘)’ | ‘{’ Enumerators ‘}’)
{nl} [‘yield’] Expr
Enumerators ::= Generator {semi Enumerator}
Enumerator ::= Generator
             | Guard
             | [‘val’] Pattern1 ‘=’ Expr
             | ForTransformer
Generator ::= Pattern1 ‘<-’ Expr [Guard]
Guard ::= ‘if’ PostfixExpr
ForTransformer ::= [pattern '<='] (ThenExpr | GroupExpr)
TransformerOp ::= InfixTransformerOp [id [nl]]
InfixTransformerOp ::= SimpleTransformerOp
                     | TransformerOp id [nl] InfixExpr
SimpleTransformerOp ::= `then'
                      | `group'
                      | SimpleTransformerOp `.' id
                      | SimpleTransformerOp TypeArgs
                      | SimpleTransformerOp ArgumentExprs
Expr1 ::= ‘if’ ‘(’ Expr ‘)’ {nl} Expr [[semi] else Expr]
        | ‘while’ ‘(’ Expr ‘)’ {nl} Expr
        | ‘try’ ‘{’ Block ‘}’ [‘catch’ ‘{’ CaseClauses ‘}’] [‘finally’ Expr]
        | ‘do’ Expr [semi] ‘while’ ‘(’ Expr ’)’
        | ‘for’ (‘(’ Enumerators ‘)’ | ‘{’ Enumerators ‘}’) {nl} [‘yield’] Expr
        | ‘throw’ Expr
        | ‘return’ [Expr]
        | [SimpleExpr ‘.’] id ‘=’ Expr
        | SimpleExpr1 ArgumentExprs ‘=’ Expr
        | [`with'] PostfixExpr
        | [`with'] PostfixExpr Ascription
        | [`with'] PostfixExpr ‘match’ ‘{’ CaseClauses ‘}’
PostfixExpr ::= InfixWithExpr [id [nl]]
InfixWithExpr ::= InfixExpr
                | InfixExpr id [nl] `with' PrefixExpr

Note: Rules where the LHS is the same as that of an existing rule replace the existing rule.

Desugaring

A for loop or comprehension of the form:

for {
  enums1
  then.op
  enums2
} body

would be desugared as follows:

for {
  BoundNames(enums1) <-
    SubThens(SubstitueWiths(op), for (enums1) yield BoundNames)
  enums2
} body

BoundNames is a tuple (or tuple-pattern) of all the names bound in enums1

SubThens replaces any instances of then in its first argument with its second argument.

SubstitueWiths replaces a with-expression of the form:

with x

with a function of the form:

{ case BoundNames => x }

A for loop or comprehension of the form for { enums1 pattern <- then.op enums2 } body

would be desugared as follows:

for {
  (pattern, FilterRepeated(BoundNames)) <-
    SubThens(
      SubstitueWiths(op),
      for (enums1) yield BoundNames(enums1)
    )
  enums2
} body

FilterRepeated replaces any identifier in the passed-in region that also appears in 'pattern' with _.

When the group keyword is used, a normalization step is required. This takes the form of an extraction function being mapped over the result of the transformer's op.

A for loop or comprehension of the form:

for {
  enums1
  group.op
  enums2
} body

would be desugared as follows:

for {
  BoundNames <-
    SubGroups(
      SubstitueWiths(op),
      for (enums1) yield BoundNames
    ).map { x =>

      val BoundNames(1) = x.map { case BoundNames =>
        BoundNames(1)
      }

      val BoundNames(n) = x.map { case BoundNames =>
        BoundNames(n)
      }

      BoundNames
    }
  enums2
} body

SubGroups replaces any instances of group in its first argument with its second argument.

BoundNames(i) is the ith name bound in enums1.

The variable x is a fresh variable.

A for loop or comprehension of the form

for {
  enums1
  pattern <- group.op
  enums2
} body

would be desugared as follows:

for {
  (pattern, FilterRepeated(BoundNames)) <-
    SubGroups(
      SubstitueWiths(op),
      for (enums1) yield BoundNames(enums1)
    ).map { case (x, y) =>

      val BoundNames(1) = y.map { case BoundNames =>
        BoundNames(1)
      }

      val BoundNames(n) = y.map { case BoundNames =>
        BoundNames(n)
      }

      (x, BoundNames)
    }
  enums2
} body

FAQ

Q: How do comprehensive comprehensions compare with LINQ?

A: Although both are DSLs for writing typesafe queries in a host language, LINQ works by hard-coding support for specific collection operations. In contrast, comprehensive comprehensions support any collection operation that meets certain type signature requirements, making them much more versatile.

Q: What about database queries?

A: Comprehensive comprehensions will integrate beautifully with the SLICK query library, once SLICK adds support for nested queries. Until then, group-transformers won't work properly with SLICK, but then-transformers will work fine.

Q: How complicated is the implementation?

A: The complete patch (including tests) is shorter than this specification.

Q: When can I test this feature out?

A: Now. There's a build that is intended to fully meet this specification available on scala-comprehensions.org. The build currently available is binary-compatible with Scala 2.10.2. If there is sufficient interest, I can probably set up a Maven repository to make testing smoother.

Chris Hodapp

unread,
Sep 17, 2013, 1:40:55 AM9/17/13
to scala...@googlegroups.com
Here are a few more desugarings, in case anyone wants further examples:


Example 1:
for {
  x <- 5 to 1 by -1
  y <- 'a' to 'z'
  then sortBy with x
  z <- 10 to 20
} yield x * 2

Desugaring:
for {
  (x, y) <- {
    (
      for {
        x <- 5 to 1 by -1
        y <- 'a' to 'z'
      } yield (x, y)
    ).sortBy { case (x, y) => x }
  }
  z <- 10 to 20
} yield x * 2


Example 2:
for {
  x <- 5 to 1 by -1
  y <- 'a' to 'z'
  key <- then sortBy2 with x
  z <- 10 to 20
} yield x * 2

(Where sortBy2 returns a pair of the value returned by the sorting function
and the actual sorted element)

Desugaring:
for {
  (key, (x, y)) <- {
    (
      for {
        x <- 5 to 1 by -1
        y <- 'a' to 'z'
      } yield (x, y)
    ).sortBy2 { case (x, y) =>
      x
    }
  }
  z <- 10 to 20
} yield x * 2

Example 3:
for {
  x <- 5 to 1 by -1
  y <- 'a' to 'z'
  group inits
  z <- 10 to 20
} yield ((x :+ 0).max, y)

Desugaring:
for {
  (x, y) <- {
    (
      for {
        x <- 5 to 1 by -1
        y <- 'a' to 'z'
      } yield (x, y)
    ).inits.map { p =>
      val x = p.map { case (x, y) => x }
      val y = p.map { case (x, y) => y }
      (x, y)
    }
  }
  z <- 10 to 20
} yield ((x :+ 0).max, y)


Example 4:
for {
  x <- 5 to 1 by -1
  y <- 'a' to 'z'
  g <- group groupBy with x
  z <- 10 to 20
} yield (x.max, y)

Desugaring:

for {
  (g, (x, y)) <- {
    (
      for {
        x <- 5 to 1 by -1
        y <- 'a' to 'z'
      } yield (x, y)
    ).groupBy { case (x, y) =>
      x
    }.map { case (p1, p2) =>
      val x = p2.map { case (x, y) => x }
      val y = p2.map { case (x, y) => y }
      (p1, (x, y))
    }
  }
  z <- 10 to 20
} yield (x.max, y)

Chris Hodapp

unread,
Sep 17, 2013, 3:15:55 AM9/17/13
to scala...@googlegroups.com
For those who want to comment line-wise, I put the proposal in Google Drive.

Justin du coeur

unread,
Sep 17, 2013, 8:51:12 AM9/17/13
to scala...@googlegroups.com
A few comments from the non-expert peanut gallery:

The proposal's very intriguing.  I like the general shape of it, providing LINQish capability without feeling quite so arbitrary and SQL-centric.  That said, two details strike me funny.

The minor one is the use of "with" -- I just don't find that specific keyword intuitive here, and would suggest at least thinking about, eg, "on" instead.  (That said, I can understand the desire to conserve keywords, and I would probably get used to it, so this is a quibble.)

More significantly, the type transformation of the existing bound names in the group-transforms is a bit confusing.  I get intellectually what's going on, but the strs example gave me a really strong, "wait -- what?" reaction.  I'm concerned about that, especially for more naive users: the notion that x has type A on the line where it is defined, but type List[A] two lines down, is pretty unobvious.  Mind, I don't have a better suggestion offhand, and I don't think it's the end of the world (modern IDE support would at least help), but consider this a wish for a more obvious distinction between "x" and "the collection of x's".

Overall, though, looks both useful and powerful...


--
You received this message because you are subscribed to the Google Groups "scala-sips" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scala-sips+...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Chris Hodapp

unread,
Sep 17, 2013, 3:00:39 PM9/17/13
to scala...@googlegroups.com
On Tuesday, September 17, 2013 7:51:12 AM UTC-5, Justin du coeur wrote:
A few comments from the non-expert peanut gallery:

The proposal's very intriguing.  I like the general shape of it, providing LINQish capability without feeling quite so arbitrary and SQL-centric.  That said, two details strike me funny.

The minor one is the use of "with" -- I just don't find that specific keyword intuitive here, and would suggest at least thinking about, eg, "on" instead.  (That said, I can understand the desire to conserve keywords, and I would probably get used to it, so this is a quibble.)

I actually disagree here; I think that the 'with' keyword is almost perfect and conveys
actual semantic information: It makes clear that you are doing the operation with the
automatically generated function. In other words, it makes clear that an object (the
generated function literal) is being pasted into your code at the current location.
Of course, I'm open to be convinced that there's a better keyword! Maybe I think
in a different way from most programmers.
 
More significantly, the type transformation of the existing bound names in the group-transforms is a bit confusing.  I get intellectually what's going on, but the strs example gave me a really strong, "wait -- what?" reaction.  I'm concerned about that, especially for more naive users: the notion that x has type A on the line where it is defined, but type List[A] two lines down, is pretty unobvious.  Mind, I don't have a better suggestion offhand, and I don't think it's the end of the world (modern IDE support would at least help), but consider this a wish for a more obvious distinction between "x" and "the collection of x's".

I agree that this is not ideal. At one point in the development of this feature, I joked that
if this were Ruby on Rails, I would include a 'pluralizer' that could pluralize any string, so
that the identifiers could be pluralized below a 'group'. However, I explored a number of
other solutions and all of the ones I tried/thought about would actually require the user
to deal with an increased cognitive/conceptual burden over just remembering that a
'group' changes the type of the identifiers above it (not to mention complicating the spec
horribly). Of course, I'm open to suggestions from others; If they're things I've already
tried/thought about, I'll explain why I didn't go with them and if they are new, I'll consider
whether they're improvements over what I have now.

√iktor Ҡlang

unread,
Sep 17, 2013, 3:51:01 PM9/17/13
to <scala-sips@googlegroups.com>
One quick "bikesheddy" thought is that I don't think that the keywords harmonize with the "<-" symbol.
--
Viktor Klang
Director of Engineering

Twitter: @viktorklang

Chris Hodapp

unread,
Sep 17, 2013, 4:38:18 PM9/17/13
to scala...@googlegroups.com
On Tuesday, September 17, 2013 2:51:01 PM UTC-5, Viktor Klang wrote:
One quick "bikesheddy" thought is that I don't think that the keywords harmonize with the "<-" symbol.

Things that have been considered in the past:
  • Putting the 'group' or the 'then' at the beginning of the line (group g <- groupBy with x instead of g <- group groupBy with x). This was rejected because it made the parsing significantly more complex.
  • Using a symbol instead of 'with' (group g <- groupBy => x instead of g <- group groupBy with x). This was rejected because two arrows in the middle of a line that seem to pull the text in two directions seemed like a recipe for confusion

Simon Ochsenreither

unread,
Sep 17, 2013, 4:43:01 PM9/17/13
to scala...@googlegroups.com

One quick "bikesheddy" thought is that I don't think that the keywords harmonize with the "<-" symbol.

That's my POV, too.

Rodrigo Cano

unread,
Sep 17, 2013, 9:18:51 PM9/17/13
to scala...@googlegroups.com
I see no gain in adding keywords for something that today it scales well. If you understand pattern matching and for comprehension, you can understand any of the examples given with nested fors, while I still don't understand many of the examples with the proposed syntax. Also, I believe these changes are an attempt to stretch the tool (for comprehension) to contemplate something that was not in its design (working with monads) for the sake of sql.
On the other hand, I also hate adding keywords like "then" and "group", really, those are way too common identifiers, keep adding keywords like those (when, then, group, on, aggregate, filter), and soon Scala will suck for any DSLs.

I believe that with the tools that we have today with macros, you could write almost SQL like in scala while working with collections and datasets or databases (maybe even using string interpolators) and have them checked and interact with your ADTs with no need to introduce yet MORE syntax to scala.

I liked the idea of somehow extend for-comprehension to make it more general, but provided that there is a solid theory behind it (like monads) and that it doesn't use up more keywords.

So to sum up, my perspective is that this is a really bad change that we should not pursue, at least not like this.

Cheers.


On Tue, Sep 17, 2013 at 5:43 PM, Simon Ochsenreither <simon.och...@gmail.com> wrote:

One quick "bikesheddy" thought is that I don't think that the keywords harmonize with the "<-" symbol.

That's my POV, too.

--

David Barri

unread,
Sep 18, 2013, 12:27:59 AM9/18/13
to scala...@googlegroups.com
> I liked the idea of somehow extend for-comprehension to make it more general, but provided that there is a solid theory behind it (like monads) and that it doesn't use up more keywords.

I agree. Or maybe one more keyword would be fine if the comprehension were to understand a new type of abstraction. There should be a single abstraction that covers both here.

Can I ask, what is the point of these? Either I'm having a stupid moment or they seem to be the same as map(identity) = NOP.

    .map { p =>
       val x = p.map { case (x, y) => x }
       val y = p.map { case (x, y) => y }
       (x, y) }
and

Chris Hodapp

unread,
Sep 18, 2013, 12:40:51 AM9/18/13
to scala...@googlegroups.com

On Tuesday, September 17, 2013 11:27:59 PM UTC-5, David Barri wrote:
> I liked the idea of somehow extend for-comprehension to make it more general, but provided that there is a solid theory behind it (like monads) and that it doesn't use up more keywords.

I agree. Or maybe one more keyword would be fine if the comprehension were to understand a new type of abstraction. There should be a single abstraction that covers both here.
 
The proposal only requires one new keyword ('group'); Both 'with' and
'then' are already keywords, though 'then' can currently be used as an
identifier, on a deprecated basis.

Can I ask, what is the point of these? Either I'm having a stupid moment or they seem to be the same as map(identity) = NOP.

    .map { p =>
       val x = p.map { case (x, y) => x }
       val y = p.map { case (x, y) => y }
       (x, y) }
and
    .map { case (p1, p2) =>
       val x = p2.map { case (x, y) => x }
       val y = p2.map { case (x, y) => y }
       (p1, (x, y)) }
 
Those convert collections of tuples into tuples of collections.
It is necessary to 'turn them inside-out' to make them integrate
properly with comprehensions; This is the code that actually allows
each identifier from before to be bound to a collection (it splits the
single-collection-containing-all-the-data into one-collection-for-each-identifer).

Paul Phillips

unread,
Sep 18, 2013, 12:56:35 AM9/18/13
to scala...@googlegroups.com
On Tue, Sep 17, 2013 at 9:40 PM, Chris Hodapp <clho...@gmail.com> wrote:
Those convert collections of tuples into tuples of collections.

That's also what unzip does. Not for arbitrary arity unfortunately.

scala> 1 to 3 zip (1 to 3)
res0: scala.collection.immutable.IndexedSeq[(Int, Int)] = Vector((1,1), (2,2), (3,3))

scala> res0.unzip
res1: (scala.collection.immutable.IndexedSeq[Int], scala.collection.immutable.IndexedSeq[Int]) = (Vector(1, 2, 3),Vector(1, 2, 3))
 

Chris Hodapp

unread,
Sep 18, 2013, 1:12:15 AM9/18/13
to scala...@googlegroups.com
You are correct; This is basically pasting in code for 'unzipN', which can work on
anything that has a 'map' defined. I would love to implement it such that it only
needed to make a single traversal (like unzip), but that would open such a can
of worms that I don't even want to think about it.

Rodrigo Cano

unread,
Sep 18, 2013, 1:12:42 AM9/18/13
to scala...@googlegroups.com
The proposal only requires one new keyword ('group'); Both 'with' and
'then' are already keywords, though 'then' can currently be used as an
identifier, on a deprecated basis.
then is still not a keyword, and that sip has been postponed since oct 11, so I'm still (naïvely if you will) hoping that they go back on this...

--

Chris Hodapp

unread,
Sep 18, 2013, 1:38:56 AM9/18/13
to scala...@googlegroups.com
Rodrigo,

I don't expect to change your mind, given the general tone
of your comment, but I would like to make a few counter-points:

  • I note that 'working with monads' is absolutely in the design of for-comprehensions; They explicitly exist as syntatic sugar for working with monads. The desugarings described in this proposal depend on the things you are preforming comprehensions on exhibiting monad and functor behaviours to work
  • The proposal only adds one keyword ('group'). The 'then' keyword is already reserved, but is currently allowed for use as an identifier, on a deprecated basis (you've already read and responded to this argument)
  • You could try to cobble something together with macros to allow you to write some sort of 'checked SQL', either as an internal DSL, or as something that goes in an interpolated String, but this approach would have a number of drawbacks:
    • It would have to be more narrowly-targeted; It would be a 'SQL query library', a 'collections query library', etc.
    • It would basically be another language; If you wanted it to be full-featured, you would basically need to write a compiler from your DSL to normal Scala code.
    • It would likely be brittle, working only with the small set of functions it was specified to work with (basically, it would be LINQ, but able to be implemented as a library, due to Scala's flexibility)
  • This does have a solid foundation; It leverages the fact that things you put into for-comprehensions (are supposed to) be monads/functors to make your life easier. The theory comes from a paper by Phillip Wadler and Simon Peyton Jones.
Basically, the idea is to support a tiny, pluggable system into which any query
framework can fit itself, rather than having each framework define its own
divergent syntax.

Jason Zaugg

unread,
Sep 18, 2013, 2:12:39 AM9/18/13
to scala...@googlegroups.com
The can of worms that you might have to think of is how this will behave with collections that can only be traversed once (e.g Iterator[T]).

-jason

Rex Kerr

unread,
Sep 18, 2013, 2:26:28 AM9/18/13
to scala...@googlegroups.com
Just two brief comments:

1) The "then" syntax looks reasonably clear and useful, though I don't see a pressing need.

2) The "group" syntax looks like an anti-pattern.  Shadowing variables is dangerous enough normally, but here it seems especially difficult to keep straight what is going on when trying to read the code.  I would need much more convincing use-cases to not be opposed to this one.

  --Rex


--

Chris Hodapp

unread,
Sep 18, 2013, 2:34:43 AM9/18/13
to scala...@googlegroups.com
Despite my hyperbole, I have looked in the can and the worms look a lot like Builders.

To be less coy, the only ways I can see to handle things that can only be traversed once are:
  1. Ignore them; Just tell people not to use group-transformers with things that can only be traversed once
  2. Desguar to 'unzipN'
  3. Desugar to builder-based code
The argument against 2 is that it places extra requirements on the object going into the comprehesion (not even the standard library implements these). The argument against 3 is that it seems totally wrong to couple the desugaring with the builder-based system that collections currently use. The argument against 1 is that it... causes too many iterations. This is why I went with 1.

Chris Hodapp

unread,
Sep 18, 2013, 2:43:46 AM9/18/13
to scala...@googlegroups.com

Although nothing in the desugaring for 'then' or 'group' actually causes any shadowing, I would say that it is at least fair to conceptualize what 'group' does as a shadowing. The thing is, though, it's not some non-linear interconnected knot of yarn; Rather, it's a series of operations which can be read from top to bottom. You do the top thing, then the second-to-top thing, and so on. A group-transformer affects only those identifiers that were defined in the comprehension above it (the ones you just read)and no others. It affects them in a semi-limited way (you can figure out their types without even knowing the grouping operation), while allowing for flexibility by changing the grouping operation.

Chris Hodapp

unread,
Sep 18, 2013, 2:46:17 AM9/18/13
to scala...@googlegroups.com
I would love to work with the community to try and find a better syntax.

It could only result in a positive change; Either, I get a better syntax, or I get to say 'I told you so' when none of the improvments anyone comes with with parses.

Chris Hodapp

unread,
Sep 18, 2013, 2:50:52 AM9/18/13
to scala...@googlegroups.com
As far as better use-cases, can you give me an idea of what sort you would be looking for? Would you like to see cases where you save a few lines and a level of nesting, or something else? I've applied comprehensive comprehensions to a few SLICK examples back when I had a more complicated desugaring that had partial group-transformer/SLICK interop. Many the examples won't compile now, but they would be the correct code for a SLICK that supported nested queries.

David Barri

unread,
Sep 18, 2013, 2:59:13 AM9/18/13
to scala...@googlegroups.com
Ah cool, thanks. I was having a stupid moment after all. Didn't realise p and p2 were collections. Cheers.
 
The proposal only requires one new keyword ('group'); Both 'with' and
'then' are already keywords, though 'then' can currently be used as an
identifier, on a deprecated basis.

Allow me to rephrase, I'm not concerned so much with new keyword count, but rather the number of new pieces of magic that the proposal will introduce to for-comprehensions. It's currently 2 (then-comprehension + group-comprehension) and I'm thinking that it would be worthwhile trying to find a single abstraction that can accommodate both cases.

Rex Kerr

unread,
Sep 18, 2013, 3:10:13 AM9/18/13
to scala...@googlegroups.com
Something that clearly explains what the goal is and why it's the kind of thing one often wants to do, followed by the best vanilla Scala solution and the comprehensive comprehension solution.

The examples you've given all leave me scratching my head as to why I'd want to do this so often that I need a new language construct.

  --Rex


On Tue, Sep 17, 2013 at 11:50 PM, Chris Hodapp <clho...@gmail.com> wrote:
As far as better use-cases, can you give me an idea of what sort you would be looking for? Would you like to see cases where you save a few lines and a level of nesting, or something else? I've applied comprehensive comprehensions to a few SLICK examples back when I had a more complicated desugaring that had partial group-transformer/SLICK interop. Many the examples won't compile now, but they would be the correct code for a SLICK that supported nested queries.

--

Christopher Vogt

unread,
Sep 18, 2013, 9:30:57 AM9/18/13
to scala...@googlegroups.com
Martin as well as the SLICK team were interested in exploring this for a while, which now fortunately happened. The explicit goal was to achieve a general solution for use with collections and DSLs. Without comprehensive comprehensions you loose name bindings in and after operations like sortBy and groupBy, which can make larger comprehensions hard to write and read. Rebinding names needs boilerplate and is error-prone. We are hoping for significantly improved writeability and readability of database queries, where you do these things a lot. Previously bound names changing their type after a group is understandably unfamiliar, but becoming collections is the only sensible meaning these names can have after the grouping operation. This is useful for aggregations, suggested by SPJ's paper and actually what SQL does, too. Haskell has successfully explored comprehensive comprehensions and added them to monad and list comprehensions ( http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#generalised-list-comprehensions ).

This would become an experimental feature of the Scala compiler, which has to be enabled using a flag so people can try it out and we can gather more real-world experience with this approach.

denys.s...@typesafe.com

unread,
Sep 18, 2013, 10:11:45 AM9/18/13
to scala...@googlegroups.com
I concur on the inappropriateness of the use of `with` keyword. Not only does it look out of place it also complicates syntax by overloading simple keywords into multiple completely disconnected meanings (see what happend to "_" as a nice anti-example).

Moreover I don't really understand why do you even need keyword there. Compiler should be smart enough to understand:


Example 1:
for {
  x <- 5 to 1 by -1
  y <- 'a' to 'z'
  then sortBy x
} yield x * 2

Without any additional syntax. The argument that you'll have to do it during typer doesn't really make a strong point.

Lastly this special `with x` syntax will be impossible to support from quasiquotes as it will only be enabled inside of for comprehensions but not regular term-level scala syntax, e.g.:

val f = q"with x"
val forwithx = q"for { ... then sortBy $f ... } ..."

Similar problems come from general lack of any proper ASTs of the whole proposed desugaring. It just doesn't integrate at all in it's current form. Current `for` desugaring is already a few steps too far and causes lots of pain to support in the tools like IDE, pretty printers, quasiquotes, linters etc etc.

To make it work there is a need to re-think how, when and in what way are for comprehensions desugared so that tools can use syntactic information and not loose it right after parsing.

Stefan Zeiger

unread,
Sep 18, 2013, 1:41:58 PM9/18/13
to scala...@googlegroups.com
On 2013-09-17 21:27, David Barri wrote:
> I liked the idea of somehow extend for-comprehension to make it more general, but provided that there is a solid theory behind it (like monads) and that it doesn't use up more keywords.

I agree. Or maybe one more keyword would be fine if the comprehension were to understand a new type of abstraction. There should be a single abstraction that covers both here.

There is a single abstraction that covers both: The "group" transformations have the shape M[T] => M'[F[T]] with monads M and M' and a functor F. Make that the identity functor and you get the "then" transformation M[T] => M'[T]. But you cannot implement that transparently and you don't want to force all operations to be of the grouping type (for performance and API reasons).

-sz

Bardur Arantsson

unread,
Sep 18, 2013, 1:57:03 PM9/18/13
to scala...@googlegroups.com
On 2013-09-18 15:30, Christopher Vogt wrote:
> This is useful for aggregations, suggested by SPJ's paper and actually what
> SQL does, too. Haskell has successfully explored comprehensive
> comprehensions and added them to monad and list comprehensions (
> http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#generalised-list-comprehensions
> ).

That's not Haskell. That's GHC, an *implementation* of Haskell with
various extensions, of which "TransformListComp" is one.

Despite having worked quite a lot with Haskell (with the GHC compiler)
over the past few years, I've never actually noticed that extension
being used anywhere. Are there any examples of usage in the wild?

That would be required to say "successfully explored", IMO.

Regards,


EECOLOR

unread,
Sep 22, 2013, 7:01:29 AM9/22/13
to scala...@googlegroups.com
I do not like `group` to be a keyword. I use it quite extensively in my code. Especially in UI related code it's a very handy name.

I am not a native english speaker, but to me group means a group, while you are using it for grouping.


Erik


--

Haoyi Li

unread,
Sep 26, 2013, 12:11:31 AM9/26/13
to scala...@googlegroups.com
Here's my contribution to the bikeshedding:

Generally, I'd be much happier using sigils/symbols rather than long, verbose keywords for this sort of thing. Is that something you've looked at? I mean, symbols are even more in short supply than keywords, but I think that the excessive keywords in "then sortBy with (x + y)" are incredibly verbose and greatly de-emphasis the important bits of this piece of code: you care about "sortBy(x + 1)", and the rest is really just noise! With symbols the noise can be kept down to a minimum. I also think that long keywords don't look nice interspersed with the short  2-character "<-" and "if" tokens.

I find the following code exceedingly confusing

val v = for { x <- Seq(1, 2) y <- Seq('a', 'b') group inits } yield x zip y

You say that it's the comprehension being grouped, but the `group` keyword comes before the comprehension is complete, making it seem pretty obvious that its referring to the state of the comprehension before the things get zipped. Does that also mean that `zip` is evaluated before `inits`? If so, that would blow my mind.

Since we're talking about new syntax, I think a lot of this could be fixed by placing the `yield` *inside* the braces; we would then have the flexibility to e.g. put the `group` clause *after* the `yield`, which I think makes more sense:

val v = for { x <- Seq(1, 2) y <- Seq('a', 'b')
yield x zip y group inits }

Not sure how crazy-talk this is, but it flows much better in general. The other example could be:

val v = for { x <- 1 to 5 y <- 1 to 5 product <- group groupBy with (x * y)
yield (product, x zip y toSet) }

Which I also think looks right: the `groupBy` gets evaluated *before* the `yield`, and so it appears earlier in the comprehension. These two comprehensions, for example:
for { x <- xs /* 100 item list */ y <- ys /* 100 item list */
yield doSomething(x, y)
take(100) }
for { x <- xs y <- ys
take(100)
yield doSomething(x, y)
}

Would then call `doSomething` either 10000 or 100 times.

I'm not sure if these two comprehensions are representable with your syntax, but I suspect a lot of confusion arises from the fact that you are arbitrarily picking one based on some difficult-to-infer context:

- In your `take`, `groupBy` and `numbered` examples, it's clearly the second: the transformation comes before the yield
- in your `inits` example, it's the first: the yield (and zipping) happens first
- In your `sortBy` example, it's ambiguous: both would have the correct semantics, and I'm not sure what's happening here

I haven't fully digested the formal de-sugaring, so I take your word that it is correct and unambiguous. As a human, though, I'm having difficulty picking out which things get placed after the `yield` expression and which ones get placed before the yield, and what the rule is that sets these cases apart.

Seth Tisue

unread,
Sep 26, 2013, 9:47:44 AM9/26/13
to scala...@googlegroups.com
On Thu, Sep 26, 2013 at 12:11 AM, Haoyi Li <haoy...@gmail.com> wrote:
> Not sure how crazy-talk this is, but it flows much better in general.

Shades of Common Lisp's LOOP macro!

Christopher Vogt

unread,
Sep 26, 2013, 1:56:16 PM9/26/13
to scala...@googlegroups.com
 
Without any additional syntax. The argument that you'll have to do it during typer doesn't really make a strong point.

For-comprehensions already are a purely syntactical desugaring aiming to keep later compiler phases simpler. That is a strong argument for it. If that pays off or rather causes problems is a discussion that is not specific to this SIP, which just followed in the foot steps of how it currently already works. We should have this discussion separately from this.

Lastly this special `with x` syntax will be impossible to support from quasiquotes as it will only be enabled inside of for comprehensions but not regular term-level scala syntax, e.g.:

val f = q"with x"
val forwithx = q"for { ... then sortBy $f ... } ..."

Well "with x" is not a valid Scala expression, so why SHOULD it work in a quasi quote? It only has meaning as part of a for-comprehension. If a lack of trees is a problem for quasi-quotes, we could consider having intermediate trees including for-comprehensions, which are erased syntactically before further stages. I believe Paul tried something like this, but given up.

Christopher Vogt

unread,
Sep 26, 2013, 1:57:47 PM9/26/13
to scala...@googlegroups.com

I do not like `group` to be a keyword.

Better suggestions anyone?

Chris Hodapp

unread,
Sep 26, 2013, 3:04:33 PM9/26/13
to scala...@googlegroups.com
Hey,

First, I did consider using "=>" instead of "with", but that was rejected, as it made for a weird "pulling apart" effect on lines that also had "<-". I've never considered using symbols instead of 'then' and 'group'. I am curious as to what symbols you think would be appropriate for that role. I'll note that I'm fairly skeptical of non-combinator symbols in general (and most people are skeptical even of those), so they'd have to be pretty appropriate to convince me that they are better than actual keywords.

Second, I think that the idea of moving the 'yield' into the comprehension body is a cool-seeming one, but it's not really related to this proposal; I think it's something that can (should) be discussed, but not something that should make or break comprehensive comprehensions.

The transformation always (unambiguously) affects the state of the comprehension at the point where it appears. This means that they never (!) run after the 'yield'. That said, you can easily model post-yield operations by 1) Assigning what would be yielded to a name, 2) Applying the transformer, and 3) Yielding the name that you assigned to.


Anyway, looking back, it seems that my explanation of the 'inits' example was actually pretty lacking, so I'll try again:

val v = for { x <- Seq(1, 2) y <- Seq('a', 'b') group inits } yield x zip y


In this comprehension, we first draw 'x' and 'y' from length-2 sequences. We then group the comprehension using 'inits'. This transforms all of the previously bound identifiers into collections. The 'inits' method returns an iterator over the decreasing-length prefixes. Therefore, we are making groups based on the decreasing-length prefixes of the comprehension so far. The comprehension so far has the values [(x=1, y='a'), (x=1, y='b'), (x=2, y='a'), (x=2, y='b')] (imagine literally doing the comprehension; The bound variables x and y would take these values and in this order). Therefore, the longest prefix for x would be [1, 1, 2, 2], while the longest prefix for y would be ['a', 'b', 'a', 'b'] (note that I am literally just pulling the x and the y values from the full list that appears above). Thus, the comprehension lines below the group-transformer would first run with x and y having these values. The one-shorter prefix of the comprehension would be [(x=1, y='a'), (x=1, y='b'), (x=2, y='a')]. Thus, the lines below the group-transformer would next run with x=[1, 1, 2] and y=['a', 'b', 'a']. This pattern would continue until x=[] and y=[], which would be the last run.

In case your'e curious, the comprehension would desugar to:

val v = for {
  (x, y) <- (
    for {
      x <- Seq(1, 2)
      y <- Seq('a', 'b')
    } yield (x, y)
  ).inits.map { $freshVal =>
    val x = $freshVal.map { case (x, y) => x }
    val y = $freshVal.map { case (x, y) => y }
    (x, y)
 }
} yield x zip y


Of course, it seems rather wasteful, since it so happens that the user-written 'yield' clause takes the just-unzipped pair and rezips it.

Rex Kerr

unread,
Sep 26, 2013, 4:04:33 PM9/26/13
to scala...@googlegroups.com
The reason that regular for comprehensions are useful instead of annoying is that they capture some of the most common use-cases for manipulations on collections.  They're useful enough that the annoyance of e.g. losing `yield` and having to learn two or three ways instead of one to do the same thing (i.e. collections, for without yield, and for with yield) is overcome by the utility.

You are proposing to eat two more words that may be in broad usage, which is annoying.  And you're introducing at least two (I'd call it four, with the different positions you can use) ways to do things.  But you're falling way, way short on the utility side.

I grant that you can make the compiler do this.

Why should one want what this produces?

Why should one want it so much that it's worth the added annoyance?

I can confidently say that I never have wanted the transform that you just showed with inits.  I have wanted something like the groupBy example, but rarely.  Rarely enough so I wouldn't want to learn this whole new syntax just to know how to do it, and I like learning new syntax.  I'd instead

  val pairs = for { x <- 1 to 5; y <- 1 to 5 } yield (x,y)
  pairs.groupBy{ case (x,y) => x*y }.mapValues(_.toSet).toMap

(note that this is harder than it needs to be since there is no non-lazy map to values).  You need to make a much stronger case for why this isn't good enough.  Let me compare again to your suggested syntax:

  for {
    x <- 1 to 5
    y <- 1 to 5
    product <- group groupBy with (x * y)
  } yield (product, x zip y toSet)

This, on the surface of it, looks simpler.  But to understand what's going on, look at all the mental transformations one has to make:

  for {
    x <- 1 to 5
    y <- 1 to 5
    product <- group groupBy with (x * y)  // they're Ints

  } yield (product, x zip y toSet)
  for {
    x <- 1 to 5
    y <- 1 to 5
    product <- group groupBy with (x * y)
  } yield (product, x zip y toSet)  // they're Seq[Int]


  for {
    x <- 1 to 5  // Looks like we're going to get a Seq?

    y <- 1 to 5
    product <- group groupBy with (x * y)
  } yield (product, x zip y toSet)
  for {
    x <- 1 to 5
    y <- 1 to 5
    product <- group groupBy with (x * y)   // Maybe not

  } yield (product, x zip y toSet)
  for {
    x <- 1 to 5
    y <- 1 to 5
    product <- group groupBy with (x * y)   // Okay, a map

  } yield (product, x zip y toSet)
  for {
    x <- 1 to 5
    y <- 1 to 5
    product <- group groupBy with (x * y)   // This isn't innermost, is it??

  } yield (product, x zip y toSet)
  for {
    x <- 1 to 5
    y <- 1 to 5
    product <- group groupBy with (x * y)   // product is x*y?  How??

  } yield (product, x zip y toSet)
  for {
    x <- 1 to 5
    y <- 1 to 5
    product <- group groupBy with (x * y)   // No, not traits!

  } yield (product, x zip y toSet)

All that redefinition, nonlocality, and altered behavior just to avoid an intermediate result.  Let's try another example:


  for {
    str <- strs if str.nonEmpty
    firstLetter <- group groupBy with str.head
  } yield (firstLetter, str.maxBy(_.length))


Without comprehensive comprehensions, we'd be reduced to doing:

  strs.filter(_.nonEmpty).
       groupBy(_.head).
       mapValues(_.maxBy(_.length)).toMap

Er, wait.  What's so terrible about that?

So you really, in my opinion, need to sell this more.  Show where it's awesomely better than everything else.  Show transformations that are really commonly needed, and show how much better comprehensive comprehensions model the transformation.  Especially with group.

(As far as then goes, I'm more okay with an intermediate yield (which you called then):

for {
  i <- 1 to 5
  j <- 1 to 5
  yield.take(12)
  k <- 1 to 5
} yield (i,j,k)

which just collapses the longer form

val temp = for {
  i <- 1 to 5
  j <- 1 to 5
} yield (i,j)
for {
  (i,j) <- temp.take(12)
  k <- 1 to 5
} yield (i,j,k)

That temporary variable is, I agree, not doing any work, and omitting it doesn't cause any confusing switches in the nature of variables.

But I worry about using map in a context like that, so I'm still not sure it's wise.)

  --Rex

Rodrigo Cano

unread,
Sep 26, 2013, 4:38:30 PM9/26/13
to scala...@googlegroups.com
Oh great, Rex's post summarizes really well how I feel about the proposition.
Btw, sorry if the tone of my earlier post sounded rude, that wasn't intended, also please note that I do appreciate your experimentation, because I believe there is always something good to learn, regardless that I disagree (so far) with this idea, many discarded ideas are the source for very good ones :)


Bardur Arantsson

unread,
Sep 26, 2013, 4:44:45 PM9/26/13
to scala...@googlegroups.com
On 2013-09-26 22:04, Rex Kerr wrote:
> The reason that regular for comprehensions are useful instead of annoying
> is that they capture some of the most common use-cases for manipulations on
> collections. They're useful enough that the annoyance of e.g. losing
> `yield` and having to learn two or three ways instead of one to do the same
> thing (i.e. collections, for without yield, and for with yield) is overcome
> by the utility.
>
[--snip--]

... lots of interesting stuff, and I still haven't heard any reasonable
response to my post as to why this kind of thing hasn't been taken up by
the Haskell community yet to any non-trivial degree. (Chrostopher Vogt's
assertions to the contrary, notwithstanding.)


Chris Hodapp

unread,
Sep 26, 2013, 5:13:38 PM9/26/13
to scala...@googlegroups.com

Given that I've never written any significant projects in Haskell, I can't really tell you why the Haskell community isn't more vocal about TransformListComp. I think one possibility is that a few weird choices were made in implementing it. For example, in order to perform a grouping by an arbitrary expression with an arbitrary grouping function, you are required to add no fewer than four keywords to your line, and it still does not capture the grouping key (meaning that you would also need to add an extra line which assigns it to a name before performing the grouping if you wanted that). Another possibility is that TransformListComp has a severely limited use-case without MonadComprehensions (for example, you can't use it for database queries without this), an extension which has been in and out of GHC a number of times. My last idea is that no one has written libraries that take advantage of the power that TransformListComp gives you (this is somewhat related to the previous point, since these libraries would likely need to take advantage of MonadComprehensions). These are just speculation, though; Since I'm not an active member of the Haskell community, I really don't know why TransformListComp isn't more widely-used.

Chris Hodapp

unread,
Sep 26, 2013, 5:32:15 PM9/26/13
to scala...@googlegroups.com
Rex,

Thanks for your comments. I think I will try and put together an 'examples' page with more useful examples. It will take some time to do this, but I think it's rather critical that I do.

The problem that I ran into in making examples (and the reason that I went with that lackluster 'inits' example) was that the standard library currently did not did provide methods for each of the four 'ways of doing things' that you refer to above. It contained no examples of correctly-signatured key-generating shape-preserving transformations, which is why I went with an example that used the imaginary 'numbered' method. However, it did contain a couple examples of correctly-signatured non-key-generating grouping transformations, so I forced one of them in as an example, even though it's not really that useful. I'll try and find a solution to this issue (likely writing extensions on collections to give them some useful-to-group-with correctly-signatured methods).

Haoyi Li

unread,
Sep 26, 2013, 5:39:47 PM9/26/13
to scala...@googlegroups.com
Rex basically said everything I was thinking about, but more eloquently. I'm very happy with then `then` as a plain statement, with no outputs. The de-sugaring is straightforward since it doesn't change the shape of the data structure or rebind the variables.

I don't really see the use if the `then` with binding:

val numberedLines = for { line <- lines num <- then numbered } yield (num, line)

I'm probably missing something because `zipWithIndex` basically does this. Perhaps you could provide some other examples which show the use of this?

I still don't truly grok the `group` examples =( after staring at them for quite a while.

Chris Hodapp

unread,
Sep 26, 2013, 5:53:12 PM9/26/13
to scala...@googlegroups.com
Maybe a longer example would make a bit more sense for the binding then:

val numberedDescribedWords = for {
adjective <- adjectives
noun <- nouns
num <- then numbered
} yield (num, adjective + noun)

Of course, you could write:

val describedWords = for {
adjective <- adjectives
noun <- nouns
} yield adjective + noun
val numberedDescribedWords =
describedWords.zipWithIndex.map(_.swap)


On the subject of zipWIthIndex: it would almost work as a shape-preserving transformer. The only problem is that it puts the index in the _2 of the Tuple2, rather than the _1. Basically, numbered is just zipWithIndex.map(_.swap).

Haoyi Li

unread,
Sep 27, 2013, 4:17:23 PM9/27/13
to scala...@googlegroups.com
Yes that makes much more sense; I can see that value in something like that now, although it probably needs a few more practical examples to demonstrate that it's worth the additional syntax.

Ittay Dror

unread,
Sep 29, 2013, 2:46:32 AM9/29/13
to scala...@googlegroups.com
FWIW, I think this idea adds more confusion to the already confusing for comprehensions. Newbies usually find the idea of a 'for' that is not a loop to be confusing. Now with more constructs and variables that change in the middle this becomes a mess. I can't imagine writing such a construct, for the fact that I don't want to be a compiler. I don't want to write something that requires a lot of desugaring to be going in my head to figure out what to do. And again, the most disturbing part is the fact that variable names get rebound to new objects of different types in the middle.

If you're adding keywords, and changing meaning, then why not add additional keywords after  the 'for'? Something like:


for {
  x <- 5 to 1 by -1
  y <- 'a' to 'z'
} group ((x,y) => ....)
  then sortBy(x => x)
 
This allows to create a new lexical scope for every part that changes the meaning of variables.

Ittay


On Tuesday, September 17, 2013 7:27:56 AM UTC+3, Chris Hodapp wrote:
Hi, I'd like to propose the following as a result of my GSOC 2013 project:

SIP: Comprehensive Comprehensions

Motivation

Scala's for comprehensions can be an extremely useful tool for concisely representing a specific type of query-like monadic computation without creating a difficult-to-read nesting structure. However, in their current form, for- comprehensions have severe limitations when used to express queries, especially when compared with purpose-designed query languages, such as SQL, and other language-integrated query systems, such as Microsoft's LINQ. Specifically, for-comprehensions lack integrated support for two kinds of iteration- transforming operations that both SQL and LINQ support, making it rather painful (and verbose) to write some types of queries in Scala. This proposal seeks to properly define these two pain points and propose solutions for each.

Shape-Preserving Transformations

SQL defines the LIMITORDER BYWHERE, and HAVING clauses. For our purposes, we can say that LINQ supports all of these operations, while Scala's for-comprehensions support WHERE and HAVING through guards. The Scala standard library also provides methods that implement the functionality of LIMIT (in take) and ORDER BY (in sortBy), but these methods do not integrate very well with for-comprehensions. For example, in order to iterate over xs and xs, two Seq[Int]'s and receive (x, y) pairs in increasing-sum order, one would have to write something like:

val result = (
    for {
        x <- xs
        y <- ys
    } yield (x, y)
 ).sortBy { case (x, y) => x + y }

In addition to being ugly, this method of applying the sort requires that the identifier names from the comprehension be manually rebound in order to be used in the sort. It also breaks the for-comprehension, since these very same identifiers would need to be bound a third time to continue it.

SQL's LIMIT is similar to the take operation that is implemented on Scala's collections. However, using take for this purpose has similar drawbacks to using sortBy, as above:

val result = (
    for {
        x <- xs
        y <- ys
    } yield (x, y)
 ).take(100)

Although we don't have to rebind x and y here, we still have to wrap the comprehension in parentheses and cannot continue it after the take, unless we rebind the identifiers we've been using. For demonstration, such a rebinding would look like:

val result = for {
  (x, y) <- (
      for {
          x <- xs
          y <- ys
      } yield (x, y)
   ).take(100)
   z <- xz
} yield (x, y, z)
val result = for {
   x <- xs
   y <- ys
   then take 100
   z <- xz
} yield (x, y, z)

What does this comprehension do? In fact, it is functionally identical to comprehension directly above the 'Aggregating Transformations' header. Conceptually, we are drawingx from xs, drawing y from yscroping to 100 results, and drawing z from zsin that order. Of course, any shape-preserving transformation operation can be used in a then-transformer.

There is a slight wrinkle: sortBy takes a function from the collection element type to a sorting key; We can't just pass it a standard expression, based on the bound-above identifiers, and expect it to work. Instead, we draft the with keyword to request that the compiler generate a function for us:

val result = for {
      x <- xs
      y <- ys
      then sortBy with (x + y)
} yield (x, y)

As you might guess, this is equivalent to the first comprehension above. The with (x + y) causes the compiler to generate a function that maps from a comprehension element to the sum of x and y for that element.

Group-Transformers

Non-key-generating aggregating transformers can be integrated with for-comprehensions with a similar extension, group-transformers:

val v = for {
  x <- Seq(1, 2)
  y <- Seq('a', 'b')
  group inits
} yield x zip y

v.toList == List(
  List((1,'a'), (1,'b'), (2,'a'), (2,'b')),
  List((1,'a'), (1,'b'), (2,'a')),
  List((1,'a'), (1,'b')),
  List((1,'a')),
  List()
)

So what is going on here? In short, when we hit group inits, type of each already-bound identifier changes, becoming an aggregate (collection of collections). In this case, they become the prefixes of the sequences that x and y draw from, in order of decreasing length. Note that conceptually, it is the comprehension being grouped, not the identifiers, so the x-prefix and the y-prefix are the same length on each iteration.

A much more commonly-used aggregating transformer is groupBy. However, because groupBy is a key-generating aggregate transformation, we need a way to the compiler that the key exists and needs to be captured:

val v = for {
  x <- 1 to 5
  y <- 1 to 5
  product <- group groupBy with (x * y)
} yield (product, x zip y toSet)
v == Map(
  1 -> Set((1,1)),
  2 -> Set((1,2), (2,1)),
  3 -> Set((1,3), (3,1)),
  4 -> Set((1,4), (2,2), (4,1)),
  5 -> Set((1,5), (5,1)),
  6 -> Set((2,3), (3,2)),
  8 -> Set((2,4), (4,2)),
  9 -> Set((3,3)),
  10 -> Set((2,5), (5,2)),
  12 -> Set((3,4), (4,3)),
  15 -> Set((3,5), (5,3)),
  16 -> Set((4,4)),
  20 -> Set((4,5), (5,4)),
  25 -> Set((5,5))
)

This will create a map from each potential product to a Set of x-y pairs that create that product.

Here is one more example, which shows how group-transformers can be useful, even when there is only one normal enumerator:

val v = for {
  str <- strs if str.nonEmpty
  firstLetter <- group groupBy with str.head
} yield (firstLetter, str.maxBy(_.length))

This gives the longest string in a sequence starting with each letter (actually, a Map from the starting letter to the word).

A Brief Return to Then-Transformers

Presently, the Scala standard library does not define any key-generating shape-preserving transformations. However, it is entirely possible that it or user-made libraries will in the future. Therefore, in the interest of symmetry between then-transformers and group-transformers, I propose that then-transformers also support the value-drawing syntax:

val numberedLines = for {
  line <- lines
  num <- then numbered
} yield (num, line)

Eugene Burmako

unread,
Jan 4, 2014, 3:30:54 PM1/4/14
to scala...@googlegroups.com
I think you guys might be interested in taking a look at [<CustomOperation>] operations in F# computation expressions: http://research.microsoft.com/en-us/um/cambridge/projects/fsharp/manual/spec.html#_Toc335818835. If you aren't interested in specese, feel free to scroll down - there's quite a number of examples of how this stuff looks like and what it desugars into.
Reply all
Reply to author
Forward
0 new messages