SIP: Comprehensive Comprehensions

716 views
Skip to first unread message

Chris Hodapp

unread,
Sep 17, 2013, 12:27:56 AM9/17/13
to scala...@googlegroups.com


Hi, I'd like to propose the following as a result of my GSOC 2013 project:

SIP: Comprehensive Comprehensions

Motivation

Scala's for comprehensions can be an extremely useful tool for concisely representing a specific type of query-like monadic computation without creating a difficult-to-read nesting structure. However, in their current form, for- comprehensions have severe limitations when used to express queries, especially when compared with purpose-designed query languages, such as SQL, and other language-integrated query systems, such as Microsoft's LINQ. Specifically, for-comprehensions lack integrated support for two kinds of iteration- transforming operations that both SQL and LINQ support, making it rather painful (and verbose) to write some types of queries in Scala. This proposal seeks to properly define these two pain points and propose solutions for each.

Shape-Preserving Transformations

SQL defines the LIMITORDER BYWHERE, and HAVING clauses. For our purposes, we can say that LINQ supports all of these operations, while Scala's for-comprehensions support WHERE and HAVING through guards. The Scala standard library also provides methods that implement the functionality of LIMIT (in take) and ORDER BY (in sortBy), but these methods do not integrate very well with for-comprehensions. For example, in order to iterate over xs and xs, two Seq[Int]'s and receive (x, y) pairs in increasing-sum order, one would have to write something like:

val result = (
    for {
        x <- xs
        y <- ys
    } yield (x, y)
 ).sortBy { case (x, y) => x + y }

In addition to being ugly, this method of applying the sort requires that the identifier names from the comprehension be manually rebound in order to be used in the sort. It also breaks the for-comprehension, since these very same identifiers would need to be bound a third time to continue it.

SQL's LIMIT is similar to the take operation that is implemented on Scala's collections. However, using take for this purpose has similar drawbacks to using sortBy, as above:

val result = (
    for {
        x <- xs
        y <- ys
    } yield (x, y)
 ).take(100)

Although we don't have to rebind x and y here, we still have to wrap the comprehension in parentheses and cannot continue it after the take, unless we rebind the identifiers we've been using. For demonstration, such a rebinding would look like:

val result = for {
  (x, y) <- (
      for {
          x <- xs
          y <- ys
      } yield (x, y)
   ).take(100)
   z <- xz
} yield (x, y, z)

Aggregating Transformations

In addition to the clauses described above, SQL defines a GROUP BY clause and a set of aggregate functions. This type of clause is also supported in LINQ. A GROUP BY clause accepts a tuple of column names (possibly with the DESC keyword to reverse the sort). All of the columns not listed in the grouping tuple are transformed into 'aggregate' columns in SELECT and WHERE clauses. An aggregate function, such as SUMAVGMAXCOUNT, gets a single value from an aggregate column, and allows the aggregate column to be used as a value.

In Scala, aggregation is especially cumbersome and does not integrate with for- comprehensions very well at all. The Scala standard library features a number of aggregating methods on collections (in contrast to SQL, which only provides GROUP BY), as well as a number of aggregate methods, which can be used with the results.

Scala's aggregating methods can be divided into two categories; There are key-generating aggregating methods and non-key-generating aggregating methods. The most commonly-used key-generating aggregating methods is groupBy. This method takes a function, which maps from from a collection element to a "group key". It returns a Map, viewable as a sequence of Tuple2's, where _1 is a key _2 is a collection of collection elements in the group (elements are grouped on key identity). In code form:

val g = List(1,2,3,4).groupBy(_ % 2)
g == Map(1 -> List(1, 3), 0 -> List(2, 4))

Non-key-gene rationing aggregating methods seem to be a bit less popular, but several are defined in the Scala standard library. These methods turn a collection (of elements of a given type) into a collection of collections (of elements of that same type). An example is inits, which turns a sequence into an iterator of sequences, with the first being the full sequence and each subsequent sequence being shorter in length by 1. In code:

val i = List(1,2,3).inits
i.next == List(1,2,3)
i.next == List(1,2)
i.next == List(1)
i.next == List()

Aggregate methods in Scala's standard library reduce a collection to a single value and include "max", "min", and "sum".

Note: The previous descriptions refer to specific methods, but what makes a method a transformer, an aggretate function, etc. is its return type. Any method that is defined on a collection and returns correct type is a transformation under this definition

Proposal

For-Transformers

I propose that extensions be made to the Scala syntax and alterations made to its parser to allow for integration of transforming operations and for-comprehensions. Specifically, I propose that two (symmetric) syntactic constructs be added to Scala, with one improving the syntax of each of the transformation types described above.

Then-Transformers

Let's first improve the syntax of shape-preserving transformations. To do this, we introduce the concept of then-transformers. A then-transformer is a line within the enumerator list of a for-loop or for-comprehension that (starts with the keyord then) and runs an operation on the current state of the comprehension. To understand what this means, let's look at one:

val result = for {
   x <- xs
   y <- ys
   then take 100
   z <- xz
} yield (x, y, z)

What does this comprehension do? In fact, it is functionally identical to comprehension directly above the 'Aggregating Transformations' header. Conceptually, we are drawingx from xs, drawing y from yscroping to 100 results, and drawing z from zsin that order. Of course, any shape-preserving transformation operation can be used in a then-transformer.

There is a slight wrinkle: sortBy takes a function from the collection element type to a sorting key; We can't just pass it a standard expression, based on the bound-above identifiers, and expect it to work. Instead, we draft the with keyword to request that the compiler generate a function for us:

val result = for {
      x <- xs
      y <- ys
      then sortBy with (x + y)
} yield (x, y)

As you might guess, this is equivalent to the first comprehension above. The with (x + y) causes the compiler to generate a function that maps from a comprehension element to the sum of x and y for that element.

Group-Transformers

Non-key-generating aggregating transformers can be integrated with for-comprehensions with a similar extension, group-transformers:

val v = for {
  x <- Seq(1, 2)
  y <- Seq('a', 'b')
  group inits
} yield x zip y

v.toList == List(
  List((1,'a'), (1,'b'), (2,'a'), (2,'b')),
  List((1,'a'), (1,'b'), (2,'a')),
  List((1,'a'), (1,'b')),
  List((1,'a')),
  List()
)

So what is going on here? In short, when we hit group inits, type of each already-bound identifier changes, becoming an aggregate (collection of collections). In this case, they become the prefixes of the sequences that x and y draw from, in order of decreasing length. Note that conceptually, it is the comprehension being grouped, not the identifiers, so the x-prefix and the y-prefix are the same length on each iteration.

A much more commonly-used aggregating transformer is groupBy. However, because groupBy is a key-generating aggregate transformation, we need a way to the compiler that the key exists and needs to be captured:

val v = for {
  x <- 1 to 5
  y <- 1 to 5
  product <- group groupBy with (x * y)
} yield (product, x zip y toSet)
v == Map(
  1 -> Set((1,1)),
  2 -> Set((1,2), (2,1)),
  3 -> Set((1,3), (3,1)),
  4 -> Set((1,4), (2,2), (4,1)),
  5 -> Set((1,5), (5,1)),
  6 -> Set((2,3), (3,2)),
  8 -> Set((2,4), (4,2)),
  9 -> Set((3,3)),
  10 -> Set((2,5), (5,2)),
  12 -> Set((3,4), (4,3)),
  15 -> Set((3,5), (5,3)),
  16 -> Set((4,4)),
  20 -> Set((4,5), (5,4)),
  25 -> Set((5,5))
)

This will create a map from each potential product to a Set of x-y pairs that create that product.

Here is one more example, which shows how group-transformers can be useful, even when there is only one normal enumerator:

val v = for {
  str <- strs if str.nonEmpty
  firstLetter <- group groupBy with str.head
} yield (firstLetter, str.maxBy(_.length))

This gives the longest string in a sequence starting with each letter (actually, a Map from the starting letter to the word).

A Brief Return to Then-Transformers

Presently, the Scala standard library does not define any key-generating shape-preserving transformations. However, it is entirely possible that it or user-made libraries will in the future. Therefore, in the interest of symmetry between then-transformers and group-transformers, I propose that then-transformers also support the value-drawing syntax:

val numberedLines = for {
  line <- lines
  num <- then numbered
} yield (num, line)

Note that numbered is a fictional method which functions much like zipWithIndex, but returns Pairs with the numbering in _1 and the element-values in _2.

Formalization

New Keywords

My proposal requires that group be made a keyword and that then be made a proper keyword (as opposed to use-deprecated identifier that it currently is).

Grammar Changes

My proposal requires that the Scala grammar be amended such that the following EBNF rules are part of it:

Expr1 ::= ‘for’ (‘(’ Enumerators ‘)’ | ‘{’ Enumerators ‘}’)
{nl} [‘yield’] Expr
Enumerators ::= Generator {semi Enumerator}
Enumerator ::= Generator
             | Guard
             | [‘val’] Pattern1 ‘=’ Expr
             | ForTransformer
Generator ::= Pattern1 ‘<-’ Expr [Guard]
Guard ::= ‘if’ PostfixExpr
ForTransformer ::= [pattern '<='] (ThenExpr | GroupExpr)
TransformerOp ::= InfixTransformerOp [id [nl]]
InfixTransformerOp ::= SimpleTransformerOp
                     | TransformerOp id [nl] InfixExpr
SimpleTransformerOp ::= `then'
                      | `group'
                      | SimpleTransformerOp `.' id
                      | SimpleTransformerOp TypeArgs
                      | SimpleTransformerOp ArgumentExprs
Expr1 ::= ‘if’ ‘(’ Expr ‘)’ {nl} Expr [[semi] else Expr]
        | ‘while’ ‘(’ Expr ‘)’ {nl} Expr
        | ‘try’ ‘{’ Block ‘}’ [‘catch’ ‘{’ CaseClauses ‘}’] [‘finally’ Expr]
        | ‘do’ Expr [semi] ‘while’ ‘(’ Expr ’)’
        | ‘for’ (‘(’ Enumerators ‘)’ | ‘{’ Enumerators ‘}’) {nl} [‘yield’] Expr
        | ‘throw’ Expr
        | ‘return’ [Expr]
        | [SimpleExpr ‘.’] id ‘=’ Expr
        | SimpleExpr1 ArgumentExprs ‘=’ Expr
        | [`with'] PostfixExpr
        | [`with'] PostfixExpr Ascription
        | [`with'] PostfixExpr ‘match’ ‘{’ CaseClauses ‘}’
PostfixExpr ::= InfixWithExpr [id [nl]]
InfixWithExpr ::= InfixExpr
                | InfixExpr id [nl] `with' PrefixExpr

Note: Rules where the LHS is the same as that of an existing rule replace the existing rule.

Desugaring

A for loop or comprehension of the form:

for {
  enums1
  then.op
  enums2
} body

would be desugared as follows:

for {
  BoundNames(enums1) <-
    SubThens(SubstitueWiths(op), for (enums1) yield BoundNames)
  enums2
} body

BoundNames is a tuple (or tuple-pattern) of all the names bound in enums1

SubThens replaces any instances of then in its first argument with its second argument.

SubstitueWiths replaces a with-expression of the form:

with x

with a function of the form:

{ case BoundNames => x }

A for loop or comprehension of the form for { enums1 pattern <- then.op enums2 } body

would be desugared as follows:

for {
  (pattern, FilterRepeated(BoundNames)) <-
    SubThens(
      SubstitueWiths(op),
      for (enums1) yield BoundNames(enums1)
    )
  enums2
} body

FilterRepeated replaces any identifier in the passed-in region that also appears in 'pattern' with _.

When the group keyword is used, a normalization step is required. This takes the form of an extraction function being mapped over the result of the transformer's op.

A for loop or comprehension of the form:

for {
  enums1
  group.op
  enums2
} body

would be desugared as follows:

for {
  BoundNames <-
    SubGroups(
      SubstitueWiths(op),
      for (enums1) yield BoundNames
    ).map { x =>

      val BoundNames(1) = x.map { case BoundNames =>
        BoundNames(1)
      }

      val BoundNames(n) = x.map { case BoundNames =>
        BoundNames(n)
      }

      BoundNames
    }
  enums2
} body

SubGroups replaces any instances of group in its first argument with its second argument.

BoundNames(i) is the ith name bound in enums1.

The variable x is a fresh variable.

A for loop or comprehension of the form

for {
  enums1
  pattern <- group.op
  enums2
} body

would be desugared as follows:

for {
  (pattern, FilterRepeated(BoundNames)) <-
    SubGroups(
      SubstitueWiths(op),
      for (enums1) yield BoundNames(enums1)
    ).map { case (x, y) =>

      val BoundNames(1) = y.map { case BoundNames =>
        BoundNames(1)
      }

      val BoundNames(n) = y.map { case BoundNames =>
        BoundNames(n)
      }

      (x, BoundNames)
    }
  enums2
} body

FAQ

Q: How do comprehensive comprehensions compare with LINQ?

A: Although both are DSLs for writing typesafe queries in a host language, LINQ works by hard-coding support for specific collection operations. In contrast, comprehensive comprehensions support any collection operation that meets certain type signature requirements, making them much more versatile.

Q: What about database queries?

A: Comprehensive comprehensions will integrate beautifully with the SLICK query library, once SLICK adds support for nested queries. Until then, group-transformers won't work properly with SLICK, but then-transformers will work fine.

Q: How complicated is the implementation?

A: The complete patch (including tests) is shorter than this specification.

Q: When can I test this feature out?

A: Now. There's a build that is intended to fully meet this specification available on scala-comprehensions.org. The build currently available is binary-compatible with Scala 2.10.2. If there is sufficient interest, I can probably set up a Maven repository to make testing smoother.

Chris Hodapp

unread,
Sep 17, 2013, 1:40:55 AM9/17/13
to scala...@googlegroups.com
Here are a few more desugarings, in case anyone wants further examples:


Example 1:
for {
  x <- 5 to 1 by -1
  y <- 'a' to 'z'
  then sortBy with x
  z <- 10 to 20
} yield x * 2

Desugaring:
for {
  (x, y) <- {
    (
      for {
        x <- 5 to 1 by -1
        y <- 'a' to 'z'
      } yield (x, y)
    ).sortBy { case (x, y) => x }
  }
  z <- 10 to 20
} yield x * 2


Example 2:
for {
  x <- 5 to 1 by -1
  y <- 'a' to 'z'
  key <- then sortBy2 with x
  z <- 10 to 20
} yield x * 2

(Where sortBy2 returns a pair of the value returned by the sorting function
and the actual sorted element)

Desugaring:
for {
  (key, (x, y)) <- {
    (
      for {
        x <- 5 to 1 by -1
        y <- 'a' to 'z'
      } yield (x, y)
    ).sortBy2 { case (x, y) =>
      x
    }
  }
  z <- 10 to 20
} yield x * 2

Example 3:
for {
  x <- 5 to 1 by -1
  y <- 'a' to 'z'
  group inits
  z <- 10 to 20
} yield ((x :+ 0).max, y)

Desugaring:
for {
  (x, y) <- {
    (
      for {
        x <- 5 to 1 by -1
        y <- 'a' to 'z'
      } yield (x, y)
    ).inits.map { p =>
      val x = p.map { case (x, y) => x }
      val y = p.map { case (x, y) => y }
      (x, y)
    }
  }
  z <- 10 to 20
} yield ((x :+ 0).max, y)


Example 4:
for {
  x <- 5 to 1 by -1
  y <- 'a' to 'z'
  g <- group groupBy with x
  z <- 10 to 20
} yield (x.max, y)

Desugaring:

for {
  (g, (x, y)) <- {
    (
      for {
        x <- 5 to 1 by -1
        y <- 'a' to 'z'
      } yield (x, y)
    ).groupBy { case (x, y) =>
      x
    }.map { case (p1, p2) =>
      val x = p2.map { case (x, y) => x }
      val y = p2.map { case (x, y) => y }
      (p1, (x, y))
    }
  }
  z <- 10 to 20
} yield (x.max, y)

Chris Hodapp

unread,
Sep 17, 2013, 3:15:55 AM9/17/13
to scala...@googlegroups.com
For those who want to comment line-wise, I put the proposal in Google Drive.

Justin du coeur

unread,
Sep 17, 2013, 8:51:12 AM9/17/13
to scala...@googlegroups.com
A few comments from the non-expert peanut gallery:

The proposal's very intriguing.  I like the general shape of it, providing LINQish capability without feeling quite so arbitrary and SQL-centric.  That said, two details strike me funny.

The minor one is the use of "with" -- I just don't find that specific keyword intuitive here, and would suggest at least thinking about, eg, "on" instead.  (That said, I can understand the desire to conserve keywords, and I would probably get used to it, so this is a quibble.)

More significantly, the type transformation of the existing bound names in the group-transforms is a bit confusing.  I get intellectually what's going on, but the strs example gave me a really strong, "wait -- what?" reaction.  I'm concerned about that, especially for more naive users: the notion that x has type A on the line where it is defined, but type List[A] two lines down, is pretty unobvious.  Mind, I don't have a better suggestion offhand, and I don't think it's the end of the world (modern IDE support would at least help), but consider this a wish for a more obvious distinction between "x" and "the collection of x's".

Overall, though, looks both useful and powerful...


--
You received this message because you are subscribed to the Google Groups "scala-sips" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scala-sips+...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Chris Hodapp

unread,
Sep 17, 2013, 3:00:39 PM9/17/13
to scala...@googlegroups.com
On Tuesday, September 17, 2013 7:51:12 AM UTC-5, Justin du coeur wrote:
A few comments from the non-expert peanut gallery:

The proposal's very intriguing.  I like the general shape of it, providing LINQish capability without feeling quite so arbitrary and SQL-centric.  That said, two details strike me funny.

The minor one is the use of "with" -- I just don't find that specific keyword intuitive here, and would suggest at least thinking about, eg, "on" instead.  (That said, I can understand the desire to conserve keywords, and I would probably get used to it, so this is a quibble.)

I actually disagree here; I think that the 'with' keyword is almost perfect and conveys
actual semantic information: It makes clear that you are doing the operation with the
automatically generated function. In other words, it makes clear that an object (the
generated function literal) is being pasted into your code at the current location.
Of course, I'm open to be convinced that there's a better keyword! Maybe I think
in a different way from most programmers.
 
More significantly, the type transformation of the existing bound names in the group-transforms is a bit confusing.  I get intellectually what's going on, but the strs example gave me a really strong, "wait -- what?" reaction.  I'm concerned about that, especially for more naive users: the notion that x has type A on the line where it is defined, but type List[A] two lines down, is pretty unobvious.  Mind, I don't have a better suggestion offhand, and I don't think it's the end of the world (modern IDE support would at least help), but consider this a wish for a more obvious distinction between "x" and "the collection of x's".

I agree that this is not ideal. At one point in the development of this feature, I joked that
if this were Ruby on Rails, I would include a 'pluralizer' that could pluralize any string, so
that the identifiers could be pluralized below a 'group'. However, I explored a number of
other solutions and all of the ones I tried/thought about would actually require the user
to deal with an increased cognitive/conceptual burden over just remembering that a
'group' changes the type of the identifiers above it (not to mention complicating the spec
horribly). Of course, I'm open to suggestions from others; If they're things I've already
tried/thought about, I'll explain why I didn't go with them and if they are new, I'll consider
whether they're improvements over what I have now.

√iktor Ҡlang

unread,
Sep 17, 2013, 3:51:01 PM9/17/13
to <scala-sips@googlegroups.com>
One quick "bikesheddy" thought is that I don't think that the keywords harmonize with the "<-" symbol.
--
Viktor Klang
Director of Engineering

Twitter: @viktorklang

Chris Hodapp

unread,
Sep 17, 2013, 4:38:18 PM9/17/13
to scala...@googlegroups.com
On Tuesday, September 17, 2013 2:51:01 PM UTC-5, Viktor Klang wrote:
One quick "bikesheddy" thought is that I don't think that the keywords harmonize with the "<-" symbol.

Things that have been considered in the past:
  • Putting the 'group' or the 'then' at the beginning of the line (group g <- groupBy with x instead of g <- group groupBy with x). This was rejected because it made the parsing significantly more complex.
  • Using a symbol instead of 'with' (group g <- groupBy => x instead of g <- group groupBy with x). This was rejected because two arrows in the middle of a line that seem to pull the text in two directions seemed like a recipe for confusion

Simon Ochsenreither

unread,
Sep 17, 2013, 4:43:01 PM9/17/13
to scala...@googlegroups.com

One quick "bikesheddy" thought is that I don't think that the keywords harmonize with the "<-" symbol.

That's my POV, too.

Rodrigo Cano

unread,
Sep 17, 2013, 9:18:51 PM9/17/13
to scala...@googlegroups.com
I see no gain in adding keywords for something that today it scales well. If you understand pattern matching and for comprehension, you can understand any of the examples given with nested fors, while I still don't understand many of the examples with the proposed syntax. Also, I believe these changes are an attempt to stretch the tool (for comprehension) to contemplate something that was not in its design (working with monads) for the sake of sql.
On the other hand, I also hate adding keywords like "then" and "group", really, those are way too common identifiers, keep adding keywords like those (when, then, group, on, aggregate, filter), and soon Scala will suck for any DSLs.

I believe that with the tools that we have today with macros, you could write almost SQL like in scala while working with collections and datasets or databases (maybe even using string interpolators) and have them checked and interact with your ADTs with no need to introduce yet MORE syntax to scala.

I liked the idea of somehow extend for-comprehension to make it more general, but provided that there is a solid theory behind it (like monads) and that it doesn't use up more keywords.

So to sum up, my perspective is that this is a really bad change that we should not pursue, at least not like this.

Cheers.


On Tue, Sep 17, 2013 at 5:43 PM, Simon Ochsenreither <simon.och...@gmail.com> wrote:

One quick "bikesheddy" thought is that I don't think that the keywords harmonize with the "<-" symbol.

That's my POV, too.

--

David Barri

unread,
Sep 18, 2013, 12:27:59 AM9/18/13
to scala...@googlegroups.com
> I liked the idea of somehow extend for-comprehension to make it more general, but provided that there is a solid theory behind it (like monads) and that it doesn't use up more keywords.

I agree. Or maybe one more keyword would be fine if the comprehension were to understand a new type of abstraction. There should be a single abstraction that covers both here.

Can I ask, what is the point of these? Either I'm having a stupid moment or they seem to be the same as map(identity) = NOP.

    .map { p =>
       val x = p.map { case (x, y) => x }
       val y = p.map { case (x, y) => y }
       (x, y) }
and

Chris Hodapp

unread,
Sep 18, 2013, 12:40:51 AM9/18/13
to scala...@googlegroups.com

On Tuesday, September 17, 2013 11:27:59 PM UTC-5, David Barri wrote:
> I liked the idea of somehow extend for-comprehension to make it more general, but provided that there is a solid theory behind it (like monads) and that it doesn't use up more keywords.

I agree. Or maybe one more keyword would be fine if the comprehension were to understand a new type of abstraction. There should be a single abstraction that covers both here.
 
The proposal only requires one new keyword ('group'); Both 'with' and
'then' are already keywords, though 'then' can currently be used as an
identifier, on a deprecated basis.

Can I ask, what is the point of these? Either I'm having a stupid moment or they seem to be the same as map(identity) = NOP.

    .map { p =>
       val x = p.map { case (x, y) => x }
       val y = p.map { case (x, y) => y }
       (x, y) }
and
    .map { case (p1, p2) =>
       val x = p2.map { case (x, y) => x }
       val y = p2.map { case (x, y) => y }
       (p1, (x, y)) }
 
Those convert collections of tuples into tuples of collections.
It is necessary to 'turn them inside-out' to make them integrate
properly with comprehensions; This is the code that actually allows
each identifier from before to be bound to a collection (it splits the
single-collection-containing-all-the-data into one-collection-for-each-identifer).

Paul Phillips

unread,
Sep 18, 2013, 12:56:35 AM9/18/13
to scala...@googlegroups.com
On Tue, Sep 17, 2013 at 9:40 PM, Chris Hodapp <clho...@gmail.com> wrote:
Those convert collections of tuples into tuples of collections.

That's also what unzip does. Not for arbitrary arity unfortunately.

scala> 1 to 3 zip (1 to 3)
res0: scala.collection.immutable.IndexedSeq[(Int, Int)] = Vector((1,1), (2,2), (3,3))

scala> res0.unzip
res1: (scala.collection.immutable.IndexedSeq[Int], scala.collection.immutable.IndexedSeq[Int]) = (Vector(1, 2, 3),Vector(1, 2, 3))
 

Chris Hodapp

unread,
Sep 18, 2013, 1:12:15 AM9/18/13
to scala...@googlegroups.com
You are correct; This is basically pasting in code for 'unzipN', which can work on
anything that has a 'map' defined. I would love to implement it such that it only
needed to make a single traversal (like unzip), but that would open such a can
of worms that I don't even want to think about it.

Rodrigo Cano

unread,
Sep 18, 2013, 1:12:42 AM9/18/13
to scala...@googlegroups.com
The proposal only requires one new keyword ('group'); Both 'with' and
'then' are already keywords, though 'then' can currently be used as an
identifier, on a deprecated basis.
then is still not a keyword, and that sip has been postponed since oct 11, so I'm still (naïvely if you will) hoping that they go back on this...

--

Chris Hodapp

unread,
Sep 18, 2013, 1:38:56 AM9/18/13
to scala...@googlegroups.com
Rodrigo,

I don't expect to change your mind, given the general tone
of your comment, but I would like to make a few counter-points:

  • I note that 'working with monads' is absolutely in the design of for-comprehensions; They explicitly exist as syntatic sugar for working with monads. The desugarings described in this proposal depend on the things you are preforming comprehensions on exhibiting monad and functor behaviours to work
  • The proposal only adds one keyword ('group'). The 'then' keyword is already reserved, but is currently allowed for use as an identifier, on a deprecated basis (you've already read and responded to this argument)
  • You could try to cobble something together with macros to allow you to write some sort of 'checked SQL', either as an internal DSL, or as something that goes in an interpolated String, but this approach would have a number of drawbacks:
    • It would have to be more narrowly-targeted; It would be a 'SQL query library', a 'collections query library', etc.
    • It would basically be another language; If you wanted it to be full-featured, you would basically need to write a compiler from your DSL to normal Scala code.
    • It would likely be brittle, working only with the small set of functions it was specified to work with (basically, it would be LINQ, but able to be implemented as a library, due to Scala's flexibility)
  • This does have a solid foundation; It leverages the fact that things you put into for-comprehensions (are supposed to) be monads/functors to make your life easier. The theory comes from a paper by Phillip Wadler and Simon Peyton Jones.
Basically, the idea is to support a tiny, pluggable system into which any query
framework can fit itself, rather than having each framework define its own
divergent syntax.

Jason Zaugg

unread,
Sep 18, 2013, 2:12:39 AM9/18/13
to scala...@googlegroups.com
The can of worms that you might have to think of is how this will behave with collections that can only be traversed once (e.g Iterator[T]).

-jason

Rex Kerr

unread,
Sep 18, 2013, 2:26:28 AM9/18/13
to scala...@googlegroups.com
Just two brief comments:

1) The "then" syntax looks reasonably clear and useful, though I don't see a pressing need.

2) The "group" syntax looks like an anti-pattern.  Shadowing variables is dangerous enough normally, but here it seems especially difficult to keep straight what is going on when trying to read the code.  I would need much more convincing use-cases to not be opposed to this one.

  --Rex


--

Chris Hodapp

unread,
Sep 18, 2013, 2:34:43 AM9/18/13
to scala...@googlegroups.com
Despite my hyperbole, I have looked in the can and the worms look a lot like Builders.

To be less coy, the only ways I can see to handle things that can only be traversed once are:
  1. Ignore them; Just tell people not to use group-transformers with things that can only be traversed once
  2. Desguar to 'unzipN'
  3. Desugar to builder-based code
The argument against 2 is that it places extra requirements on the object going into the comprehesion (not even the standard library implements these). The argument against 3 is that it seems totally wrong to couple the desugaring with the builder-based system that collections currently use. The argument against 1 is that it... causes too many iterations. This is why I went with 1.

Chris Hodapp

unread,
Sep 18, 2013, 2:43:46 AM9/18/13
to scala...@googlegroups.com

Although nothing in the desugaring for 'then' or 'group' actually causes any shadowing, I would say that it is at least fair to conceptualize what 'group' does as a shadowing. The thing is, though, it's not some non-linear interconnected knot of yarn; Rather, it's a series of operations which can be read from top to bottom. You do the top thing, then the second-to-top thing, and so on. A group-transformer affects only those identifiers that were defined in the comprehension above it (the ones you just read)and no others. It affects them in a semi-limited way (you can figure out their types without even knowing the grouping operation), while allowing for flexibility by changing the grouping operation.

Chris Hodapp

unread,
Sep 18, 2013, 2:46:17 AM9/18/13
to scala...@googlegroups.com
I would love to work with the community to try and find a better syntax.

It could only result in a positive change; Either, I get a better syntax, or I get to say 'I told you so' when none of the improvments anyone comes with with parses.

Chris Hodapp

unread,
Sep 18, 2013, 2:50:52 AM9/18/13
to scala...@googlegroups.com
As far as better use-cases, can you give me an idea of what sort you would be looking for? Would you like to see cases where you save a few lines and a level of nesting, or something else? I've applied comprehensive comprehensions to a few SLICK examples back when I had a more complicated desugaring that had partial group-transformer/SLICK interop. Many the examples won't compile now, but they would be the correct code for a SLICK that supported nested queries.

David Barri

unread,
Sep 18, 2013, 2:59:13 AM9/18/13
to scala...@googlegroups.com
Ah cool, thanks. I was having a stupid moment after all. Didn't realise p and p2 were collections. Cheers.
 
The proposal only requires one new keyword ('group'); Both 'with' and
'then' are already keywords, though 'then' can currently be used as an
identifier, on a deprecated basis.

Allow me to rephrase, I'm not concerned so much with new keyword count, but rather the number of new pieces of magic that the proposal will introduce to for-comprehensions. It's currently 2 (then-comprehension + group-comprehension) and I'm thinking that it would be worthwhile trying to find a single abstraction that can accommodate both cases.

Rex Kerr

unread,
Sep 18, 2013, 3:10:13 AM9/18/13
to scala...@googlegroups.com
Something that clearly explains what the goal is and why it's the kind of thing one often wants to do, followed by the best vanilla Scala solution and the comprehensive comprehension solution.

The examples you've given all leave me scratching my head as to why I'd want to do this so often that I need a new language construct.

  --Rex


On Tue, Sep 17, 2013 at 11:50 PM, Chris Hodapp <clho...@gmail.com> wrote:
As far as better use-cases, can you give me an idea of what sort you would be looking for? Would you like to see cases where you save a few lines and a level of nesting, or something else? I've applied comprehensive comprehensions to a few SLICK examples back when I had a more complicated desugaring that had partial group-transformer/SLICK interop. Many the examples won't compile now, but they would be the correct code for a SLICK that supported nested queries.

--

Christopher Vogt

unread,
Sep 18, 2013, 9:30:57 AM9/18/13
to scala...@googlegroups.com
Martin as well as the SLICK team were interested in exploring this for a while, which now fortunately happened. The explicit goal was to achieve a general solution for use with collections and DSLs. Without comprehensive comprehensions you loose name bindings in and after operations like sortBy and groupBy, which can make larger comprehensions hard to write and read. Rebinding names needs boilerplate and is error-prone. We are hoping for significantly improved writeability and readability of database queries, where you do these things a lot. Previously bound names changing their type after a group is understandably unfamiliar, but becoming collections is the only sensible meaning these names can have after the grouping operation. This is useful for aggregations, suggested by SPJ's paper and actually what SQL does, too. Haskell has successfully explored comprehensive comprehensions and added them to monad and list comprehensions ( http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#generalised-list-comprehensions ).

This would become an experimental feature of the Scala compiler, which has to be enabled using a flag so people can try it out and we can gather more real-world experience with this approach.

denys.s...@typesafe.com

unread,
Sep 18, 2013, 10:11:45 AM9/18/13
to scala...@googlegroups.com
I concur on the inappropriateness of the use of `with` keyword. Not only does it look out of place it also complicates syntax by overloading simple keywords into multiple completely disconnected meanings (see what happend to "_" as a nice anti-example).

Moreover I don't really understand why do you even need keyword there. Compiler should be smart enough to understand:


Example 1:
for {
  x <- 5 to 1 by -1
  y <- 'a' to 'z'
  then sortBy x
} yield x * 2

Without any additional syntax. The argument that you'll have to do it during typer doesn't really make a strong point.

Lastly this special `with x` syntax will be impossible to support from quasiquotes as it will only be enabled inside of for comprehensions but not regular term-level scala syntax, e.g.:

val f = q"with x"
val forwithx = q"for { ... then sortBy $f ... } ..."

Similar problems come from general lack of any proper ASTs of the whole proposed desugaring. It just doesn't integrate at all in it's current form. Current `for` desugaring is already a few steps too far and causes lots of pain to support in the tools like IDE, pretty printers, quasiquotes, linters etc etc.

To make it work there is a need to re-think how, when and in what way are for comprehensions desugared so that tools can use syntactic information and not loose it right after parsing.

Stefan Zeiger

unread,
Sep 18, 2013, 1:41:58 PM9/18/13
to scala...@googlegroups.com
On 2013-09-17 21:27, David Barri wrote:
> I liked the idea of somehow extend for-comprehension to make it more general, but provided that there is a solid theory behind it (like monads) and that it doesn't use up more keywords.

I agree. Or maybe one more keyword would be fine if the comprehension were to understand a new type of abstraction. There should be a single abstraction that covers both here.

There is a single abstraction that covers both: The "group" transformations have the shape M[T] => M'[F[T]] with monads M and M' and a functor F. Make that the identity functor and you get the "then" transformation M[T] => M'[T]. But you cannot implement that transparently and you don't want to force all operations to be of the grouping type (for performance and API reasons).

-sz

Bardur Arantsson

unread,
Sep 18, 2013, 1:57:03 PM9/18/13
to scala...@googlegroups.com
On 2013-09-18 15:30, Christopher Vogt wrote:
> This is useful for aggregations, suggested by SPJ's paper and actually what
> SQL does, too. Haskell has successfully explored comprehensive
> comprehensions and added them to monad and list comprehensions (
> http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#generalised-list-comprehensions
> ).

That's not Haskell. That's GHC, an *implementation* of Haskell with
various extensions, of which "TransformListComp" is one.

Despite having worked quite a lot with Haskell (with the GHC compiler)
over the past few years, I've never actually noticed that extension
being used anywhere. Are there any examples of usage in the wild?

That would be required to say "successfully explored", IMO.

Regards,


EECOLOR

unread,
Sep 22, 2013, 7:01:29 AM9/22/13
to scala...@googlegroups.com
I do not like `group` to be a keyword. I use it quite extensively in my code. Especially in UI related code it's a very handy name.

I am not a native english speaker, but to me group means a group, while you are using it for grouping.


Erik


--

Haoyi Li

unread,
Sep 26, 2013, 12:11:31 AM9/26/13
to scala...@googlegroups.com
Here's my contribution to the bikeshedding:

Generally, I'd be much happier using sigils/symbols rather than long, verbose keywords for this sort of thing. Is that something you've looked at? I mean, symbols are even more in short supply than keywords, but I think that the excessive keywords in "then sortBy with (x + y)" are incredibly verbose and greatly de-emphasis the important bits of this piece of code: you care about "sortBy(x + 1)", and the rest is really just noise! With symbols the noise can be kept down to a minimum. I also think that long keywords don't look nice interspersed with the short  2-character "<-" and "if" tokens.

I find the following code exceedingly confusing

val v = for { x <- Seq(1, 2) y <- Seq('a', 'b') group inits } yield x zip y

You say that it's the comprehension being grouped, but the `group` keyword comes before the comprehension is complete, making it seem pretty obvious that its referring to the state of the comprehension before the things get zipped. Does that also mean that `zip` is evaluated before `inits`? If so, that would blow my mind.

Since we're talking about new syntax, I think a lot of this could be fixed by placing the `yield` *inside* the braces; we would then have the flexibility to e.g. put the `group` clause *after* the `yield`, which I think makes more sense:

val v = for { x <- Seq(1, 2) y <- Seq('a', 'b')
yield x zip y group inits }

Not sure how crazy-talk this is, but it flows much better in general. The other example could be:

val v = for { x <- 1 to 5 y <- 1 to 5 product <- group groupBy with (x * y)
yield (product, x zip y toSet) }

Which I also think looks right: the `groupBy` gets evaluated *before* the `yield`, and so it appears earlier in the comprehension. These two comprehensions, for example:
for { x <- xs /* 100 item list */ y <- ys /* 100 item list */
yield doSomething(x, y)
take(100) }
for { x <- xs y <- ys
take(100)
yield doSomething(x, y)
}

Would then call `doSomething` either 10000 or 100 times.

I'm not sure if these two comprehensions are representable with your syntax, but I suspect a lot of confusion arises from the fact that you are arbitrarily picking one based on some difficult-to-infer context:

- In your `take`, `groupBy` and `numbered` examples, it's clearly the second: the transformation comes before the yield
- in your `inits` example, it's the first: the yield (and zipping) happens first
- In your `sortBy` example, it's ambiguous: both would have the correct semantics, and I'm not sure what's happening here

I haven't fully digested the formal de-sugaring, so I take your word that it is correct and unambiguous. As a human, though, I'm having difficulty picking out which things get placed after the `yield` expression and which ones get placed before the yield, and what the rule is that sets these cases apart.

Seth Tisue

unread,
Sep 26, 2013, 9:47:44 AM9/26/13