Scala's for comprehensions can be an extremely useful tool for concisely representing a specific type of query-like monadic computation without creating a difficult-to-read nesting structure. However, in their current form, for- comprehensions have severe limitations when used to express queries, especially when compared with purpose-designed query languages, such as SQL, and other language-integrated query systems, such as Microsoft's LINQ. Specifically, for-comprehensions lack integrated support for two kinds of iteration- transforming operations that both SQL and LINQ support, making it rather painful (and verbose) to write some types of queries in Scala. This proposal seeks to properly define these two pain points and propose solutions for each.
SQL defines the LIMIT
, ORDER BY
, WHERE
, and HAVING
clauses. For our purposes, we can say that LINQ supports all of these operations, while Scala's for-comprehensions support WHERE
and HAVING
through guards. The Scala standard library also provides methods that implement the functionality of LIMIT
(in take
) and ORDER BY
(in sortBy
), but these methods do not integrate very well with for-comprehensions. For example, in order to iterate over xs
and xs
, two Seq[Int]
's and receive (x, y)
pairs in increasing-sum order, one would have to write something like:
val result = (
for {
x <- xs
y <- ys
} yield (x, y)
).sortBy { case (x, y) => x + y }
In addition to being ugly, this method of applying the sort requires that the identifier names from the comprehension be manually rebound in order to be used in the sort. It also breaks the for-comprehension, since these very same identifiers would need to be bound a third time to continue it.
SQL's LIMIT is similar to the take
operation that is implemented on Scala's collections. However, using take
for this purpose has similar drawbacks to using sortBy
, as above:
val result = (
for {
x <- xs
y <- ys
} yield (x, y)
).take(100)
Although we don't have to rebind x
and y
here, we still have to wrap the comprehension in parentheses and cannot continue it after the take
, unless we rebind the identifiers we've been using. For demonstration, such a rebinding would look like:
val result = for {
(x, y) <- (
for {
x <- xs
y <- ys
} yield (x, y)
).take(100)
z <- xz
} yield (x, y, z)
In addition to the clauses described above, SQL defines a GROUP BY
clause and a set of aggregate functions. This type of clause is also supported in LINQ. A GROUP BY
clause accepts a tuple of column names (possibly with the DESC
keyword to reverse the sort). All of the columns not listed in the grouping tuple are transformed into 'aggregate' columns in SELECT
and WHERE
clauses. An aggregate function, such as SUM
, AVG
, MAX
, COUNT
, gets a single value from an aggregate column, and allows the aggregate column to be used as a value.
In Scala, aggregation is especially cumbersome and does not integrate with for- comprehensions very well at all. The Scala standard library features a number of aggregating methods on collections (in contrast to SQL, which only provides GROUP BY
), as well as a number of aggregate methods, which can be used with the results.
Scala's aggregating methods can be divided into two categories; There are key-generating aggregating methods and non-key-generating aggregating methods. The most commonly-used key-generating aggregating methods is groupBy. This method takes a function, which maps from from a collection element to a "group key". It returns a Map, viewable as a sequence of Tuple2's, where _1
is a key _2
is a collection of collection elements in the group (elements are grouped on key identity). In code form:
val g = List(1,2,3,4).groupBy(_ % 2)
g == Map(1 -> List(1, 3), 0 -> List(2, 4))
Non-key-gene rationing aggregating methods seem to be a bit less popular, but several are defined in the Scala standard library. These methods turn a collection (of elements of a given type) into a collection of collections (of elements of that same type). An example is inits
, which turns a sequence into an iterator of sequences, with the first being the full sequence and each subsequent sequence being shorter in length by 1. In code:
val i = List(1,2,3).inits
i.next == List(1,2,3)
i.next == List(1,2)
i.next == List(1)
i.next == List()
Aggregate methods in Scala's standard library reduce a collection to a single value and include "max", "min", and "sum".
Note: The previous descriptions refer to specific methods, but what makes a method a transformer, an aggretate function, etc. is its return type. Any method that is defined on a collection and returns correct type is a transformation under this definition
I propose that extensions be made to the Scala syntax and alterations made to its parser to allow for integration of transforming operations and for-comprehensions. Specifically, I propose that two (symmetric) syntactic constructs be added to Scala, with one improving the syntax of each of the transformation types described above.
Let's first improve the syntax of shape-preserving transformations. To do this, we introduce the concept of then-transformers. A then-transformer is a line within the enumerator list of a for-loop or for-comprehension that (starts with the keyord then
) and runs an operation on the current state of the comprehension. To understand what this means, let's look at one:
val result = for {
x <- xs
y <- ys
then take 100
z <- xz
} yield (x, y, z)
What does this comprehension do? In fact, it is functionally identical to comprehension directly above the 'Aggregating Transformations' header. Conceptually, we are drawingx
from xs
, drawing y
from ys
, croping to 100 results, and drawing z
from zs
, in that order. Of course, any shape-preserving transformation operation can be used in a then-transformer.
There is a slight wrinkle: sortBy takes a function from the collection element type to a sorting key; We can't just pass it a standard expression, based on the bound-above identifiers, and expect it to work. Instead, we draft the with
keyword to request that the compiler generate a function for us:
val result = for {
x <- xs
y <- ys
then sortBy with (x + y)
} yield (x, y)
As you might guess, this is equivalent to the first comprehension above. The with (x + y)
causes the compiler to generate a function that maps from a comprehension element to the sum of x
and y
for that element.
Non-key-generating aggregating transformers can be integrated with for-comprehensions with a similar extension, group-transformers:
val v = for {
x <- Seq(1, 2)
y <- Seq('a', 'b')
group inits
} yield x zip y
v.toList == List(
List((1,'a'), (1,'b'), (2,'a'), (2,'b')),
List((1,'a'), (1,'b'), (2,'a')),
List((1,'a'), (1,'b')),
List((1,'a')),
List()
)
So what is going on here? In short, when we hit group inits
, type of each already-bound identifier changes, becoming an aggregate (collection of collections). In this case, they become the prefixes of the sequences that x
and y
draw from, in order of decreasing length. Note that conceptually, it is the comprehension being grouped, not the identifiers, so the x
-prefix and the y
-prefix are the same length on each iteration.
A much more commonly-used aggregating transformer is groupBy
. However, because groupBy
is a key-generating aggregate transformation, we need a way to the compiler that the key exists and needs to be captured:
val v = for {
x <- 1 to 5
y <- 1 to 5
product <- group groupBy with (x * y)
} yield (product, x zip y toSet)
v == Map(
1 -> Set((1,1)),
2 -> Set((1,2), (2,1)),
3 -> Set((1,3), (3,1)),
4 -> Set((1,4), (2,2), (4,1)),
5 -> Set((1,5), (5,1)),
6 -> Set((2,3), (3,2)),
8 -> Set((2,4), (4,2)),
9 -> Set((3,3)),
10 -> Set((2,5), (5,2)),
12 -> Set((3,4), (4,3)),
15 -> Set((3,5), (5,3)),
16 -> Set((4,4)),
20 -> Set((4,5), (5,4)),
25 -> Set((5,5))
)
This will create a map from each potential product to a Set of x
-y
pairs that create that product.
Here is one more example, which shows how group-transformers can be useful, even when there is only one normal enumerator:
val v = for {
str <- strs if str.nonEmpty
firstLetter <- group groupBy with str.head
} yield (firstLetter, str.maxBy(_.length))
This gives the longest string in a sequence starting with each letter (actually, a Map from the starting letter to the word).
Presently, the Scala standard library does not define any key-generating shape-preserving transformations. However, it is entirely possible that it or user-made libraries will in the future. Therefore, in the interest of symmetry between then-transformers and group-transformers, I propose that then-transformers also support the value-drawing syntax:
val numberedLines = for {
line <- lines
num <- then numbered
} yield (num, line)
Note that numbered
is a fictional method which functions much like zipWithIndex, but returns Pairs with the numbering in _1
and the element-values in _2
.
My proposal requires that group
be made a keyword and that then
be made a proper keyword (as opposed to use-deprecated identifier that it currently is).
My proposal requires that the Scala grammar be amended such that the following EBNF rules are part of it:
Expr1 ::= ‘for’ (‘(’ Enumerators ‘)’ | ‘{’ Enumerators ‘}’)
{nl} [‘yield’] Expr
Enumerators ::= Generator {semi Enumerator}
Enumerator ::= Generator
| Guard
| [‘val’] Pattern1 ‘=’ Expr
| ForTransformer
Generator ::= Pattern1 ‘<-’ Expr [Guard]
Guard ::= ‘if’ PostfixExpr
ForTransformer ::= [pattern '<='] (ThenExpr | GroupExpr)
TransformerOp ::= InfixTransformerOp [id [nl]]
InfixTransformerOp ::= SimpleTransformerOp
| TransformerOp id [nl] InfixExpr
SimpleTransformerOp ::= `then'
| `group'
| SimpleTransformerOp `.' id
| SimpleTransformerOp TypeArgs
| SimpleTransformerOp ArgumentExprs
Expr1 ::= ‘if’ ‘(’ Expr ‘)’ {nl} Expr [[semi] else Expr]
| ‘while’ ‘(’ Expr ‘)’ {nl} Expr
| ‘try’ ‘{’ Block ‘}’ [‘catch’ ‘{’ CaseClauses ‘}’] [‘finally’ Expr]
| ‘do’ Expr [semi] ‘while’ ‘(’ Expr ’)’
| ‘for’ (‘(’ Enumerators ‘)’ | ‘{’ Enumerators ‘}’) {nl} [‘yield’] Expr
| ‘throw’ Expr
| ‘return’ [Expr]
| [SimpleExpr ‘.’] id ‘=’ Expr
| SimpleExpr1 ArgumentExprs ‘=’ Expr
| [`with'] PostfixExpr
| [`with'] PostfixExpr Ascription
| [`with'] PostfixExpr ‘match’ ‘{’ CaseClauses ‘}’
PostfixExpr ::= InfixWithExpr [id [nl]]
InfixWithExpr ::= InfixExpr
| InfixExpr id [nl] `with' PrefixExpr
Note: Rules where the LHS is the same as that of an existing rule replace the existing rule.
A for loop or comprehension of the form:
for {
enums1
then.op
enums2
} body
would be desugared as follows:
for {
BoundNames(enums1) <-
SubThens(SubstitueWiths(op), for (enums1) yield BoundNames)
enums2
} body
BoundNames is a tuple (or tuple-pattern) of all the names bound in enums1
SubThens replaces any instances of then
in its first argument with its second argument.
SubstitueWiths replaces a with-expression of the form:
with x
with a function of the form:
{ case BoundNames => x }
A for loop or comprehension of the form for { enums1 pattern <- then.op enums2 } body
would be desugared as follows:
for {
(pattern, FilterRepeated(BoundNames)) <-
SubThens(
SubstitueWiths(op),
for (enums1) yield BoundNames(enums1)
)
enums2
} body
FilterRepeated replaces any identifier in the passed-in region that also appears in 'pattern' with _
.
When the group
keyword is used, a normalization step is required. This takes the form of an extraction function being mapped over the result of the transformer's op.
A for loop or comprehension of the form:
for {
enums1
group.op
enums2
} body
would be desugared as follows:
for {
BoundNames <-
SubGroups(
SubstitueWiths(op),
for (enums1) yield BoundNames
).map { x =>
val BoundNames(1) = x.map { case BoundNames =>
BoundNames(1)
}
val BoundNames(n) = x.map { case BoundNames =>
BoundNames(n)
}
BoundNames
}
enums2
} body
SubGroups replaces any instances of group
in its first argument with its second argument.
BoundNames(i) is the ith name bound in enums1.
The variable x
is a fresh variable.
A for loop or comprehension of the form
for {
enums1
pattern <- group.op
enums2
} body
would be desugared as follows:
for {
(pattern, FilterRepeated(BoundNames)) <-
SubGroups(
SubstitueWiths(op),
for (enums1) yield BoundNames(enums1)
).map { case (x, y) =>
val BoundNames(1) = y.map { case BoundNames =>
BoundNames(1)
}
val BoundNames(n) = y.map { case BoundNames =>
BoundNames(n)
}
(x, BoundNames)
}
enums2
} body
Q: How do comprehensive comprehensions compare with LINQ?
A: Although both are DSLs for writing typesafe queries in a host language, LINQ works by hard-coding support for specific collection operations. In contrast, comprehensive comprehensions support any collection operation that meets certain type signature requirements, making them much more versatile.
Q: What about database queries?
A: Comprehensive comprehensions will integrate beautifully with the SLICK query library, once SLICK adds support for nested queries. Until then, group-transformers won't work properly with SLICK, but then-transformers will work fine.
Q: How complicated is the implementation?
A: The complete patch (including tests) is shorter than this specification.
Q: When can I test this feature out?
A: Now. There's a build that is intended to fully meet this specification available on scala-comprehensions.org. The build currently available is binary-compatible with Scala 2.10.2. If there is sufficient interest, I can probably set up a Maven repository to make testing smoother.
--
You received this message because you are subscribed to the Google Groups "scala-sips" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scala-sips+...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
A few comments from the non-expert peanut gallery:The proposal's very intriguing. I like the general shape of it, providing LINQish capability without feeling quite so arbitrary and SQL-centric. That said, two details strike me funny.The minor one is the use of "with" -- I just don't find that specific keyword intuitive here, and would suggest at least thinking about, eg, "on" instead. (That said, I can understand the desire to conserve keywords, and I would probably get used to it, so this is a quibble.)
More significantly, the type transformation of the existing bound names in the group-transforms is a bit confusing. I get intellectually what's going on, but the strs example gave me a really strong, "wait -- what?" reaction. I'm concerned about that, especially for more naive users: the notion that x has type A on the line where it is defined, but type List[A] two lines down, is pretty unobvious. Mind, I don't have a better suggestion offhand, and I don't think it's the end of the world (modern IDE support would at least help), but consider this a wish for a more obvious distinction between "x" and "the collection of x's".
One quick "bikesheddy" thought is that I don't think that the keywords harmonize with the "<-" symbol.
One quick "bikesheddy" thought is that I don't think that the keywords harmonize with the "<-" symbol.
One quick "bikesheddy" thought is that I don't think that the keywords harmonize with the "<-" symbol.
That's my POV, too.
--
> I liked the idea of somehow extend for-comprehension to make it more general, but provided that there is a solid theory behind it (like monads) and that it doesn't use up more keywords.I agree. Or maybe one more keyword would be fine if the comprehension were to understand a new type of abstraction. There should be a single abstraction that covers both here.
Can I ask, what is the point of these? Either I'm having a stupid moment or they seem to be the same as map(identity) = NOP..map { p =>val x = p.map { case (x, y) => x }val y = p.map { case (x, y) => y }(x, y) }and.map { case (p1, p2) =>val x = p2.map { case (x, y) => x }val y = p2.map { case (x, y) => y }(p1, (x, y)) }
Those convert collections of tuples into tuples of collections.
The proposal only requires one new keyword ('group'); Both 'with' and
'then' are already keywords, though 'then' can currently be used as an
identifier, on a deprecated basis.
--
--
The proposal only requires one new keyword ('group'); Both 'with' and'then' are already keywords, though 'then' can currently be used as anidentifier, on a deprecated basis.
As far as better use-cases, can you give me an idea of what sort you would be looking for? Would you like to see cases where you save a few lines and a level of nesting, or something else? I've applied comprehensive comprehensions to a few SLICK examples back when I had a more complicated desugaring that had partial group-transformer/SLICK interop. Many the examples won't compile now, but they would be the correct code for a SLICK that supported nested queries.
--
> I liked the idea of somehow extend for-comprehension to make it more general, but provided that there is a solid theory behind it (like monads) and that it doesn't use up more keywords.
I agree. Or maybe one more keyword would be fine if the comprehension were to understand a new type of abstraction. There should be a single abstraction that covers both here.
--