[slick 0.11.2] joins result in nested selects

2,588 views
Skip to first unread message

Denis

unread,
Nov 7, 2012, 11:52:39 PM11/7/12
to scala...@googlegroups.com
Why do explicit joins in slick 0.11.2,  result in nested select statements? 

A simple statement below:

val editorialTagsQuery = for { (a, b) <- ATbl leftJoin BTbl on (_.id is _. linkId) } yield a.name

produces something wired like this:

select x2.x3 from (select x4.`id` as x5, x4.`name` as x3 from `tbl_a` x4) x2 left outer join (select x6.`linkId` as x7 from `tble_b` x6) x8 on x2.x5 = x8.x7

virtualeyes

unread,
Nov 8, 2012, 4:27:32 AM11/8/12
to scala...@googlegroups.com
looks ticket worthy...

Stefan Zeiger

unread,
Nov 8, 2012, 6:12:28 AM11/8/12
to scala...@googlegroups.com
On 2012-11-08 5:52, Denis wrote:
val editorialTagsQuery = for { (a, b) <- ATbl leftJoin BTbl on (_.id is _. linkId) } yield a.name

produces something wired like this:

select x2.x3 from (select x4.`id` as x5, x4.`name` as x3 from `tbl_a` x4) x2 left outer join (select x6.`linkId` as x7 from `tble_b` x6) x8 on x2.x5 = x8.x7


Inlining of unnecessary projections is done during comprehension fusion. Since that is not happening for explicit joins, the inner projections are not getting inlined. We could add an extra phase for that but I don't think it would be worth the overhead. I can't imagine that it would result in a different execution plan on any decent database engine.

--
Stefan Zeiger
Typesafe - The software stack for applications that scale
Twitter: @StefanZeiger

Denis

unread,
Nov 8, 2012, 11:54:57 AM11/8/12
to scala...@googlegroups.com
Stefan - I see your point. However, that was the most basic case. Once I start adding more logic to a query it becomes more and more scary... 

For example, something as easy as this statement: 

select sit.name from tbl_a AS p 
    left join tbl_b AS sitl ON sitl.id3 = p.id1 and sitl.id2 = 'property'
    left join tbl_c sit on sit.id4 = sitl.id1
produces the following:

select x2.`name` from `tbl_a` x2, (select x3.x4 as x5 from (select x6.x7 as x8 from DUAL) x9 left outer join (select x10.`id1` as x4, x10.`id2` as x11, x10.`id3` as x12 from `tbl_b` x10) x3 on (x9.x8 = x3.x12) and (x3.x11 = 'property')) x13 left outer join (select x14.`id4` as x15 from `tbl_c` x14) x16 on x13.x5 = x16.x15

So, now it generated nested select statements within the inner select. Queries like this will not fly by any DBA...

Also, I noticed, if I declare tables in first and then join them, it generates a different SQL from the previous example using DUAL tables. 

val q = for {
p <- ATbl
stl <- BTbl
z <- p leftJoin stl on (_.id1 is _.id2)
} yield p.name

Generates the following:

select x2.`name` from `a_tbl` x2, `b_tbl` x3, (select x4.x5 as x6 from DUAL) x7 left outer join (select x8.x9 as x10 from DUAL) x11 on x7.x6 = x11.x10
Message has been deleted

Stefan Zeiger

unread,
Nov 8, 2012, 12:54:17 PM11/8/12
to scala...@googlegroups.com
On 2012-11-08 17:54, Denis wrote:
For example, something as easy as this statement: 

select sit.name from tbl_a AS p 
    left join tbl_b AS sitl ON sitl.id3 = p.id1 and sitl.id2 = 'property'
    left join tbl_c sit on sit.id4 = sitl.id1
produces the following:

select x2.`name` from `tbl_a` x2, (select x3.x4 as x5 from (select x6.x7 as x8 from DUAL) x9 left outer join (select x10.`id1` as x4, x10.`id2` as x11, x10.`id3` as x12 from `tbl_b` x10) x3 on (x9.x8 = x3.x12) and (x3.x11 = 'property')) x13 left outer join (select x14.`id4` as x15 from `tbl_c` x14) x16 on x13.x5 = x16.x15

Since you didn't post the source code, I'm assuming the main issue here is the same one as in the next query.


Also, I noticed, if I declare tables in first and then join them, it generates a different SQL from the previous example using DUAL tables. 

val q = for {
p <- ATbl
stl <- BTbl
z <- p leftJoin stl on (_.id1 is _.id2)
} yield p.name

Generates the following:

select x2.`name` from `a_tbl` x2, `b_tbl` x3, (select x4.x5 as x6 from DUAL) x7 left outer join (select x8.x9 as x10 from DUAL) x11 on x7.x6 = x11.x10

That is exactly what you wrote. You did not declare the tables first, you used them as generators, so you iterate over them. The implicit conversion that allows you to use table objects directly as generators (instead of lifting them to a query explicitly) also kicks in here to enable that leftJoin. It becomes clearer when you do the lifting explicitly on the leftJoin:


val q = for {
p <- ATbl
stl <- BTbl
z <- Query(p) leftJoin Query(stl) on (_.id1 is _.id2)
} yield p.name

So this expression does a cross-join of ATbl and BTbl and then takes each of the resulting pairs, lifts the two values into new queries (a.k.a. collections) and joins these collections (each containing only a single value).

Let's see where this leads. A bit more abstract and in monadic form:

A.flatMap { a =>
  B.flatMap { b =>
    Query(a).flatMap { aa =>
      Query(b).filter(bb => e1(aa, bb)).flatMap { bb =>
        e2(a)
      }
    }
  }
}

Get rid of the filter (with a hypothetical Query.empty which ought to exist since Query is a monad with zero):

A.flatMap { a =>
  B.flatMap { b =>
    Query(a).flatMap { aa =>
      Query(b).flatMap { bb =>
        if(e1(aa, bb)) Query(bb) else Query.empty
      }.flatMap { bb =>
        e2(a)
      }
    }
  }
}

Apply the left identity law for monads:

A.flatMap { a =>
  B.flatMap { b =>
      val bq = if(e1(a, b)) Query(b) else Query.empty
      bq.flatMap { bb =>
        e2(a)
      }
  }
}

And the associativity law:

A.flatMap { a =>
  B.flatMap { b =>
      if(e1(a, b)) Query(b) else Query.empty
  }.flatMap { bb =>
        e2(a)
  }
}

Transform back to a filter:

A.flatMap { a =>
  B.filter(b => e1(a,b)).flatMap { b =>
      e2(a)
  }
}

And we see that it is indeed identical to an inner join, q.e.e., but it is not trivial to get from one the other. An advanced query optimizer could do it but that's far beyond the kind of optimization that we do at the moment.

If you're still unconvinced that the extra generators cannot be trivially removed, try the same with an outer join instead of an inner join. You will find that the result with and without those generators is different (essentially turning the outer join into an inner join).
Message has been deleted
Message has been deleted

Denis

unread,
Nov 8, 2012, 1:34:54 PM11/8/12
to scala...@googlegroups.com
Stefan - thank you for your replies. This makes more sense now. I seems like my messages keep getting deleted once I include the source code for a bigger query. 
Message has been deleted
Message has been deleted

nafg

unread,
Nov 12, 2012, 5:31:45 AM11/12/12
to scala...@googlegroups.com
I'm starting to worry about how hard it is to reason about Slick queries. To the extent that you can figure them out by thinking of a Table as an Iterator, using for comprehensions is a plus. But if I have to apply monad laws to translate them to SQL then one wonders if there isn't a better way?


On Thursday, November 8, 2012 12:54:21 PM UTC-5, Stefan Zeiger wrote:

Stefan Zeiger

unread,
Nov 12, 2012, 7:09:01 AM11/12/12
to scala...@googlegroups.com
On 2012-11-12 11:31, nafg wrote:
I'm starting to worry about how hard it is to reason about Slick queries. To the extent that you can figure them out by thinking of a Table as an Iterator, using for comprehensions is a plus. But if I have to apply monad laws to translate them to SQL then one wonders if there isn't a better way?

You can think of a for-comprehension as being roughly equivalent to a SELECT statement in SQL. I generally use for-comprehensions when introducing Slick because they give you the right intuition. But for-comprehensions are not a first-class concept in Scala. They map to flatMap/map/filter method calls (and that is a good thing -- we want true compositionality instead of being constrained by the everything-is-a-special-case semantics of SQL) which can only work if the iterator, collection or query type that you're working with is a monad.

But this also means that we never see the comprehensions in the AST. In ScalaQuery I made the mistake of using SELECT statements as a basic abstraction. Chained computations on a Query would map to a single SELECT. If you wanted a sub-select you had to make that explicit (by calling .subquery on a Query). These semantics have the advantage that you can write a very simple code generator for them (ScalaQuery went straight from the AST to SQL code in a single pass; Slick has 18 compiler phases between the initial AST and the code generator) but at the expense of not being compositional. Users of the API have to know into what shapes in SQL their Scala code gets converted instead of relying on the Scala types and collection semantics.

Slick cannot solve this problem completely (yet) because there is no valid translation of arbitrary monadic Query expressions to SQL (whose semantics are stuck half-way between applicative functors and monads) but the remaining corner cases can usually be avoided by clever inlining.

As for having to apply monad laws to translate Slick ASTs to SQL, no, you don't. You can think in term of actual collections and laws that are applicable to them. In some cases you even have to because collection and SQL semantics are richer than "just a monad". But the most important laws that you need for translating and optimizing the code are the monad laws (or some collection invariants that are based on them). In the example in this thread, the intuition was clear but an optimizer needs something more precise. The interesting part is that a monad with zero is all you need to perform this optimization. It doesn't matter at all if you're computing with lists, sets or database queries, and that is a very valuable property when you're translating list and set semantics to queries.

Denis

unread,
Nov 12, 2012, 12:00:15 PM11/12/12
to scala...@googlegroups.com
Also, it seems like, slick throws exceptions for left/right joins that result in no match. E.g. I am left joining tbl_a with tbl_b, where tbl_b is empty and then using list to get the list of items. This results in:
scala.slick.SlickException: Read NULL value for column tbl_b.id

Where I would still expect results returned for tbl_a and then a tuple for tbl_b be empty/null/none or something...

-denis.

Denis

unread,
Nov 13, 2012, 12:01:08 AM11/13/12
to scala...@googlegroups.com
Ok, it seems like all non option columns need to be specifically listed out in the yield clause with .orZero/.orElse condition else exception will is thrown. I can see why this was done this way, however it is really inconvenient especially when you are left joining multiple tables and each table has say 20 columns. Listing out columns of every table of the left/right join isn't very convenient to say the least.

-denis

Pradeep Kumar Mishra

unread,
Nov 18, 2012, 4:56:13 AM11/18/12
to scala...@googlegroups.com
Hi Denis,

Could you please let me know the work around for this problem.

Thanks,
Pradeep
Message has been deleted

Yaroslav Shuper

unread,
Nov 18, 2012, 8:15:55 AM11/18/12
to scala...@googlegroups.com
Hi Denis, could you please write scala code how did you declare and condition in join section left join tbl_b AS sitl ON sitl.id3 = p.id1 and sitl.id2 = 'property' ?
Message has been deleted

Denis

unread,
Nov 18, 2012, 3:52:12 PM11/18/12
to scala...@googlegroups.com
Pradeep - sure. There are several ways to do this. So, as I mentioned in my previous post, slick's default condition for non optional columns is to throw an exception regardless of the type of a join. Meaning that, if you are left joining table A and table B, and then you are accessing a non-optional column on table B, exception is thrown if the non nullable filed is null. There are several options to solve this:

1. Each non optional filed can be declared orZeroin which case the filed will be defaulted to it's underlying 'default' type value, int -> 0, string -> "", etc...
for { (a, b) <- ATbl leftJoin BTbl on (_.id is _. linkId) } yield b.value.orZero

2. Each non optional filed can be declared orElsethat takes a by name expression that is run if the filed is not set and is accessed:
for { (a, b) <- ATbl leftJoin BTbl on (_.id is _. linkId) } yield b.value.orElse(Some("this field isn't set to anything"))

3. Each non optional filed can be declared ?, that will make non optional filed optional and if the filed is not set it will be None
for { (a, b) <- ATbl leftJoin BTbl on (_.id is _. linkId) } yield b.value.?

#3 is my preferred approach.

Regards,
Denis.

Denis

unread,
Nov 18, 2012, 4:11:33 PM11/18/12
to scala...@googlegroups.com
Yaroslav,

You can do it in several ways:

1. Have the condition in the join clause:
for {
  (aTbl, bTbl) <- ATbl leftJoin BTbl on ((a,b) => (a.id1 is b.id2) && (b.name === "property"))
} yield aTbl.someValue

2. Have the condition in the if clause:
for {
  (aTbl, bTbl) <- ATbl leftJoin BTbl on (_.id1 is _.id2) if bTbl.name === "property"
} yield aTbl.someValue


I ended up using option #2. It's just more concise and you can use _ in the join clause since you are only accessing function parameters once.

Regards,
Denis.

Pradeep Kumar Mishra

unread,
Nov 19, 2012, 10:21:17 AM11/19/12
to scala...@googlegroups.com
Thanks Denis.

Pradeep Kumar Mishra

unread,
Dec 23, 2012, 12:29:41 AM12/23/12
to scala...@googlegroups.com
Hi Denis,

On the related note, today I encountered a situation where  I need to yield the complete object i.e. yield(a, b) in your example # 3. Do I need to explicitly yield each property of b or there is any shorter workaround?

Regards,
Pradeep
Message has been deleted

Brice

unread,
Jan 20, 2013, 5:34:03 AM1/20/13
to scala...@googlegroups.com
Hello,

I don't understand how to manage optional relationships when mapping tables to entities. For example, consider the following test:

  test("left join"){
    AppDB4Tests.database.withSession{
      case class Supplier(id:Int,name:String)
      case class Coffee(id:Int,supplierId:Int,name:String)

      object Suppliers extends Table[Supplier]("SUPPLIER"){
        def id = column[Int]("id", O.PrimaryKey)
        def name = column[String]("name")

        def * = id ~ name <>(Supplier, Supplier.unapply _)
      }

      object Coffees extends Table[Coffee]("COFFEE"){
        def id = column[Int]("id", O.PrimaryKey)
        def name = column[String]("name")
        def supplierId = column[Int]("sup_id", O.NotNull)

        def * = id ~supplierId~name <>(Coffee,Coffee.unapply _)
        def fkSupplier = foreignKey("fk_supId", supplierId,Suppliers)(_.id)

      }

      (Suppliers.ddl ++ Coffees.ddl).create

      Suppliers.insert(Supplier(1, "my supp"))

      val query = for {
        (s, c) <- Suppliers leftJoin Coffees /*on (_.id === _.supplierId)*/
      } yield (s, c.name.orZero)
      assert(query.list.length === 1)


      val entityQuery = for {
        (s, c) <- Suppliers leftJoin Coffees /*on (_.id === _.supplierId)*/
      } yield (s, c)

      assert(entityQuery.list.length === 1) //Fails!

    }
  }

The first query is ok but the second fails because of the previously mentioned 'Read NULL value for column COFFEE.id'.

If the query becomes :

      val entityQuery = for {
        (s, c) <- Suppliers leftJoin Coffees /*on (_.id === _.supplierId)*/
      } yield (s, c.orZero)

Compilation fails:

error: could not find implicit value for evidence parameter of type scala.slick.lifted.TypeMapper[c.type]
} yield (s, c.orZero)

By the way, in 1.0.0-RC1, orZero is deprecated and getOrElse is recommended, but how to use it?

For example:

      val query = for {
        (s, c) <- Suppliers leftJoin Coffees /*on (_.id === _.supplierId)*/
      } yield (s, c.name.getOrElse(""))

 produces :

error: Cannot prove that Option[String] =:= String.
} yield (s, c.name.getOrElse(""))

and:

      val entityQuery = for {
        (s, c) <- Suppliers leftJoin Coffees /*on (_.id === _.supplierId)*/
      } yield (s, c.getOrElse(Coffee(-1,-1,"")))

produces:

error: value getOrElse is not a member of object Coffees
} yield (s, c.getOrElse(Coffee(-1,-1,"")))

Brice

unread,
Jan 20, 2013, 1:22:07 PM1/20/13
to scala...@googlegroups.com
I found a workaround with:

val l = query.list.map(t =>{
        val (s,i,n,si) = (t._1, t._2, t._3, t._4)
        (s, Coffee(i,si,n))
      } )

If someone knows a better way, answers are welcome!

Naftoli Gugenheim

unread,
Jan 20, 2013, 10:29:16 PM1/20/13
to scala...@googlegroups.com
On Sun, Jan 20, 2013 at 1:22 PM, Brice <blep...@gmail.com> wrote:
I found a workaround with:

val l = query.list.map(t =>{
        val (s,i,n,si) = (t._1, t._2, t._3, t._4)
        (s, Coffee(i,si,n))
      } )

You can probably write that as

val l = query.list map { case (s, i, n, si) => (s, Coffee(i, si, n)) }

 

If someone knows a better way, answers are welcome!

Le jeudi 8 novembre 2012 05:52:39 UTC+1, Denis a écrit :
Why do explicit joins in slick 0.11.2,  result in nested select statements? 

A simple statement below:

val editorialTagsQuery = for { (a, b) <- ATbl leftJoin BTbl on (_.id is _. linkId) } yield a.name

produces something wired like this:

select x2.x3 from (select x4.`id` as x5, x4.`name` as x3 from `tbl_a` x4) x2 left outer join (select x6.`linkId` as x7 from `tble_b` x6) x8 on x2.x5 = x8.x7

--
 
 
 

ijuma

unread,
Jan 22, 2013, 8:19:34 AM1/22/13
to scala...@googlegroups.com
On Thursday, 8 November 2012 11:12:32 UTC, Stefan Zeiger wrote:
Inlining of unnecessary projections is done during comprehension fusion. Since that is not happening for explicit joins, the inner projections are not getting inlined. We could add an extra phase for that but I don't think it would be worth the overhead. I can't imagine that it would result in a different execution plan on any decent database engine.

I ran into an issue recently where using `innerJoin` instead of an implicit join caused some nesting in the generated SQL that confused MySQL's query optimiser. This was with MySQL 5.5 and the query time went from 5 minutes to 11 seconds by this simple change. If there is interest, I can file an issue for more investigation.

Best,
Ismael

Denis

unread,
Jan 22, 2013, 2:23:00 PM1/22/13
to scala...@googlegroups.com
The SQL generated by slick is definitely a big issue. Producing sql with nested selects and then relying on the db query optimizer to optimize it, in my opinion  is a major setback for this project. We are also using MySql and every we compared performance of sql with joins generated by slick vs. a hand written sql with normal (non nested join statements) a normal SQL is always MUCH faster. It's especially apparent with large tables. The difference is in the order of magnitudes. So, we ended up simply hand writing sql every time we need to left/right join and using slick syntax in other cases. Also, drop/take results in nested selects among other things...

virtualeyes

unread,
Jan 22, 2013, 5:45:19 PM1/22/13
to scala...@googlegroups.com
FWIW, MySQL performs horribly with queries that generate a dependent subquery select type.

Not sure how PostGres, SQL Server, etc. perform in this area but I suspect MySQL is not at the head of the pack.

Hopefully the issue here does not generate dependent subqueries in MySQL; if so, good luck.

My workaround for the one query in my app. that does generate a dependent subquery select type is to cache it in the web framework (Play).

Still on ScalaQuery here, waiting for Play 2.1 final (Scala 2.11 can't come fast enough, have heard compile times are 5-10% slower on 2.10, the opposite direction hoped for ;-))
Reply all
Reply to author
Forward
0 new messages