Re: DefaultIfEmpty outer join support for NH3 (moved from [nhibernate-development])

286 views
Skip to first unread message

Fabian Schmied

unread,
Sep 6, 2010, 3:02:47 AM9/6/10
to re-motion Users
[Following up from this thread:
http://groups.google.com/group/nhibernate-development/browse_thread/thread/1e5b65b0395e4004]

Frans,

I think we're hijacking thist list for discussions about re-linq;
therefore, I'm continuing this on the re-motion users list.

> > var q = from c in ctxt.Order
> > join o in ctxt.Order on c.CustomerId equals o.Customer.CustomerId
> > into co // A
> > from x in co.DefaultIfEmpty() //B
> > from y in co //C
> > select c;
>
> > Here, B uses the group join in a left-outer join fashion, but C uses it in
> > an ordinary cross join fashion.
>
> Is that really the case? 'y' is an order, same as 'x', but 'co' is
> the same set as when the DefaultIfEmpty() call is made.

Sure, but while the "from x in co.DefaultIfEmpty()" clause will yield
a null "x" if co is empty, the "from y in co" will not - that makes a
huge difference. If re-linq's frontend were to mark the group join's
inner join as a "left join" once it detects a DefaultIfEmpty(), that
would change the semantics of the query.

> I do understand why you're saying it, from the perspective of
> sequences, it makes sense, however from the perspective of sets and SQL, I
> don't really know how to formulate the query above, because one has to join
> co again with... co ? (which can't be done).
>
> At the moment I follow the same steps, but it's a cumbersome path
> IMHO, where if you look at 'intent', it can be made simpler, IMHO. But I
> understand why adding this to the front end might make situations like the
> one above harder to deal with (although I'm interested in what you'd think
> the query above does, how it looks in SQL :))

It looks like this (from Linq2Sql):

SELECT [t0].[ID], [t0].[Timestamp], [t0].[OrderNo], [t0].
[DeliveryDate], [t0].[CustomerID], [t0].[OfficialID]
FROM [Order] AS [t0]
LEFT OUTER JOIN ([Order] AS [t1] -- B
LEFT OUTER JOIN [Company] AS [t2] ON [t2].[ID] = [t1].
[CustomerID]) ON [t0].[CustomerID] = ([t2].[ID])
CROSS JOIN ([Order] AS [t3] -- C
LEFT OUTER JOIN [Company] AS [t4] ON [t4].[ID] = [t3].
[CustomerID])
WHERE [t0].[CustomerID] = ([t4].[ID])

Our own SQL backend produces something similar (although we have a few
subqueries in that picturet because our subquery optimizer isn't
implemented yet).

Don't forget that a GroupJoin does not generate a SQL join _until its
result are actually used_. And every time its result are used (eg., in
a SelectMany), a new SQL join should be generated. Therefore, it's
easiest (IMO) to deal with the DefaultIfEmpty at the point where the
group join is used, which might be very far from the point where the
group join was written. re-linq creates references to the declaration
from the usage, so handling everything there (with an ordinary "join
handler", as you said) is fully possible. (Ignoring, of course, the
possibility of the group to stem from a subquery, which causes some
kind of totally different difficulties with references from outer
scope. Which are off-topic here.)

> IMHO the sole purpose of 'DefaultIfEmpty' is to use a sequence which
> can contain nulls due to a left join, and in fact, it's the only mechanism
> to force a left join into the final query (navigator traversal might result
> in left joins due to nullable FK fields, but that's not enforceable) using a
> 'join' operator or a navigator traversal which would otherwise result in an
> inner join.
>
> I.o.w.: if you see a DefaultIfEmpty, the sequence it is called on is
> a result of a left join, and you can treat that sequence as such. Dealing
> with it 'on the fly' otherwise is a very cumbersome task as it's 'too late':
> either group joins are already handled in some trees or other clauses refer
> to the defaultifempty sequence and do that by its own alias (and not the
> alias of the side the defaultifempty refers to)

That's the point - if you traverse the re-linq query model left-to-
right (main from clause, body clauses, select clause) and inside-out
(sub-queries first), which re-linq makes very easy for you to do, it's
not too late: as I've illustrated in my blog post, you're in exactly
the right position to choose what to make of the DefaultIfEmpty
operator. Even subquery optimization is easy there - no visitors
involved (see below).

> > In addition, as Stefan said, DefaultIfEmpty can be applied to any
> arbitrary
> > sequence in a query, not only group joins. A LINQ provider striving to
> > support DefaultIfEmpty should try to find a solution that works in all
> > cases, no matter where the DefaultIfEmpty operator is applied.
>
> yes, that's true. The second query I gave is indeed an example of
> that. Though is it really a matter of 'a lot of cases' btw? Aren't there
> just 2: 1) a group join result used with DefaultIfEMpty -> make groupjoin a
> left join and 2) the random sequence on which DefaultIfEmpty() is called,
> which also would result in a left join? (x=> c.Orders.DefaultIfEmpty())

Sure, DefaultIfEmpty always produces "outer" semantics (could become a
left outer join, or an outer apply); however, since the LINQ provider
has to deal with #2 anyways - and re-linq makes it easy to deal with
both cases at once -, there isn't much value in providing an up-front
simplification for only half the cases. (Especially given that the
simplification might be wrong, see above.)

Speaking from a library point of view (see below), of course; the
specific LINQ provider based on re-linq (or not) might make a
different decision.

> The part I'm struggling with is the query you gave above, which to
> me is unspecifyable in SQL, or at least I don't really see the 'intent' of
> what such a construct might be. Could you elaborate on that a bit please?

Well, the intent is of course stupid, but in the context of re-linq's
frontend we can't just rule out any stupid queries - chances are too
high that "sensible" use cases would produce the same structure. Every
re-linq backend can do that, of course, but the frontend can't.

from o in Orders
join oi in OrderItems on o equals oi.Order into orderItems
from oi in orderItems.DefaultIfEmpty()
select new { Order = o, Count = orderItems.Count(), OrderItem = oi }

This produces a very similar structure in the re-linq query model, for
example. (But I don't really want to discuss this query here, it's
just an example of a group join with and without DefaultIfEmpty making
sense.)

> > I've written a blog post about how I'd handle the DefaultIfEmpty query
> > operator in a SQL-translating LINQ provider here:
> >https://www.re-motion.org/blogs/mix/archive/2010/09/04/handling-the-
> > defaultifempty-result-operator-in-a-re-linq-based-provider.aspx
> > . This describes more or less how we deal with DefaultIfEmpty in re-
> linq's
> > SQL backend, and I think you'll agree that with an approach such as the
> one
> > explained in the blog post, DefaultIfEmpty isn't any longer difficult to
> > implement.
>
> .. except that you have a tremendous amount of work ahead of you:
> optimize the query you're producing :). This is rather difficult to do
> though, Linq to SQL (the best linq provider out there still) uses a lot of
> visitors for this, which can take significant amount of time during query
> production.

In general, you're right. In this specific case, however, optimization
is really easy if you choose the right data structure for your SQL
output. In re-linq's SQL backend, for example, the subquery will
produce a "SqlStatement" instance. I can easily and locally inspect
that instance for complexities (no visitors involved) and flatten it
out if there are no such complexities involved.

> I do like the approach though, it makes things easier on the
> expression tree front, at the expense of optimizing it later (something the
> EF linq provider also does) with the assumption that optimizing queries is
> easier. It's at least more straight forward! ;)

And we both know how important every simplification is when we write a
LINQ provider :)

> What I learned from the demoscene so many years ago though, was that
> if you can do things up front, it will save you a lot of time later on.

Yep, definitely true - I'm not denying this, and we use this approach
ourselves in a few places. I'm just saying it's not necessary for
DefaultIfEmpty because re-linq already gives you enough structural
simplifications to deal with it nicely.

> I.o.w.: if you run into a defaultifempty, you can also think: why is it
> there? My point is then: what if the answer to that question is: "to make a
> join a left join"? This info is then usable to rework the tree a bit to
> reflect that info so the expression tree handling is easier and you don't
> need optimization afterwards. The question of course is: is it indeed only
> there for that purpose or not? (in the context of database targeting
> queries, as not all IQueryable operators make sense in every location of
> such a query when you see them in the context of db targeting queries).
>
> I have no clear answer to that. THe query you gave above as an
> example clearly proves me wrong, so we can look into other ways perhaps to
> deal with this, but on the other hand, the query construct also might not
> make much sense in the context of a db query, which may mitigate the
> argument. IF we can find evidence DefaultIfEmpty is there for making joins
> left joins, things will get much easier to implement IMHO (including group
> join).

I think the important point here is the distinction between re-linq's
frontend and any SQL backends. In the frontend, we have a "library"
context and cannot make any assumptions; the semantics of the query
model _must_ be the same as in the original query. After all, it's
also used by projects like LINQ-to-Excel, which might not have any
problems with the DefaultIfEmpty operator at all. (I don't know that,
it's just an example.)

Any backend, however, will have to make its own decisions about
simplifications. And the good thing is: re-linq doesn't keep you from
implementing them; any backend can change the query model and replace
the GroupJoinClause with a LeftJoinClause any time it wants. We just
can't do it as a default.

Regards,
Fabian

> > On Sep 4, 12:02 pm, "Frans Bouma" <fr...@sd.nl> wrote:
> > > > > from timeHeader in metaData.NonPresentTimeHeader where
> > > > > timeHeader.Id == 6 join time in metaData.NonPresentTime on
> > > > > timeHeader.Id equals time.HeaderId into timeJoin from joinedTime
> > > > > in
> > > > > timeJoin.DefaultIfEmpty() select joinedTime
>
> > > > > What might be a solution is to add a 'jointype' to the groupjoin
> > > > > class of re-linq and switch it to 'left join' when you handle the
> > > > > p=>DefaultIfEmpty() construct of a re-linq tree.
>
> > > > Hi Frans,
>
> > > > we could transform simple left-join-via-DefaultIfEmpty to left join
> > > clauses
> > > > in re-linq's QueryModel, but that would be a very simplistic
> > > > solution that could not support any but the most straightforward LINQ
> > clauses. (i.e.
> > > only
> > > > those where someone implements left join as they found out via
> > > > Google, but for instance not any scenario where the query further
> > > > references the intermediate join clause (timeJoin in your sample).
>
> > > A simple, stupid, but illustrative example: ('ctxt' is
> > > session.Linq... ) var q = from c in ctxt.Order
> > > join o in ctxt.Order on c.CustomerId equals
> > > o.Customer.CustomerId Into co // A
> > > from x in co.DefaultIfEmpty() //B
> > > select c;
>
> > > Lines A and B form a SelectMany, due to the from in B. At the left
> > > side, you'll have the group join in A and on the right side you have
> > > the right side of the group join in A.
>
> > > This can be transformed into keeping A, and using B to adjust the
> > > group join. The main advantage is that the group join contains the
> > > selectors and in C#'s case also the projection.
>
> > > So if re-linq could transform this into (pseudo code!) var q = from c
> > > in ctxt.Order
> > > (left) join o in ctxt.Order on c.CustomerId equals
> > > o.Customer.CustomerId Into co select c;
>
> > > it would be a big win. Big problem is the code which references the
> > > DefaultIfEmpty result as that has to be changed to references to the
> > > right-side of the join.
>
> > > It would help a lot, because it would make a join handler in a linq
> > > provider much easier, in fact, an existing join handler (which can
> > > handle the normal,
> > > 'join') would likely already work.
>
> > > > We think that DefaultIfEmpty is best solved in each back-end
> > > > individually, but of course nothing stops anyone from extending the
> > > > front-end to create that kind of QueryModel via an optional
> > > > transformation step. Just be aware that this only takes you so far,
> > > > it would probably result in the LINQ provider rejecting every use of
> > > > DefaultIfEmtpy in joins that cannot be reduced to that pattern.
>
> > > You mean:
> > > var q = from c in ctxt.Order
> > > from o in c.Orders.DefaultIfEmpty()
> > > select c;
>
> > > ?
>
> > > FB- Hide quoted text -
>
> > > - Show quoted text-- Hide quoted text -
>


On Sep 5, 11:49 am, "Frans Bouma" <fr...@sd.nl> wrote:
> > We try to keep the re-linq front-end as general as possible. The fact that
> > DefaultIfEmpty causes a GroupJoin to turn into a left-outer join in your
> > example is true, but it's not as simple generally speaking.
> > Consider the following slightly modified version of your query:
>
> > var q = from c in ctxt.Order
> > join o in ctxt.Order on c.CustomerId equals o.Customer.CustomerId
> > into co // A
> > from x in co.DefaultIfEmpty() //B
> > from y in co //C
> > select c;
>
> > Here, B uses the group join in a left-outer join fashion, but C uses it in
> > an ordinary cross join fashion.
>
> Is that really the case? 'y' is an order, same as 'x', but 'co' is
> the same set as when the DefaultIfEmpty() call is made.
>
> I do understand why you're saying it, from the perspective of
> sequences, it makes sense, however from the perspective of sets and SQL, I
> don't really know how to formulate the query above, because one has to join
> co again with... co ? (which can't be done).
>
> At the moment I follow the same steps, but it's a cumbersome path
> IMHO, where if you look at 'intent', it can be made simpler, IMHO. But I
> understand why adding this to the front end might make situations like the
> one above harder to deal with (although I'm interested in what you'd think
> the query above does, how it looks in SQL :))
>
>
>
> > The simplification you suggested (keeping A
> > and flagging it as a left join) won't work here. And my opinion is that if
> > we can't apply it generally, we shouldn't apply it at all. (As a
> predefined
> > transformation in the re-linq front-end, that is. Any LINQ provider can of
> > course choose to build this simplification if it helps.)
>
> IMHO the sole purpose of 'DefaultIfEmpty' is to use a sequence which
> can contain nulls due to a left join, and in fact, it's the only mechanism
> to force a left join into the final query (navigator traversal might result
> in left joins due to nullable FK fields, but that's not enforceable) using a
> 'join' operator or a navigator traversal which would otherwise result in an
> inner join.
>
> I.o.w.: if you see a DefaultIfEmpty, the sequence it is called on is
> a result of a left join, and you can treat that sequence as such. Dealing
> with it 'on the fly' otherwise is a very cumbersome task as it's 'too late':
> either group joins are already handled in some trees or other clauses refer
> to the defaultifempty sequence and do that by its own alias (and not the
> alias of the side the defaultifempty refers to)
>
>
>
> > In addition, as Stefan said, DefaultIfEmpty can be applied to any
> arbitrary
> > sequence in a query, not only group joins. A LINQ provider striving to
> > support DefaultIfEmpty should try to find a solution that works in all
> > cases, no matter where the DefaultIfEmpty operator is applied.
>
> yes, that's true. The second query I gave is indeed an example of
> that. Though is it really a matter of 'a lot of cases' btw? Aren't there
> just 2: 1) a group join result used with DefaultIfEMpty -> make groupjoin a
> left join and 2) the random sequence on which DefaultIfEmpty() is called,
> which also would result in a left join? (x=> c.Orders.DefaultIfEmpty())
>
> The part I'm struggling with is the query you gave above, which to
> me is unspecifyable in SQL, or at least I don't really see the 'intent' of
> what such a construct might be. Could you elaborate on that a bit please?
>
>
>
> > I've written a blog post about how I'd handle the DefaultIfEmpty query
> > operator in a SQL-translating LINQ provider here:
> >https://www.re-motion.org/blogs/mix/archive/2010/09/04/handling-the-
> > defaultifempty-result-operator-in-a-re-linq-based-provider.aspx
> > . This describes more or less how we deal with DefaultIfEmpty in re-
> linq's
> > SQL backend, and I think you'll agree that with an approach such as the
> one
> > explained in the blog post, DefaultIfEmpty isn't any longer difficult to
> > implement.
>
> .. except that you have a tremendous amount of work ahead of you:
> optimize the query you're producing :). This is rather difficult to do
> though, Linq to SQL (the best linq provider out there still) uses a lot of
> visitors for this, which can take significant amount of time during query
> production.
>
> I do like the approach though, it makes things easier on the
> expression tree front, at the expense of optimizing it later (something the
> EF linq provider also does) with the assumption that optimizing queries is
> easier. It's at least more straight forward! ;)
>
> What I learned from the demoscene so many years ago though, was that
> if you can do things up front, it will save you a lot of time later on.
> I.o.w.: if you run into a defaultifempty, you can also think: why is it
> there? My point is then: what if the answer to that question is: "to make a
> join a left join"? This info is then usable to rework the tree a bit to
> reflect that info so the expression tree handling is easier and you don't
> need optimization afterwards. The question of course is: is it indeed only
> there for that purpose or not? (in the context of database targeting
> queries, as not all IQueryable operators make sense in every location of
> such a query when you see them in the context of db targeting queries).
>
> I have no clear answer to that. THe query you gave above as an
> example clearly proves me wrong, so we can look into other ways perhaps to
> deal with this, but on the other hand, the query construct also might not
> make much sense in the context of a db query, which may mitigate the
> argument. IF we can find evidence DefaultIfEmpty is there for making joins
> left joins, things will get much easier to implement IMHO (including group
> join).
>
> Disclaimer: I haven't looked at these constructs for quite a while,
> (read: more 1.5 years) so I might overlook an important aspect.
>
> FB
>
>
>
>
>
> > Cheers,
> > Fabian
>
> > On Sep 4, 12:02 pm, "Frans Bouma" <fr...@sd.nl> wrote:
> > > > > from timeHeader in metaData.NonPresentTimeHeader where
> > > > > timeHeader.Id == 6 join time in metaData.NonPresentTime on
> > > > > timeHeader.Id equals time.HeaderId into timeJoin from joinedTime
> > > > > in
> > > > > timeJoin.DefaultIfEmpty() select joinedTime
>
> > > > > What might be a solution is to add a 'jointype' to the groupjoin
> > > > > class of re-linq and switch it to 'left join' when you handle the
> > > > > p=>DefaultIfEmpty() construct of a re-linq tree.
>
> > > > Hi Frans,
>
> > > > we could transform simple left-join-via-DefaultIfEmpty to left join
> > > clauses
> > > > in re-linq's QueryModel, but that would be a very simplistic
> > > > solution that could not support any but the most straightforward LINQ
> > clauses. (i.e.
> > > only
> > > > those where someone implements left join as they found out via
> > > > Google, but for instance not any scenario where the query further
> > > > references the intermediate join clause (timeJoin in your sample).
>
> > > A simple, stupid, but illustrative example: ('ctxt' is
> > > session.Linq... ) var q = from c in ctxt.Order
> > > join o in ctxt.Order on c.CustomerId equals
> > > o.Customer.CustomerId Into co // A
> > > from x in co.DefaultIfEmpty() //B
> > > select c;
>
> > > Lines A and B form a SelectMany, due to the from in B. At the left
> > > side, you'll have the group join in A and on the right side you have
> > > the right side of the group join in A.
>
> > > This can be transformed into keeping A, and using B to adjust the
> > > group join. The main advantage is that the group join contains the
> > > selectors and in C#'s case also the projection.
>
> > > So if re-linq could transform this into (pseudo code!) var q = from c
> > > in ctxt.Order
> > > (left) join o in ctxt.Order on c.CustomerId equals
> > > o.Customer.CustomerId Into co select c;
>
> > > it would be a big win. Big problem is the code which references the
> > > DefaultIfEmpty result as that has to be changed to references to the
> > > right-side of the join.
>
> > > It would help a lot, because it would make a join handler in a linq
> > > provider much easier, in fact, an existing join handler (which can
> > > handle the normal,
> > > 'join') would likely already work.
>
> > > > We think that DefaultIfEmpty is best solved in each back-end
> > > > individually, but of course nothing stops anyone from extending the
> > > > front-end to create that kind of QueryModel via an optional
> > > > transformation step. Just be aware that this only takes you so far,
> > > > it would probably result in the LINQ provider rejecting every use of
> > > > DefaultIfEmtpy in joins that cannot be reduced to that pattern.
>
> > > You mean:
> > > var q = from c in ctxt.Order
> > > from o in c.Orders.DefaultIfEmpty()
> > > select c;
>
> > > ?
>
> > > FB- Hide quoted text -
>
> > > - Show quoted text-- Hide quoted text -
>
> - Show quoted text -

Stefan Wenig

unread,
Sep 6, 2010, 3:56:28 AM9/6/10
to re-motion Users
Phew. Excuse my ignorance, I'm only following half of what both of you
say.

But how about implementing optional transformations in the front-end
that any backend can chose to invoke? This could be done on demand.
E.g. when NH choses that it doesn't want to (or even, can't) support
various uses of DefaultIfEmpty, it might as well take a short cut in
the front-end and just accept any limitations arising from that. After
all, LINQ to NH creates HQL, not SQL. Using this path, some of the
tricks you're doing to SQL might not even be possible. (I can imagine
NH users using the re-linq SQL backend with a NH MetadataResolver for
some tricky queries some fine day. After all, Fabio can't seem to get
enough query mechanisms into NH ;-))

Optional transformations could start in back-end projects or re-motion
contrib and end up as a fully supported feature of re-linq once other
back-ends ask for it. We could even create a contrib project
specifically for these things.

Just an idea.

Cheers,
Stefan


On Sep 6, 9:02 am, Fabian Schmied <fabian.schm...@gmail.com> wrote:
> [Following up from this thread:http://groups.google.com/group/nhibernate-development/browse_thread/t...]
> > - Show quoted text -- Hide quoted text -
>
> - Show quoted text -- Hide quoted text -
>
> - Show quoted text -- Hide quoted text -
>
> - Show quoted text -- Hide quoted text -

Wenig, Stefan

unread,
Sep 6, 2010, 4:44:56 AM9/6/10
to re-moti...@googlegroups.com
Now I've got an unfair advantage here: I can just walk downstairs and drag Fabian to a whiteboard.

He did some explaining, and I want to withdraw my suggestion. There probably is a relatively easy way to do it even with NH as a target language, and we should just let Steve give it a try.

We tried to make re-linq as easy as we can by providing convenient code instead of long-winded advice, but we also should know where to stop. This is clearly a time when good advice can work much better than a half-baked solution that seems tempting but is ultimately not correct. Fabian already has a quite straightforward plan for back-end implementations, we should just give that some thought. We can still fall back to alternatives if that doesn't work out with HQL or is just too much work.

Cheers,
Stefan

> --
> You received this message because you are subscribed to the Google
> Groups "re-motion Users" group.
> To post to this group, send email to re-moti...@googlegroups.com.
> To unsubscribe from this group, send email to re-motion-
> users+un...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/re-motion-users?hl=en.

Steve Strong

unread,
Sep 6, 2010, 4:42:02 AM9/6/10
to re-motion Users
I can understand the argument for not putting things in re-linq that
can't support every query form, but there's no doubt that some option
like this (i.e., having DefaultIfEmpty "support" on request) would be
useful. HQL can't possibly map every possible Linq query, so having
restrictions for that particular provider is not an issue :)
> ...
>
> read more »

Steve Strong

unread,
Sep 6, 2010, 4:49:09 AM9/6/10
to re-moti...@googlegroups.com
No probs, but you can't blame me for jumping at the chance to have you guys do some more of my work ;)

To unsubscribe from this group, send email to re-motion-use...@googlegroups.com.

Wenig, Stefan

unread,
Sep 6, 2010, 5:00:16 AM9/6/10
to re-moti...@googlegroups.com

Sure, but that’s not what we’re trying to avoid. We’re just a bit scared of opening doors that lead nowhere, giving wrong impressions and finally being labeled an 80% solution.

 

Why not aim a bit higher? Try to wrap your head around it in Fabians way, he’ll be happy to help out right here on the list, if just to prove his theory. ;-) It might not be that hard after all, he’s pretty much got it figured out.

 

Start thinking group joins through, they are a beast even for plain inner joins. When you get that right, DefaultIfEmpty might not be that much of a biggie after all.

 

Just one thing: You’ll need to help us with HQL. When talking about LINQ translation, please always try to show us the HQL statement you’re hoping to get to. When we’re discussing it in SQL terms, it’s just not going to work out.

 

Thanks,

Stefan

Reply all
Reply to author
Forward
0 new messages