[Following up from this thread:
http://groups.google.com/group/nhibernate-development/browse_thread/thread/1e5b65b0395e4004]
Frans,
I think we're hijacking thist list for discussions about re-linq;
therefore, I'm continuing this on the re-motion users list.
> > var q = from c in ctxt.Order
> > join o in ctxt.Order on c.CustomerId equals o.Customer.CustomerId
> > into co // A
> > from x in co.DefaultIfEmpty() //B
> > from y in co //C
> > select c;
>
> > Here, B uses the group join in a left-outer join fashion, but C uses it in
> > an ordinary cross join fashion.
>
> Is that really the case? 'y' is an order, same as 'x', but 'co' is
> the same set as when the DefaultIfEmpty() call is made.
Sure, but while the "from x in co.DefaultIfEmpty()" clause will yield
a null "x" if co is empty, the "from y in co" will not - that makes a
huge difference. If re-linq's frontend were to mark the group join's
inner join as a "left join" once it detects a DefaultIfEmpty(), that
would change the semantics of the query.
> I do understand why you're saying it, from the perspective of
> sequences, it makes sense, however from the perspective of sets and SQL, I
> don't really know how to formulate the query above, because one has to join
> co again with... co ? (which can't be done).
>
> At the moment I follow the same steps, but it's a cumbersome path
> IMHO, where if you look at 'intent', it can be made simpler, IMHO. But I
> understand why adding this to the front end might make situations like the
> one above harder to deal with (although I'm interested in what you'd think
> the query above does, how it looks in SQL :))
It looks like this (from Linq2Sql):
SELECT [t0].[ID], [t0].[Timestamp], [t0].[OrderNo], [t0].
[DeliveryDate], [t0].[CustomerID], [t0].[OfficialID]
FROM [Order] AS [t0]
LEFT OUTER JOIN ([Order] AS [t1] -- B
LEFT OUTER JOIN [Company] AS [t2] ON [t2].[ID] = [t1].
[CustomerID]) ON [t0].[CustomerID] = ([t2].[ID])
CROSS JOIN ([Order] AS [t3] -- C
LEFT OUTER JOIN [Company] AS [t4] ON [t4].[ID] = [t3].
[CustomerID])
WHERE [t0].[CustomerID] = ([t4].[ID])
Our own SQL backend produces something similar (although we have a few
subqueries in that picturet because our subquery optimizer isn't
implemented yet).
Don't forget that a GroupJoin does not generate a SQL join _until its
result are actually used_. And every time its result are used (eg., in
a SelectMany), a new SQL join should be generated. Therefore, it's
easiest (IMO) to deal with the DefaultIfEmpty at the point where the
group join is used, which might be very far from the point where the
group join was written. re-linq creates references to the declaration
from the usage, so handling everything there (with an ordinary "join
handler", as you said) is fully possible. (Ignoring, of course, the
possibility of the group to stem from a subquery, which causes some
kind of totally different difficulties with references from outer
scope. Which are off-topic here.)
> IMHO the sole purpose of 'DefaultIfEmpty' is to use a sequence which
> can contain nulls due to a left join, and in fact, it's the only mechanism
> to force a left join into the final query (navigator traversal might result
> in left joins due to nullable FK fields, but that's not enforceable) using a
> 'join' operator or a navigator traversal which would otherwise result in an
> inner join.
>
> I.o.w.: if you see a DefaultIfEmpty, the sequence it is called on is
> a result of a left join, and you can treat that sequence as such. Dealing
> with it 'on the fly' otherwise is a very cumbersome task as it's 'too late':
> either group joins are already handled in some trees or other clauses refer
> to the defaultifempty sequence and do that by its own alias (and not the
> alias of the side the defaultifempty refers to)
That's the point - if you traverse the re-linq query model left-to-
right (main from clause, body clauses, select clause) and inside-out
(sub-queries first), which re-linq makes very easy for you to do, it's
not too late: as I've illustrated in my blog post, you're in exactly
the right position to choose what to make of the DefaultIfEmpty
operator. Even subquery optimization is easy there - no visitors
involved (see below).
> > In addition, as Stefan said, DefaultIfEmpty can be applied to any
> arbitrary
> > sequence in a query, not only group joins. A LINQ provider striving to
> > support DefaultIfEmpty should try to find a solution that works in all
> > cases, no matter where the DefaultIfEmpty operator is applied.
>
> yes, that's true. The second query I gave is indeed an example of
> that. Though is it really a matter of 'a lot of cases' btw? Aren't there
> just 2: 1) a group join result used with DefaultIfEMpty -> make groupjoin a
> left join and 2) the random sequence on which DefaultIfEmpty() is called,
> which also would result in a left join? (x=> c.Orders.DefaultIfEmpty())
Sure, DefaultIfEmpty always produces "outer" semantics (could become a
left outer join, or an outer apply); however, since the LINQ provider
has to deal with #2 anyways - and re-linq makes it easy to deal with
both cases at once -, there isn't much value in providing an up-front
simplification for only half the cases. (Especially given that the
simplification might be wrong, see above.)
Speaking from a library point of view (see below), of course; the
specific LINQ provider based on re-linq (or not) might make a
different decision.
> The part I'm struggling with is the query you gave above, which to
> me is unspecifyable in SQL, or at least I don't really see the 'intent' of
> what such a construct might be. Could you elaborate on that a bit please?
Well, the intent is of course stupid, but in the context of re-linq's
frontend we can't just rule out any stupid queries - chances are too
high that "sensible" use cases would produce the same structure. Every
re-linq backend can do that, of course, but the frontend can't.
from o in Orders
join oi in OrderItems on o equals oi.Order into orderItems
from oi in orderItems.DefaultIfEmpty()
select new { Order = o, Count = orderItems.Count(), OrderItem = oi }
This produces a very similar structure in the re-linq query model, for
example. (But I don't really want to discuss this query here, it's
just an example of a group join with and without DefaultIfEmpty making
sense.)
> > I've written a blog post about how I'd handle the DefaultIfEmpty query
> > operator in a SQL-translating LINQ provider here:
> >
https://www.re-motion.org/blogs/mix/archive/2010/09/04/handling-the-
> > defaultifempty-result-operator-in-a-re-linq-based-provider.aspx
> > . This describes more or less how we deal with DefaultIfEmpty in re-
> linq's
> > SQL backend, and I think you'll agree that with an approach such as the
> one
> > explained in the blog post, DefaultIfEmpty isn't any longer difficult to
> > implement.
>
> .. except that you have a tremendous amount of work ahead of you:
> optimize the query you're producing :). This is rather difficult to do
> though, Linq to SQL (the best linq provider out there still) uses a lot of
> visitors for this, which can take significant amount of time during query
> production.
In general, you're right. In this specific case, however, optimization
is really easy if you choose the right data structure for your SQL
output. In re-linq's SQL backend, for example, the subquery will
produce a "SqlStatement" instance. I can easily and locally inspect
that instance for complexities (no visitors involved) and flatten it
out if there are no such complexities involved.
> I do like the approach though, it makes things easier on the
> expression tree front, at the expense of optimizing it later (something the
> EF linq provider also does) with the assumption that optimizing queries is
> easier. It's at least more straight forward! ;)
And we both know how important every simplification is when we write a
LINQ provider :)
> What I learned from the demoscene so many years ago though, was that
> if you can do things up front, it will save you a lot of time later on.
Yep, definitely true - I'm not denying this, and we use this approach
ourselves in a few places. I'm just saying it's not necessary for
DefaultIfEmpty because re-linq already gives you enough structural
simplifications to deal with it nicely.
> I.o.w.: if you run into a defaultifempty, you can also think: why is it
> there? My point is then: what if the answer to that question is: "to make a
> join a left join"? This info is then usable to rework the tree a bit to
> reflect that info so the expression tree handling is easier and you don't
> need optimization afterwards. The question of course is: is it indeed only
> there for that purpose or not? (in the context of database targeting
> queries, as not all IQueryable operators make sense in every location of
> such a query when you see them in the context of db targeting queries).
>
> I have no clear answer to that. THe query you gave above as an
> example clearly proves me wrong, so we can look into other ways perhaps to
> deal with this, but on the other hand, the query construct also might not
> make much sense in the context of a db query, which may mitigate the
> argument. IF we can find evidence DefaultIfEmpty is there for making joins
> left joins, things will get much easier to implement IMHO (including group
> join).
I think the important point here is the distinction between re-linq's
frontend and any SQL backends. In the frontend, we have a "library"
context and cannot make any assumptions; the semantics of the query
model _must_ be the same as in the original query. After all, it's
also used by projects like LINQ-to-Excel, which might not have any
problems with the DefaultIfEmpty operator at all. (I don't know that,
it's just an example.)
Any backend, however, will have to make its own decisions about
simplifications. And the good thing is: re-linq doesn't keep you from
implementing them; any backend can change the query model and replace
the GroupJoinClause with a LeftJoinClause any time it wants. We just
can't do it as a default.
Regards,
Fabian
> > On Sep 4, 12:02 pm, "Frans Bouma" <
fr...@sd.nl> wrote:
> > > > > from timeHeader in metaData.NonPresentTimeHeader where
> > > > > timeHeader.Id == 6 join time in metaData.NonPresentTime on
> > > > > timeHeader.Id equals time.HeaderId into timeJoin from joinedTime
> > > > > in
> > > > > timeJoin.DefaultIfEmpty() select joinedTime
>
> > > > > What might be a solution is to add a 'jointype' to the groupjoin
> > > > > class of re-linq and switch it to 'left join' when you handle the
> > > > > p=>DefaultIfEmpty() construct of a re-linq tree.
>
> > > > Hi Frans,
>
> > > > we could transform simple left-join-via-DefaultIfEmpty to left join
> > > clauses
> > > > in re-linq's QueryModel, but that would be a very simplistic
> > > > solution that could not support any but the most straightforward LINQ
> > clauses. (i.e.
> > > only
> > > > those where someone implements left join as they found out via
> > > > Google, but for instance not any scenario where the query further
> > > > references the intermediate join clause (timeJoin in your sample).
>
> > > A simple, stupid, but illustrative example: ('ctxt' is
> > > session.Linq... ) var q = from c in ctxt.Order
> > > join o in ctxt.Order on c.CustomerId equals
> > > o.Customer.CustomerId Into co // A
> > > from x in co.DefaultIfEmpty() //B
> > > select c;
>
> > > Lines A and B form a SelectMany, due to the from in B. At the left
> > > side, you'll have the group join in A and on the right side you have
> > > the right side of the group join in A.
>
> > > This can be transformed into keeping A, and using B to adjust the
> > > group join. The main advantage is that the group join contains the
> > > selectors and in C#'s case also the projection.
>
> > > So if re-linq could transform this into (pseudo code!) var q = from c
> > > in ctxt.Order
> > > (left) join o in ctxt.Order on c.CustomerId equals
> > > o.Customer.CustomerId Into co select c;
>
> > > it would be a big win. Big problem is the code which references the
> > > DefaultIfEmpty result as that has to be changed to references to the
> > > right-side of the join.
>
> > > It would help a lot, because it would make a join handler in a linq
> > > provider much easier, in fact, an existing join handler (which can
> > > handle the normal,
> > > 'join') would likely already work.
>
> > > > We think that DefaultIfEmpty is best solved in each back-end
> > > > individually, but of course nothing stops anyone from extending the
> > > > front-end to create that kind of QueryModel via an optional
> > > > transformation step. Just be aware that this only takes you so far,
> > > > it would probably result in the LINQ provider rejecting every use of
> > > > DefaultIfEmtpy in joins that cannot be reduced to that pattern.
>
> > > You mean:
> > > var q = from c in ctxt.Order
> > > from o in c.Orders.DefaultIfEmpty()
> > > select c;
>
> > > ?
>
> > > FB- Hide quoted text -
>
> > > - Show quoted text-- Hide quoted text -
>
On Sep 5, 11:49 am, "Frans Bouma" <
fr...@sd.nl> wrote:
> > We try to keep the re-linq front-end as general as possible. The fact that
> > DefaultIfEmpty causes a GroupJoin to turn into a left-outer join in your
> > example is true, but it's not as simple generally speaking.
> > Consider the following slightly modified version of your query:
>
> > var q = from c in ctxt.Order
> > join o in ctxt.Order on c.CustomerId equals o.Customer.CustomerId
> > into co // A
> > from x in co.DefaultIfEmpty() //B
> > from y in co //C
> > select c;
>
> > Here, B uses the group join in a left-outer join fashion, but C uses it in
> > an ordinary cross join fashion.
>
> Is that really the case? 'y' is an order, same as 'x', but 'co' is
> the same set as when the DefaultIfEmpty() call is made.
>
> I do understand why you're saying it, from the perspective of
> sequences, it makes sense, however from the perspective of sets and SQL, I
> don't really know how to formulate the query above, because one has to join
> co again with... co ? (which can't be done).
>
> At the moment I follow the same steps, but it's a cumbersome path
> IMHO, where if you look at 'intent', it can be made simpler, IMHO. But I
> understand why adding this to the front end might make situations like the
> one above harder to deal with (although I'm interested in what you'd think
> the query above does, how it looks in SQL :))
>
>
>
> > The simplification you suggested (keeping A
> > and flagging it as a left join) won't work here. And my opinion is that if
> > we can't apply it generally, we shouldn't apply it at all. (As a
> predefined
> > transformation in the re-linq front-end, that is. Any LINQ provider can of
> > course choose to build this simplification if it helps.)
>
> IMHO the sole purpose of 'DefaultIfEmpty' is to use a sequence which
> can contain nulls due to a left join, and in fact, it's the only mechanism
> to force a left join into the final query (navigator traversal might result
> in left joins due to nullable FK fields, but that's not enforceable) using a
> 'join' operator or a navigator traversal which would otherwise result in an
> inner join.
>
> I.o.w.: if you see a DefaultIfEmpty, the sequence it is called on is
> a result of a left join, and you can treat that sequence as such. Dealing
> with it 'on the fly' otherwise is a very cumbersome task as it's 'too late':
> either group joins are already handled in some trees or other clauses refer
> to the defaultifempty sequence and do that by its own alias (and not the
> alias of the side the defaultifempty refers to)
>
>
>
> > In addition, as Stefan said, DefaultIfEmpty can be applied to any
> arbitrary
> > sequence in a query, not only group joins. A LINQ provider striving to
> > support DefaultIfEmpty should try to find a solution that works in all
> > cases, no matter where the DefaultIfEmpty operator is applied.
>
> yes, that's true. The second query I gave is indeed an example of
> that. Though is it really a matter of 'a lot of cases' btw? Aren't there
> just 2: 1) a group join result used with DefaultIfEMpty -> make groupjoin a
> left join and 2) the random sequence on which DefaultIfEmpty() is called,
> which also would result in a left join? (x=> c.Orders.DefaultIfEmpty())
>
> The part I'm struggling with is the query you gave above, which to
> me is unspecifyable in SQL, or at least I don't really see the 'intent' of
> what such a construct might be. Could you elaborate on that a bit please?
>
>
>
> > I've written a blog post about how I'd handle the DefaultIfEmpty query
> > operator in a SQL-translating LINQ provider here:
> >
https://www.re-motion.org/blogs/mix/archive/2010/09/04/handling-the-
> > defaultifempty-result-operator-in-a-re-linq-based-provider.aspx
> > . This describes more or less how we deal with DefaultIfEmpty in re-
> linq's
> > SQL backend, and I think you'll agree that with an approach such as the
> one
> > explained in the blog post, DefaultIfEmpty isn't any longer difficult to
> > implement.
>
> .. except that you have a tremendous amount of work ahead of you:
> optimize the query you're producing :). This is rather difficult to do
> though, Linq to SQL (the best linq provider out there still) uses a lot of
> visitors for this, which can take significant amount of time during query
> production.
>
> I do like the approach though, it makes things easier on the
> expression tree front, at the expense of optimizing it later (something the
> EF linq provider also does) with the assumption that optimizing queries is
> easier. It's at least more straight forward! ;)
>
> What I learned from the demoscene so many years ago though, was that
> if you can do things up front, it will save you a lot of time later on.
> I.o.w.: if you run into a defaultifempty, you can also think: why is it
> there? My point is then: what if the answer to that question is: "to make a
> join a left join"? This info is then usable to rework the tree a bit to
> reflect that info so the expression tree handling is easier and you don't
> need optimization afterwards. The question of course is: is it indeed only
> there for that purpose or not? (in the context of database targeting
> queries, as not all IQueryable operators make sense in every location of
> such a query when you see them in the context of db targeting queries).
>
> I have no clear answer to that. THe query you gave above as an
> example clearly proves me wrong, so we can look into other ways perhaps to
> deal with this, but on the other hand, the query construct also might not
> make much sense in the context of a db query, which may mitigate the
> argument. IF we can find evidence DefaultIfEmpty is there for making joins
> left joins, things will get much easier to implement IMHO (including group
> join).
>
> Disclaimer: I haven't looked at these constructs for quite a while,
> (read: more 1.5 years) so I might overlook an important aspect.
>
> FB
>
>
>
>
>
> > Cheers,
> > Fabian
>
> > On Sep 4, 12:02 pm, "Frans Bouma" <
fr...@sd.nl> wrote:
> > > > > from timeHeader in metaData.NonPresentTimeHeader where
> > > > > timeHeader.Id == 6 join time in metaData.NonPresentTime on
> > > > > timeHeader.Id equals time.HeaderId into timeJoin from joinedTime
> > > > > in
> > > > > timeJoin.DefaultIfEmpty() select joinedTime
>
> > > > > What might be a solution is to add a 'jointype' to the groupjoin
> > > > > class of re-linq and switch it to 'left join' when you handle the
> > > > > p=>DefaultIfEmpty() construct of a re-linq tree.
>
> > > > Hi Frans,
>
> > > > we could transform simple left-join-via-DefaultIfEmpty to left join
> > > clauses
> > > > in re-linq's QueryModel, but that would be a very simplistic
> > > > solution that could not support any but the most straightforward LINQ
> > clauses. (i.e.
> > > only
> > > > those where someone implements left join as they found out via
> > > > Google, but for instance not any scenario where the query further
> > > > references the intermediate join clause (timeJoin in your sample).
>
> > > A simple, stupid, but illustrative example: ('ctxt' is
> > > session.Linq... ) var q = from c in ctxt.Order
> > > join o in ctxt.Order on c.CustomerId equals
> > > o.Customer.CustomerId Into co // A
> > > from x in co.DefaultIfEmpty() //B
> > > select c;
>
> > > Lines A and B form a SelectMany, due to the from in B. At the left
> > > side, you'll have the group join in A and on the right side you have
> > > the right side of the group join in A.
>
> > > This can be transformed into keeping A, and using B to adjust the
> > > group join. The main advantage is that the group join contains the
> > > selectors and in C#'s case also the projection.
>
> > > So if re-linq could transform this into (pseudo code!) var q = from c
> > > in ctxt.Order
> > > (left) join o in ctxt.Order on c.CustomerId equals
> > > o.Customer.CustomerId Into co select c;
>
> > > it would be a big win. Big problem is the code which references the
> > > DefaultIfEmpty result as that has to be changed to references to the
> > > right-side of the join.
>
> > > It would help a lot, because it would make a join handler in a linq
> > > provider much easier, in fact, an existing join handler (which can
> > > handle the normal,
> > > 'join') would likely already work.
>
> > > > We think that DefaultIfEmpty is best solved in each back-end
> > > > individually, but of course nothing stops anyone from extending the
> > > > front-end to create that kind of QueryModel via an optional
> > > > transformation step. Just be aware that this only takes you so far,
> > > > it would probably result in the LINQ provider rejecting every use of
> > > > DefaultIfEmtpy in joins that cannot be reduced to that pattern.
>
> > > You mean:
> > > var q = from c in ctxt.Order
> > > from o in c.Orders.DefaultIfEmpty()
> > > select c;
>
> > > ?
>
> > > FB- Hide quoted text -
>
> > > - Show quoted text-- Hide quoted text -
>
> - Show quoted text -