Linq OrderBy and left joins

76 views
Skip to first unread message

Steve Strong

unread,
Jul 8, 2010, 5:24:57 AM7/8/10
to nhibernate-...@googlegroups.com
I've been pondering this thread over on nh-users http://groups.google.com/group/nhusers/browse_thread/thread/1ff37a36e9f975ce?hl=en#

To save you reading it all, the guy is try to do this:

Session.Query<Project>().OrderBy(p.Type.Label).ToList() 

but due to the reference to p.Type there is a join being inserted, which means that there is some un-intentional filtering going on (he doesn't receive any results where p.Type is null).

This is analogous to hitting a child reference within a select, for example

Session.Query<Project>().Select(p.Type.Label).ToList() 

In this case, the provider (correctly, imo) generates outer joins to ensure that filtering is not taking place.  In my view, filtering of the results should only happen if there is an explicit join or where clause.

So I'm proposing that I process OrderBy clauses in the same way as I handle Select, and introduce outer joins as appropriate - this is technically a breaking change, hence the post here to see if you all agree.

Cheers,

Steve

Richard Brown (gmail)

unread,
Jul 8, 2010, 5:50:34 AM7/8/10
to nhibernate-...@googlegroups.com
Agreed, the join should be outer (i.e., shouldn't filter out null values).
 
One consideration though; presumably the reason we like it not to filter is because it behaves the same way as the in-memory LINQ provider (Queryable<T>).  However wouldn't each of these examples generate a null reference exception?  (Should the LINQ provider do the same?)

Fabio Maulo

unread,
Jul 8, 2010, 7:58:19 AM7/8/10
to nhibernate-...@googlegroups.com
Super agree (have a look to HQL behavior ;) ) 
--
Fabio Maulo

Steve Strong

unread,
Jul 8, 2010, 8:54:01 AM7/8/10
to nhibernate-...@googlegroups.com
Where I've been unsre as to the correct behaviour, I've been using Linq-to-Sql as my guide - Linq-to-objects, although better defined, doesn't really map so well to the database world, things like the null reference exception that would indeed happen with these queries being a good example.

Richard Brown (gmail)

unread,
Jul 8, 2010, 9:02:17 AM7/8/10
to nhibernate-...@googlegroups.com
Understood.  Probably not helpful to throw the exception anyway.  Cheers.

Fabio Maulo

unread,
Jul 8, 2010, 9:29:57 AM7/8/10
to nhibernate-...@googlegroups.com
On Thu, Jul 8, 2010 at 9:54 AM, Steve Strong <srst...@gmail.com> wrote:
I've been using Linq-to-Sql as my guide 

Heretic in the pillory!!!

--
Fabio Maulo

Steve Strong

unread,
Jul 8, 2010, 9:31:38 AM7/8/10
to nhibernate-...@googlegroups.com
LOL!

Diego Mijelshon

unread,
Jul 8, 2010, 10:33:45 AM7/8/10
to nhibernate-...@googlegroups.com
I agree with the outer join behavior, as it's the safest from a data point of view.

In fact, I recently wrote a patch for DynamicQueryable that provides the same behavior when using Linq-to-objects (with some limitations)

Since it's originally copyrighted by Microsoft (and I couldn't find a license in the zip), I'll take it down if they request it.

   Diego

Frans Bouma

unread,
Jul 18, 2010, 6:34:00 AM7/18/10
to nhibernate-development
> I've been pondering this thread over on nh-usershttp://groups.google.com/group/nhusers/browse_thread/thread/1ff37a36e...
>
> To save you reading it all, the guy is try to do this:
>
> Session.Query<Project>().OrderBy(p.Type.Label).ToList()
>
> but due to the reference to p.Type there is a join being inserted, which
> means that there is some un-intentional filtering going on (he doesn't
> receive any results where p.Type is null).
>
> This is analogous to hitting a child reference within a select, for example
>
> Session.Query<Project>().Select(p.Type.Label).ToList()
>
> In this case, the provider (correctly, imo) generates outer joins to ensure
> that filtering is not taking place.  In my view, filtering of the results
> should only happen if there is an explicit join or where clause.

You should do the following:
- FKside LEFT JOIN PKside, when at least 1 of the FK fields is
nullable
- FKside INNER JOIN PKside when no Fk field is nullable.

One could argue to always do an FKSide LEFT JOIN PKside, even if the
fk side has no nullable fk fields, though it might be an RDBMS can
optimize the INNER JOIN better than the LEFT JOIN. Linq to SQL (and
also my linq provider) uses this left join / inner join system.

What I did was in any expression I detect on my own in the expression
tree, a property collects the relationships navigated. So the lambda
p=>p.Type.Label, results in a Field Expression which contains in its
AdditionalRelationships property the relationship navigated by the
navigator 'Type'. The OrderBy (which returns a query) then appends the
additional relationships gathered to its body's relationships with the
proper left/inner statement. Similar in where and other extension
methods which result in a query.

There are a couple of miserable situations you've to deal with. I'll
list two examples below.
1) duplicate relationships with different join characteristics.
var q = from c in metaData.Customer // A
from o in c.Orders // B
....

here, the multiple from clauses form a cross-join between customer and
order. However, the line B contains the _same_ relationship with
either an INNER join (if o.CustomerId is not nullable) or a LEFT join
towards the FK side (order) if o.CustomerID is nullable). You have to
pick the second one over the first one, so actually ignore the
navigator originating relationship and use its join characteristics to
adjust the cross-join

2) where / other derived table creating method inside navigation
clause
var q = from c in metaData.Customer
from o in c.Orders.Where(o=>o.OrderDetails.Count() > 10)
...

Here, the join is different, as the 'Where' clause creates a derived
table / subquery on Orders, however it's correlated with the
customer.
The trick is to create a visitor which removes these where clauses
(and order by etc.) from join sides. This gives a lot of headaches in
the alias/scope department but it also makes it easier to deal with
complex joins.

> So I'm proposing that I process OrderBy clauses in the same way as I handle
> Select, and introduce outer joins as appropriate - this is technically a
> breaking change, hence the post here to see if you all agree.

There's a problem with relationships navigated in the projection: it's
sometimes cumbersome to determine whether a scalar query is meant or a
member navigation. (like messy queries with select(x=>new
{x.Foo.Bar.First().SomeNavigator.FieldName, ....}). This example for
example requires that from 'x' everything is a scalar, so
Foo.Bar.First() all have to be in the scalar query, which is embedded
in the projection. However you find that out when you run into
'First()', likely completely elsewhere than where you evaluate the
memberexpressions for Foo and Bar. So it's not always that simple ;)

Another thing which will pop up in the above example is that if you
assign an alias to Foo and Bar, and the developer placed a second time
the whole x.Foo.Bar line in the projection (this happens, trust
me ;)), you need to assign different aliases. The problem is however
that you don't always want to do that:
.Select(x=>new { x.Customer.CompanyName, x.Customer.CustomerId, ...}),
here you want to have 1 join with Customer and assign it 1 alias.

FB

Richard Birkby

unread,
Jul 18, 2010, 6:48:08 AM7/18/10
to nhibernate-...@googlegroups.com
On 18 Jul 2010, at 11:34, Frans Bouma <fr...@sd.nl> wrote:
>
> One could argue to always do an FKSide LEFT JOIN PKside, even if the
> fk side has no nullable fk fields, though it might be an RDBMS can
> optimize the INNER JOIN better than the LEFT JOIN.

Yes - SQL compact will not rearrange tables in a queryplan when it has
an outer join. Ironically, i've used this fact to force a more optimal
queryplan by using an outer join in HQL.

Richard

Fabio Maulo

unread,
Jul 18, 2010, 8:09:46 AM7/18/10
to nhibernate-...@googlegroups.com
The main difference between LINQ and HQL is that HQL was designed to
be translated to SQL.
We have to design our systems to use the best query-tool for a specific task.
I'm working with NH3 in production with more than one project using
HQL, QueryOver, LINQ, H-SQL, SQL and even Criteria (variable no
strongly typed queries).
We don't have to limit our possibilities (EQO is with me ;) )

--
Fabio Maulo

Frans Bouma

unread,
Jul 18, 2010, 8:20:59 AM7/18/10
to nhibernate-...@googlegroups.com
> The main difference between LINQ and HQL is that HQL was designed to be
> translated to SQL.
> We have to design our systems to use the best query-tool for a specific
> task.
> I'm working with NH3 in production with more than one project using HQL,
> QueryOver, LINQ, H-SQL, SQL and even Criteria (variable no strongly typed
> queries).
> We don't have to limit our possibilities (EQO is with me ;) )

It's also not wise to limit a query api to just Linq, as you can't
express all queries in linq, for example a simple join with an ON clause
using an expression other than an equality operator is not possible: you've
to revert to cross joins with a where clause, which not always can be
optimized properly by the RDBMS compared to an ansi join. This is IMHO als
one of the more cumbersome aspects of MS' O/R mappers and one which is often
ignored by bookwriters/speakers, I wonder why ;)

FB

Fabio Maulo

unread,
Jul 18, 2010, 8:39:00 AM7/18/10
to nhibernate-...@googlegroups.com
About this matter we can talk a lot...
Do you remember that times when to write hardcoded SQL everywhere was a bad choice ?

Well... now I'm seeing LINQ-to-persistence wrote everywhere in the application and people continue saying me that is not the same.

P.S. I have some white hair, a wife, a daughter and a dog and I know something about the immortality of the medusa.
--
Fabio Maulo

Kakone

unread,
Sep 9, 2010, 5:40:18 AM9/9/10
to nhibernate-development
Hello,

Do you think this behavior will be changed in NH3 RTM ? Actually, the
provider always generates an inner join (I think also it's a good idea
to generate an outer Join).

Cordially.
Reply all
Reply to author
Forward
0 new messages