[re-linq] How to recognize if a method is a query operator?

119 views

Skip to first unread message

Fabian Schmied

unread,

Dec 20, 2011, 4:48:08 AM12/20/11

to re-motion Developers

On the users’ mailing list
(http://groups.google.com/group/re-motion-users/browse_thread/thread/ca18acd6e43945a6),
Alex Norcliffe, Lead Architect of Umbraco 5, describes a problem they
are currently facing with re-linq. As an illustration, consider the
following two queries, especially the sub-queries within the where
clauses:

var query1 = from c in QuerySource
where c.Assistants.Any ()
select c;

var query2 = from c in QuerySource
where c.GetAssistants().Any()
select c;

When re-linq analyzes those sub-queries within the where clauses, the
first query will produce a SubQueryExpression with a QueryModel whose
MainFromClause has the following expression: [c].Assistants. In other
words, the items produced by the sub-query are those identified by the
Assistants property.

The second query, however, will produce an exception:

Remotion.Linq.Parsing.ParserException : Cannot parse expression ‘c’ as
it has an unsupported type. Only query sources (that is, expressions
that implement IEnumerable) and query operators can be parsed.
—-> Remotion.Linq.Utilities.ArgumentTypeException : Expected a type
implementing IEnumerable<T>, but found
‘Remotion.Linq.UnitTests.Linq.Core.TestDomain.Cook’.

Why’s that?

re-linq assumes that all methods occurring in a query operator call
chain should be treated like query operators (Where, Select, etc.).
This means that for the sub-query within query2, re-linq regards c as
the start of the query operator chain. And, since c’s type does not
implement IEnumerable<T>, it throws the exception shown above. Even if
c’s type implemented IEnumerable<T>, an exception would be thrown that
GetAssistants() “is currently not supported”, unless one registers a
custom node parser for that method.

Of course, what Alex actually wanted was re-linq treating both query1
and query2 in an equivalent way. I.e., a SubQueryExpression with a
QueryModel whose MainFromClause has the following expression:
[c].GetAssistants().

There is an easy workaround for now (see the mailing list), but I’m
wondering how we could change re-linq to produce this result out of
the box. I can think of two possibilities, both of which have certain
drawbacks:

1 – Have re-linq treat MethodCallExpressions the same way as
MemberExpressions. I.e, if the method has a registered node parser,
treat it as a query operator. Otherwise, treat it (and all expression
parts of the call chain before it) as the start of the query.

This would work quite well in the scenario shown above, and it would
be nicely symmetric to how MemberExpressions work in re-linq.

However, it would become a very breaking change regarding diagnostics.
Consider this example, in which CustomOperator is actually a custom
query operator:

source.Where(…).CustomOperator().Select(…)

Currently, re-linq will throw an exception that it can’t parse
CustomOperator() if one forgets to register the respective parser, and
the LINQ provider backend won’t even get a QueryModel to process.

If we change this behavior, the frontend will no longer throw an
exception, and the backend will suddenly get a MainFromClause with a
FromExpression of "[source].Where(…).CustomOperator()". I think it
would be difficult to understand for LINQ provider implementers why
exactly this occurs. I can even imagine people believing this must be
"right" (as no exception occurred) and start manually parsing the
Where(…) and CustomOperator() calls, effectively reimplementing logic
from re-linq…

2 – Have re-linq only treat MethodCallExpressions called on
enumerables as query operators. Otherwise, treat them (and all
expression parts of the call chain before the method call) as the
start of the query.

This would also work in the given scenario, and it has the advantage
of still providing good diagnostics when methods taking IEnumerable<T>
have no associated expression node parser. However, it’s still a
heuristic way of parsing, and it is asymmetric (both with
MemberExpressions and in itself). Consider the following three
examples:

instanceImplementingEnumerable.StartQueryHere().Where(…)
instanceNotImplementingEnumerable.StartQueryHere().Where(…)
instanceImplementingEnumerable.StartQueryHere.Where(…)

re-linq would parse the StartQueryHere method in the first example as
a query operator (and throw an exception if there isn’t an expression
node parser registered for it). The StartQueryHere method and property
in the second and third example, on the other hand, would parse just
fine. I believe this is difficult to understand just as well.

What do other people think of these two options? If you want to see
this scenario to be supported out of the box, please give me some
feedback about it.

Fabian

Fabian Schmied

unread,

Dec 20, 2011, 4:49:57 AM12/20/11

to re-motion Developers

Here's a blog post with the same text, but more nicely formatted:
"https://www.re-motion.org/blogs/mix/2011/12/20/re-linq-how-to-recognize-if-a-method-is-a-query-operator/".

Wenig, Stefan

unread,

Dec 20, 2011, 11:07:30 AM12/20/11

to re-mot...@googlegroups.com

Forgive my oversimplified view of things (again), but how about this:
- make this decision via a strategy or some such
- default strategy = strict (as currently implemented)
- optionally, provide your options 1 and 2 as stock strategies that the user can just pick
- modify the exception message to include a hint about likely causes and their respective remedies (missing custom node parser: register one; want to get the literal expression: use another strategy)

No surprises (assuming strategies are not picked randomly), easy to diagnose, easy to fix. The user needs to take some time to understand the concepts behind it, but LINQ providers are not for the faint of heart anyway.

Stefan

> --> Remotion.Linq.Utilities.ArgumentTypeException : Expected a type

> implementing IEnumerable<T>, but found
> 'Remotion.Linq.UnitTests.Linq.Core.TestDomain.Cook'.
>
> Why's that?
>
> re-linq assumes that all methods occurring in a query operator call
> chain should be treated like query operators (Where, Select, etc.).
> This means that for the sub-query within query2, re-linq regards c as
> the start of the query operator chain. And, since c's type does not
> implement IEnumerable<T>, it throws the exception shown above. Even if
> c's type implemented IEnumerable<T>, an exception would be thrown that
> GetAssistants() "is currently not supported", unless one registers a
> custom node parser for that method.
>
> Of course, what Alex actually wanted was re-linq treating both query1
> and query2 in an equivalent way. I.e., a SubQueryExpression with a
> QueryModel whose MainFromClause has the following expression:
> [c].GetAssistants().
>
> There is an easy workaround for now (see the mailing list), but I'm
> wondering how we could change re-linq to produce this result out of the
> box. I can think of two possibilities, both of which have certain
> drawbacks:
>

> 1 - Have re-linq treat MethodCallExpressions the same way as

> MemberExpressions. I.e, if the method has a registered node parser,
> treat it as a query operator. Otherwise, treat it (and all expression
> parts of the call chain before it) as the start of the query.
>
> This would work quite well in the scenario shown above, and it would be
> nicely symmetric to how MemberExpressions work in re-linq.
>
> However, it would become a very breaking change regarding diagnostics.
> Consider this example, in which CustomOperator is actually a custom
> query operator:
>

> source.Where(...).CustomOperator().Select(...)

>
> Currently, re-linq will throw an exception that it can't parse
> CustomOperator() if one forgets to register the respective parser, and
> the LINQ provider backend won't even get a QueryModel to process.
>
> If we change this behavior, the frontend will no longer throw an
> exception, and the backend will suddenly get a MainFromClause with a

> FromExpression of "[source].Where(...).CustomOperator()". I think it

> would be difficult to understand for LINQ provider implementers why
> exactly this occurs. I can even imagine people believing this must be
> "right" (as no exception occurred) and start manually parsing the

> Where(...) and CustomOperator() calls, effectively reimplementing logic
> from re-linq...
>
> 2 - Have re-linq only treat MethodCallExpressions called on enumerables

> as query operators. Otherwise, treat them (and all expression parts of
> the call chain before the method call) as the start of the query.
>
> This would also work in the given scenario, and it has the advantage of
> still providing good diagnostics when methods taking IEnumerable<T>
> have no associated expression node parser. However, it's still a
> heuristic way of parsing, and it is asymmetric (both with
> MemberExpressions and in itself). Consider the following three
> examples:
>

> instanceImplementingEnumerable.StartQueryHere().Where(...)
> instanceNotImplementingEnumerable.StartQueryHere().Where(...)
> instanceImplementingEnumerable.StartQueryHere.Where(...)

>
> re-linq would parse the StartQueryHere method in the first example as a
> query operator (and throw an exception if there isn't an expression
> node parser registered for it). The StartQueryHere method and property
> in the second and third example, on the other hand, would parse just
> fine. I believe this is difficult to understand just as well.
>
> What do other people think of these two options? If you want to see
> this scenario to be supported out of the box, please give me some
> feedback about it.
>
> Fabian
>

> --
> You received this message because you are subscribed to the Google
> Groups "re-motion Developers" group.
> To post to this group, send email to re-mot...@googlegroups.com.
> To unsubscribe from this group, send email to re-motion-
> dev+uns...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/re-motion-dev?hl=en.

Fabian Schmied

unread,

Dec 20, 2011, 1:45:58 PM12/20/11

to re-mot...@googlegroups.com

> Forgive my oversimplified view of things (again), but how about this:
> - make this decision via a strategy or some such
> - default strategy = strict (as currently implemented)
> - optionally, provide your options 1 and 2 as stock strategies that the user can just pick

Well, for me, this is not really about providing options, but about
making the best out-of-the-box experience. (There are options to get
the each of the behaviors today, although they could probably be made
easier by making a few methods virtual...)

The current out-of-the-box experience is confusing, as Alex Norcliffe
has experienced, mostly because MethodCallExpressions are dealt with
differently than MemberExpressions are. Why does his query work with a
property, but not with an extension method? He's not the first person
to have encountered this; I think the NHibernate people had the same
problem at some time. The question is whether one of the other options
(1 or 2) would be "better" defaults, and whether there is a third
alternative that I'm missing. My option 1 would make the
out-of-the-box handling of MethodCallExpressions exactly the same as
that of MemberExpressions, which is a good thing, I think. But is it
worth the loss of diagnostics about unrecognized (potential) query
operators? Is that diagnostics even important? After all, the LINQ
provider's backend would encounter the unknown method anyway, and
could throw an exception at that point.

The problem is that both variants are not optimal, but which one is
the better default one? The consistent variant where the LINQ provider
implementer gets surprised about unhandled query operators in the
MainFromClause, or the strict one where the implementer gets surprised
about properties (and other expressions) being dealt with more
gracefully than method calls?

> - modify the exception message to include a hint about likely causes
> and their respective remedies (missing custom node parser: register
> one; want to get the literal expression: use another strategy)
> No surprises (assuming strategies are not picked randomly), easy to
> diagnose, easy to fix. The user needs to take some time to
> understand the concepts behind it, but LINQ providers are not for the
> faint of heart anyway.

That's of course an interesting point; if the default stays as it is
(i.e, exception for unknown methods), the exception message could give
a hint about how to fix the situation. In fact, it once contained such
a hint ("the method overload is currently supported, but you can
register an expression node parser for it"). The problem is that this
is an exception message thrown at the user of the LINQ provider, not
necessarily at the implementer. When a LINQ provider doesn't handle a
certain query operator and a user writes a query with that operator in
it, the user will get that "is currently not supported" message.
Telling him that he can register an expression node parser (or do
something else that really needs to be done by the LINQ provider
implementer) would be very confusing. That's why I removed that part
of the exception message.

Regards,
Fabian

> To unsubscribe from this group, send email to re-motion-de...@googlegroups.com.

Wenig, Stefan

unread,

Dec 21, 2011, 8:49:25 AM12/21/11

to re-mot...@googlegroups.com

> > Forgive my oversimplified view of things (again), but how about this:
> > - make this decision via a strategy or some such
> > - default strategy = strict (as currently implemented)
> > - optionally, provide your options 1 and 2 as stock strategies that
> > the user can just pick
>
> Well, for me, this is not really about providing options, but about
> making the best out-of-the-box experience. (There are options to get
> the each of the behaviors today, although they could probably be made
> easier by making a few methods virtual...)

That's why I proposed an unambiguous default, good messages plus alternative strategies for provider authors to pick once they understand the situation.

> The question is whether one of the other options
> (1 or 2) would be "better" defaults, and whether there is a third
> alternative that I'm missing.

I got that. My answer is no, as long as we make it easy to diagnose, understand and fix.

> The problem is that this
> is an exception message thrown at the user of the LINQ provider, not
> necessarily at the implementer. When a LINQ provider doesn't handle a
> certain query operator and a user writes a query with that operator in
> it, the user will get that "is currently not supported" message.
> Telling him that he can register an expression node parser (or do
> something else that really needs to be done by the LINQ provider
> implementer) would be very confusing. That's why I removed that part of
> the exception message.

Good point. How about providing simple messages dedicated at provider users and including inner exceptions dedicated at provider authors?

We might consider error codes too, they can easily be used to build a knowledge base (Wiki) and are Google-friendly.