nested from clauses

27 views
Skip to first unread message

Gordon Watts

unread,
Dec 1, 2010, 1:40:52 AM12/1/10
to re-motion Users
Hi,
I'm new to re-linq. I've managed to get a very simple query provider
running that can do things like

int result = ntuple.Count();

and even define my own custom operator

var histogram = ntuple.ProjectTo1DHisto(x => x.run, 100, 10.0,
1000.0);

So, this "ntuple", which is my query provider, is similar to a
database, but is really closer to a sequential or streaming database.
And each entry can contain sub-arrays. So, for example, it makes sense
to write:

var histogram = ntuple.SelectMany(e => e.inputJets).ProjectTo1DHisto(x
=> x.Pt(), 100, 0.0, 100000.0);

But, I don't understand how to translate this! As I'm sure anyone
knows that has dealt with this, this is passed into relinq with
"ntuple.SelectMany()" as the "fromClause" of a query, and the
ProjectTo1DHisto (or Count()) as the result operator. What I can't
figure out how to do is implement the parsing of the from operator...

My guess is I need to put some code in my VisitMainFromClause method
in my query visitor object. But getting at that expression... I'm not
sure how. The fromClause.FromExpression is coming in as a
ConstantExpression. I then have to look at the type, see that it is my
queriable with a different template argument, somehow type-cast it
(not sure how to do that when I don't know the type ahead of time),
etc. Looking at blog (https://www.re-motion.org/blogs/mix/archive/
2009/07/10/flattening-sub-queries-in-from-clauses.aspx) by fabian it
looks like it is accepted as "easy". :-) The codeproject example seems
to expect that the from clause be very simple (not "nested") - though
it seems to allow for multiple ones (??).

Many thanks in advance - pointers to sample code are also helpful if
this is fairly complex!

Fabian Schmied

unread,
Dec 1, 2010, 2:41:56 AM12/1/10
to re-moti...@googlegroups.com
Hi Gordon,

> So, this "ntuple", which is my query provider, is similar to a
> database, but is really closer to a sequential or streaming database.
> And each entry can contain sub-arrays. So, for example, it makes sense
> to write:
>
> var histogram = ntuple.SelectMany(e => e.inputJets).ProjectTo1DHisto(x
> => x.Pt(), 100, 0.0, 100000.0);
>
> But, I don't understand how to translate this! As I'm sure anyone
> knows that has dealt with this, this is passed into relinq with
> "ntuple.SelectMany()" as the "fromClause" of a query, and the
> ProjectTo1DHisto (or Count()) as the result operator. What I can't
> figure out how to do is implement the parsing of the from operator...

Your query will have a QueryModel with the following elements:
- A MainFromClause representing the "ntuple" query source. Its
FromExpression is a ConstantExpression that contains the object
identified by "ntuple".
- An AdditionalFromClause (in the BodyClauses collection) representing
the SelectMany. Its FromExpression is a MemberExpression whose Member
is "inputJets" and whose inner Expression is a reference to the
MainFromClause (QuerySourceReferenceExpression). This means that the
second from expression refers to the member "inputJets" of the items
yielded by the MainFromClause.
- A SelectClause whose Selector is a reference to the
AdditionalFromClause (QuerySourceReferenceExpression). This means that
the query selects the items yielded by the AdditionalFromClause.
- A custom result operator in the ResultOperators collection. If your
expression node parser resolves the LambdaExpression (as in the sample
on my blog), the result operator will have a MethodCallExpression
whose Object is a reference to the AdditionalFromClause (again, a
QuerySourceReferenceExpression).

Note how this structure reflects the (mostly) equivalent C# query:

(from e in ntuple
from x in e.inputJets
select x)


.ProjectTo1DHisto(x => x.Pt(), 100, 0.0, 100000.0);

> My guess is I need to put some code in my VisitMainFromClause method


> in my query visitor object. But getting at that expression... I'm not
> sure how. The fromClause.FromExpression is coming in as a
> ConstantExpression. I then have to look at the type, see that it is my
> queriable with a different template argument, somehow type-cast it
> (not sure how to do that when I don't know the type ahead of time),
> etc.

The MainFromClause only refers to the first query source, which, in
your case, is "ntuple". Therefore, you only get a ConstantExpression
(because "ntuple" is a constant value from LINQ's point of view). It's
often enough to know the type of the items represented by the first
query source (via MainFromClause.ItemType), but if it's not (eg.
because "ntuple" has some state that needs to be interpreted), then
you usually define a non-generic interface or base class that your
queryable implements, and which you can cast the ConstantExpression's
Value to.

To analyze the SelectMany, you need to visit the AdditionalFromClause.
Override the VisitAdditionalFromClause method in your visitor, and
you'll see the MemberExpression described above.

I hope this explains what you need, feel free to ask again if it doesn't :)

Regards,
Fabian

Gordon Watts

unread,
Dec 1, 2010, 2:59:15 AM12/1/10
to re-motion Users
Hi,
Interesting. It sounds like I don't have my query provider
configured correctly, as what you are describing is not what happens.

In my query visitor pattern I've basically overridden every single
VIsitorxxx method and set a break point in each one (they all just
call their base class implementation except for the ones I've
implemented). When parsing the lines:

var ntuple = new LINQToTTree.QueriableTTree<BasicNtupleModel>(fi,
"btag");
var h = ntuple.SelectMany(e => e.inputJets).ProjectToHistogram(j =>
j.Pt(), 100, 0.0, 10000);

I see the following order of calls:
1) QueriableTTree<BasicNtupleModel>::QueriableTTree(FileInfo fileinfo,
string treename)
2) QueriableTTree<BasicNtupleModel>::QueriableTTree(IQueryProvider p,
Expression expression). Here provider is the DefaultQueryProvider and
expression is "expression =
{value(SimpleLINQToTTreeFeasability.LINQToTTree.QueriableTTree`1[SimpleLINQToTTreeFeasability.BasicNtupleModel]).SelectMany(evt
=> evt.inputJets, (evt, j) => j)}"
3) QueriableTTree<BasicNtupleModel>::QueriableTTree(IQueryProvider p,
Expression expression). Here provider is the DefaultQueryProvider and
expression is "expression =
{value(SimpleLINQToTTreeFeasability.LINQToTTree.QueriableTTree`1[SimpleLINQToTTreeFeasability.BasicNtupleModel]).SelectMany(e
=> e.inputJets)}"
4) QueryVisitor::VisitMainFromClause (MainFromClause fromClause,
QueryModel qm), with "fromClause = {from BTagJet j in
value(SimpleLINQToTTreeFeasability.LINQToTTree.QueriableTTree`1[SimpleLINQToTTreeFeasability.BTagJet])}"...
so is not just the ntuple, but is the loop...

If in #4 I look at the QueryModel, the MainFromClause is that mess
that is there in #4.

So... this could have to do with how I've declared the objects for
LINQ to parse... So, here is the main object:
class BasicNtupleModel
{
public int run;
public int event_;

public IEnumerable<BTagJet> inputJets;
}

Perhaps I should not be using IEnumerable? The underlying
implemenation is just a sequence of BTagJet objects (and I know its
length, etc.) - so the IEnumerable choice above is totally
arbitrary...

Thanks! - Gordon.

Fabian Schmied

unread,
Dec 1, 2010, 4:25:10 AM12/1/10
to re-moti...@googlegroups.com
Hi,

You're right, the overload of SelectMany you are using is not in the
default parse list of the SelectManyExpressionNode. I'll check the
easiest way to get what you want and will get back to you. Until then,
you can use the other overload that also takes a result selector
(nTuple.SelectMany (e => e.inputJets, (e, x) => x).ProjectTo1DHisto
(...)). This should parse as I explained.

Regards,
Fabian

> --
> You received this message because you are subscribed to the Google Groups "re-motion Users" group.
> To post to this group, send email to re-moti...@googlegroups.com.
> To unsubscribe from this group, send email to re-motion-use...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/re-motion-users?hl=en.
>
>

Gordon Watts

unread,
Dec 1, 2010, 4:49:54 AM12/1/10
to re-moti...@googlegroups.com
?Hmmm.... So two things. First, when I do the following:

var h = (ntuple.SelectMany(e => e.inputJets, (e,x) =>
x).ProjectToHistogram(j => j.Pt(), 100, 0.0, 10000));

The parse occurs exactly the same way. So the problem isn't solved by this.

Also, I stumbled on this because I tried to write, originally, the following
code:

var jets = from evt in ntuple
from j in evt.inputJets
select j;
var h = jets.ProjectToHistogram(j => j.Pt(), 100, 0.0, 200 * 1000);

So I would have thought that this was such a common pattern that re-linq
handled it with out modification (indeed, you mention it several times in
your blog postings). I've done nothing to try to implement a SelectMany
clause (nothing overridden, etc.) - I was expecting an exception to be
thrown on a method call in my ExpressionVisitor method - but it has, of
course, never gotten that far.

I suspect this is my bug, not yours!

Cheers,
Gordon.

Fabian Schmied

unread,
Dec 1, 2010, 10:41:53 AM12/1/10
to re-moti...@googlegroups.com
> ?Hmmm.... So two things. First, when I do the following:
>
> var h = (ntuple.SelectMany(e => e.inputJets, (e,x) =>
> x).ProjectToHistogram(j => j.Pt(), 100, 0.0, 10000));
>
> The parse occurs exactly the same way. So the problem isn't solved by this.

Okay, this is strange, the expression above should parse without any
problems with re-linq's default setup.

First, make sure that the SelectMany method used here is the method:
SelectMany<TSource, TCollection, TResult (IEnumerable<TSource>,
Func<TSource, IEnumerable<TCollection>>, Func<TSource, TCollection,
TResult>) defined either by the System.Linq.Queryable or the
System.Linq.Enumerable classes. (Check this via VisualStudio.)

Second, I remember that you need to register your custom handler with
the MethodCallExpressionNodeTypeRegistry in your LINQ provider. Do
you, by any chance, create that MethodCallExpressionNodeTypeRegistry
yourself? If so, you need to use
MethodCallExpressionNodeTypeRegistry.CreateDefault() (not "new
MethodCallExpressionNodeTypeRegistry"), otherwise, re-linq won't know
how to parse even the simplest queries.

Third, if this is not the case here; can you inspect the following
expression inside a debugger: ((DefaultQueryProvider)
nTuple.Provider).ExpressionTreeParser.NodeTypeRegistry? There should
be about 230 methods registered. One of them (it's at index 193 on my
system) should be the MethodInfo for the SelectMany described above.

> Also, I stumbled on this because I tried to write, originally, the following
> code:
>
> var jets = from evt in ntuple
>          from j in evt.inputJets
>          select j;
> var h = jets.ProjectToHistogram(j => j.Pt(), 100, 0.0, 200 * 1000);
>
>  So I would have thought that this was such a common pattern that re-linq
> handled it with out modification (indeed, you mention it several times in
> your blog postings).

Yes, this should definitely work. Something in your re-linq setup is not right.

If none of the points I've given above leads to the error, I'd ask you
to create a small, compilable sample of the problem and send it to me;
I'll take a look at it.

Regards,
Fabian

Gordon Watts

unread,
Dec 1, 2010, 11:49:38 AM12/1/10
to re-motion Users
Ah ha! Ok - good point. I switched back to using "Count" but changing
nothing else:

var h = (ntuple.SelectMany(e => e.inputJets, (e,x) =>
x).Count());

and now when it calls the VisitMainFromCause it passes in the central
from clause as it sounds like we are expecting. So this boils down to
how I've declared my special operator, it would seem! I've included
the definition below. If nothing jumps out then I'm happy to create a
small project and try to isolate the problem (especially since my
current one is now full of debugging Trace statements, etc.).

public static class ProjectToHistogramOperatorsLINQ
{
public static NTH1 ProjectToHistogram<T>(
this IQueryable<T> source,
Expression<Func<T, double>> histogrammedParameter,
int nBins, double xmin, double xmax
)
{
return source.Provider.Execute<NTH1>(Expression.Call(

((MethodInfo)MethodBase.GetCurrentMethod()).MakeGenericMethod(typeof(T)),
Expression.Constant(source),
Expression.Quote(histogrammedParameter),
Expression.Constant(nBins),
Expression.Constant(xmin),
Expression.Constant(xmax)
));
}
}

Cheers,
Gordon.
> >> {value(SimpleLINQToTTreeFeasability.LINQToTTree.QueriableTTree`1[SimpleLINQ­ToTTreeFeasability.BasicNtupleModel]).SelectMany(evt
> >> => evt.inputJets, (evt, j) => j)}"
> >> 3) QueriableTTree<BasicNtupleModel>::QueriableTTree(IQueryProvider p,
> >> Expression expression). Here provider is the DefaultQueryProvider and
> >> expression is "expression =
>
> >> {value(SimpleLINQToTTreeFeasability.LINQToTTree.QueriableTTree`1[SimpleLINQ­ToTTreeFeasability.BasicNtupleModel]).SelectMany(e
> >> => e.inputJets)}"
> >> 4) QueryVisitor::VisitMainFromClause (MainFromClause fromClause,
> >> QueryModel qm), with "fromClause = {from BTagJet j in
>
> >> value(SimpleLINQToTTreeFeasability.LINQToTTree.QueriableTTree`1[SimpleLINQT­oTTreeFeasability.BTagJet])}"...
> >http://groups.google.com/group/re-motion-users?hl=en.- Hide quoted text -
>
> - Show quoted text -

Gordon Watts

unread,
Dec 1, 2010, 12:46:39 PM12/1/10
to re-motion Users
Oh, and the parser which I should have included. I did my best to
pattern this off the implementation in your blog and of teh Count
operator in the re-linq source. Thanks again for your help!

class ProjectToHistogramExpressionNode :
ResultOperatorExpressionNodeBase
{
public static readonly MethodInfo[] SupportedMethods = new[]
{

typeof(LINQToTTree.ResultOperators.ProjectToHistogramOperatorsLINQ).GetMethod("ProjectToHistogram")
};

public
ProjectToHistogramExpressionNode(MethodCallExpressionParseInfo
parseInfo,
LambdaExpression toHistogram, ConstantExpression bins,
ConstantExpression xmin, ConstantExpression xmax)
: base(parseInfo, null, null)
{
ParameterToHistogram = toHistogram;
NBins = bins;
XMin = xmin;
XMax = xmax;
}

protected override ResultOperatorBase
CreateResultOperator(ClauseGenerationContext clauseGenerationContext)
{
var resolvedToHistogram = Source.Resolve(
ParameterToHistogram.Parameters[0],
ParameterToHistogram.Body, clauseGenerationContext);
return new
ProjectToHistogramResultOperator(resolvedToHistogram, NBins, XMin,
XMax);
}

/// <summary>
/// We don't have a stream of data coming out, so we should
never be called - throw hard if we are!
/// </summary>
/// <param name="inputParameter"></param>
/// <param name="expressionToBeResolved"></param>
/// <param name="clauseGenerationContext"></param>
/// <returns></returns>
public override Expression Resolve(ParameterExpression
inputParameter,
Expression expressionToBeResolved,
ClauseGenerationContext clauseGenerationContext)
{
throw new NotImplementedException();
}

public LambdaExpression ParameterToHistogram { get; set; }

public ConstantExpression NBins { get; set; }
public ConstantExpression XMin { get; set; }
public ConstantExpression XMax { get; set; }
}


and the result operator:
class ProjectToHistogramExpressionNode :
ResultOperatorExpressionNodeBase
{
public static readonly MethodInfo[] SupportedMethods = new[]
{

typeof(LINQToTTree.ResultOperators.ProjectToHistogramOperatorsLINQ).GetMethod("ProjectToHistogram")
};

public
ProjectToHistogramExpressionNode(MethodCallExpressionParseInfo
parseInfo,
LambdaExpression toHistogram, ConstantExpression bins,
ConstantExpression xmin, ConstantExpression xmax)
: base(parseInfo, null, null)
{
ParameterToHistogram = toHistogram;
NBins = bins;
XMin = xmin;
XMax = xmax;
}

protected override ResultOperatorBase
CreateResultOperator(ClauseGenerationContext clauseGenerationContext)
{
var resolvedToHistogram = Source.Resolve(
ParameterToHistogram.Parameters[0],
ParameterToHistogram.Body, clauseGenerationContext);
return new
ProjectToHistogramResultOperator(resolvedToHistogram, NBins, XMin,
XMax);
}

/// <summary>
/// We don't have a stream of data coming out, so we should
never be called - throw hard if we are!
/// </summary>
/// <param name="inputParameter"></param>
/// <param name="expressionToBeResolved"></param>
/// <param name="clauseGenerationContext"></param>
/// <returns></returns>
public override Expression Resolve(ParameterExpression
inputParameter,
Expression expressionToBeResolved,
ClauseGenerationContext clauseGenerationContext)
{
throw new NotImplementedException();
}

public LambdaExpression ParameterToHistogram { get; set; }

public ConstantExpression NBins { get; set; }
public ConstantExpression XMin { get; set; }
public ConstantExpression XMax { get; set; }
}

Fabian Schmied

unread,
Dec 2, 2010, 6:07:01 AM12/2/10
to re-moti...@googlegroups.com
Hi,

The problem is in the extension method declaration:

>    public static class ProjectToHistogramOperatorsLINQ
>    {
>        public static NTH1 ProjectToHistogram<T>(
>            this IQueryable<T> source,
>            Expression<Func<T, double>> histogrammedParameter,
>            int nBins, double xmin, double xmax
>            )
>        {
>            return source.Provider.Execute<NTH1>(Expression.Call(
>
> ((MethodInfo)MethodBase.GetCurrentMethod()).MakeGenericMethod(typeof(T)),
>                Expression.Constant(source),
>                Expression.Quote(histogrammedParameter),
>                Expression.Constant(nBins),
>                Expression.Constant(xmin),
>                Expression.Constant(xmax)
>                ));
>        }
>    }

You specify "Expression.Constant (source)". That way, you turn the
first part of the query into a ConstantExpression (which is what you
see in your "broken" MainFromClause).
Specify "source.Expression" instead to build a tree that includes the
first part's expression tree in its original form.

(And, of course, it's my fault: my blog post does this incorrectly.
Sorry for the inconvenience, I've updated the post.)

Regards,
Fabian

Gordon Watts

unread,
Dec 3, 2010, 1:34:07 AM12/3/10
to re-motion Users
Fantastic! Thanks! That fixed my problem right away. I had copied that
bit from your blog post and I'd not cross-checked the form of the
Count operator!

BTW, on your blog post, I'd get rid of the inline "bug fix" as it
really gets in the way of reading the code. If you do want to call
attention to the fact that you did the update, why not just stick a
"Update" at the end and say what changed...

Finally, I'm slowly starting to understand some bigs.
ConstantExpression is good for anythign that is outside the query. So
a parameter (like "100") that is passed in. Further, it is fine if
that constant is going to be a variable - that will be correctly
resolved and turned into a number. Expression, on the other hand, can
point back to another item in the query - somethign that needs further
resolution after the LINQ query is in progress. And, I guess, it
always needs to be translated, right?

re-linq, btw, has allowed me to get started on this much faster than I
thought possible. Thanks to everyone who contributed to it. I'll back
back to my LINQ->C++ translation now. :-)

Cheers, Gordon.
> Fabian- Hide quoted text -

Fabian Schmied

unread,
Dec 3, 2010, 7:08:45 AM12/3/10
to re-moti...@googlegroups.com
> Finally, I'm slowly starting to understand some bigs.
> ConstantExpression is good for anythign that is outside the query. So
> a parameter (like "100") that is passed in. Further, it is fine if
> that constant is going to be a variable - that will be correctly
> resolved and turned into a number.

Exactly.
In addition, re-linq will also detect precalculable expressions and
stick them into ConstantExpressions. For example, if your write
"int.TryParse ("123")" into an expression, this will be replaced with
a ConstantExpression holding the integer 123.

> Expression, on the other hand, can
> point back to another item in the query - somethign that needs further
> resolution after the LINQ query is in progress. And, I guess, it
> always needs to be translated, right?

Expression is only a base class, there are a lot of different concrete
expressions. There are MemberExpressions, MethodCallExpressions, and
many many more. re-linq uses QuerySourceReferenceExpressions to "point
back" to an item in the query.

If you want a "complete" LINQ provider, you must translate all
standard expression types to your target query system. This is often
overkill, though. If you can, define what expressions you want to
support, and implement support for just these.

The usual way to translate expressions would be to implement a class
derived from ExpressionTreeVisitor (or ThrowingExpressionTreeVisitor)
and override the respective Visit... methods. Then, you can call
visitor.VisitExpression () to have the visitor analyze the expression
and call the respective Visit method.

> re-linq, btw, has allowed me to get started on this much faster than I
> thought possible. Thanks to everyone who contributed to it. I'll back
> back to my LINQ->C++ translation now. :-)

Great, and good luck with your translation.

Fabian

Gordon Watts

unread,
Dec 3, 2010, 1:43:32 PM12/3/10
to re-moti...@googlegroups.com
?Hi,
Thanks. I'm already a big fan of ThrowingExpressionTreeVisitor. I have a
list of use cases and I create tests for them and basically just keep going
solving thrown exceptions from my visitor class until all my use cases pass.

I meant to ask - why not take this same approach of doing a
ThrowingQueryVisitor base class? Also, I see the sub expression flattener is
done as a base class - why not a mix-in or similar? Also, I'm starting to
notice that my ExpressionVisitor and QueryVisitor classes are becoming
huge - I was thinking of starting to add some MEF powered look-ups to
implement things like all the ResultOperators.

Cheers,
Gordon.


-----Original Message-----
From: Fabian Schmied
Sent: Friday, December 03, 2010 4:08 AM
To: re-moti...@googlegroups.com
Subject: Re: [re-motion-users] Re: nested from clauses

Fabian

--

Fabian Schmied

unread,
Dec 6, 2010, 11:15:47 AM12/6/10
to re-moti...@googlegroups.com
>  I meant to ask - why not take this same approach of doing a
> ThrowingQueryVisitor base class?

You're right, that would definitely be useful; I've added a feature
request (though I can't say when we'll be able to implement it).

> Also, I see the sub expression flattener is
> done as a base class - why not a mix-in or similar?

We're not currently using mixins in re-linq (because re-linq should be
stand-alone, without a reference to the rest of re-motion).
The flattener isn't meant as a base class, though; more as an
additional step: you first have the QueryModel accept the flattener,
then your own visitor.

> Also, I'm starting to
> notice that my ExpressionVisitor and QueryVisitor classes are becoming huge
> - I was thinking of starting to add some MEF powered look-ups to implement
> things like all the ResultOperators.

If you take a look at our SQL Backend, we have a similar concept in
there (<https://svn.re-motion.org/svn/Remotion/trunk/Remotion/Data/Linq.SqlBackend/SqlPreparation/ResultOperatorHandlers/>).
Feel free to do it the same way :)

Fabian

Reply all
Reply to author
Forward
0 new messages