Transforming a QueryModel

51 views
Skip to first unread message

Gordon Watts

unread,
Jan 5, 2014, 3:08:22 PM1/5/14
to re-moti...@googlegroups.com
Hi,
  My queries are now complex enough that I’m running into a bug whose solution I think requires me to transform a QM before I run it through my regular QM transformation.

  I support anonymous types, tuples, and, generally, any object that has member initialization in {}. In short, if you were to write new Tuple<int, int>() {Item1 = 5, Item2=10}.Item1, it would correctly figure out that you meant 5. You can also use these in queries - so you could do something like: (for int j in list select new Tuple<int, int>() {Item1=j, Item2=j+1}).Where(l => l.Item2 > 10).Select(l => l.Item1), and it will correctly transform that into from j in list where j+1 > 10 select j. This is a big help to keeping track of rather complex queries in the code that uses my library.

  However, under certain circumstances, when I cut on Item2, but select Item1, after having done something about the ordering of the list dependent on Item1, I run into a scoping problem with my translation - basically, a query reference is no longer valid. This is, in a sense, a limitation of my translation code, I believe.

  To fix it, it occurs to me that these new Tuple<…> are just place holders. They have no actual meaning in the code that is finally emitted. I should be able to transform a QueryModel totally so that I can eliminate these temporary objects.

  I’ve been using the expression transformer objects to transform things in my expressions, but now I need to do something that travels between expressions and QM. I don’t see direct support for that… do I just write a new QM that transforms each clause as it is processed?

  Many thanks!

    Cheers,
        Gordon.

Gordon Watts

unread,
Jan 5, 2014, 8:40:23 PM1/5/14
to re-moti...@googlegroups.com
Hi,
  Perhaps an example is in order. There is some C# code that runs over a simple set of objects, creates a bunch of “complex” anonymous objects, and then looks at the query model: https://gist.github.com/gordonwatts/8276879

  (sorry, it was very ugly when I pasted it into my email).

  The difficulty arises, I think, when the object has to cross sub-query boundaries. I’d like to see if I can’t take that out. It should be possible, as that is what we are doing in the end when pushing this into code.


    Cheers,
        Gordon.

--
You received this message because you are subscribed to the Google Groups "re-motion Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to re-motion-use...@googlegroups.com.
To post to this group, send email to re-moti...@googlegroups.com.
Visit this group at http://groups.google.com/group/re-motion-users.
For more options, visit https://groups.google.com/groups/opt_out.

Fabian Schmied

unread,
Jan 8, 2014, 5:22:55 AM1/8/14
to re-moti...@googlegroups.com
Hi Gordon,

There are essentially two extension points where you can
transform/simplify your queries. One is to manipulate the QueryModels
after they have been created by re-linq, one is to transform the
expression tree before (or while) they are parsed by re-linq.

Manipulating QueryModels is easy, you just do this in your executor
after you get them from the QueryParser.
Manipulating the expression tree can, e.g., be done by decorating
(wrapping) the QueryParser and running a visitor over the expression
in GetParsedQuery before the inner QueryParser is called.

Whether you want to manipulate query models or expression trees
depends on how much metadata you need. In the Query Model form,
re-linq has already identified sub-queries, associated query source
references with the actual sources, and structured the query in
from/where/orderby/select and result transformations. In the
expression form, it's still the complex AST given by the compiler. In
general, I'd suggest working on the QueryModel unless important
information has already been removed or the expression transformation
is extremely local (the typical ExpressionTransformer use case).

In our own re-linq backend, the SQL generator, we have a third option.
Since we translate the QueryModels to yet another model, the
SqlStatementModel, we can perform simplifications while we're doing
this translation resp. on the SqlStatementModel itself. In fact, this
is the option I usually chose in there rather than changing the
QueryModel.

For more specific ideas about your scenario, it would probably be good
to know what simplification exactly you would want to perform. I.e.,
given a query/Query Model; how would you want it to look after your
simplification?

This is probably more or less the query model from your gist (call
ToString on the QM to get a string representation, then add line
breaks and indentation):

(from m in (
from d in q
select new
{
matches = ( from i in [d].run1 select new { pt = [i] } ) // QM 3
}
) // QM 2
from j in [m].matches
select [j].pt
=> Sum
) // QM 1

re-linq might simplify this a little bit more, I'm not sure if QM 2 is
inlined. But let's pretend it isn't for the sake of the argument ;)

Feeding this into your elimination algorithm, what would you like it to become?

Best regards,
Fabian

Gordon Watts

unread,
Jan 10, 2014, 5:41:05 PM1/10/14
to re-moti...@googlegroups.com
Hi Fabian,
  Sorry to have not responded earlier. I had to spend some time pulling apart the QueryModel that is causing me trouble. Here it is, nicely formatted:

from TestTranslatedNestedCompareAndSortHolder mjj in
 {from TestTranslatedNestedCompareAndSortHolder mj in
  {from subNtupleObjects1 j in [evt].jets
   orderby [j].v3 desc
   select new TestTranslatedNestedCompareAndSortHolder() {
    jet = [j],
     track = {[evt].tracks => First()}
   }
  }
  orderby [mj].track.v6 asc
  select [mj]
 }
 where ([mjj].jet.v3 > 60)
 select [mjj]
 => AnyResultOperator()

(if the formatting is awful, just look at the gist, and apologies in advance). There are some obvious optimizations you could do to this query, however, ignore them - this is what is left over from a much more complex query - I kept cutting it down until I had the minimum amount of code to repro the bug.

 I am translating my queries to C++, and the way I have done it so far I’ve been able to avoid having to cache intermediate results. I have basically just added another layer in a loop, and been able to access all the elements on the fly.

 However, here, the fact that I orderby [mj].track.v6 and then later on cut on [mjj].jet.v3 means really do need that TestTranslatedNestedCompareAndSOrtHodler object to be saved. I have basically looped over the jets to order the objects already, and then I pay attention to the tracks. By the time I’m ready to pay attention to the jets again my previous jet results have gone out of scope. However, since it is a query reference, my code tries to use it. What I need to do is create the intermediate results, and use those. I can’t get away from using it some other way, I think, when we have two objects like that linked.

    Cheers,
        Gordon.

Gordon Watts

unread,
Jan 11, 2014, 12:19:08 PM1/11/14
to re-moti...@googlegroups.com
Hi,
  Yesterday evening I finally had about 3 hours of quiet time where I could really look at this carefully. In the end, my current translation method works just fine - it had a bug. As you can see from below, there is an implicit connection between the track and the jet. My translation code was not tracking that linkage, though another part of the code was expecting it to. So, now that I figured that out, it was a fairly “trivial” fix.

  In general, though the LINQ query may contain these temporary objects (like TestTranslatedNestedCompareAndSortHolder below), my translation code skips those object - so those objects never appear in the C++ code that is generated as a result of the translation of the LINQ query. In general re-linq does a good job of removing those temporary objects. However, it fails when a reference query is a sequence of objects (as it is below). So my translation code has to track this… and that was where things went south. It would be nice if I could implement the temporary class removal in the LINQ query because then my optimizer could ignore the classes, but I’ve now convinced myself that it can’t happen until the queries are flattened at the very end (or it is a lot more work than it is probably worth).
Reply all
Reply to author
Forward
0 new messages