Using Concat

23 views
Skip to first unread message

Gordon Watts

unread,
Feb 19, 2016, 2:05:20 PM2/19/16
to re-moti...@googlegroups.com

Hi,

  My re-linq backend has been doing a lot for me (thanks, btw). But I am now in a situation where I’d really like to be able to write something like the following:

 

                IQueryable<recoTree> r1, r2;

 

                r1.Concat(r2).Plot(…);

 

  My code is quite happy to do “r1.Plot(…)” and “r2.Plot(…)”. Of course, for many reasons, really doing the Concat there would not work generally.

 

  However, in my case, I know how to add the result of r1.Plot() to r2.Plot(). So, I’d like to make, I suppose, the transform “r1.Concat(r2).Plot(…)” => “r1.Plot(…) + r2.Plot(…)”

 

  I’m just staring to think about how to approach this, or how hard it would be in the re-linq framework. The main issue that I’m stumbling on is I’m turning what looks like one query and processing cycle in re-linq into two.

 

  As a backup I could surface that “addition” in all places, there it will require a major change in my approach to using linq, and also it would end up looking very non-linq’y. 😊

 

  Advice, or comments, or ideas on how to achieve this pseudo-Concat operator in re-linq?

 

  Many thanks!

 

                Cheers,

                                Gordon.

Michael Ketting

unread,
Feb 22, 2016, 2:50:36 AM2/22/16
to re-motion Users
Hello Gordon!

I've thought this over a bit and had to come to the conclusion that I don't get enough of the scenario you're going for to make a good suggestion. To start with a completely new notations, it looks like you're trying to start out with Plot ( Concat (r1, r2) ) and are trying to get Concat ( Plot (r1), Plot (r2) ). So, you're basically inverting the nesting. And it is important that you know that you are concatenating two plot-results instead of just two sequences, right?

Is adding two plot-results a standard usecase or one out of many add-operations? I'm asking because you could do a PlotQueryable.Concat(PlotQueryable) and handle this one specifically if its a very special situation. Of course, when you have many similiar cases, you'll need a more generic approach.

In SQL backends, a situation like this, where the query-structure is changed to invert some nestings, this would be a query optimization and must usually be implemented with a very specific start- and endpoint in mind and based on detecting a specific pattern in the expression tree.

Does that help?
Best regards, Michael

Gordon Watts

unread,
Feb 22, 2016, 3:45:26 AM2/22/16
to Michael Ketting, re-motion Users

Hi Michael,

  Thanks a lot for thinking about this. What you said might help, but I’m not sure yet. 😊 Let me try a longer explanation of what I’m thinking. And along the way try to answer some of the questions in your email.

 

  My re-linq provider is attached to a file. So all the data from that file is represented by one IQueryable<>. Now, I have multiple files, and I would like to stich them together into a single sequence, and send that sequence to my “Plot” result operator.

 

  I have a functioning library based on re-linq that will allow me to send the sequence into the Plot result operator.

 

  Re-linq can’t concat arbitrary sequences from different IQuerable data sources for, I think, fairly obvious technical reasons. In my case, the LINQ sequence is converted to C++, and then run in custom analysis software against the data in the file. At some level, combining the files makes no sense.

 

  However, combining the results makes a lot of sense. In this case it is a plot, and I can just “Add”. However, in other cases, it might be something else – like an integer, or a special kind of plot. Whatever, I, as the end user, have to provide the addition semantics.

 

  So perhaps the proper way to represent is to change from Plot (Concat(r1, r2)) to Add(Plot(r1), Plot(r2)).

 

  Now, your suggestion of doing PlotQUeriable.Concat(PlotQUeriable). I’d be happy to handle that – but I’m not sure how to do it in the relinq infrastructure. Re-linq currently fails with a complaint it doesn’t know how to do the method call Concat (which makes sense). But if I extend re-linq I’m not really sure how I would handle that. This is because Plot(Concat(r1, r2)) looks to relinq like a single query, and I need to transform it into a two queries: Plot(r1) and Plot(r2), and then add the results.

 

  Just to make this more interesting… Actually running over two files is currently supported by my provider. I just hand the executor a list of files. What I want to do here is take the sequence r1, add some meta data to it, do the same for r2, and then run Plot on it – and the result of Plot would depend on that. Specifically, the data in the two files have different “weights”, so each bit of data in r1 is worth twice the bit of data in r2. Some pseudo code:

 

   IQueriable<recoTree> r1, r2 = ….;

   var r1w = r1.Select(r => new { Data = r.value, Weight = 1.0 });

   var r2w = r2.Select(r => new { Data = r.value, Weight = 0.05 });

   r1w.Concat(r2w).Plot(…, s => s.Data, s => s.Weight).SaveToFile(myfile);

 

  I can write currently, and it builds and runs:

 

   IQueriable<recoTree> r1, r2 = ….;

   var r1w = r1.Select(r => new { Data = r.value, Weight = 1.0 });

   var r2w = r2.Select(r => new { Data = r.value, Weight = 0.05 });

   var p1 = r1w.Plot(…, s => s.Data, s => s.Weight);

   var p2 = r1w.Plot(…, s => s.Data, s => s.Weight);

   p1.Add(p2).SaveToFile(myfile);

 

  Being able to pass the Concat’ sequence into my more complex methods and routines, without having to surface that addition everywhere, would make the code significantly cleaner. So I want to be able to write the first block of code instead of the second block.

 

  You mention code the detects a fairly specific start and end point. I have no problem authoring that code in my re-linq backend – I already do some fairly complex transformations (I support tuples, anonymous classes, real classes – none of which actually makes it into the C++ code, along with bridging the .NET and C++ world). But, I just can’t conceptually figure out where to intercept the query in the re-linq processing pipe-line to scan for Concat’s, and then split it into multiple queries, which I then take the results form and add.

 

  The best I could come up with was I needed to write a new re-linq provider. I’d write a new Concat function (“Stitch” 😊) that would then perform two different queries and add them together. That is a fair amount of work, so I thought I’d see if there was another part of the pipeline I should be thinking about.

 

  I hope this makes what I want to do more clear.

 

Cheers,
Gordon

--
You received this message because you are subscribed to the Google Groups "re-motion Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to re-motion-use...@googlegroups.com.
To post to this group, send email to re-moti...@googlegroups.com.
Visit this group at https://groups.google.com/group/re-motion-users.
For more options, visit https://groups.google.com/d/optout.

fabian....@gmail.com

unread,
Feb 22, 2016, 10:31:25 AM2/22/16
to re-moti...@googlegroups.com
Hi Gordon,

If you treat an IQueryable as an IEnumerable (e.g., by casting it or by calling ".AsEnumerable()"), you're effectively "ending" the current query and continuing in memory. E.g., if you call "r1.AsEnumerable().Concat(r2)", the Concat will sit outside your two queries and be executed in memory. Similarly, your Add method could take parameters of type IEnumerable and thus be run in memory on the query results.

Does this help you?

Best regards,
Fabian

Von meinem Windows Phone gesendet

Von: Gordon Watts
Gesendet: ‎22.‎02.‎2016 09:45
An: Michael Ketting; re-motion Users
Betreff: RE: [re-motion-users] Re: Using Concat

Gordon Watts

unread,
Feb 22, 2016, 11:19:25 AM2/22/16
to fabian....@gmail.com, re-moti...@googlegroups.com

Hmmm… I don’t think so, because I want to be able to do things like this:

 

   var r1w = r1.Select(r => new { Data = r.value, Weight = 1.0 });

   var r2w = r2.Select(r => new { Data = r.value, Weight = 0.05 });

   var r = r1w.Concat(r2w);

   r.Where(m => m.value > 2.0).Plot(…, s => s.Data, s => s.Weight).SaveToFile(myfile);

 

In short – I want to apply a Where or other transformations like Select, etc., to the stream.

 

Also, doing something like “r1.AsEnumerable()” will generate a sequence in memory, as you say, however, that would be a bit of a disaster for a few reasons. First, some of these data streams are many GBs right now (and will increase), and second, the back end is distributed and remote, so pulling the data back over the network may not be very fast. However, the “.Plot” guy generates a very small bit of data (a binned histogram, a 20-40 KB) from 100’s of GB of data.

 

Am I making sense?

 

Cheers,
Gordon

Michael Ketting

unread,
Feb 23, 2016, 3:06:56 AM2/23/16
to re-motion Users, fabian....@gmail.com
Hi Gordon!

Since we can rule out Fabian's suggestion from the look of it, back to splitting queries (and thanks for the details, they helped with getting an idea how to start :) ):

Would the queries always be semantically equivalent, if you could interpret Concat as a split-in-two-and-add-at-the-end operation? What I'm getting at is, If you (actually, the linq-provider code) find Concat and just drop the entire other half of the input first for the left, then for the right side and concatenated only at the end of the query, this would give you the same results, correct?

You can override the semantics of Concat or just introduce a new extension method via MethodInfoBasedNodeTypeRegistry.CreateFromRelinqAssembly() and just register a new method or override an existing one. For instance:
Register an AddSequencesExpressionNodeType, for ease, use the same arguments as ConcatExpressionNodeType and provide a matching extension method similiar to Concat (or, just replace the registration for Concat). The AddSesequencesExpressionNodeType will then create an AddSequencesResultOperator.

Then when the QueryModel is created from the expression tree via QueryParser, you get a QueryModel with your AddSequencesResultOperator in the ResultOperators collection. Pretty standard till now. The interesting part is, that you get a SubQueryExpression as AddSequencesResultOperator.Source2 which will contain the second query source.

Now, you can start copying/manipulating the QueryModel to strip out the AddSequencesResultOperator and replacing the original QuerySource (QueryModel.MainFromClause) with the one from AddSequencesResultOperator.Source2 when you create the copy. This should leave you with two QueryModels that represent the queries as if you had written them separately. Finally, you can put them back into a new QueryModel using the first QueryModel as MainFromSource and the second QueryModel as ConcatResultOperatorHandler.Source2 (yes, I switched here from the AddSequenceResultOperator back to the ConcatResultOperatorHandler).

Disclaimer: I haven't written the code for this, so I might have missed some translation bit or other, but re-linq should complain when you do something incompatible. Oh, and you need to watch out for QuerySourceReferenceExpressions, since they hold a reference to the original QueryModel (see ReferenceReplacingExpressionVisitor and QuerySourceMapping).

Best regards, Michael
To unsubscribe from this group and stop receiving emails from it, send an email to re-motion-users+unsubscribe@googlegroups.com.
To post to this group, send email to re-motion-users@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "re-motion Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to re-motion-users+unsubscribe@googlegroups.com.
To post to this group, send email to re-motion-users@googlegroups.com.

Visit this group at https://groups.google.com/group/re-motion-users.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "re-motion Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to re-motion-users+unsubscribe@googlegroups.com.
To post to this group, send email to re-motion-users@googlegroups.com.

Gordon Watts

unread,
Feb 23, 2016, 3:21:19 AM2/23/16
to Michael Ketting, re-motion Users, fabian....@gmail.com

Hi,

  Clever!

 

  Yes, you can always look at them as split queries – so you can do exactly as you suggest – at each concat, drop one half of the query, then the other, and then add the two together.

 

  Creating a new method with the same semantics as Concat was something I’d thought of as well, and it is easy to capture that during re-linq processing (I do it with lots of other things in my provider, actually). But I’d not thought to make that a result operator. Especially because I’d wanted re-linq to continue process the query. I’d not thought about making it a result operator. But that sounds like it would do what I want.

 

  But… I’m having trouble with the implied code behind this statement:

 

Ø  This should leave you with two QueryModels that represent the queries as if you had written them separately

 

Once I have the two Query Models, in the middle of my processing, how do I get all the re-linq power to actually process them? Is it as “simple” as calling my query executor with the new QM, or is there some context I need to get setup?

 

Many thanks for your help!!

 

Cheers,
Gordon

 

P.S. You note that you switched back to the concat semantics in your last paragraph. Was there a reason for that? If there was, I missed the significance.

To unsubscribe from this group and stop receiving emails from it, send an email to re-motion-use...@googlegroups.com.
To post to this group, send email to re-moti...@googlegroups.com.

Michael Ketting

unread,
Feb 23, 2016, 3:46:46 AM2/23/16
to re-motion Users, michael...@rubicon.eu, fabian....@gmail.com
Hi Gordon!

Glad you like :)

Okay,  maybe it's easier from where I'm coming from (and without the nasty bits of reality stuck on it): You rebuild the entire QueryModel with the "lifted-out" concat and then have a model that looks like if you had originally written it like this (all the processing stuff first, then the Concat as a last step). Of course, in my simple world, you then go and do stuff with the model like you'd always do. That's also why I switched back to Concat to make it obvious that we're now talking about a regular Concat, nothing special to see. And this Concat is fed from your two query sources that ran on the big files somewhere in the wild.

I guess, the question is, do you interpret the QueryModel at execution time or do you transform the entire thing into something C++ based and then let this new program run ? From what I remember, you go the second route, correct? There, I figured that the processing would already be handled. Am I missing something here?

And you generate code like this from the QueryModel (written in C# for simplicity's sake):

var result1 = await ProcessQueryModel1();
var result2 = await ProcessQueryModel2();
var result = result1 + result2;

Best regards, Michael


On Tuesday, February 23, 2016 at 9:21:19 AM UTC+1, Gordon Watts wrote:

Hi,

  Clever!

Gordon Watts

unread,
Feb 23, 2016, 4:15:29 AM2/23/16
to Michael Ketting, re-motion Users, michael...@rubicon.eu, fabian....@gmail.com

Hi,

  Ha! Ok – I think I’m making this harder than it should be: I know a few corners of re-linq well, and, obviously not the others.

 

  I have not thought a great deal about how to run a QM through my code to turn it into C++ code. But I use your complete infrastructure of QM visitors and expression visitors, etc., in order to effect that. But I think I copied from some example project how I start from a QM at the top (that re-linq hands to me). I’ve obviously forgotten about some code I wrote at the very start of this project (when, frankly, I probably didn’t know enough to understand the role of the code I was copying).

 

  Ok, I’ve got enough to try this out. I will probably tackle this later this week and I will post back with results or questions.

 

  Again, thanks for your help!

--
You received this message because you are subscribed to the Google Groups "re-motion Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to re-motion-use...@googlegroups.com.
To post to this group, send email to re-moti...@googlegroups.com.

Gordon Watts

unread,
Feb 23, 2016, 4:26:52 PM2/23/16
to Michael Ketting, re-motion Users, fabian....@gmail.com

Hi,

  I’ve written a QueryVisitor that walks through the QM and splits it. There are some special cases I haven’t tested yet, and the code needs some cleaning (I re-factored at the wrong point). But these will get fixed. The basic idea is there. And it is much shorter than I’d thought it was going to be.

 

  If you feel up to it, you can take a quick look and see if what I’m doing basically matches what you had in mind:

 

                https://github.com/gordonwatts/LINQtoROOT/commit/add1fb31a8fcd4faeb0974839ec73cb8b68ceff5

 

  Look for the ConcatSplitterQueryVisitor.cs file to see the actual QM visiting and splitting code. Just above is the unit test file that drives it. Comments, obviously, welcome. I’ve updated and modified expressions in QueryModels, but I’ve never actually changed the structure of a QM before. Comments on style as well as correctness are definitely welcome! 😊

 

  My next step is to integrate it into the execution environment. That will be the ultimate test to see if I got a Query reference source out of place. But that is for later this week – as I have quite a bit of infrastructure around that and so I must proceed a little more carefully (it will also not be interesting to anyone on this list).

 

Cheers,
Gordon

 

From: Michael Ketting
Sent: Tuesday, February 23, 2016 9:06 AM
To: re-motion Users
Cc: fabian....@gmail.com
Subject: Re: [re-motion-users] Re: Using Concat

 

Hi Gordon!
--
You received this message because you are subscribed to the Google Groups "re-motion Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to re-motion-users+unsubscribe@googlegroups.com.
To post to this group, send email to re-motion-users@googlegroups.com.
Visit this group at https://groups.google.com/group/re-motion-users.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "re-motion Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to re-motion-users+unsubscribe@googlegroups.com.
To post to this group, send email to re-motion-users@googlegroups.com.
Visit this group at https://groups.google.com/group/re-motion-users.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "re-motion Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to re-motion-users+unsubscribe@googlegroups.com.
To post to this group, send email to re-motion-users@googlegroups.com.
Visit this group at https://groups.google.com/group/re-motion-users.
For more options, visit https://groups.google.com/d/optout.

Michael Ketting

unread,
Feb 24, 2016, 2:13:15 AM2/24/16
to re-motion Users
 And it is much shorter than I’d thought it was going to be.
:)

Oooh, code! will take a peek later/or possibly over the weekend but I've put this on my stack :)

Glad I could help!
Michael

Gordon Watts

unread,
Feb 24, 2016, 5:17:20 AM2/24/16
to Michael Ketting, re-motion Users

Thanks!

 

By then it will have been cleaned and updated as I better understand the use cases I have to handle, so after clicking on that link, go to the actual file, rather than that change set.

 

And no worries. Weekends are meant for enjoyment, not looking at stranger’s code! 😊

 

Cheers,
Gordon

 

From: Michael Ketting
Sent: Wednesday, February 24, 2016 8:13 AM
To: re-motion Users
Subject: Re: [re-motion-users] Re: Using Concat

 

 And it is much shorter than I’d thought it was going to be.
Reply all
Reply to author
Forward
0 new messages