Re: [re-motion-users] Cached/indexed linq provider help

43 views
Skip to first unread message

Fabian Schmied

unread,
Dec 19, 2012, 11:00:58 AM12/19/12
to re-moti...@googlegroups.com
Hi,

> Hi there. I am trying to make a Linq provider, similar to LinqToObjects with
> two performance improvement features - caching, and indexed search.
>
> I'm a noob to this, so I am hoping for a couple of pointers, and help with
> my current roadblock if possible. I’ve got this stuff in progress, but would
> appreciate being told if I am walking down the wrong path.

One caveat upfront: re-linq is very well-suited for systems that need
to translate a LINQ query into a different query
language/specification (e.g., SQL, HQL, ROOT code, etc.). It is less
well-suited for executing queries in memory, such as LinqToObjects
does. The reason for this is that re-linq takes the expression tree
representing the query and transforms it into a different data
structure (QueryModel). While you could quite easily compile the
original expression tree of a LINQ query into something executable,
the QueryModel does not offer any support for being compiled to an
executable in-memory query.

It is possible to use re-linq to implement an in-memory LINQ provider,
but it might just be that you have to do a lot yourself.
Another approach would be to use (and maybe adapt) parts of re-linq,
e.g., the partial evaluator, and ignore other stuff, e.g., the
QueryModel.

> Caching: my plan is to create an IQueryable such that the results of
> GetEnumerator are cached in an internal dictionary, based on a cachekey
> built from the expression, and use any referenced INotifyPropertyChanged or
> ObservableCollections.to invalidate the hash entry on changes.
>
> My understanding of this is that I need to
>
> a/ visit all ConstantExpressions to see if they are instances of
> INotifying.. or ObservableCollections. I have this working by overriding
> VisitConstantExpression
>
> b/ expand SubQueries so that calling ToString on the Expression results in
> unique cache keys. I was going to do this using the PartialEvaluation from
> http://petemontgomery.wordpress.com/2008/08/07/caching-the-results-of-linq-queries/
> but that doesn’t use Re-Linq, which I think I need for point c/, so I might
> have to re-implement it as a Re-Linq visitor.

re-linq also has a partial evaluator
(PartialEvaluatingExpressionTreeVisitor), but you might need to tweak
it a bit: currently, the evaluator stops when it finds a query
operator (because that should be represented in the QueryModel). You,
OTOH, want to evaluate query operators (within subqueries).

> c/ visit the subqueries in order to find the INotifyPropertyChanged etc. Am
> stuck here really - what's the best way to do this?

Hm, can you give a concrete query and describe what exactly you mean?
Then we can use the example to discuss how to best solve your issue.

> Indexing: For indexed queries, I am planning to use a Linq to Lucene
> implementation with an in-memory Lucene store, and index the collection on a
> call to .ToIndexed(). Then I would pass any expressions into the internal
> Linq to Lucene provider. Sounds erlatively simple compared to the caching!

I'm not sure I understand how the actual indexing should work. Maybe
you could again give an example to illustrate.

> I’m a bit worried I am biting off more that I can chew, as I am pulling bits
> from blogs here and there, so I would love to know if these approaches seem
> sound(ish)! thanks

The hardest in building a LINQ provider is usually the translation of
the query intent into another query language. Building an in-memory
LINQ provider may not be as hard as that, but it's still not an easy
thing to do.

Best regards,
Fabian

On Wed, Dec 19, 2012 at 11:33 AM, Harry McIntyre <mcint...@gmail.com> wrote:
> Hi there. I am trying to make a Linq provider, similar to LinqToObjects with
> two performance improvement features - caching, and indexed search.
>
>
>
> I'm a noob to this, so I am hoping for a couple of pointers, and help with
> my current roadblock if possible. I’ve got this stuff in progress, but would
> appreciate being told if I am walking down the wrong path.
>
>
>
> Caching: my plan is to create an IQueryable such that the results of
> GetEnumerator are cached in an internal dictionary, based on a cachekey
> built from the expression, and use any referenced INotifyPropertyChanged or
> ObservableCollections.to invalidate the hash entry on changes.
>
>
>
> My understanding of this is that I need to
>
> a/ visit all ConstantExpressions to see if they are instances of
> INotifying.. or ObservableCollections. I have this working by overriding
> VisitConstantExpression
>
> b/ expand SubQueries so that calling ToString on the Expression results in
> unique cache keys. I was going to do this using the PartialEvaluation from
> http://petemontgomery.wordpress.com/2008/08/07/caching-the-results-of-linq-queries/
> but that doesn’t use Re-Linq, which I think I need for point c/, so I might
> have to re-implement it as a Re-Linq visitor.
>
> c/ visit the subqueries in order to find the INotifyPropertyChanged etc. Am
> stuck here really - what's the best way to do this?
>
>
>
> Indexing: For indexed queries, I am planning to use a Linq to Lucene
> implementation with an in-memory Lucene store, and index the collection on a
> call to .ToIndexed(). Then I would pass any expressions into the internal
> Linq to Lucene provider. Sounds erlatively simple compared to the caching!
>
>
>
> I’m a bit worried I am biting off more that I can chew, as I am pulling bits
> from blogs here and there, so I would love to know if these approaches seem
> sound(ish)! thanks
>
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "re-motion Users" group.
> To view this discussion on the web visit
> https://groups.google.com/d/msg/re-motion-users/-/QsgDms-k85AJ.
> To post to this group, send email to re-moti...@googlegroups.com.
> To unsubscribe from this group, send email to
> re-motion-use...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/re-motion-users?hl=en.

Harry McIntyre

unread,
Dec 20, 2012, 5:13:30 AM12/20/12
to re-moti...@googlegroups.com

> I'm not sure I understand how the actual indexing should work. Maybe 
> you could again give an example to illustrate. 

I have created a Gist with a 'proof of concept' for using Lucene for querying a collection. There are several things I need to do still
  • watching source objects for changes and updating the index
  • making it return the actual indexed object, not a clone (Lucene.Net.Linq re-materializes the objects from the data in the index. I'd prefer to have it return the real items from the collection)
> Hm, can you give a concrete query and describe what exactly you mean? 
> Then we can use the example to discuss how to best solve your issue. 

I'm going to have a play with the PartialEvaluatingExpressionTreeVisitor and get back to you. Thanks for the pointer

Michael Ketting

unread,
Dec 20, 2012, 11:15:38 AM12/20/12
to re-moti...@googlegroups.com
Hi Harry!

Since you're planning to drive a lucene-query via linq, we've had a thread going about writing a linq-provider for lucene a couple of months ago:
https://groups.google.com/d/topic/re-motion-users/a0K9whLUC98/discussion
Maybe Criss's work might help with the querying-side of things. Not compeltely sure, though, since he was going for a full-fledged linq-provider, not an in-memory search optimization.

Regards, Michael

Harry McIntyre

unread,
Dec 20, 2012, 11:35:32 AM12/20/12
to re-moti...@googlegroups.com
Thanks Michael, I'll read up on it. The natty thing is that Lucene's built-in in-memory data store, I can piggyback on the work of others easily!
Reply all
Reply to author
Forward
0 new messages