You've asked how to deal with closures. After spending some time thinking about this, I don't think it's possible to support closures, assuming we're thinking of the same thing.
I'm assuming you have code vaguely like this:
public IQueryable<Person> GetPeopleWithFirstName(string firstName)
{
return GetPeople().Where(e => e.FirstName == firstName);
}
that is, some form of method/property that returns an
IQueryable<T>, which subsequent code applies further extension methods to.
Furthermore, you then want to cache the underlying queries, without the cache growing too large.
The problem, as you say, is closures -- '
.Where(e => e.FirstName == firstName)' creates a closure on
firstName -- the compiler is, quite literally, both permitting closures to exist
and screwing things up for you.
To facilitate the above, the compiler will create a new class to contain the captured
firstName value, instantiate an instance of this new class, populate the
firstName field of the new class, and further pass this class to the
.Where() extension method. This class is obviously garbage collected (how could it not be?).
So, by the time we get a
SelectQuery instance to insert into the cache, it
must be referring to this captured
firstName value. Furthermore, there's no way to modify the
firstName value (because we never keep a reference around to permit modifying it).
This is why with the current
QueryCache code, subsequent uses of the same lambda expression return the
original parameter values -- the captured values are inexorably linked to the
SelectQuery instance, but we're not taking the values into consideration (to keep the cache smaller), we get major breakage like what I was seeing in NerdDinner.
I suspect that there is no solution to this: you can't say that set A of captured values should stay with the resulting
SelectQuery, while set B of captured values needs to be overridden later. Even if you could say this, there's no provided mechanism to get the updated values for use with a new SQL generation step, as the only way to get
any value is via closure capturing.
QueryCache is busted. The only way to make it work is to do what was originally done (make values part of the cached values), which results in large memory use as every query variation is stored uniquely.
(Which is why caches and cache policy are actually quite hard to design and implement, as a poorly written cache is indistinguishable from a memory leak, and it looks like our existing cache
is indistinguishable from a memory leak. :-( Furthermore, given the unbounded memory properties of our cache, and the CPU-bound nature of SQL generation, there is probably a point where always generating the Query+SQL "from scratch" is more performant than using the cache, because the cache sucked up several GB of space and is slowing down the entire process due to VM memory being swapped around...)
The
only solution to this that I see is
CompiledQuery, which
does provide a mechanism to distinguish between captured variables and explicit parameters, e.g.
string implicitlyCaptured = "Jon";
Func<PeopleDB, string, Person> compiledQuery = CompiledQuery.Compile(
(PeopleDB db, string explicitParameter) =>
db.People.Where(p => p.FirstName == implicitlyCaptured &&
p.LastName = explicitParameter)
.SingleOrDefault()
);
To return to your previous question:
I don't think you can actually do this,
unless the different methods accept their closure variables as parameters. If this is the case, you could conceivably write a
CompiledQuery which was implemented in terms of these other methods, and could take explicit parameters for delegate to the other methods. This is untested, but what I'm thinking may be possible:
var people = CompiledQuery.Compile(
(PeopleDB db, string firstName, string lastName) =>
// see earlier definition
GetPeopleWithFirstName(firstName)
.Where(p => p.LastName == lastName);
);
You could then store
people as a class field (thus caching the pre-parsed query and intermediate SQL with the containing object, so the user has full control over cache policy), and parameters Just Work w/o worrying about a cache lookup using the previous values (as 1. there is no centralized cache, and 2. all needed parameters are explicitly provided).
Hope this provides some food for thought.
- Jon