Ad Hoc Queries

184 views
Skip to first unread message

Ayende Rahien

unread,
Sep 23, 2010, 6:56:41 AM9/23/10
to ravendb
I figured out a way to execute ad hoc queries against raven without causing a memory leak, most in the interest of being able to run tests or to export data.
The API isn't exposed over HTTP at the moment, but the C# API is:

 var result = db.ExecuteQueryUsingLinearSearch(new LinearQuery
 {
     Query = "from doc in docs select new { doc.Name.Length } "
 });

Please note a few things:
  • As the method name implies, this is an O(n) operation
  • There is support for paging, but there is no support for finding out the total number of matching records.
  • There is no support for parameters, and as long as it is test focused, I don't think that I'll add it.

Rob Ashton

unread,
Sep 23, 2010, 11:31:09 AM9/23/10
to ravendb
Okay, I've been through and checked it out and before I go ahead and
create a system along the lines we discussed on Twitter I want to
check a few facts about this particular implementation of dynamic
queries.

In order to avoid memory leaks, you're creating a new app domain which
contains a query cache, when a LinearQuery is ran against the
QueryRunner that query is stored in the cache, and when the query
runner contains 1024 queries, the app domain is flushed and the
process starts all over again.

This query runner is passed a IRemoteStorage, from which it retrieves
the documents in batches, and enumerates through them plucking out the
docs that meet the requirements of the query passed in. As far as I
can see, this doesn't use Lucene indexes, but the raw query passed in?
Am I understanding that correctly? Does that mean you potentially will
get different results from traditional indexes?

---

I was planning on doing the dynamic index system as a bundle, and
actually creating/destroying indexes (for all queries, potentially
taking a long time the first time a query is invoked)- is this going
to be a problem? My initial plan was to tack onto whatever you had
already written, but if Lucene indexes aren't being used for ad-hoc
queries then that plan is scuppered (I think)

Ayende Rahien

unread,
Sep 23, 2010, 11:41:09 AM9/23/10
to ravendb
Yes, the idea here is that the output of an ad hoc query is whatever the output of the linq query is.
I am not sure HOW we could use indexes for that, or why we would want that.
With indexes, in much the same way, we take the output from the linq query and put it in Lucene.
Here, we are simply returning the output.
I am not sure that I understand what you mean by raw query passed in.

What would YOU consider for querying on dynamic indexes?

Rob Ashton

unread,
Sep 23, 2010, 11:46:32 AM9/23/10
to ravendb
Okay, I get you - so this is just a nice way of debugging the contents
of the document store via linq queries
I think for *that* purpose, using indexes would be a folly, but it'll
need to be pointed out quite loudly that this is what it does and that
it's very different to what you'd get from an actual index.

Also yes yes, you're quite right - this is what you meant by
parameters not being there, I didn't read your post properly.

Rob Ashton

unread,
Sep 23, 2010, 11:47:40 AM9/23/10
to ravendb
So basically ignore my question entirely because I'm talking nonsense
=)

Rob Ashton

unread,
Sep 23, 2010, 11:54:00 AM9/23/10
to ravendb
Anyway, moving on and ignoring the above drivel - I've remembered how
RavenDB works now...

An implementation of parametrized dynamic queries could

- Use the above to get the documents, put the results in a temporary
lucene index, wait for the indexing to occur, query that lucene index
and then delete the lucene index
- If the same query is called a number of times in a certain time
period, an actual index could be created and queried instead (given a
unique name based on the fields being queried)
- If the query isn't invoked after a set amount of time, it would be
deleted

This would potentially have less overhead than creating a proper index
each time and waiting for the system to index the data?

My alternative suggestion is to create an index for all documents,
full of dynamic fields and query that too, but that has the
disadvantage that it works completely differently to how the rest of
RavenDB functions

Ayende Rahien

unread,
Sep 23, 2010, 12:20:14 PM9/23/10
to ravendb
Problems,
You can only do that for the simplest scenarios, such as:
Name:Ayende 

It also ignore big optimizations such as the ability to only scan specific entity names.

Rob Ashton

unread,
Sep 23, 2010, 12:34:56 PM9/23/10
to ravendb
I assume by simplest scenarios you mean to do with the queries
themselves rather than the indexes

For example

select new {
length = name.Length
}

vs

select new {
name = name
}

Because you'd probably want ints/etc to be treated differently from
strings. I foresee a convention where different types are stored in
Lucene with their nearest match? I can't see any problem with the
indexes themselves being complex, just the parameters/lucene queries
themselves? We have to make some concessions when it comes to dynamic
queries and what the user can expect to happen under specific
circumstances.

Where is the optimisation made to only scan specific entity names
(Just a class name will do, I can work out how it works myself), I
assumed this would be handled up in the web application as part of
creating the index rather than in the storage/indexing engine itself

Ayende Rahien

unread,
Sep 23, 2010, 12:39:51 PM9/23/10
to ravendb
Let me rephrase that.
Dream up the API from the client side to do this.

Rob Ashton

unread,
Sep 23, 2010, 12:56:22 PM9/23/10
to ravendb
I think I am missing something :)

client.DynamicQuery<BlogEntry>()
.Where(b=> b.Category == "Fish" && b.Title.Length > 10)

I was planning on looking at the expression in the client API, fishing
out the data on each comparison, and sending all of this information
across the wire to a custom Responder
I was then going to work out what fields need indexing based on this
data and construct a map statement from this information

It's a little bit complicated, because it would require
yet_another_linq_provider (or a bolt-on to the current one), but I'd
envisage support starting off simple and growing over time

The data sent across the wire would be something like

doc.Category, ":Fish"
doc.Title.Length, ":[1 TO *]" (or whatever the lucene is for this)

Am I missing a really obvious no-no in any of this? Surely it's all
just a case of analysing the linq expressions?

Ayende Rahien

unread,
Sep 23, 2010, 1:03:18 PM9/23/10
to ravendb
No, you aren't. I am being obtuse. Yes, sound perfectly fine to me.
Probably best to also check if existing indexes can match this, which would also remove the need to create one specially.

Rob Ashton

unread,
Sep 23, 2010, 1:06:23 PM9/23/10
to ravendb
My plan is (with the knowledge the the above *is* indeed possible), to
start with the responder and write the functionality from there whilst
bearing in mind the needs of being able to write the Client API - as I
want to map out the various options within the server and assess the
practicality before embarking in linq expression parsing madness.

I'll let you know how I get on

Ayende Rahien

unread,
Sep 23, 2010, 1:08:39 PM9/23/10
to ravendb
Awesome.

The Bitland Prince

unread,
Sep 23, 2010, 1:51:32 PM9/23/10
to ravendb

By the way, for what it's worth, I think ad hoc queries could be very
useful. It's great to have a simple way to scan documents to perform
tests or, as you said, to export data to a different storage. Makes
Raven choice to be comfortable enough to be trusted, expecially to
people new to document databases like me.

Moreover, it allows migration of documents to other objects into the
same database very easy to perform.

Ayende Rahien

unread,
Sep 23, 2010, 1:58:05 PM9/23/10
to ravendb
Yeah, the idea of doing migrations that way is pretty compelling.

Rob Ashton

unread,
Sep 23, 2010, 5:52:28 PM9/23/10
to ravendb
Right, well the code for it isn't beautiful, but I've got this test
passing:

[Fact]
public void CanPerformDynamicQueryAndGetValidResults()
{
var blogOne = new Blog
{
Title = "one",
Category = "Ravens"
};
var blogTwo = new Blog
{
Title = "two",
Category = "Rhinos"
};
var blogThree = new Blog
{
Title = "three",
Category = "Rhinos"
};

using (var s = store.OpenSession())
{
s.Store(blogOne);
s.Store(blogTwo);
s.Store(blogThree);
s.SaveChanges();
}

var results = server.Database.ExecuteDynamicQuery(new
Bundles.DynamicQueries.Data.DynamicQuery()
{
FieldMap =
"Title:title,Category:category,Title.Length:titleLength",
PageSize = 128,
Start = 0,
Query = "titleLength:3 AND category:Rhinos"
});

Assert.Equal(1, results.Results.Length);
Assert.Equal("two",
results.Results[0].Value<string>("Title"));
Assert.Equal("Rhinos",
results.Results[0].Value<string>("Category"));
}

I don't like that I'm having to pass through a map of the fields, but
the alternative is to parse a lucene query and perform a replace on
certain values. I'll leave it like that and I've exposed that
functionality as_is in the HTTP API, easy enough to change once I'm
done with doing more important things like the optimisation we were
talking about etc.

Ayende Rahien

unread,
Sep 23, 2010, 9:17:07 PM9/23/10
to ravendb
Is there a reason that the field map is a single string and not an array of them? It would seems more easy to do.

Rob Ashton

unread,
Sep 24, 2010, 3:03:23 AM9/24/10
to ravendb

Just because they came from the query string that way :), probably
should move the responsibility for parsing that up a level and no
doubt I will.

An important point about this POC is that I am going through the
normal system for creating the index and waiting for it to be indexed,
for performance I am guessing this is sub optimal - and I should take
care of that manually instead of relying on the usual background tasks
- just wanted to make sure I had all the information required for the
task before I went down that route.

I'll bash out a more complete attempt today and push to my fork so
work in progress can be seen


On Sep 24, 2:17 am, Ayende Rahien <aye...@ayende.com> wrote:
> Is there a reason that the field map is a single string and not an array of
> them? It would seems more easy to do.
>

Rob Ashton

unread,
Sep 24, 2010, 5:46:07 AM9/24/10
to ravendb
That's a point actually, let's discuss preferences for the HTTP API
while we're here

Currently it looks like this

/dynamicquery?query=name:ayende&mapping=User.Name:name

It's a bit lame, and I'd prefer not to be specifying what the mapping
is at all because I'd like to generate that myself when I generate the
map statement (means greater potential for the sharing of indexes
between similar queries and less responsibility for the clent)

I could do

/dynamicquery?query=User.Name:ayende

But that means parsing the Lucene query to extract the fields and
replace them with whatever I put in the map query. This is (In my
opinion) a challenge to do properly - I could use QueryParser from
Lucene.net but that means using the right analyser, and I'm not
confident it wouldn't be a brittle solution, doing it manually using a
rudimentary parser might work - but I can't find a formal grammer for
Lucene queries

A simpler alternative would be to do something like this

/dynamicquery?query={User.Name}:ayende

And parse for the braces, replacing them with whatever I choose to
name the mapped fields. This is my preference, but there is no
analogue for it in the rest of the API

Rob Ashton

unread,
Sep 24, 2010, 9:43:37 AM9/24/10
to ravendb
I've managed to fit a bit more work into my lunch hour, and now have a
more instant way of doing dynamic queries (create lucene index
manually, index the documents manually by paging through them, then
query, then delete the lucene index)

Code is here:

http://github.com/robashton/ravendb/blob/f910ff7d5e07b01e4c9b7edb20ccf0dd66da0dd3/Bundles/Raven.Bundles.DynamicQueries/Database/DatabaseExtensions.cs

My only concern (beyond anything I might be doing wrong that I don't
know about yet), is that I'm having to create an entry in the Esent DB
- the only reason I have to do this, is that when you run a query
against an index, statistics are generated against that index and an
error is thrown if the index doesn't exist in Esent.

I'd prefer not to be creating an entry storage just for this purpose,
as I need to persuade the background task not to perform any indexing
on this index if I do. If I could get by without creating the entry in
storage, then I wouldn't need to worry about background tasks - the
temporary index would truly be temporary and there would be no
concerns of clashing.

Nonetheless, this implementation does seem to work for small numbers
of documents (I need to put 30,000+ through and make sure it functions
as expected).

Next up I'll start thinking about creating permanent indexes and
choosing to use those etc.

Ayende Rahien

unread,
Sep 24, 2010, 12:06:59 PM9/24/10
to ravendb
Okay, I did a review, and I have a few comments.
a) I don't like the DynamicQueryMap, there really isn't any need to do this.

 [Fact]
 public void Parsing()
 {
     var query = new QueryParser(Version.LUCENE_29, "", new StandardAnalyzer()).Parse("Title.Length:5 Category:Users");
     var terms = new Hashtable();
     query.ExtractTerms(terms);

     var fields = new HashSet<string>();

     foreach (Term term in terms.Keys)
     {
         fields.Add(term.Field());
     }
 }

Remember, you don't actually care about the query at this stage, you only care about the field names, and this gives it to you, then you can construct the index def.

b) I would actually think that it would be better to create a temporary index, but NOT delete it. Rather, setup a timer to remove it after some amount of time. If there are enough requests for the index in that duration, make it permanent.

c) For that matter, I wouldn't actually wait for the entire index to be built. All we need is enough results to satisfy the PageSize, after all. So I would just query it until either the results are non stale or I have enough results for the page.

Rob Ashton

unread,
Sep 24, 2010, 12:24:10 PM9/24/10
to ravendb
Okay, if it's that easy to extract the terms then that's fine by me -
I'll do that - makes for a *much* better API

RE Creating a temporary index and not deleting it, I think you're
right, and that's the logical next step, I'll need to name the indexes
more sensibly but that's how I plan to do the optimisation.

If I'm doing that, then I don't mind it going into Esent, but I'd
still prefer to manually run the index through Lucene rather than
waiting for it to happen (if that makes sense). I need to prevent the
normal indexing process from occurring if I do that though won't I?

I'll carry on chugging through this when I get home tonight and get
the persistence/deletion thing going on - this is the right path I'm
going down though right?
> >http://github.com/robashton/ravendb/blob/f910ff7d5e07b01e4c9b7edb20cc...

Rob Ashton

unread,
Sep 24, 2010, 12:25:51 PM9/24/10
to ravendb
I'm not sure if you're right about page size, if they're asking for
any kind of ordering in their query then the index does need to be
built?
> >http://github.com/robashton/ravendb/blob/f910ff7d5e07b01e4c9b7edb20cc...

Rob Ashton

unread,
Sep 24, 2010, 12:37:51 PM9/24/10
to ravendb
Having said that, this is a possibly just a question of expectations:

Your expectation is that querying in this way is going to potentially
give you a hyper-stale result?
I was thinking that a call to a dynamic query would give you results
that were up to date at the point of call time.

I should perhaps be passing in a cut-off parameter like the other
querying methods, and letting the consumer of the service decide?

Rob Ashton

unread,
Sep 24, 2010, 12:45:35 PM9/24/10
to ravendb
Sorry for the spam, this is just how I think when I haven't got an IDE
in front of me to write code with :)

One more addition before I go home to start work on making this more
satisfactory, if I am indeed going to create the index and leave it
there, then perhaps I am better off doing what I did originally and
just waiting for indexing to happen - it is likely to be less
performant when the system is busy, but if this is on the
understanding that future calls to this dynamic query will be fast
because the index is already created then this isn't really a problem
is it?

Ayende Rahien

unread,
Sep 24, 2010, 2:04:35 PM9/24/10
to rav...@googlegroups.com
A few other thing to consider
Make it a RAM dir index would be faster
You can modify the index cos de to not require the stats although I
think we want that
Think about it from admin POV
I wanna know what is going on there even for temp indexes

Ayende Rahien

unread,
Sep 24, 2010, 2:05:36 PM9/24/10
to rav...@googlegroups.com
Hm
Not sure about that
Licene ordering requires specifying sort order how would you do that?

On Friday, September 24, 2010, Rob Ashton <roba...@codeofrob.com> wrote:

Ayende Rahien

unread,
Sep 24, 2010, 2:07:12 PM9/24/10
to rav...@googlegroups.com
Since we expect to reuse the query I think we can use the same logic
as elsewhere
Another point in favor of index stays in esent

On Friday, September 24, 2010, Rob Ashton <roba...@codeofrob.com> wrote:

Ayende Rahien

unread,
Sep 24, 2010, 2:08:38 PM9/24/10
to rav...@googlegroups.com
No just wait for the page to be full for the first time
The second time normal rules apply

On Friday, September 24, 2010, Rob Ashton <roba...@codeofrob.com> wrote:

Rob Ashton

unread,
Sep 24, 2010, 3:32:02 PM9/24/10
to ravendb
Right, so:

When first creating index, wait until either all docs have been
indexed or selected page is full (whatever comes first)
Index goes in Esent as per usual
I thought you specified sort order on point of query when performing
the lucene search (Don't need to play with the definition for this to
work do we?)

I'll make it so and then play with it, no point in dallying. Finally
home after car breaking on me, so I'm 2 hours behind where I wanted to
be!

Rob Ashton

unread,
Sep 24, 2010, 3:40:12 PM9/24/10
to ravendb
Super - your lucene query code is brilliant

[Fact]
public void CanPerformDynamicQueryAndGetValidResults()
{
var blogOne = new Blog
{
Title = "one",
Category = "Ravens"
};
var blogTwo = new Blog
{
Title = "two",
Category = "Rhinos"
};
var blogThree = new Blog
{
Title = "three",
Category = "Rhinos"
};

using (var s = store.OpenSession())
{
s.Store(blogOne);
s.Store(blogTwo);
s.Store(blogThree);
s.SaveChanges();
}

var results = server.Database.ExecuteDynamicQuery(new
Bundles.DynamicQueries.Data.DynamicQuery()
{
PageSize = 128,
Start = 0,
Query = "Title.Length:3 AND Category:Rhinos"
});

Assert.Equal(1, results.Results.Length);
Assert.Equal("two",
results.Results[0].Value<string>("Title"));
Assert.Equal("Rhinos",
results.Results[0].Value<string>("Category"));
}

<3

Ayende Rahien

unread,
Sep 24, 2010, 3:47:48 PM9/24/10
to ravendb
Great.
Now, give it a shot with a linq query, which should also work.
The next hurdle is complex queries, such as:

from user in docs.Users
from role in user.Roles
select new { Role = role }

I am not sure if we want / need to support such things, though.
Thoughts?

Rob Ashton

unread,
Sep 24, 2010, 4:01:29 PM9/24/10
to ravendb
I think that's a different challenge entirely and perhaps something we
don't want to support unless it actually becomes useful to do so.

A major problem there would be how to even represent that, as we're
currently passing in an actual lucene query and reverse engineering a
linq statement from it

Ayende Rahien

unread,
Sep 24, 2010, 4:21:09 PM9/24/10
to rav...@googlegroups.com
Yeah, that is pretty much what I am thinking.
We can probably do better, though.

Assume that we had the following Lucene query on the User.Roles model.

Roles:Administrator

We could set things up that trying to index an IEnumerable would result in multiple fields being emit for Lucene.

The linq query would be:

select user from docs.Users
select new { Roles = user.Roles }

But the query should work.

Rob Ashton

unread,
Sep 24, 2010, 4:24:18 PM9/24/10
to ravendb
Okay, well I'm game to give it a go if I've still got time once I've
refined what I've got and sorted out a client API for it
Not touching work code all weekend and with my car in the garage I'm
probably not going anywhere :(

On Sep 24, 9:21 pm, Ayende Rahien <aye...@ayende.com> wrote:
> Yeah, that is pretty much what I am thinking.
> We can probably do better, though.
>
> Assume that we had the following Lucene query on the User.Roles model.
>
> Roles:Administrator
>
> We could set things up that trying to index an IEnumerable would result in
> multiple fields being emit for Lucene.
>
> The linq query would be:
>
> select user from docs.Users
> select new { Roles = user.Roles }
>
> But the query should work.
>

Ayende Rahien

unread,
Sep 24, 2010, 5:03:00 PM9/24/10
to rav...@googlegroups.com
Cool, let me know when you push next

Rob Ashton

unread,
Sep 24, 2010, 6:26:08 PM9/24/10
to ravendb
Flying through this now, a query for anytime between now and the final
push

Generating the index name, in an ideal world it would tell us what was
in the index, but how sustainable is this?

String combinedFields = String.Join("",
map.Items
.OrderBy(x => x.To)
.Select(x=>x.To)
.ToArray());

Am I going to have to do a hash of those fields to get the index name?
I assume the maximum index name length is going to be hit sooner or
later by somebody

Ayende Rahien

unread,
Sep 25, 2010, 2:46:00 AM9/25/10
to rav...@googlegroups.com
The max field name is something like 255 characters. 
I would say that you want a hash IIF the name is longer than that, otherwise, use human readable string.

Rob Ashton

unread,
Sep 25, 2010, 5:54:56 AM9/25/10
to ravendb
Righto.

A further query now I'm getting properly into the client side work - I
started this as a bundle because it seemed to me putting it in the
core build might encourage people to use it more than they perhaps
should - but is this the right decision?

I'm going to have a hard time carrying on doing this as a bundle if
I'm going to add the client API for this, this is because there are no
easy extension points on IDatabaseCommands for adding further
functionality - because of course the underlying implementation may
change depending on whether you're pointing it the http/direct-access
(or TCP?)

I'm happy to crack it open and try to add the relevant extension
points, but this would represent another non-trivial task so I'm not
willing to do it unless we decide we *really* want this as a bundle.
For now I'm writing the code directly against the client, in the
knowledge that refactoring it out will be easier once I know exactly
what it needs.

I am beginning to lead towards not having it as a bundle, as the way
we're now implementing it means it is going to actually be a useful
piece of functionality for lowering the entry barrier of just 'picking
up and playing' with RavenDB

Rob Ashton

unread,
Sep 25, 2010, 6:37:16 AM9/25/10
to ravendb
We could even go so far as to carry on using the Indexes endpoint, and
accept "Temp" as an index name, but that might be going too far

Ayende Rahien

unread,
Sep 25, 2010, 7:06:06 AM9/25/10
to rav...@googlegroups.com
I agree that this would be useful as a core piece of functionality.
We can probably do it in the same way replication is in, as a core piece of the client, which is only available if the bundle is installed on the server

Ayende Rahien

unread,
Sep 25, 2010, 7:06:30 AM9/25/10
to rav...@googlegroups.com
I actually like that better.
It makes the HTTP API simpler.
Even if we put it in the core, I like /indexes/dynamic?query=....

Rob Ashton

unread,
Sep 25, 2010, 7:17:06 AM9/25/10
to ravendb
Aha, then I'll get what I'm working on finished and then move it into
the core along with the tests (both server and client)

It really does keep the HTTP API simple - it also means this
automatically gets supports for Includes and whatever you choose to
add to index queries in the future, turns out that once I'd finished
experimenting with the different ways of supporting this that this was
the natural conclusion of those efforts.

Hopefully that should be the last "dumb question" until I get the full
experience finished

Rob Ashton

unread,
Sep 25, 2010, 9:39:20 AM9/25/10
to ravendb
It's quite embarrassing how long it's taken me to come to such a
simple solution, but oh well - pushed to my fork is functionality
like:

using (var s = store.OpenSession())
{
var results = s.DynamicQuery<Blog>()
.Customize(x =>
x.WaitForNonStaleResultsAsOfNow())
.Where(x => x.Category == "Rhinos" &&
x.Title.Length == 3)
.ToArray();

Assert.Equal(1, results.Length);
Assert.Equal("two", results[0].Title);
Assert.Equal("Rhinos", results[0].Category);
}

and

using (var s = store.OpenSession())
{
var results = s.DynamicLuceneQuery<Blog>()
.Where("Title.Length:3 AND Category:Rhinos")
.WaitForNonStaleResultsAsOfNow().ToArray();

Assert.Equal(1, results.Length);
Assert.Equal("two", results[0].Title);
Assert.Equal("Rhinos", results[0].Category);
}

I wouldn't class it as "done", trivial tidy up is needed (moving the
magic string "dynamic" to a constant somewhere - things like that)

A few of things to be aware of, and something I'm now going to devote
my afternoon to:

1) I had to modify the Linq provider to start using the full path to
the member instead of the name of the member, I don't think this will
break non-dynamic queries, but it's necessary for dynamic queries to
work (this is in RavenQueryProviderProcessor.GetMember)

2) I've had to do a hack where I remove [[ and ]] from the query when
running StandardAnalyzer over it to extract the terms as it doesn't
like them. It probably doesn't like other things either, and this was
my worry over parsing the Lucene query - I guess fix the issues as I
come across them and start thinking about using a better analyzer

3) I'm currently setting fields to NotAnalyzed by default which makes
my test queries function correctly, this is obviously not desired, but
reflects the assumption by the Linq provider that if it's anything
other than a string it is analyzed, and if it's a string it's not
analyzed

4) I've not yet added the code to delete obsolete indexes

5) I've not yet added the code to hash the index name if it's greater
than a certain number of characters

I intend on getting #4 and #5 sorted immediately

#2 and #3 I'm going to mull over, and by writing some tests against
dynamic queries for common use cases I aim to find the common pitfalls
and from that establish some more intelligent code from which to
generate more appropriate indexes.

That will lead nicely onto collection types (multiple froms), because
without a more intelligent solution to #3 that's not going to be
possible


Ayende Rahien

unread,
Sep 25, 2010, 10:31:59 AM9/25/10
to rav...@googlegroups.com
1) Hm, what do you mean, what is the difference?
2) Probably need to run it through the same process as standard queries go through, which would give the same basic result.
3) Note that we intend to move in that direction anyway (default for NotAnalyzed)

Rob Ashton

unread,
Sep 25, 2010, 10:58:01 AM9/25/10
to ravendb
1)

If you just use MemberInfo on an expression like

x=> x.Title.Length == 3

The query generated looks like

Length:3

Which is fine for pre-built indexes because that's what the mapped
field will no doubt be called

What you actually want is

Title.Length:3

2) I'll have a look at that and see if I can't do that
3) Okay, that's re-assurance at least

Ayende Rahien

unread,
Sep 25, 2010, 11:31:09 AM9/25/10
to rav...@googlegroups.com
1 would probably will break stuff for us

Rob Ashton

unread,
Sep 25, 2010, 11:55:28 AM9/25/10
to ravendb
I've tried to think of things it *would* break, but to no avail (If it
truly does, then I need to split out some of the functionality from
the linq provider and create two modes)

Because we previously have only allowed queries against indexes, the
structure of indexes has been flat (IE, no complex properties)

public class MySuperCoolIndex
{
public string Title
{
get;
set;
}

public string TitleLength
{
get;
set;
}
}

The behaviour of this won't have modified with this change, unless
I've missed something since I last wrote code against RavenDB - have
you got an example?

Ayende Rahien

unread,
Sep 25, 2010, 12:04:56 PM9/25/10
to rav...@googlegroups.com
Hm, you are correct, please ignore my previous comment.

Rob Ashton

unread,
Sep 25, 2010, 12:13:01 PM9/25/10
to ravendb
In other news, I think I've over promised - I'm not exactly sure
*where* to add the logic for "if this dynamic index hasn't been
invoked in the past <insert time here> then delete it"

Running it as an occasional task is fine, I've no problem with that -
but as far as I can see we don't log/mark when an index was last
invoked - in order to get accurate behaviour after shutdown/restart
we'd surely need to store information like this in our persistent
storage (Esent?).

Unless it's obvious and I'm just being dumb can I leave this one to
you once I've tidied up the rest of my code?

Ayende Rahien

unread,
Sep 25, 2010, 12:49:24 PM9/25/10
to rav...@googlegroups.com
1) on startup, delete all temp indexes
2) keep the last queried timestamp in memory
3) when the # of queries per timespan increases over X, create real index and delete temp one.
4) every Y time, cleanup old indexes.

Rob Ashton

unread,
Sep 25, 2010, 12:58:47 PM9/25/10
to ravendb
Well, I've fixed the query analysis to use the same functionality as
the rest of RavenDB so I've been able to remove that hack ( #2 ) and
I've started hashing indexes when they're longer than 240 characters
(leaving space for a prefix "Temp_" so we can identify them.

I've also sorted out my negligence and added the ability to query
dynamic indexes to a local document store

I'm going to submit that pull request and leave it there for now,
although once you've done what you need to do I'll no doubt attack it
with a few more examples and start thinking about more complicated
queries.

That just leaves #4 and whatever tidy ups you want to do

Rob Ashton

unread,
Sep 25, 2010, 12:59:52 PM9/25/10
to ravendb
Oh, this will teach me for not hitting refresh - I've got time to do
that before I go

Rob Ashton

unread,
Sep 25, 2010, 1:00:31 PM9/25/10
to ravendb
(That was a bit different from how I was seeing it working, but
definitely up to task)

Rob Ashton

unread,
Sep 25, 2010, 2:27:56 PM9/25/10
to ravendb
Done (I think), and now I'm off to party

Currently set to 100 times in 10 minutes (or an average similar to
that since time of start), maybe needs tweaking but at least the logic
is started

Rob Ashton

unread,
Sep 25, 2010, 3:04:27 PM9/25/10
to ravendb
Just realised I've forgotten to push my final work in, you're missing
periodic cleanup.

Can't touch that till I get home now, doh!

Rob Ashton

unread,
Sep 26, 2010, 10:26:47 AM9/26/10
to ravendb
I did this last night by the way, it's in to a reasonable standard -
spending some time thinking about more advanced queries now,
@ScottGal's expectation of

from doc in docs
from tag in doc.Tags
select new
{
Name = tag.Name
}

being accessible via

Where(x=> x.Tags.Any(y=> y.StartsWIth("Fi")))

Has got me thinking about how this would work with both real indexes
and dynamic indexes - probably thought for a bigger discussion that
really, it's another "how do we map Linq onto our Lucene indexes"
problem

Rob Ashton

unread,
Sep 26, 2010, 10:41:24 AM9/26/10
to ravendb
My current thoughts are

In a dynamic query, that would generate the following path

Tag.Name

In a plain old index query, that would generate the following path

Name

(IE, going to need two separate linq providers, although largely
stemming from the same code)

For dynamic queries on the server, when a 'dotted' expression is found
when doing dynamic query parsing, if the right hand side of the dot is
anything other than 'length' or other special terms, a nested select
will be generated in the dynamic query.
For generate index queries on the server, it would assume you had
created an index with a nested select with that property

The alternative is to pass Tag.Name to the server in both instances,
and do some query parsing for both dynamic queries and index queries

Ayende Rahien

unread,
Sep 26, 2010, 9:07:43 PM9/26/10
to rav...@googlegroups.com
Where is the additional select feature is implemented?

Ayende Rahien

unread,
Sep 26, 2010, 9:54:53 PM9/26/10
to rav...@googlegroups.com
Pulled, it is in build 165

Rob Ashton

unread,
Sep 27, 2010, 4:26:08 AM9/27/10
to ravendb
You mean the functionality I've added Any for? I was holding back on
that as part of the greater Linq discussion - should be relatively
trivial to add though, I'll give it a go today

Rob Ashton

unread,
Sep 27, 2010, 4:27:56 AM9/27/10
to ravendb
I'll write documentation for what is there first though

On Sep 27, 2:54 am, Ayende Rahien <aye...@ayende.com> wrote:
> Pulled, it is in build 165
>
>
>
> On Mon, Sep 27, 2010 at 3:07 AM, Ayende Rahien <aye...@ayende.com> wrote:
> > Where is the additional select feature is implemented?
>

Ayende Rahien

unread,
Sep 27, 2010, 4:35:28 AM9/27/10
to rav...@googlegroups.com
Awesome!

Rob Ashton

unread,
Sep 27, 2010, 8:18:50 AM9/27/10