Ad Hoc Queries

Ayende Rahien

unread,

Sep 23, 2010, 6:56:41 AM9/23/10

to ravendb

I figured out a way to execute ad hoc queries against raven without causing a memory leak, most in the interest of being able to run tests or to export data.

The API isn't exposed over HTTP at the moment, but the C# API is:

var result = db.ExecuteQueryUsingLinearSearch(new LinearQuery

{

Query = "from doc in docs select new { doc.Name.Length } "

});

Please note a few things:

As the method name implies, this is an O(n) operation
There is support for paging, but there is no support for finding out the total number of matching records.
There is no support for parameters, and as long as it is test focused, I don't think that I'll add it.

Rob Ashton

unread,

Sep 23, 2010, 11:31:09 AM9/23/10

to ravendb

Okay, I've been through and checked it out and before I go ahead and
create a system along the lines we discussed on Twitter I want to
check a few facts about this particular implementation of dynamic
queries.

In order to avoid memory leaks, you're creating a new app domain which
contains a query cache, when a LinearQuery is ran against the
QueryRunner that query is stored in the cache, and when the query
runner contains 1024 queries, the app domain is flushed and the
process starts all over again.

This query runner is passed a IRemoteStorage, from which it retrieves
the documents in batches, and enumerates through them plucking out the
docs that meet the requirements of the query passed in. As far as I
can see, this doesn't use Lucene indexes, but the raw query passed in?
Am I understanding that correctly? Does that mean you potentially will
get different results from traditional indexes?

---

I was planning on doing the dynamic index system as a bundle, and
actually creating/destroying indexes (for all queries, potentially
taking a long time the first time a query is invoked)- is this going
to be a problem? My initial plan was to tack onto whatever you had
already written, but if Lucene indexes aren't being used for ad-hoc
queries then that plan is scuppered (I think)

Ayende Rahien

unread,

Sep 23, 2010, 11:41:09 AM9/23/10

to ravendb

Yes, the idea here is that the output of an ad hoc query is whatever the output of the linq query is.

I am not sure HOW we could use indexes for that, or why we would want that.

With indexes, in much the same way, we take the output from the linq query and put it in Lucene.

Here, we are simply returning the output.

I am not sure that I understand what you mean by raw query passed in.

What would YOU consider for querying on dynamic indexes?

Rob Ashton

unread,

Sep 23, 2010, 11:46:32 AM9/23/10

to ravendb

Okay, I get you - so this is just a nice way of debugging the contents
of the document store via linq queries
I think for *that* purpose, using indexes would be a folly, but it'll
need to be pointed out quite loudly that this is what it does and that
it's very different to what you'd get from an actual index.

Also yes yes, you're quite right - this is what you meant by
parameters not being there, I didn't read your post properly.

Rob Ashton

unread,

Sep 23, 2010, 11:47:40 AM9/23/10

to ravendb

So basically ignore my question entirely because I'm talking nonsense
=)

Rob Ashton

unread,

Sep 23, 2010, 11:54:00 AM9/23/10

to ravendb

Anyway, moving on and ignoring the above drivel - I've remembered how
RavenDB works now...

An implementation of parametrized dynamic queries could

- Use the above to get the documents, put the results in a temporary
lucene index, wait for the indexing to occur, query that lucene index
and then delete the lucene index
- If the same query is called a number of times in a certain time
period, an actual index could be created and queried instead (given a
unique name based on the fields being queried)
- If the query isn't invoked after a set amount of time, it would be
deleted

This would potentially have less overhead than creating a proper index
each time and waiting for the system to index the data?

My alternative suggestion is to create an index for all documents,
full of dynamic fields and query that too, but that has the
disadvantage that it works completely differently to how the rest of
RavenDB functions

Ayende Rahien

unread,

Sep 23, 2010, 12:20:14 PM9/23/10

to ravendb

Problems,

You can only do that for the simplest scenarios, such as:

Name:Ayende

It also ignore big optimizations such as the ability to only scan specific entity names.

Rob Ashton

unread,

Sep 23, 2010, 12:34:56 PM9/23/10

to ravendb

I assume by simplest scenarios you mean to do with the queries
themselves rather than the indexes

For example

select new {
length = name.Length
}

vs

select new {
name = name
}

Because you'd probably want ints/etc to be treated differently from
strings. I foresee a convention where different types are stored in
Lucene with their nearest match? I can't see any problem with the
indexes themselves being complex, just the parameters/lucene queries
themselves? We have to make some concessions when it comes to dynamic
queries and what the user can expect to happen under specific
circumstances.

Where is the optimisation made to only scan specific entity names
(Just a class name will do, I can work out how it works myself), I
assumed this would be handled up in the web application as part of
creating the index rather than in the storage/indexing engine itself

Ayende Rahien

unread,

Sep 23, 2010, 12:39:51 PM9/23/10

to ravendb

Let me rephrase that.

Dream up the API from the client side to do this.

Rob Ashton

unread,

Sep 23, 2010, 12:56:22 PM9/23/10

to ravendb

I think I am missing something :)

client.DynamicQuery<BlogEntry>()
.Where(b=> b.Category == "Fish" && b.Title.Length > 10)

I was planning on looking at the expression in the client API, fishing
out the data on each comparison, and sending all of this information
across the wire to a custom Responder
I was then going to work out what fields need indexing based on this
data and construct a map statement from this information

It's a little bit complicated, because it would require
yet_another_linq_provider (or a bolt-on to the current one), but I'd
envisage support starting off simple and growing over time

The data sent across the wire would be something like

doc.Category, ":Fish"
doc.Title.Length, ":[1 TO *]" (or whatever the lucene is for this)

Am I missing a really obvious no-no in any of this? Surely it's all
just a case of analysing the linq expressions?

Ayende Rahien

unread,

Sep 23, 2010, 1:03:18 PM9/23/10

to ravendb

No, you aren't. I am being obtuse. Yes, sound perfectly fine to me.

Probably best to also check if existing indexes can match this, which would also remove the need to create one specially.

Rob Ashton

unread,

Sep 23, 2010, 1:06:23 PM9/23/10

to ravendb

My plan is (with the knowledge the the above *is* indeed possible), to
start with the responder and write the functionality from there whilst
bearing in mind the needs of being able to write the Client API - as I
want to map out the various options within the server and assess the
practicality before embarking in linq expression parsing madness.

I'll let you know how I get on

Ayende Rahien

unread,

Sep 23, 2010, 1:08:39 PM9/23/10

to ravendb

Awesome.

The Bitland Prince

unread,

Sep 23, 2010, 1:51:32 PM9/23/10

to ravendb

By the way, for what it's worth, I think ad hoc queries could be very
useful. It's great to have a simple way to scan documents to perform
tests or, as you said, to export data to a different storage. Makes
Raven choice to be comfortable enough to be trusted, expecially to
people new to document databases like me.

Moreover, it allows migration of documents to other objects into the
same database very easy to perform.

Ayende Rahien

unread,

Sep 23, 2010, 1:58:05 PM9/23/10

to ravendb

Yeah, the idea of doing migrations that way is pretty compelling.

Rob Ashton

unread,

Sep 23, 2010, 5:52:28 PM9/23/10

to ravendb

Right, well the code for it isn't beautiful, but I've got this test
passing:

[Fact]
public void CanPerformDynamicQueryAndGetValidResults()
{
var blogOne = new Blog
{
Title = "one",
Category = "Ravens"
};
var blogTwo = new Blog
{
Title = "two",
Category = "Rhinos"
};
var blogThree = new Blog
{
Title = "three",
Category = "Rhinos"
};

using (var s = store.OpenSession())
{
s.Store(blogOne);
s.Store(blogTwo);
s.Store(blogThree);
s.SaveChanges();
}

var results = server.Database.ExecuteDynamicQuery(new
Bundles.DynamicQueries.Data.DynamicQuery()
{
FieldMap =
"Title:title,Category:category,Title.Length:titleLength",
PageSize = 128,
Start = 0,
Query = "titleLength:3 AND category:Rhinos"
});

Assert.Equal(1, results.Results.Length);
Assert.Equal("two",
results.Results[0].Value<string>("Title"));
Assert.Equal("Rhinos",
results.Results[0].Value<string>("Category"));
}

I don't like that I'm having to pass through a map of the fields, but
the alternative is to parse a lucene query and perform a replace on
certain values. I'll leave it like that and I've exposed that
functionality as_is in the HTTP API, easy enough to change once I'm
done with doing more important things like the optimisation we were
talking about etc.

Ayende Rahien

unread,

Sep 23, 2010, 9:17:07 PM9/23/10

to ravendb

Is there a reason that the field map is a single string and not an array of them? It would seems more easy to do.

Rob Ashton

unread,

Sep 24, 2010, 3:03:23 AM9/24/10

to ravendb

Just because they came from the query string that way :), probably
should move the responsibility for parsing that up a level and no
doubt I will.

An important point about this POC is that I am going through the
normal system for creating the index and waiting for it to be indexed,
for performance I am guessing this is sub optimal - and I should take
care of that manually instead of relying on the usual background tasks
- just wanted to make sure I had all the information required for the
task before I went down that route.

I'll bash out a more complete attempt today and push to my fork so
work in progress can be seen

On Sep 24, 2:17 am, Ayende Rahien <aye...@ayende.com> wrote:
> Is there a reason that the field map is a single string and not an array of
> them? It would seems more easy to do.
>

Rob Ashton

unread,

Sep 24, 2010, 5:46:07 AM9/24/10

to ravendb

That's a point actually, let's discuss preferences for the HTTP API
while we're here

Currently it looks like this

/dynamicquery?query=name:ayende&mapping=User.Name:name

It's a bit lame, and I'd prefer not to be specifying what the mapping
is at all because I'd like to generate that myself when I generate the
map statement (means greater potential for the sharing of indexes
between similar queries and less responsibility for the clent)

I could do

/dynamicquery?query=User.Name:ayende

But that means parsing the Lucene query to extract the fields and
replace them with whatever I put in the map query. This is (In my
opinion) a challenge to do properly - I could use QueryParser from
Lucene.net but that means using the right analyser, and I'm not
confident it wouldn't be a brittle solution, doing it manually using a
rudimentary parser might work - but I can't find a formal grammer for
Lucene queries

A simpler alternative would be to do something like this

/dynamicquery?query={User.Name}:ayende

And parse for the braces, replacing them with whatever I choose to
name the mapped fields. This is my preference, but there is no
analogue for it in the rest of the API

Rob Ashton

unread,

Sep 24, 2010, 9:43:37 AM9/24/10

to ravendb

I've managed to fit a bit more work into my lunch hour, and now have a
more instant way of doing dynamic queries (create lucene index
manually, index the documents manually by paging through them, then
query, then delete the lucene index)

Code is here:

http://github.com/robashton/ravendb/blob/f910ff7d5e07b01e4c9b7edb20ccf0dd66da0dd3/Bundles/Raven.Bundles.DynamicQueries/Database/DatabaseExtensions.cs

My only concern (beyond anything I might be doing wrong that I don't
know about yet), is that I'm having to create an entry in the Esent DB
- the only reason I have to do this, is that when you run a query
against an index, statistics are generated against that index and an
error is thrown if the index doesn't exist in Esent.

I'd prefer not to be creating an entry storage just for this purpose,
as I need to persuade the background task not to perform any indexing
on this index if I do. If I could get by without creating the entry in
storage, then I wouldn't need to worry about background tasks - the
temporary index would truly be temporary and there would be no
concerns of clashing.

Nonetheless, this implementation does seem to work for small numbers
of documents (I need to put 30,000+ through and make sure it functions
as expected).

Next up I'll start thinking about creating permanent indexes and
choosing to use those etc.

Ayende Rahien

unread,

Sep 24, 2010, 12:06:59 PM9/24/10

to ravendb

Okay, I did a review, and I have a few comments.

a) I don't like the DynamicQueryMap, there really isn't any need to do this.

[Fact]

public void Parsing()

{

var query = new QueryParser(Version.LUCENE_29, "", new StandardAnalyzer()).Parse("Title.Length:5 Category:Users");

var terms = new Hashtable();

query.ExtractTerms(terms);

var fields = new HashSet<string>();

foreach (Term term in terms.Keys)

{

fields.Add(term.Field());

}

Remember, you don't actually care about the query at this stage, you only care about the field names, and this gives it to you, then you can construct the index def.

b) I would actually think that it would be better to create a temporary index, but NOT delete it. Rather, setup a timer to remove it after some amount of time. If there are enough requests for the index in that duration, make it permanent.

c) For that matter, I wouldn't actually wait for the entire index to be built. All we need is enough results to satisfy the PageSize, after all. So I would just query it until either the results are non stale or I have enough results for the page.

Rob Ashton

unread,

Sep 24, 2010, 12:24:10 PM9/24/10

to ravendb

Okay, if it's that easy to extract the terms then that's fine by me -
I'll do that - makes for a *much* better API

RE Creating a temporary index and not deleting it, I think you're
right, and that's the logical next step, I'll need to name the indexes
more sensibly but that's how I plan to do the optimisation.

If I'm doing that, then I don't mind it going into Esent, but I'd
still prefer to manually run the index through Lucene rather than
waiting for it to happen (if that makes sense). I need to prevent the
normal indexing process from occurring if I do that though won't I?

I'll carry on chugging through this when I get home tonight and get
the persistence/deletion thing going on - this is the right path I'm
going down though right?

> >http://github.com/robashton/ravendb/blob/f910ff7d5e07b01e4c9b7edb20cc...

Rob Ashton

unread,

Sep 24, 2010, 12:25:51 PM9/24/10

to ravendb

I'm not sure if you're right about page size, if they're asking for
any kind of ordering in their query then the index does need to be
built?

> >http://github.com/robashton/ravendb/blob/f910ff7d5e07b01e4c9b7edb20cc...

Rob Ashton

unread,

Sep 24, 2010, 12:37:51 PM9/24/10

to ravendb

Having said that, this is a possibly just a question of expectations:

Your expectation is that querying in this way is going to potentially
give you a hyper-stale result?
I was thinking that a call to a dynamic query would give you results
that were up to date at the point of call time.

I should perhaps be passing in a cut-off parameter like the other
querying methods, and letting the consumer of the service decide?

Rob Ashton

unread,

Sep 24, 2010, 12:45:35 PM9/24/10

to ravendb

Sorry for the spam, this is just how I think when I haven't got an IDE
in front of me to write code with :)

One more addition before I go home to start work on making this more
satisfactory, if I am indeed going to create the index and leave it
there, then perhaps I am better off doing what I did originally and
just waiting for indexing to happen - it is likely to be less
performant when the system is busy, but if this is on the
understanding that future calls to this dynamic query will be fast
because the index is already created then this isn't really a problem
is it?

Ayende Rahien

unread,

Sep 24, 2010, 2:04:35 PM9/24/10

to rav...@googlegroups.com

A few other thing to consider
Make it a RAM dir index would be faster
You can modify the index cos de to not require the stats although I
think we want that
Think about it from admin POV
I wanna know what is going on there even for temp indexes

Ayende Rahien

unread,

Sep 24, 2010, 2:05:36 PM9/24/10

to rav...@googlegroups.com

Hm
Not sure about that
Licene ordering requires specifying sort order how would you do that?

On Friday, September 24, 2010, Rob Ashton <roba...@codeofrob.com> wrote:

Ayende Rahien

unread,

Sep 24, 2010, 2:07:12 PM9/24/10

to rav...@googlegroups.com

Since we expect to reuse the query I think we can use the same logic
as elsewhere
Another point in favor of index stays in esent

On Friday, September 24, 2010, Rob Ashton <roba...@codeofrob.com> wrote:

Ayende Rahien

unread,

Sep 24, 2010, 2:08:38 PM9/24/10

to rav...@googlegroups.com

No just wait for the page to be full for the first time
The second time normal rules apply

On Friday, September 24, 2010, Rob Ashton <roba...@codeofrob.com> wrote:

Rob Ashton

unread,

Sep 24, 2010, 3:32:02 PM9/24/10

to ravendb

Right, so:

When first creating index, wait until either all docs have been
indexed or selected page is full (whatever comes first)
Index goes in Esent as per usual
I thought you specified sort order on point of query when performing
the lucene search (Don't need to play with the definition for this to
work do we?)

I'll make it so and then play with it, no point in dallying. Finally
home after car breaking on me, so I'm 2 hours behind where I wanted to
be!

Rob Ashton

unread,

Sep 24, 2010, 3:40:12 PM9/24/10

to ravendb

Super - your lucene query code is brilliant

[Fact]
public void CanPerformDynamicQueryAndGetValidResults()
{
var blogOne = new Blog
{
Title = "one",
Category = "Ravens"
};
var blogTwo = new Blog
{
Title = "two",
Category = "Rhinos"
};
var blogThree = new Blog
{
Title = "three",
Category = "Rhinos"
};

using (var s = store.OpenSession())
{
s.Store(blogOne);
s.Store(blogTwo);
s.Store(blogThree);
s.SaveChanges();
}

var results = server.Database.ExecuteDynamicQuery(new
Bundles.DynamicQueries.Data.DynamicQuery()
{

PageSize = 128,
Start = 0,

Query = "Title.Length:3 AND Category:Rhinos"

});

Assert.Equal(1, results.Results.Length);
Assert.Equal("two",
results.Results[0].Value<string>("Title"));
Assert.Equal("Rhinos",
results.Results[0].Value<string>("Category"));
}

<3

Ayende Rahien

unread,

Sep 24, 2010, 3:47:48 PM9/24/10

to ravendb

Great.

Now, give it a shot with a linq query, which should also work.

The next hurdle is complex queries, such as:

from user in docs.Users

from role in user.Roles

select new { Role = role }

I am not sure if we want / need to support such things, though.

Thoughts?

Rob Ashton

unread,

Sep 24, 2010, 4:01:29 PM9/24/10

to ravendb

I think that's a different challenge entirely and perhaps something we
don't want to support unless it actually becomes useful to do so.

A major problem there would be how to even represent that, as we're
currently passing in an actual lucene query and reverse engineering a
linq statement from it

Ayende Rahien

unread,

Sep 24, 2010, 4:21:09 PM9/24/10

to rav...@googlegroups.com

Yeah, that is pretty much what I am thinking.

We can probably do better, though.

Assume that we had the following Lucene query on the User.Roles model.

Roles:Administrator

We could set things up that trying to index an IEnumerable would result in multiple fields being emit for Lucene.

The linq query would be:

select user from docs.Users

select new { Roles = user.Roles }

But the query should work.

Rob Ashton

unread,

Sep 24, 2010, 4:24:18 PM9/24/10

to ravendb

Okay, well I'm game to give it a go if I've still got time once I've
refined what I've got and sorted out a client API for it
Not touching work code all weekend and with my car in the garage I'm
probably not going anywhere :(

On Sep 24, 9:21 pm, Ayende Rahien <aye...@ayende.com> wrote:
> Yeah, that is pretty much what I am thinking.
> We can probably do better, though.
>
> Assume that we had the following Lucene query on the User.Roles model.
>
> Roles:Administrator
>
> We could set things up that trying to index an IEnumerable would result in
> multiple fields being emit for Lucene.
>
> The linq query would be:
>
> select user from docs.Users
> select new { Roles = user.Roles }
>
> But the query should work.
>

Ayende Rahien

unread,

Sep 24, 2010, 5:03:00 PM9/24/10

to rav...@googlegroups.com

Cool, let me know when you push next

Rob Ashton

unread,

Sep 24, 2010, 6:26:08 PM9/24/10

to ravendb

Flying through this now, a query for anytime between now and the final
push

Generating the index name, in an ideal world it would tell us what was
in the index, but how sustainable is this?

String combinedFields = String.Join("",
map.Items
.OrderBy(x => x.To)
.Select(x=>x.To)
.ToArray());

Am I going to have to do a hash of those fields to get the index name?
I assume the maximum index name length is going to be hit sooner or
later by somebody

Ayende Rahien

unread,

Sep 25, 2010, 2:46:00 AM9/25/10

to rav...@googlegroups.com

The max field name is something like 255 characters.

I would say that you want a hash IIF the name is longer than that, otherwise, use human readable string.

Rob Ashton

unread,

Sep 25, 2010, 5:54:56 AM9/25/10

to ravendb

Righto.

A further query now I'm getting properly into the client side work - I
started this as a bundle because it seemed to me putting it in the
core build might encourage people to use it more than they perhaps
should - but is this the right decision?

I'm going to have a hard time carrying on doing this as a bundle if
I'm going to add the client API for this, this is because there are no
easy extension points on IDatabaseCommands for adding further
functionality - because of course the underlying implementation may
change depending on whether you're pointing it the http/direct-access
(or TCP?)

I'm happy to crack it open and try to add the relevant extension
points, but this would represent another non-trivial task so I'm not
willing to do it unless we decide we *really* want this as a bundle.
For now I'm writing the code directly against the client, in the
knowledge that refactoring it out will be easier once I know exactly
what it needs.

I am beginning to lead towards not having it as a bundle, as the way
we're now implementing it means it is going to actually be a useful
piece of functionality for lowering the entry barrier of just 'picking
up and playing' with RavenDB

Rob Ashton

unread,

Sep 25, 2010, 6:37:16 AM9/25/10

to ravendb

We could even go so far as to carry on using the Indexes endpoint, and
accept "Temp" as an index name, but that might be going too far

Ayende Rahien

unread,

Sep 25, 2010, 7:06:06 AM9/25/10

to rav...@googlegroups.com

I agree that this would be useful as a core piece of functionality.

We can probably do it in the same way replication is in, as a core piece of the client, which is only available if the bundle is installed on the server

Ayende Rahien

unread,

Sep 25, 2010, 7:06:30 AM9/25/10

to rav...@googlegroups.com

I actually like that better.

It makes the HTTP API simpler.

Even if we put it in the core, I like /indexes/dynamic?query=....

Rob Ashton

unread,

Sep 25, 2010, 7:17:06 AM9/25/10

to ravendb

Aha, then I'll get what I'm working on finished and then move it into
the core along with the tests (both server and client)

It really does keep the HTTP API simple - it also means this
automatically gets supports for Includes and whatever you choose to
add to index queries in the future, turns out that once I'd finished
experimenting with the different ways of supporting this that this was
the natural conclusion of those efforts.

Hopefully that should be the last "dumb question" until I get the full
experience finished

Rob Ashton

unread,

Sep 25, 2010, 9:39:20 AM9/25/10

to ravendb

It's quite embarrassing how long it's taken me to come to such a
simple solution, but oh well - pushed to my fork is functionality
like:

using (var s = store.OpenSession())
{

var results = s.DynamicQuery<Blog>()
.Customize(x =>
x.WaitForNonStaleResultsAsOfNow())
.Where(x => x.Category == "Rhinos" &&
x.Title.Length == 3)
.ToArray();

Assert.Equal(1, results.Length);
Assert.Equal("two", results[0].Title);
Assert.Equal("Rhinos", results[0].Category);
}

and

using (var s = store.OpenSession())
{

var results = s.DynamicLuceneQuery<Blog>()
.Where("Title.Length:3 AND Category:Rhinos")
.WaitForNonStaleResultsAsOfNow().ToArray();

Assert.Equal(1, results.Length);
Assert.Equal("two", results[0].Title);
Assert.Equal("Rhinos", results[0].Category);
}

I wouldn't class it as "done", trivial tidy up is needed (moving the
magic string "dynamic" to a constant somewhere - things like that)

A few of things to be aware of, and something I'm now going to devote
my afternoon to:

1) I had to modify the Linq provider to start using the full path to
the member instead of the name of the member, I don't think this will
break non-dynamic queries, but it's necessary for dynamic queries to
work (this is in RavenQueryProviderProcessor.GetMember)

2) I've had to do a hack where I remove [[ and ]] from the query when
running StandardAnalyzer over it to extract the terms as it doesn't
like them. It probably doesn't like other things either, and this was
my worry over parsing the Lucene query - I guess fix the issues as I
come across them and start thinking about using a better analyzer

3) I'm currently setting fields to NotAnalyzed by default which makes
my test queries function correctly, this is obviously not desired, but
reflects the assumption by the Linq provider that if it's anything
other than a string it is analyzed, and if it's a string it's not
analyzed

4) I've not yet added the code to delete obsolete indexes

5) I've not yet added the code to hash the index name if it's greater
than a certain number of characters

I intend on getting #4 and #5 sorted immediately

#2 and #3 I'm going to mull over, and by writing some tests against
dynamic queries for common use cases I aim to find the common pitfalls
and from that establish some more intelligent code from which to
generate more appropriate indexes.

That will lead nicely onto collection types (multiple froms), because
without a more intelligent solution to #3 that's not going to be
possible

Ayende Rahien

unread,

Sep 25, 2010, 10:31:59 AM9/25/10

to rav...@googlegroups.com

1) Hm, what do you mean, what is the difference?

2) Probably need to run it through the same process as standard queries go through, which would give the same basic result.

3) Note that we intend to move in that direction anyway (default for NotAnalyzed)

Rob Ashton

unread,

Sep 25, 2010, 10:58:01 AM9/25/10

to ravendb

1)

If you just use MemberInfo on an expression like

x=> x.Title.Length == 3

The query generated looks like

Length:3

Which is fine for pre-built indexes because that's what the mapped
field will no doubt be called

What you actually want is

Title.Length:3

2) I'll have a look at that and see if I can't do that
3) Okay, that's re-assurance at least

Ayende Rahien

unread,

Sep 25, 2010, 11:31:09 AM9/25/10

to rav...@googlegroups.com

1 would probably will break stuff for us

Rob Ashton

unread,

Sep 25, 2010, 11:55:28 AM9/25/10

to ravendb

I've tried to think of things it *would* break, but to no avail (If it
truly does, then I need to split out some of the functionality from
the linq provider and create two modes)

Because we previously have only allowed queries against indexes, the
structure of indexes has been flat (IE, no complex properties)

public class MySuperCoolIndex
{
public string Title
{
get;
set;
}

public string TitleLength
{
get;
set;
}
}

The behaviour of this won't have modified with this change, unless
I've missed something since I last wrote code against RavenDB - have
you got an example?

Ayende Rahien

unread,

Sep 25, 2010, 12:04:56 PM9/25/10

to rav...@googlegroups.com

Hm, you are correct, please ignore my previous comment.

Rob Ashton

unread,

Sep 25, 2010, 12:13:01 PM9/25/10

to ravendb

In other news, I think I've over promised - I'm not exactly sure
*where* to add the logic for "if this dynamic index hasn't been
invoked in the past <insert time here> then delete it"

Running it as an occasional task is fine, I've no problem with that -
but as far as I can see we don't log/mark when an index was last
invoked - in order to get accurate behaviour after shutdown/restart
we'd surely need to store information like this in our persistent
storage (Esent?).

Unless it's obvious and I'm just being dumb can I leave this one to
you once I've tidied up the rest of my code?

Ayende Rahien

unread,

Sep 25, 2010, 12:49:24 PM9/25/10

to rav...@googlegroups.com

1) on startup, delete all temp indexes

2) keep the last queried timestamp in memory

3) when the # of queries per timespan increases over X, create real index and delete temp one.

4) every Y time, cleanup old indexes.

Rob Ashton

unread,

Sep 25, 2010, 12:58:47 PM9/25/10

to ravendb

Well, I've fixed the query analysis to use the same functionality as
the rest of RavenDB so I've been able to remove that hack ( #2 ) and
I've started hashing indexes when they're longer than 240 characters
(leaving space for a prefix "Temp_" so we can identify them.

I've also sorted out my negligence and added the ability to query
dynamic indexes to a local document store

I'm going to submit that pull request and leave it there for now,
although once you've done what you need to do I'll no doubt attack it
with a few more examples and start thinking about more complicated
queries.

That just leaves #4 and whatever tidy ups you want to do

Rob Ashton

unread,

Sep 25, 2010, 12:59:52 PM9/25/10

to ravendb

Oh, this will teach me for not hitting refresh - I've got time to do
that before I go

Rob Ashton

unread,

Sep 25, 2010, 1:00:31 PM9/25/10

to ravendb

(That was a bit different from how I was seeing it working, but
definitely up to task)

Rob Ashton

unread,

Sep 25, 2010, 2:27:56 PM9/25/10

to ravendb

Done (I think), and now I'm off to party

Currently set to 100 times in 10 minutes (or an average similar to
that since time of start), maybe needs tweaking but at least the logic
is started

Rob Ashton

unread,

Sep 25, 2010, 3:04:27 PM9/25/10

to ravendb

Just realised I've forgotten to push my final work in, you're missing
periodic cleanup.

Can't touch that till I get home now, doh!

Rob Ashton

unread,

Sep 26, 2010, 10:26:47 AM9/26/10

to ravendb

I did this last night by the way, it's in to a reasonable standard -
spending some time thinking about more advanced queries now,
@ScottGal's expectation of

from doc in docs
from tag in doc.Tags
select new
{
Name = tag.Name
}

being accessible via

Where(x=> x.Tags.Any(y=> y.StartsWIth("Fi")))

Has got me thinking about how this would work with both real indexes
and dynamic indexes - probably thought for a bigger discussion that
really, it's another "how do we map Linq onto our Lucene indexes"
problem

Rob Ashton

unread,

Sep 26, 2010, 10:41:24 AM9/26/10

to ravendb

My current thoughts are

In a dynamic query, that would generate the following path

Tag.Name

In a plain old index query, that would generate the following path

Name

(IE, going to need two separate linq providers, although largely
stemming from the same code)

For dynamic queries on the server, when a 'dotted' expression is found
when doing dynamic query parsing, if the right hand side of the dot is
anything other than 'length' or other special terms, a nested select
will be generated in the dynamic query.
For generate index queries on the server, it would assume you had
created an index with a nested select with that property

The alternative is to pass Tag.Name to the server in both instances,
and do some query parsing for both dynamic queries and index queries

Ayende Rahien

unread,

Sep 26, 2010, 9:07:43 PM9/26/10

to rav...@googlegroups.com

Where is the additional select feature is implemented?

Ayende Rahien

unread,

Sep 26, 2010, 9:54:53 PM9/26/10

to rav...@googlegroups.com

Pulled, it is in build 165

Rob Ashton

unread,

Sep 27, 2010, 4:26:08 AM9/27/10

to ravendb

You mean the functionality I've added Any for? I was holding back on
that as part of the greater Linq discussion - should be relatively
trivial to add though, I'll give it a go today

Rob Ashton

unread,

Sep 27, 2010, 4:27:56 AM9/27/10

to ravendb

I'll write documentation for what is there first though

On Sep 27, 2:54 am, Ayende Rahien <aye...@ayende.com> wrote:
> Pulled, it is in build 165
>
>
>
> On Mon, Sep 27, 2010 at 3:07 AM, Ayende Rahien <aye...@ayende.com> wrote:
> > Where is the additional select feature is implemented?
>

Ayende Rahien

unread,

Sep 27, 2010, 4:35:28 AM9/27/10

to rav...@googlegroups.com

Awesome!

Rob Ashton

unread,

Sep 27, 2010, 8:18:50 AM9/27/10

to ravendb

Written some documentation and realised that we don't yet support
range queries (which makes support for .Length completely redundant!)

You can't call ExtractTerms on a query that has a range query from the
looks of things.

I'll pop the documentation up that I have so far and spend a couple of
hours this afternoon adding decent support for more complicated
queries to the dynamic indexes stuff.

Rob Ashton

unread,

Sep 27, 2010, 8:25:23 AM9/27/10

to ravendb

HTTP API Docs http://ravendb.net/documentation/http-indexes-dynamic

Ayende Rahien

unread,

Sep 27, 2010, 9:45:29 AM9/27/10

to rav...@googlegroups.com

I actually run into that issue too. That would be more complex to resolve, but I think that we can extend Lucene.NET and offer that as a patch

Rob Ashton

unread,

Sep 27, 2010, 10:15:40 AM9/27/10

to ravendb

From what I see this behaviour is carried over from the original
Lucene project, queries like that have to be broken down by a call to
Query.Rewrite before you can extract the terms.

Lucene.Net.Highlighter has functionality to extract the terms which I
assume uses the same functionality under the hood as it has the same
limitations

http://lucene.apache.org/lucene.net/docs/2.0/Highlighter.Net/Lucene.Net.Highlight.QueryTermExtractor.html

I'm not sure how we could submit a patch to modify that behaviour,
instead we'd have to write something in parallel to that, specifically
just to get the names of the terms out, giving no creedance to a lot
of the nuances present.

The problem with doing that is that the Lucene syntax is big and we
wouldn't want to risk missing functionality out and stimmying the
ability to use the full power of Lucene.

I'm thinking about it, this is something that cannot go without some
sort of solution

Ayende Rahien

unread,

Sep 27, 2010, 10:18:47 AM9/27/10

to rav...@googlegroups.com

What about if we call Rewrite instead and use that, that seems to be guaranteed to work.

Ayende Rahien

unread,

Sep 27, 2010, 10:19:33 AM9/27/10

to rav...@googlegroups.com

*snort*

Or we just use this regex:

([^\s][\w._])\:

Rob Ashton

unread,

Sep 27, 2010, 10:30:06 AM9/27/10

to ravendb

I was afraid you'd say that (See, I raised this 2 pages ago!)

I'll try that out and hopefully there won't be a million issues we
haven't foreseen with it :)

Ayende Rahien

unread,

Sep 27, 2010, 10:33:35 AM9/27/10

to rav...@googlegroups.com

It is actually quite likely to just work. The Lucene query syntax is actually very simple, overall, and we want just something very specific, after all.

Rob Ashton

unread,

Sep 27, 2010, 10:35:50 AM9/27/10

to ravendb

:)

I think you know the Lucene syntax better than I do, I'd feel more
comfortable if I had a full grammar in front of me.

Ayende Rahien

unread,

Sep 27, 2010, 10:47:04 AM9/27/10

to rav...@googlegroups.com

http://lucene.apache.org/java/2_2_0/api/org/apache/lucene/queryParser/precedence/PrecedenceQueryParser.html

Rob Ashton

unread,

Sep 27, 2010, 10:47:35 AM9/27/10

to ravendb

I just realised you've already got something similar in the
QueryBuilder class (doh)- and that gets the match expressions and
processes them too.

I might use that actually, I could do with seeing what is on the right
hand side of the expressions and making decisions about whether to
analyze the fields?

Rob Ashton

unread,

Sep 27, 2010, 10:48:18 AM9/27/10

to ravendb

Okay fine, I'll accept your RegexFu as something that'll work - no
wonder I couldn't find the grammar, it's tiny!

On Sep 27, 3:47 pm, Ayende Rahien <aye...@ayende.com> wrote:
> http://lucene.apache.org/java/2_2_0/api/org/apache/lucene/queryParser...

Rob Ashton

unread,

Sep 27, 2010, 11:08:37 AM9/27/10

to ravendb

I've modified the Regex a little bit to cater for the following, look
okay?

[Fact]
public void CanExtractTermsFromRangedQuery()
{
var mapping = Data.DynamicQueryMapping.Create("Term:[0 TO
10]");
Assert.Equal("Term", mapping.Items[0].From);
}

[Fact]
public void CanExtractTermsFromEqualityQuery()
{
var mapping =
Data.DynamicQueryMapping.Create("Term:Whatever");
Assert.Equal("Term", mapping.Items[0].From);
}

[Fact]
public void CanExtractMultipleTermsQuery()
{
var mapping =
Data.DynamicQueryMapping.Create("Term:Whatever OR Term2:[0 TO 10]");

Assert.Equal(2, mapping.Items.Length);
Assert.True(mapping.Items.Any(x => x.From == "Term"));
Assert.True(mapping.Items.Any(x => x.From ==
"Term2"));
}

[Fact]
public void CanExtractTermsFromComplexQuery()
{
var mapping = Data.DynamicQueryMapping.Create("+(Term:bar
Term2:baz) +Term3:foo -Term4:rob");
Assert.Equal(4, mapping.Items.Length);
Assert.True(mapping.Items.Any(x => x.From == "Term"));
Assert.True(mapping.Items.Any(x => x.From == "Term2"));
Assert.True(mapping.Items.Any(x => x.From == "Term3"));
Assert.True(mapping.Items.Any(x => x.From == "Term4"));
}

[Fact]
public void CanExtractMultipleNestedTermsQuery()
{
var mapping =
Data.DynamicQueryMapping.Create("Term:Whatever OR (Term2:Whatever AND
Term3:Whatever)");
Assert.Equal(3, mapping.Items.Length);
Assert.True(mapping.Items.Any(x => x.From == "Term"));
Assert.True(mapping.Items.Any(x => x.From == "Term2"));
Assert.True(mapping.Items.Any(x => x.From == "Term3"));
}

Ayende Rahien

unread,

Sep 27, 2010, 11:21:46 AM9/27/10

to rav...@googlegroups.com

Sure!

Rob Ashton

unread,

Sep 27, 2010, 11:28:21 AM9/27/10

to ravendb

Okay, I'm feeling good about this now, I'll get range queries working
and then I'll do multiple froms and it'll all be gravy :D

Rob Ashton

unread,

Sep 27, 2010, 11:57:27 AM9/27/10

to ravendb

Pushed this and range support to my fork

Rob Ashton

unread,

Sep 27, 2010, 12:43:45 PM9/27/10

to ravendb

Will need to do something about this

class Blog
{
User user { get; set;}
Tag[] tags { get; set;}
}

User.Name:rob

generates an index

from doc in docs
select new { UserName = doc.User.Name }

Tags.Name:elephant

from doc in docs
from tag in doc.Tags

select new { tagName = tag.Name }

I'm thinking

Tags[]Name:elephant

The Linq client will need to generate that from an Any clause (not a
problem)

I don't think it's possible at the server side of things to tell the
difference between

Tags.Name
User.Name

As we don't have the documents to hand, we could do it on convention
(look for properties with plurals), but almost guaranteed that would
cause more support issues than it's worth

Have gone down the [] route for now, easily changed once debate has
been had

Rob Ashton

unread,

Sep 27, 2010, 1:51:06 PM9/27/10

to ravendb

Am back from work so I've started work on this, this is where I am at

[Fact]
public void NestedCollectionPropertiesCanBeQueried()
{
var blogOne = new Blog
{
Title = "one",
Category = "Ravens",
Tags = new Tag[]{
new Tag(){ Name = "birds" }
},
};
var blogTwo = new Blog
{
Title = "two",
Category = "Rhinos",
Tags = new Tag[]{
new Tag(){ Name = "mammals" }
},
};
var blogThree = new Blog
{
Title = "three",
Category = "Rhinos",
Tags = new Tag[]{
new Tag(){ Name = "mammals" }
},
};

db.Put("blogOne", null, JObject.FromObject(blogOne), new
JObject(), null);
db.Put("blogTwo", null, JObject.FromObject(blogTwo), new
JObject(), null);
db.Put("blogThree", null, JObject.FromObject(blogThree),
new JObject(), null);

var results = db.ExecuteDynamicQuery(new IndexQuery()
{
PageSize = 128,
Start = 0,
Cutoff = DateTime.Now,
Query = "Tags,Name:birds"
});

Assert.Equal(1, results.Results.Count);
Assert.Equal("one",
results.Results[0].Value<string>("Title"));
Assert.Equal("Ravens",
results.Results[0].Value<string>("Category"));
}

[Fact]
public void NestedPropertiesCanBeQueried()
{
var blogOne = new Blog
{
Title = "one",
Category = "Ravens",
User = new User(){ Name = "ayende" }
};
var blogTwo = new Blog
{
Title = "two",
Category = "Rhinos",
User = new User() { Name = "ayende" }
};
var blogThree = new Blog
{
Title = "three",
Category = "Rhinos",
User = new User() { Name = "rob" }
};

db.Put("blogOne", null, JObject.FromObject(blogOne), new
JObject(), null);
db.Put("blogTwo", null, JObject.FromObject(blogTwo), new
JObject(), null);
db.Put("blogThree", null, JObject.FromObject(blogThree),
new JObject(), null);

var results = db.ExecuteDynamicQuery(new IndexQuery()
{
PageSize = 128,
Start = 0,
Cutoff = DateTime.Now,
Query = "User.Name:rob"
});

Assert.Equal(1, results.Results.Count);
Assert.Equal("three",
results.Results[0].Value<string>("Title"));
Assert.Equal("Rhinos",
results.Results[0].Value<string>("Category"));
}

So that's the server side working great with all those froms :)

I'll have a look at the client next, that'll be interesting as I think
I'm going to have to make a load of the expression parser methods
protected virtual and override a few of them to generate the full
paths to properties rather than just the property names (Any is my
main candidate for this)

Rob Ashton

unread,

Sep 27, 2010, 1:55:31 PM9/27/10

to ravendb

Short form of this is the query

Tags,Name:Fish

will generate

from doc in docs
from docTagsItem in doc.Tags
select new
{
docTagsItemName = docTagsItem.Name
}

and

User.Name:Fish

from doc in docs
select new
{

Name = doc.Name
}

Theoretically this will also work

User.Tags,Name:Fish

from doc in docs
from docUserTagsItem in doc.User.Tags
select new
{
docUserTagsItemName = docUserTagsItem.Name
}

Rob Ashton

unread,

Sep 27, 2010, 5:03:49 PM9/27/10

to ravendb

Okay, apologies in advance for the code drop I'm pushing in

Without making some really big changes, or moving a lot of code around
I found it hard to make the changes I wanted in a 'good' way in the
Client API.

What I've done is very java-esque (inheritance and virtual methods).
As future changes to the providers are made hopefully a way will
become clear of doing this better.

The good news is

var results = s.DynamicQuery<Blog>()
.Where(x => x.Tags.Any(y=>y.Name == "Birds"))
.ToArray();

Just_Works

as does

var results = s.DynamicQuery<Blog>()
.Where(x => x.User.Name == "Rob")
.ToArray();

Woo!

Rob Ashton

unread,

Sep 27, 2010, 5:22:53 PM9/27/10

to ravendb

I think the next step will be to analyze the RHS of the terms and work
out if they are exact matches or partial matches and set to Analyzed
or NotAnalyzed accordingly?

I'll leave that for now, I want to write a few blog posts and use the
functionality so far to demonstrate the ease of "starting out with
ravendb"

Reply all

Reply to author

Forward