Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
Is OrderBy applied before filtering (WhereGreaterThanOrEqual) in LuceneQuery?
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  24 messages - Collapse all  -  Translate all to Translated (View all originals)
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
Tobias Sebring  
View profile  
 More options Aug 14 2012, 6:31 am
From: Tobias Sebring <tsebr...@gmail.com>
Date: Tue, 14 Aug 2012 03:31:39 -0700 (PDT)
Local: Tues, Aug 14 2012 6:31 am
Subject: Is OrderBy applied before filtering (WhereGreaterThanOrEqual) in LuceneQuery?

I'm seeing some strange results when adding OrderBy operands to my queries
on a fairly large dataset (2.3M documents).

In the following queries FilesCount_Range:[0x00000001 TO NULL] will limit
the dataset to 2600 out of the 2.3M documents in the full dataset.

The below query will return almost immediately:
query=FilesCount_Range:[0x00000001 TO NULL]
start=0
pageSize=25
aggregation=None
fetch=Id
fetch=Files

This query on the other hand takes 50 seconds to return:
query=FilesCount_Range:[0x00000001 TO NULL]
start=0
pageSize=25
aggregation=None
fetch=Id
fetch=Files
sort=Text
sort=Year
sort=ProjectId
sort=CaseId

This is taken from the mvc-profiler:
"49308 ms waiting for server in 1 request(s) for 1 sessions(s)Session
opened for 50025.86 ms for 1 request(s) - Data/Index"

Subsequent calls with the same orderby are cached and return immediately.

The index is set up like so:
public class Data_Index : AbstractIndexCreationTask<Data, Data_
Index.Result>
{
public class Result
{
public string Text { get; set; }
public int Year { get; set; }
public int? ProjectId { get; set; }
public int? CaseId { get; set; }
public int FilesCount { get; set; }
public string[] Files { get; set; }

}

public Data_ Index()
{
Map = data => from d in data
select new
{
Text = d.Text,
Year = d.Year,
ProjectId = d.ProjectId,
CaseId = d.CaseId,
FilesCount = d.Files.Count,
Files = d.Files.Select(x => x.Path).ToArray()

};

Index(x => x.Text, FieldIndexing.NotAnalyzed);
Index(x => x.Year, FieldIndexing.Default);
Index(x => x.ProjectId, FieldIndexing.Default);
Index(x => x.CaseId, FieldIndexing.Default);
Index(x => x.FilesCount, FieldIndexing.Default);
Index(x => x.Files, FieldIndexing.Default);

Sort(x => x.Text, SortOptions.String);
Sort(x => x.Year, SortOptions.Int);
Sort(x => x.ProjectId, SortOptions.Int);
Sort(x => x.CaseId, SortOptions.Int);
Sort(x => x.FilesCount, SortOptions.Int);

}
}

This is on the latest build of RavenDb running as a console application.
I've tried to reproduce this issue on a embedded server running in memory
but to no avail.

I'm considering denormalizing my data even further by extracting all
documents with files to another collection but hesitant because I figure
these queries should work without this much overhead.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Tobias Sebring  
View profile  
 More options Aug 15 2012, 5:36 am
From: Tobias Sebring <tsebr...@gmail.com>
Date: Wed, 15 Aug 2012 02:36:52 -0700 (PDT)
Local: Wed, Aug 15 2012 5:36 am
Subject: Re: Is OrderBy applied before filtering (WhereGreaterThanOrEqual) in LuceneQuery?

Any clue as to why this is so slow? I have still not been able to figure
this out.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Oren Eini (Ayende Rahien)  
View profile  
 More options Aug 15 2012, 5:53 am
From: "Oren Eini (Ayende Rahien)" <aye...@ayende.com>
Date: Wed, 15 Aug 2012 12:53:53 +0300
Local: Wed, Aug 15 2012 5:53 am
Subject: Re: [RavenDB] Re: Is OrderBy applied before filtering (WhereGreaterThanOrEqual) in LuceneQuery?

How many items match the result?


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Tobias Sebring  
View profile  
 More options Aug 15 2012, 6:25 am
From: Tobias Sebring <tsebr...@gmail.com>
Date: Wed, 15 Aug 2012 03:25:22 -0700 (PDT)
Local: Wed, Aug 15 2012 6:25 am
Subject: Re: [RavenDB] Re: Is OrderBy applied before filtering (WhereGreaterThanOrEqual) in LuceneQuery?

*"TotalResults": 2601,
"SkippedResults": 0*


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Oren Eini (Ayende Rahien)  
View profile  
 More options Aug 15 2012, 7:19 am
From: "Oren Eini (Ayende Rahien)" <aye...@ayende.com>
Date: Wed, 15 Aug 2012 14:19:35 +0300
Local: Wed, Aug 15 2012 7:19 am
Subject: Re: [RavenDB] Re: Is OrderBy applied before filtering (WhereGreaterThanOrEqual) in LuceneQuery?

Well, we do have to do extra work on your case, loading a lot more info to
memory to be able to sort on this.
If this happens on a cold start, it may take a while.
Can you try a different query (so it won't be cached), and see what happens
over time?


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Tobias Sebring  
View profile  
 More options Aug 15 2012, 7:45 am
From: Tobias Sebring <tsebr...@gmail.com>
Date: Wed, 15 Aug 2012 04:45:20 -0700 (PDT)
Local: Wed, Aug 15 2012 7:45 am
Subject: Re: [RavenDB] Re: Is OrderBy applied before filtering (WhereGreaterThanOrEqual) in LuceneQuery?

Is 50 seconds reasonable for a cold start with the size of the data set I'm
dealing with? If I remove the OrderBy on the text field the query will sort
on Year, ProjectId and CaseId and return within 16 seconds which also seems
slow for loading 2601 documents and doing an in-memory sort. A subsequent
and modified query (e.g. adding another field to the query) will complete
pretty fast. After a delay of inactivity the query will once again take 50
seconds to complete.

Considering these factors I'm guessing the sorting is done on the full
document collection and then cached in memory? If so is there a way
to increase the lifetime of this cache and/or possibly not sort on the full
collection but on the documents returned from the query?


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Oren Eini (Ayende Rahien)  
View profile  
 More options Aug 15 2012, 7:54 am
From: "Oren Eini (Ayende Rahien)" <aye...@ayende.com>
Date: Wed, 15 Aug 2012 14:54:22 +0300
Local: Wed, Aug 15 2012 7:54 am
Subject: Re: [RavenDB] Re: Is OrderBy applied before filtering (WhereGreaterThanOrEqual) in LuceneQuery?

Tobias,
What is the size of the text field?

We do a lot of complex caching there, and depending on the actual index
size (can you check), loading through it might take a while.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Tobias Sebring  
View profile  
 More options Aug 15 2012, 7:59 am
From: Tobias Sebring <tsebr...@gmail.com>
Date: Wed, 15 Aug 2012 04:59:08 -0700 (PDT)
Local: Wed, Aug 15 2012 7:59 am
Subject: Re: [RavenDB] Re: Is OrderBy applied before filtering (WhereGreaterThanOrEqual) in LuceneQuery?

25-50 chars on average but in rare cases upwards of 100 chars.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Oren Eini (Ayende Rahien)  
View profile  
 More options Aug 15 2012, 7:59 am
From: "Oren Eini (Ayende Rahien)" <aye...@ayende.com>
Date: Wed, 15 Aug 2012 14:59:50 +0300
Local: Wed, Aug 15 2012 7:59 am
Subject: Re: [RavenDB] Re: Is OrderBy applied before filtering (WhereGreaterThanOrEqual) in LuceneQuery?

Strange.
At any rate, you need to warm up the index, make sure that a lot of it is
in memory before you can really make perf comparisons. That is especially
true with large indexes.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Tobias Sebring  
View profile  
 More options Aug 15 2012, 8:05 am
From: Tobias Sebring <tsebr...@gmail.com>
Date: Wed, 15 Aug 2012 05:05:36 -0700 (PDT)
Local: Wed, Aug 15 2012 8:05 am
Subject: Re: [RavenDB] Re: Is OrderBy applied before filtering (WhereGreaterThanOrEqual) in LuceneQuery?

What are my options for doing so and/or to make sure it stays in memory?
The database is around 4-5GB and I don't expect it to become much larger
than that. If I knew of a way to do it I could potentially pre-load the
entire database into memory but it would still need to be persisted to disk.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Oren Eini (Ayende Rahien)  
View profile  
 More options Aug 15 2012, 8:15 am
From: "Oren Eini (Ayende Rahien)" <aye...@ayende.com>
Date: Wed, 15 Aug 2012 15:15:13 +0300
Local: Wed, Aug 15 2012 8:15 am
Subject: Re: [RavenDB] Re: Is OrderBy applied before filtering (WhereGreaterThanOrEqual) in LuceneQuery?

Tobias,
First, let us check some things.
Can you check the actual size of the index?


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Tobias Sebring  
View profile  
 More options Aug 15 2012, 8:45 am
From: Tobias Sebring <tsebr...@gmail.com>
Date: Wed, 15 Aug 2012 05:45:34 -0700 (PDT)
Local: Wed, Aug 15 2012 8:45 am
Subject: Re: [RavenDB] Re: Is OrderBy applied before filtering (WhereGreaterThanOrEqual) in LuceneQuery?

I will as soon as I can later today. Right now I don't have access to the
database computer and rebuilding the index takes ~4 hours.

We are talking about the physical size of the index on disk right?


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Oren Eini (Ayende Rahien)  
View profile  
 More options Aug 15 2012, 8:49 am
From: "Oren Eini (Ayende Rahien)" <aye...@ayende.com>
Date: Wed, 15 Aug 2012 15:49:08 +0300
Local: Wed, Aug 15 2012 8:49 am
Subject: Re: [RavenDB] Re: Is OrderBy applied before filtering (WhereGreaterThanOrEqual) in LuceneQuery?

YEד


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Tobias Sebring  
View profile  
 More options Aug 15 2012, 9:53 am
From: Tobias Sebring <tsebr...@gmail.com>
Date: Wed, 15 Aug 2012 06:53:06 -0700 (PDT)
Local: Wed, Aug 15 2012 9:53 am
Subject: Re: [RavenDB] Re: Is OrderBy applied before filtering (WhereGreaterThanOrEqual) in LuceneQuery?

The index is 759mb.
Data file is 3,63 GB.

Indexing Attempts: 2,296,704
Indexing Successes: 2,296,704


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Oren Eini (Ayende Rahien)  
View profile  
 More options Aug 15 2012, 9:55 am
From: "Oren Eini (Ayende Rahien)" <aye...@ayende.com>
Date: Wed, 15 Aug 2012 16:55:09 +0300
Local: Wed, Aug 15 2012 9:55 am
Subject: Re: [RavenDB] Re: Is OrderBy applied before filtering (WhereGreaterThanOrEqual) in LuceneQuery?

Hm,
Big enough that we will take several queries to load significant portions
of it to memory.
In short, as long as the site is serving queries, you'll see good perf.
If you have long period of inactivity, we will drop all resources.

Do you expect long periods of inactivity?


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Tobias Sebring  
View profile  
 More options Aug 15 2012, 10:01 am
From: Tobias Sebring <tsebr...@gmail.com>
Date: Wed, 15 Aug 2012 07:01:40 -0700 (PDT)
Local: Wed, Aug 15 2012 10:01 am
Subject: Re: [RavenDB] Re: Is OrderBy applied before filtering (WhereGreaterThanOrEqual) in LuceneQuery?

Yes, the site will be inactive for most hours out of the day and will only
see sporadic use that will not be concentrated in any specific period of
time. I expect the resources will have been dropped almost every time a
user accesses the site.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Oren Eini (Ayende Rahien)  
View profile   Translate to Translated (View Original)
 More options Aug 16 2012, 12:02 pm
From: "Oren Eini (Ayende Rahien)" <aye...@ayende.com>
Date: Thu, 16 Aug 2012 19:02:33 +0300
Local: Thurs, Aug 16 2012 12:02 pm
Subject: Re: [RavenDB] Re: Is OrderBy applied before filtering (WhereGreaterThanOrEqual) in LuceneQuery?

How are you running this?
In a service? In IIS?

I assume that you are running in a child db mode, you need to set:
Raven/Tenants/MaxIdleTimeForTenantDatabase

To a high value.

If you are running in IIS, you need to set IIS idle settings to zero and
disable recycling.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Tobias Sebring  
View profile   Translate to Translated (View Original)
 More options Aug 16 2012, 12:24 pm
From: Tobias Sebring <tsebr...@gmail.com>
Date: Thu, 16 Aug 2012 09:24:38 -0700 (PDT)
Local: Thurs, Aug 16 2012 12:24 pm
Subject: Re: [RavenDB] Re: Is OrderBy applied before filtering (WhereGreaterThanOrEqual) in LuceneQuery?

I'm currently running it as a console application (i.e. haven't installed
it as a service but not running embedded). The site is running in IIS.
Going to try that setting to see if it makes any difference.

Another thing I've noticed over the past day is that if I leave indexing on
while doing the batch import (takes aprox. 40 min) around 2,200,000 /
2,300,000 documents will have already been processed in the index when it's
done. If I repeat the same batch import  with indexing turned off, turn it
back on, and wait for indexing to complete - the same index will take 3-4
hours to build with a very noticeable decrease in performance as the index
gets bigger.

Another issue with the indexing seems to be that after you stop actively
adding documents to the index the indexing will slow down and those last
100,000 documents may take 15-20 minutes to complete and more often than
not the index will remain stale even though all documents have been
processed (to the point that it does not return any results when querying).
I've noticed that if I shutdown the ravendb server when this happens and
then start it back up the index will no longer be stale.

...

read more »


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Oren Eini (Ayende Rahien)  
View profile   Translate to Translated (View Original)
 More options Aug 16 2012, 12:28 pm
From: "Oren Eini (Ayende Rahien)" <aye...@ayende.com>
Date: Thu, 16 Aug 2012 19:28:30 +0300
Local: Thurs, Aug 16 2012 12:28 pm
Subject: Re: [RavenDB] Re: Is OrderBy applied before filtering (WhereGreaterThanOrEqual) in LuceneQuery?

Tobias,
Is it a child database?

...

read more »


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Tobias Sebring  
View profile   Translate to Translated (View Original)
 More options Aug 16 2012, 12:32 pm
From: Tobias Sebring <tsebr...@gmail.com>
Date: Thu, 16 Aug 2012 09:32:43 -0700 (PDT)
Local: Thurs, Aug 16 2012 12:32 pm
Subject: Re: [RavenDB] Re: Is OrderBy applied before filtering (WhereGreaterThanOrEqual) in LuceneQuery?

I'm not sure I understand "child database". It's a separate database from
the system database and I've created it in the manager by clicking New
database.

...

read more »


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Oren Eini (Ayende Rahien)  
View profile  
 More options Aug 16 2012, 12:35 pm
From: "Oren Eini (Ayende Rahien)" <aye...@ayende.com>
Date: Thu, 16 Aug 2012 19:35:31 +0300
Local: Thurs, Aug 16 2012 12:35 pm
Subject: Re: [RavenDB] Re: Is OrderBy applied before filtering (WhereGreaterThanOrEqual) in LuceneQuery?

Okay, that is what I meant.

I just fixed a bug that would likely cause non system database to shut down
after inactivity, even if they are doing indexing, if you are doing a LOT
of indexing.
That would probably match what you are seeing.
Can you wait for the next build and test that?

...

read more »


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Tobias Sebring  
View profile   Translate to Translated (View Original)
 More options Aug 16 2012, 1:02 pm
From: Tobias Sebring <tsebr...@gmail.com>
Date: Thu, 16 Aug 2012 10:02:22 -0700 (PDT)
Local: Thurs, Aug 16 2012 1:02 pm
Subject: Re: [RavenDB] Re: Is OrderBy applied before filtering (WhereGreaterThanOrEqual) in LuceneQuery?

Absolutely.

While I have you on the hook I have another problem I'm struggling with:

My batch import is processed in multiple phases to build my full
denormalized dataset. In one of the later phases I need to query data in
the database to find the relevant document to update using .LuceneQuery<>().
This works well for a majority of the updates in this phase but a few
iterations will fail to find the relevant document to update.
For every update iteration I modify fields that are part of the index I'm
querying and as such I call .WaitForNonStaleResults() on the subsequent
query after every update to (hopefully) make sure the index is in good
shape. The index is defined as an AbstractIndexCreationTask<Data,
Data.Result> (Map/Reduce) but there is no defined reduce function but
rather I'm using it to query the data using lambda syntax on computed
index-fields and then selecting the actual document type for the result:
session.Advanced.LuceneQuery<Datas_Index.Result, Datas_Index>()
   .WhereEqual(x => ..., ...)
   .SelectFields<Data>()
   .ToList()

After I rerun the same update code on the data that failed to find the
relevant document to update - like magic - it's now returned and updates
are successful. Any idea what I'm doing wrong here?

...

read more »


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Oren Eini (Ayende Rahien)  
View profile  
 More options Aug 16 2012, 2:18 pm
From: "Oren Eini (Ayende Rahien)" <aye...@ayende.com>
Date: Thu, 16 Aug 2012 21:18:57 +0300
Local: Thurs, Aug 16 2012 2:18 pm
Subject: Re: [RavenDB] Re: Is OrderBy applied before filtering (WhereGreaterThanOrEqual) in LuceneQuery?

No idea.
Can you try generating a failing test?
I am not sure that I am following.

...

read more »


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Tobias Sebring  
View profile   Translate to Translated (View Original)
 More options Aug 16 2012, 3:51 pm
From: Tobias Sebring <tsebr...@gmail.com>
Date: Thu, 16 Aug 2012 12:51:05 -0700 (PDT)
Local: Thurs, Aug 16 2012 3:51 pm
Subject: Re: [RavenDB] Re: Is OrderBy applied before filtering (WhereGreaterThanOrEqual) in LuceneQuery?

I'll see if I can get a test to fail to show the issue.

...

read more »


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »