I'm seeing some strange results when adding OrderBy operands to my queries on a fairly large dataset (2.3M documents).
In the following queries FilesCount_Range:[0x00000001 TO NULL] will limit the dataset to 2600 out of the 2.3M documents in the full dataset.
The below query will return almost immediately: query=FilesCount_Range:[0x00000001 TO NULL] start=0 pageSize=25 aggregation=None fetch=Id fetch=Files
This query on the other hand takes 50 seconds to return: query=FilesCount_Range:[0x00000001 TO NULL] start=0 pageSize=25 aggregation=None fetch=Id fetch=Files sort=Text sort=Year sort=ProjectId sort=CaseId
This is taken from the mvc-profiler: "49308 ms waiting for server in 1 request(s) for 1 sessions(s)Session opened for 50025.86 ms for 1 request(s) - Data/Index"
Subsequent calls with the same orderby are cached and return immediately.
The index is set up like so: public class Data_Index : AbstractIndexCreationTask<Data, Data_ Index.Result> { public class Result { public string Text { get; set; } public int Year { get; set; } public int? ProjectId { get; set; } public int? CaseId { get; set; } public int FilesCount { get; set; } public string[] Files { get; set; }
}
public Data_ Index() { Map = data => from d in data select new { Text = d.Text, Year = d.Year, ProjectId = d.ProjectId, CaseId = d.CaseId, FilesCount = d.Files.Count, Files = d.Files.Select(x => x.Path).ToArray()
This is on the latest build of RavenDb running as a console application. I've tried to reproduce this issue on a embedded server running in memory but to no avail.
I'm considering denormalizing my data even further by extracting all documents with files to another collection but hesitant because I figure these queries should work without this much overhead.
On Tuesday, August 14, 2012 12:31:39 PM UTC+2, Tobias Sebring wrote:
> I'm seeing some strange results when adding OrderBy operands to my queries > on a fairly large dataset (2.3M documents).
> In the following queries FilesCount_Range:[0x00000001 TO NULL] will limit > the dataset to 2600 out of the 2.3M documents in the full dataset.
> The below query will return almost immediately: > query=FilesCount_Range:[0x00000001 TO NULL] > start=0 > pageSize=25 > aggregation=None > fetch=Id > fetch=Files
> This query on the other hand takes 50 seconds to return: > query=FilesCount_Range:[0x00000001 TO NULL] > start=0 > pageSize=25 > aggregation=None > fetch=Id > fetch=Files > sort=Text > sort=Year > sort=ProjectId > sort=CaseId
> This is taken from the mvc-profiler: > "49308 ms waiting for server in 1 request(s) for 1 sessions(s)Session > opened for 50025.86 ms for 1 request(s) - Data/Index"
> Subsequent calls with the same orderby are cached and return immediately.
> The index is set up like so: > public class Data_Index : AbstractIndexCreationTask<Data, Data_ > Index.Result> > { > public class Result > { > public string Text { get; set; } > public int Year { get; set; } > public int? ProjectId { get; set; } > public int? CaseId { get; set; } > public int FilesCount { get; set; } > public string[] Files { get; set; } > }
> public Data_ Index() > { > Map = data => from d in data > select new > { > Text = d.Text, > Year = d.Year, > ProjectId = d.ProjectId, > CaseId = d.CaseId, > FilesCount = d.Files.Count, > Files = d.Files.Select(x => x.Path).ToArray() > };
> This is on the latest build of RavenDb running as a console application. > I've tried to reproduce this issue on a embedded server running in memory > but to no avail.
> I'm considering denormalizing my data even further by extracting all > documents with files to another collection but hesitant because I figure > these queries should work without this much overhead.
On Wed, Aug 15, 2012 at 12:36 PM, Tobias Sebring <tsebr...@gmail.com> wrote:
> Any clue as to why this is so slow? I have still not been able to figure
> this out.
> On Tuesday, August 14, 2012 12:31:39 PM UTC+2, Tobias Sebring wrote:
>> I'm seeing some strange results when adding OrderBy operands to my
>> queries on a fairly large dataset (2.3M documents).
>> In the following queries FilesCount_Range:[**0x00000001 TO NULL] will
>> limit the dataset to 2600 out of the 2.3M documents in the full dataset.
>> The below query will return almost immediately:
>> query=FilesCount_Range:[**0x00000001 TO NULL]
>> start=0
>> pageSize=25
>> aggregation=None
>> fetch=Id
>> fetch=Files
>> This query on the other hand takes 50 seconds to return:
>> query=FilesCount_Range:[**0x00000001 TO NULL]
>> start=0
>> pageSize=25
>> aggregation=None
>> fetch=Id
>> fetch=Files
>> sort=Text
>> sort=Year
>> sort=ProjectId
>> sort=CaseId
>> This is taken from the mvc-profiler:
>> "49308 ms waiting for server in 1 request(s) for 1 sessions(s)Session
>> opened for 50025.86 ms for 1 request(s) - Data/Index"
>> Subsequent calls with the same orderby are cached and return immediately.
>> The index is set up like so:
>> public class Data_Index : AbstractIndexCreationTask<**Data, Data_
>> Index.Result>
>> {
>> public class Result
>> {
>> public string Text { get; set; }
>> public int Year { get; set; }
>> public int? ProjectId { get; set; }
>> public int? CaseId { get; set; }
>> public int FilesCount { get; set; }
>> public string[] Files { get; set; }
>> }
>> public Data_ Index()
>> {
>> Map = data => from d in data
>> select new
>> {
>> Text = d.Text,
>> Year = d.Year,
>> ProjectId = d.ProjectId,
>> CaseId = d.CaseId,
>> FilesCount = d.Files.Count,
>> Files = d.Files.Select(x => x.Path).ToArray()
>> };
>> This is on the latest build of RavenDb running as a console application.
>> I've tried to reproduce this issue on a embedded server running in memory
>> but to no avail.
>> I'm considering denormalizing my data even further by extracting all
>> documents with files to another collection but hesitant because I figure
>> these queries should work without this much overhead.
On Wednesday, August 15, 2012 11:53:53 AM UTC+2, Oren Eini wrote:
> How many items match the result?
> On Wed, Aug 15, 2012 at 12:36 PM, Tobias Sebring <tseb...@gmail.com<javascript:> > > wrote:
>> Any clue as to why this is so slow? I have still not been able to figure >> this out.
>> On Tuesday, August 14, 2012 12:31:39 PM UTC+2, Tobias Sebring wrote:
>>> I'm seeing some strange results when adding OrderBy operands to my >>> queries on a fairly large dataset (2.3M documents).
>>> In the following queries FilesCount_Range:[**0x00000001 TO NULL] will >>> limit the dataset to 2600 out of the 2.3M documents in the full dataset.
>>> The below query will return almost immediately: >>> query=FilesCount_Range:[**0x00000001 TO NULL] >>> start=0 >>> pageSize=25 >>> aggregation=None >>> fetch=Id >>> fetch=Files
>>> This query on the other hand takes 50 seconds to return: >>> query=FilesCount_Range:[**0x00000001 TO NULL] >>> start=0 >>> pageSize=25 >>> aggregation=None >>> fetch=Id >>> fetch=Files >>> sort=Text >>> sort=Year >>> sort=ProjectId >>> sort=CaseId
>>> This is taken from the mvc-profiler: >>> "49308 ms waiting for server in 1 request(s) for 1 sessions(s)Session >>> opened for 50025.86 ms for 1 request(s) - Data/Index"
>>> Subsequent calls with the same orderby are cached and return immediately.
>>> The index is set up like so: >>> public class Data_Index : AbstractIndexCreationTask<**Data, Data_ >>> Index.Result> >>> { >>> public class Result >>> { >>> public string Text { get; set; } >>> public int Year { get; set; } >>> public int? ProjectId { get; set; } >>> public int? CaseId { get; set; } >>> public int FilesCount { get; set; } >>> public string[] Files { get; set; } >>> }
>>> public Data_ Index() >>> { >>> Map = data => from d in data >>> select new >>> { >>> Text = d.Text, >>> Year = d.Year, >>> ProjectId = d.ProjectId, >>> CaseId = d.CaseId, >>> FilesCount = d.Files.Count, >>> Files = d.Files.Select(x => x.Path).ToArray() >>> };
>>> This is on the latest build of RavenDb running as a console application. >>> I've tried to reproduce this issue on a embedded server running in memory >>> but to no avail.
>>> I'm considering denormalizing my data even further by extracting all >>> documents with files to another collection but hesitant because I figure >>> these queries should work without this much overhead.
Well, we do have to do extra work on your case, loading a lot more info to
memory to be able to sort on this.
If this happens on a cold start, it may take a while.
Can you try a different query (so it won't be cached), and see what happens
over time?
On Wed, Aug 15, 2012 at 1:25 PM, Tobias Sebring <tsebr...@gmail.com> wrote:
> *"TotalResults": 2601,
> "SkippedResults": 0*
> On Wednesday, August 15, 2012 11:53:53 AM UTC+2, Oren Eini wrote:
>> How many items match the result?
>> On Wed, Aug 15, 2012 at 12:36 PM, Tobias Sebring <tseb...@gmail.com>wrote:
>>> Any clue as to why this is so slow? I have still not been able to figure
>>> this out.
>>> On Tuesday, August 14, 2012 12:31:39 PM UTC+2, Tobias Sebring wrote:
>>>> I'm seeing some strange results when adding OrderBy operands to my
>>>> queries on a fairly large dataset (2.3M documents).
>>>> In the following queries FilesCount_Range:[**0x00**000001 TO NULL] will
>>>> limit the dataset to 2600 out of the 2.3M documents in the full
>>>> dataset.
>>>> The below query will return almost immediately:
>>>> query=FilesCount_Range:[**0x0000**0001 TO NULL]
>>>> start=0
>>>> pageSize=25
>>>> aggregation=None
>>>> fetch=Id
>>>> fetch=Files
>>>> This query on the other hand takes 50 seconds to return:
>>>> query=FilesCount_Range:[**0x0000**0001 TO NULL]
>>>> start=0
>>>> pageSize=25
>>>> aggregation=None
>>>> fetch=Id
>>>> fetch=Files
>>>> sort=Text
>>>> sort=Year
>>>> sort=ProjectId
>>>> sort=CaseId
>>>> This is taken from the mvc-profiler:
>>>> "49308 ms waiting for server in 1 request(s) for 1 sessions(s)Session
>>>> opened for 50025.86 ms for 1 request(s) - Data/Index"
>>>> Subsequent calls with the same orderby are cached and return
>>>> immediately.
>>>> The index is set up like so:
>>>> public class Data_Index : AbstractIndexCreationTask<**Data**, Data_
>>>> Index.Result>
>>>> {
>>>> public class Result
>>>> {
>>>> public string Text { get; set; }
>>>> public int Year { get; set; }
>>>> public int? ProjectId { get; set; }
>>>> public int? CaseId { get; set; }
>>>> public int FilesCount { get; set; }
>>>> public string[] Files { get; set; }
>>>> }
>>>> public Data_ Index()
>>>> {
>>>> Map = data => from d in data
>>>> select new
>>>> {
>>>> Text = d.Text,
>>>> Year = d.Year,
>>>> ProjectId = d.ProjectId,
>>>> CaseId = d.CaseId,
>>>> FilesCount = d.Files.Count,
>>>> Files = d.Files.Select(x => x.Path).ToArray()
>>>> };
>>>> This is on the latest build of RavenDb running as a console
>>>> application. I've tried to reproduce this issue on a embedded server
>>>> running in memory but to no avail.
>>>> I'm considering denormalizing my data even further by extracting all
>>>> documents with files to another collection but hesitant because I figure
>>>> these queries should work without this much overhead.
Is 50 seconds reasonable for a cold start with the size of the data set I'm dealing with? If I remove the OrderBy on the text field the query will sort on Year, ProjectId and CaseId and return within 16 seconds which also seems slow for loading 2601 documents and doing an in-memory sort. A subsequent and modified query (e.g. adding another field to the query) will complete pretty fast. After a delay of inactivity the query will once again take 50 seconds to complete.
Considering these factors I'm guessing the sorting is done on the full document collection and then cached in memory? If so is there a way to increase the lifetime of this cache and/or possibly not sort on the full collection but on the documents returned from the query?
On Wednesday, August 15, 2012 1:19:35 PM UTC+2, Oren Eini wrote:
> Well, we do have to do extra work on your case, loading a lot more info to > memory to be able to sort on this. > If this happens on a cold start, it may take a while. > Can you try a different query (so it won't be cached), and see what > happens over time?
> On Wed, Aug 15, 2012 at 1:25 PM, Tobias Sebring <tseb...@gmail.com<javascript:> > > wrote:
>> *"TotalResults": 2601, >> "SkippedResults": 0*
>> On Wednesday, August 15, 2012 11:53:53 AM UTC+2, Oren Eini wrote:
>>> How many items match the result?
>>> On Wed, Aug 15, 2012 at 12:36 PM, Tobias Sebring <tseb...@gmail.com>wrote:
>>>> Any clue as to why this is so slow? I have still not been able to >>>> figure this out.
>>>> On Tuesday, August 14, 2012 12:31:39 PM UTC+2, Tobias Sebring wrote:
>>>>> I'm seeing some strange results when adding OrderBy operands to my >>>>> queries on a fairly large dataset (2.3M documents).
>>>>> In the following queries FilesCount_Range:[**0x00**000001 TO NULL] will >>>>> limit the dataset to 2600 out of the 2.3M documents in the full >>>>> dataset.
>>>>> The below query will return almost immediately: >>>>> query=FilesCount_Range:[**0x0000**0001 TO NULL] >>>>> start=0 >>>>> pageSize=25 >>>>> aggregation=None >>>>> fetch=Id >>>>> fetch=Files
>>>>> This query on the other hand takes 50 seconds to return: >>>>> query=FilesCount_Range:[**0x0000**0001 TO NULL] >>>>> start=0 >>>>> pageSize=25 >>>>> aggregation=None >>>>> fetch=Id >>>>> fetch=Files >>>>> sort=Text >>>>> sort=Year >>>>> sort=ProjectId >>>>> sort=CaseId
>>>>> This is taken from the mvc-profiler: >>>>> "49308 ms waiting for server in 1 request(s) for 1 sessions(s)Session >>>>> opened for 50025.86 ms for 1 request(s) - Data/Index"
>>>>> Subsequent calls with the same orderby are cached and return >>>>> immediately.
>>>>> The index is set up like so: >>>>> public class Data_Index : AbstractIndexCreationTask<**Data**, Data_ >>>>> Index.Result> >>>>> { >>>>> public class Result >>>>> { >>>>> public string Text { get; set; } >>>>> public int Year { get; set; } >>>>> public int? ProjectId { get; set; } >>>>> public int? CaseId { get; set; } >>>>> public int FilesCount { get; set; } >>>>> public string[] Files { get; set; } >>>>> }
>>>>> public Data_ Index() >>>>> { >>>>> Map = data => from d in data >>>>> select new >>>>> { >>>>> Text = d.Text, >>>>> Year = d.Year, >>>>> ProjectId = d.ProjectId, >>>>> CaseId = d.CaseId, >>>>> FilesCount = d.Files.Count, >>>>> Files = d.Files.Select(x => x.Path).ToArray() >>>>> };
>>>>> This is on the latest build of RavenDb running as a console >>>>> application. I've tried to reproduce this issue on a embedded server >>>>> running in memory but to no avail.
>>>>> I'm considering denormalizing my data even further by extracting all >>>>> documents with files to another collection but hesitant because I figure >>>>> these queries should work without this much overhead.
On Wed, Aug 15, 2012 at 2:45 PM, Tobias Sebring <tsebr...@gmail.com> wrote:
> Is 50 seconds reasonable for a cold start with the size of the data set
> I'm dealing with? If I remove the OrderBy on the text field the query will
> sort on Year, ProjectId and CaseId and return within 16 seconds which also
> seems slow for loading 2601 documents and doing an in-memory sort. A
> subsequent and modified query (e.g. adding another field to the query) will
> complete pretty fast. After a delay of inactivity the query will once again
> take 50 seconds to complete.
> Considering these factors I'm guessing the sorting is done on the full
> document collection and then cached in memory? If so is there a way
> to increase the lifetime of this cache and/or possibly not sort on the full
> collection but on the documents returned from the query?
> On Wednesday, August 15, 2012 1:19:35 PM UTC+2, Oren Eini wrote:
>> Well, we do have to do extra work on your case, loading a lot more info
>> to memory to be able to sort on this.
>> If this happens on a cold start, it may take a while.
>> Can you try a different query (so it won't be cached), and see what
>> happens over time?
>> On Wed, Aug 15, 2012 at 1:25 PM, Tobias Sebring <tseb...@gmail.com>wrote:
>>> On Wednesday, August 15, 2012 11:53:53 AM UTC+2, Oren Eini wrote:
>>>> How many items match the result?
>>>> On Wed, Aug 15, 2012 at 12:36 PM, Tobias Sebring <tseb...@gmail.com>wrote:
>>>>> Any clue as to why this is so slow? I have still not been able to
>>>>> figure this out.
>>>>> On Tuesday, August 14, 2012 12:31:39 PM UTC+2, Tobias Sebring wrote:
>>>>>> I'm seeing some strange results when adding OrderBy operands to my
>>>>>> queries on a fairly large dataset (2.3M documents).
>>>>>> In the following queries FilesCount_Range:[**0x00****000001 TO NULL] will
>>>>>> limit the dataset to 2600 out of the 2.3M documents in the full
>>>>>> dataset.
>>>>>> The below query will return almost immediately:
>>>>>> query=FilesCount_Range:[**0x0000****0001 TO NULL]
>>>>>> start=0
>>>>>> pageSize=25
>>>>>> aggregation=None
>>>>>> fetch=Id
>>>>>> fetch=Files
>>>>>> This query on the other hand takes 50 seconds to return:
>>>>>> query=FilesCount_Range:[**0x0000****0001 TO NULL]
>>>>>> start=0
>>>>>> pageSize=25
>>>>>> aggregation=None
>>>>>> fetch=Id
>>>>>> fetch=Files
>>>>>> sort=Text
>>>>>> sort=Year
>>>>>> sort=ProjectId
>>>>>> sort=CaseId
>>>>>> This is taken from the mvc-profiler:
>>>>>> "49308 ms waiting for server in 1 request(s) for 1 sessions(s)Session
>>>>>> opened for 50025.86 ms for 1 request(s) - Data/Index"
>>>>>> Subsequent calls with the same orderby are cached and return
>>>>>> immediately.
>>>>>> The index is set up like so:
>>>>>> public class Data_Index : AbstractIndexCreationTask<**Data****,
>>>>>> Data_ Index.Result>
>>>>>> {
>>>>>> public class Result
>>>>>> {
>>>>>> public string Text { get; set; }
>>>>>> public int Year { get; set; }
>>>>>> public int? ProjectId { get; set; }
>>>>>> public int? CaseId { get; set; }
>>>>>> public int FilesCount { get; set; }
>>>>>> public string[] Files { get; set; }
>>>>>> }
>>>>>> public Data_ Index()
>>>>>> {
>>>>>> Map = data => from d in data
>>>>>> select new
>>>>>> {
>>>>>> Text = d.Text,
>>>>>> Year = d.Year,
>>>>>> ProjectId = d.ProjectId,
>>>>>> CaseId = d.CaseId,
>>>>>> FilesCount = d.Files.Count,
>>>>>> Files = d.Files.Select(x => x.Path).ToArray()
>>>>>> };
>>>>>> This is on the latest build of RavenDb running as a console
>>>>>> application. I've tried to reproduce this issue on a embedded server
>>>>>> running in memory but to no avail.
>>>>>> I'm considering denormalizing my data even further by extracting all
>>>>>> documents with files to another collection but hesitant because I figure
>>>>>> these queries should work without this much overhead.
On Wednesday, August 15, 2012 1:54:22 PM UTC+2, Oren Eini wrote:
> Tobias, > What is the size of the text field?
> We do a lot of complex caching there, and depending on the actual index > size (can you check), loading through it might take a while.
> On Wed, Aug 15, 2012 at 2:45 PM, Tobias Sebring <tseb...@gmail.com<javascript:> > > wrote:
>> Is 50 seconds reasonable for a cold start with the size of the data set >> I'm dealing with? If I remove the OrderBy on the text field the query will >> sort on Year, ProjectId and CaseId and return within 16 seconds which also >> seems slow for loading 2601 documents and doing an in-memory sort. A >> subsequent and modified query (e.g. adding another field to the query) will >> complete pretty fast. After a delay of inactivity the query will once again >> take 50 seconds to complete.
>> Considering these factors I'm guessing the sorting is done on the full >> document collection and then cached in memory? If so is there a way >> to increase the lifetime of this cache and/or possibly not sort on the full >> collection but on the documents returned from the query?
>> On Wednesday, August 15, 2012 1:19:35 PM UTC+2, Oren Eini wrote:
>>> Well, we do have to do extra work on your case, loading a lot more info >>> to memory to be able to sort on this. >>> If this happens on a cold start, it may take a while. >>> Can you try a different query (so it won't be cached), and see what >>> happens over time?
>>> On Wed, Aug 15, 2012 at 1:25 PM, Tobias Sebring <tseb...@gmail.com>wrote:
>>>> On Wednesday, August 15, 2012 11:53:53 AM UTC+2, Oren Eini wrote:
>>>>> How many items match the result?
>>>>> On Wed, Aug 15, 2012 at 12:36 PM, Tobias Sebring <tseb...@gmail.com>wrote:
>>>>>> Any clue as to why this is so slow? I have still not been able to >>>>>> figure this out.
>>>>>> On Tuesday, August 14, 2012 12:31:39 PM UTC+2, Tobias Sebring wrote:
>>>>>>> I'm seeing some strange results when adding OrderBy operands to my >>>>>>> queries on a fairly large dataset (2.3M documents).
>>>>>>> In the following queries FilesCount_Range:[**0x00****000001 TO NULL] will >>>>>>> limit the dataset to 2600 out of the 2.3M documents in the full >>>>>>> dataset.
>>>>>>> The below query will return almost immediately: >>>>>>> query=FilesCount_Range:[**0x0000****0001 TO NULL] >>>>>>> start=0 >>>>>>> pageSize=25 >>>>>>> aggregation=None >>>>>>> fetch=Id >>>>>>> fetch=Files
>>>>>>> This query on the other hand takes 50 seconds to return: >>>>>>> query=FilesCount_Range:[**0x0000****0001 TO NULL] >>>>>>> start=0 >>>>>>> pageSize=25 >>>>>>> aggregation=None >>>>>>> fetch=Id >>>>>>> fetch=Files >>>>>>> sort=Text >>>>>>> sort=Year >>>>>>> sort=ProjectId >>>>>>> sort=CaseId
>>>>>>> This is taken from the mvc-profiler: >>>>>>> "49308 ms waiting for server in 1 request(s) for 1 >>>>>>> sessions(s)Session opened for 50025.86 ms for 1 request(s) - Data/Index"
>>>>>>> Subsequent calls with the same orderby are cached and return >>>>>>> immediately.
>>>>>>> The index is set up like so: >>>>>>> public class Data_Index : AbstractIndexCreationTask<**Data****, >>>>>>> Data_ Index.Result> >>>>>>> { >>>>>>> public class Result >>>>>>> { >>>>>>> public string Text { get; set; } >>>>>>> public int Year { get; set; } >>>>>>> public int? ProjectId { get; set; } >>>>>>> public int? CaseId { get; set; } >>>>>>> public int FilesCount { get; set; } >>>>>>> public string[] Files { get; set; } >>>>>>> }
>>>>>>> public Data_ Index() >>>>>>> { >>>>>>> Map = data => from d in data >>>>>>> select new >>>>>>> { >>>>>>> Text = d.Text, >>>>>>> Year = d.Year, >>>>>>> ProjectId = d.ProjectId, >>>>>>> CaseId = d.CaseId, >>>>>>> FilesCount = d.Files.Count, >>>>>>> Files = d.Files.Select(x => x.Path).ToArray() >>>>>>> };
>>>>>>> This is on the latest build of RavenDb running as a console >>>>>>> application. I've tried to reproduce this issue on a embedded server >>>>>>> running in memory but to no avail.
>>>>>>> I'm considering denormalizing my data even further by extracting all >>>>>>> documents with files to another collection but hesitant because I figure >>>>>>> these queries should work without this much overhead.
Strange.
At any rate, you need to warm up the index, make sure that a lot of it is
in memory before you can really make perf comparisons. That is especially
true with large indexes.
On Wed, Aug 15, 2012 at 2:59 PM, Tobias Sebring <tsebr...@gmail.com> wrote:
> 25-50 chars on average but in rare cases upwards of 100 chars.
> On Wednesday, August 15, 2012 1:54:22 PM UTC+2, Oren Eini wrote:
>> Tobias,
>> What is the size of the text field?
>> We do a lot of complex caching there, and depending on the actual index
>> size (can you check), loading through it might take a while.
>> On Wed, Aug 15, 2012 at 2:45 PM, Tobias Sebring <tseb...@gmail.com>wrote:
>>> Is 50 seconds reasonable for a cold start with the size of the data set
>>> I'm dealing with? If I remove the OrderBy on the text field the query will
>>> sort on Year, ProjectId and CaseId and return within 16 seconds which also
>>> seems slow for loading 2601 documents and doing an in-memory sort. A
>>> subsequent and modified query (e.g. adding another field to the query) will
>>> complete pretty fast. After a delay of inactivity the query will once again
>>> take 50 seconds to complete.
>>> Considering these factors I'm guessing the sorting is done on the full
>>> document collection and then cached in memory? If so is there a way
>>> to increase the lifetime of this cache and/or possibly not sort on the full
>>> collection but on the documents returned from the query?
>>> On Wednesday, August 15, 2012 1:19:35 PM UTC+2, Oren Eini wrote:
>>>> Well, we do have to do extra work on your case, loading a lot more info
>>>> to memory to be able to sort on this.
>>>> If this happens on a cold start, it may take a while.
>>>> Can you try a different query (so it won't be cached), and see what
>>>> happens over time?
>>>> On Wed, Aug 15, 2012 at 1:25 PM, Tobias Sebring <tseb...@gmail.com>wrote:
>>>>> On Wednesday, August 15, 2012 11:53:53 AM UTC+2, Oren Eini wrote:
>>>>>> How many items match the result?
>>>>>> On Wed, Aug 15, 2012 at 12:36 PM, Tobias Sebring <tseb...@gmail.com>wrote:
>>>>>>> Any clue as to why this is so slow? I have still not been able to
>>>>>>> figure this out.
>>>>>>> On Tuesday, August 14, 2012 12:31:39 PM UTC+2, Tobias Sebring wrote:
>>>>>>>> I'm seeing some strange results when adding OrderBy operands to my
>>>>>>>> queries on a fairly large dataset (2.3M documents).
>>>>>>>> In the following queries FilesCount_Range:[**0x00******000001 TO
>>>>>>>> NULL] will limit the dataset to 2600 out of the 2.3M documents in
>>>>>>>> the full dataset.
>>>>>>>> The below query will return almost immediately:
>>>>>>>> query=FilesCount_Range:[**0x0000******0001 TO NULL]
>>>>>>>> start=0
>>>>>>>> pageSize=25
>>>>>>>> aggregation=None
>>>>>>>> fetch=Id
>>>>>>>> fetch=Files
>>>>>>>> This query on the other hand takes 50 seconds to return:
>>>>>>>> query=FilesCount_Range:[**0x0000******0001 TO NULL]
>>>>>>>> start=0
>>>>>>>> pageSize=25
>>>>>>>> aggregation=None
>>>>>>>> fetch=Id
>>>>>>>> fetch=Files
>>>>>>>> sort=Text
>>>>>>>> sort=Year
>>>>>>>> sort=ProjectId
>>>>>>>> sort=CaseId
>>>>>>>> This is taken from the mvc-profiler:
>>>>>>>> "49308 ms waiting for server in 1 request(s) for 1
>>>>>>>> sessions(s)Session opened for 50025.86 ms for 1 request(s) - Data/Index"
>>>>>>>> Subsequent calls with the same orderby are cached and return
>>>>>>>> immediately.
>>>>>>>> The index is set up like so:
>>>>>>>> public class Data_Index : AbstractIndexCreationTask<**Data******,
>>>>>>>> Data_ Index.Result>
>>>>>>>> {
>>>>>>>> public class Result
>>>>>>>> {
>>>>>>>> public string Text { get; set; }
>>>>>>>> public int Year { get; set; }
>>>>>>>> public int? ProjectId { get; set; }
>>>>>>>> public int? CaseId { get; set; }
>>>>>>>> public int FilesCount { get; set; }
>>>>>>>> public string[] Files { get; set; }
>>>>>>>> }
>>>>>>>> public Data_ Index()
>>>>>>>> {
>>>>>>>> Map = data => from d in data
>>>>>>>> select new
>>>>>>>> {
>>>>>>>> Text = d.Text,
>>>>>>>> Year = d.Year,
>>>>>>>> ProjectId = d.ProjectId,
>>>>>>>> CaseId = d.CaseId,
>>>>>>>> FilesCount = d.Files.Count,
>>>>>>>> Files = d.Files.Select(x => x.Path).ToArray()
>>>>>>>> };
>>>>>>>> This is on the latest build of RavenDb running as a console
>>>>>>>> application. I've tried to reproduce this issue on a embedded server
>>>>>>>> running in memory but to no avail.
>>>>>>>> I'm considering denormalizing my data even further by extracting
>>>>>>>> all documents with files to another collection but hesitant because I
>>>>>>>> figure these queries should work without this much overhead.
What are my options for doing so and/or to make sure it stays in memory? The database is around 4-5GB and I don't expect it to become much larger than that. If I knew of a way to do it I could potentially pre-load the entire database into memory but it would still need to be persisted to disk.
On Wednesday, August 15, 2012 1:59:50 PM UTC+2, Oren Eini wrote:
> Strange. > At any rate, you need to warm up the index, make sure that a lot of it is > in memory before you can really make perf comparisons. That is especially > true with large indexes.
> On Wed, Aug 15, 2012 at 2:59 PM, Tobias Sebring <tseb...@gmail.com<javascript:> > > wrote:
>> 25-50 chars on average but in rare cases upwards of 100 chars.
>> On Wednesday, August 15, 2012 1:54:22 PM UTC+2, Oren Eini wrote:
>>> Tobias, >>> What is the size of the text field?
>>> We do a lot of complex caching there, and depending on the actual index >>> size (can you check), loading through it might take a while.
>>> On Wed, Aug 15, 2012 at 2:45 PM, Tobias Sebring <tseb...@gmail.com>wrote:
>>>> Is 50 seconds reasonable for a cold start with the size of the data set >>>> I'm dealing with? If I remove the OrderBy on the text field the query will >>>> sort on Year, ProjectId and CaseId and return within 16 seconds which also >>>> seems slow for loading 2601 documents and doing an in-memory sort. A >>>> subsequent and modified query (e.g. adding another field to the query) will >>>> complete pretty fast. After a delay of inactivity the query will once again >>>> take 50 seconds to complete.
>>>> Considering these factors I'm guessing the sorting is done on the full >>>> document collection and then cached in memory? If so is there a way >>>> to increase the lifetime of this cache and/or possibly not sort on the full >>>> collection but on the documents returned from the query?
>>>> On Wednesday, August 15, 2012 1:19:35 PM UTC+2, Oren Eini wrote:
>>>>> Well, we do have to do extra work on your case, loading a lot more >>>>> info to memory to be able to sort on this. >>>>> If this happens on a cold start, it may take a while. >>>>> Can you try a different query (so it won't be cached), and see what >>>>> happens over time?
>>>>> On Wed, Aug 15, 2012 at 1:25 PM, Tobias Sebring <tseb...@gmail.com>wrote:
>>>>>> On Wednesday, August 15, 2012 11:53:53 AM UTC+2, Oren Eini wrote:
>>>>>>> How many items match the result?
>>>>>>> On Wed, Aug 15, 2012 at 12:36 PM, Tobias Sebring <tseb...@gmail.com>wrote:
>>>>>>>> Any clue as to why this is so slow? I have still not been able to >>>>>>>> figure this out.
>>>>>>>> On Tuesday, August 14, 2012 12:31:39 PM UTC+2, Tobias Sebring >>>>>>>> wrote:
>>>>>>>>> I'm seeing some strange results when adding OrderBy operands to my >>>>>>>>> queries on a fairly large dataset (2.3M documents).
>>>>>>>>> In the following queries FilesCount_Range:[**0x00******000001 TO >>>>>>>>> NULL] will limit the dataset to 2600 out of the 2.3M documents in >>>>>>>>> the full dataset.
>>>>>>>>> The below query will return almost immediately: >>>>>>>>> query=FilesCount_Range:[**0x0000******0001 TO NULL] >>>>>>>>> start=0 >>>>>>>>> pageSize=25 >>>>>>>>> aggregation=None >>>>>>>>> fetch=Id >>>>>>>>> fetch=Files
>>>>>>>>> This query on the other hand takes 50 seconds to return: >>>>>>>>> query=FilesCount_Range:[**0x0000******0001 TO NULL] >>>>>>>>> start=0 >>>>>>>>> pageSize=25 >>>>>>>>> aggregation=None >>>>>>>>> fetch=Id >>>>>>>>> fetch=Files >>>>>>>>> sort=Text >>>>>>>>> sort=Year >>>>>>>>> sort=ProjectId >>>>>>>>> sort=CaseId
>>>>>>>>> This is taken from the mvc-profiler: >>>>>>>>> "49308 ms waiting for server in 1 request(s) for 1 >>>>>>>>> sessions(s)Session opened for 50025.86 ms for 1 request(s) - Data/Index"
>>>>>>>>> Subsequent calls with the same orderby are cached and return >>>>>>>>> immediately.
>>>>>>>>> The index is set up like so: >>>>>>>>> public class Data_Index : AbstractIndexCreationTask<**Data******, >>>>>>>>> Data_ Index.Result> >>>>>>>>> { >>>>>>>>> public class Result >>>>>>>>> { >>>>>>>>> public string Text { get; set; } >>>>>>>>> public int Year { get; set; } >>>>>>>>> public int? ProjectId { get; set; } >>>>>>>>> public int? CaseId { get; set; } >>>>>>>>> public int FilesCount { get; set; } >>>>>>>>> public string[] Files { get; set; } >>>>>>>>> }
>>>>>>>>> public Data_ Index() >>>>>>>>> { >>>>>>>>> Map = data => from d in data >>>>>>>>> select new >>>>>>>>> { >>>>>>>>> Text = d.Text, >>>>>>>>> Year = d.Year, >>>>>>>>> ProjectId = d.ProjectId, >>>>>>>>> CaseId = d.CaseId, >>>>>>>>> FilesCount = d.Files.Count, >>>>>>>>> Files = d.Files.Select(x => x.Path).ToArray() >>>>>>>>> };
>>>>>>>>> This is on the latest build of RavenDb running as a console >>>>>>>>> application. I've tried to reproduce this issue on a embedded server >>>>>>>>> running in memory but to no avail.
>>>>>>>>> I'm considering denormalizing my data even further by extracting >>>>>>>>> all documents with files to another collection but hesitant because I >>>>>>>>> figure these queries should work without this much overhead.
On Wed, Aug 15, 2012 at 3:05 PM, Tobias Sebring <tsebr...@gmail.com> wrote:
> What are my options for doing so and/or to make sure it stays in memory?
> The database is around 4-5GB and I don't expect it to become much larger
> than that. If I knew of a way to do it I could potentially pre-load the
> entire database into memory but it would still need to be persisted to disk.
> On Wednesday, August 15, 2012 1:59:50 PM UTC+2, Oren Eini wrote:
>> Strange.
>> At any rate, you need to warm up the index, make sure that a lot of it is
>> in memory before you can really make perf comparisons. That is especially
>> true with large indexes.
>> On Wed, Aug 15, 2012 at 2:59 PM, Tobias Sebring <tseb...@gmail.com>wrote:
>>> 25-50 chars on average but in rare cases upwards of 100 chars.
>>> On Wednesday, August 15, 2012 1:54:22 PM UTC+2, Oren Eini wrote:
>>>> Tobias,
>>>> What is the size of the text field?
>>>> We do a lot of complex caching there, and depending on the actual index
>>>> size (can you check), loading through it might take a while.
>>>> On Wed, Aug 15, 2012 at 2:45 PM, Tobias Sebring <tseb...@gmail.com>wrote:
>>>>> Is 50 seconds reasonable for a cold start with the size of the data
>>>>> set I'm dealing with? If I remove the OrderBy on the text field the query
>>>>> will sort on Year, ProjectId and CaseId and return within 16 seconds which
>>>>> also seems slow for loading 2601 documents and doing an in-memory
>>>>> sort. A subsequent and modified query (e.g. adding another field to the
>>>>> query) will complete pretty fast. After a delay of inactivity the query
>>>>> will once again take 50 seconds to complete.
>>>>> Considering these factors I'm guessing the sorting is done on the full
>>>>> document collection and then cached in memory? If so is there a way
>>>>> to increase the lifetime of this cache and/or possibly not sort on the full
>>>>> collection but on the documents returned from the query?
>>>>> On Wednesday, August 15, 2012 1:19:35 PM UTC+2, Oren Eini wrote:
>>>>>> Well, we do have to do extra work on your case, loading a lot more
>>>>>> info to memory to be able to sort on this.
>>>>>> If this happens on a cold start, it may take a while.
>>>>>> Can you try a different query (so it won't be cached), and see what
>>>>>> happens over time?
>>>>>> On Wed, Aug 15, 2012 at 1:25 PM, Tobias Sebring <tseb...@gmail.com>wrote:
>>>>>>> On Wednesday, August 15, 2012 11:53:53 AM UTC+2, Oren Eini wrote:
>>>>>>>> How many items match the result?
>>>>>>>> On Wed, Aug 15, 2012 at 12:36 PM, Tobias Sebring <tseb...@gmail.com
>>>>>>>> > wrote:
>>>>>>>>> Any clue as to why this is so slow? I have still not been able to
>>>>>>>>> figure this out.
>>>>>>>>> On Tuesday, August 14, 2012 12:31:39 PM UTC+2, Tobias Sebring
>>>>>>>>> wrote:
>>>>>>>>>> I'm seeing some strange results when adding OrderBy operands to
>>>>>>>>>> my queries on a fairly large dataset (2.3M documents).
>>>>>>>>>> In the following queries FilesCount_Range:[**0x00********000001
>>>>>>>>>> TO NULL] will limit the dataset to 2600 out of the 2.3M
>>>>>>>>>> documents in the full dataset.
>>>>>>>>>> The below query will return almost immediately:
>>>>>>>>>> query=FilesCount_Range:[**0x0000********0001 TO NULL]
>>>>>>>>>> start=0
>>>>>>>>>> pageSize=25
>>>>>>>>>> aggregation=None
>>>>>>>>>> fetch=Id
>>>>>>>>>> fetch=Files
>>>>>>>>>> This query on the other hand takes 50 seconds to return:
>>>>>>>>>> query=FilesCount_Range:[**0x0000********0001 TO NULL]
>>>>>>>>>> start=0
>>>>>>>>>> pageSize=25
>>>>>>>>>> aggregation=None
>>>>>>>>>> fetch=Id
>>>>>>>>>> fetch=Files
>>>>>>>>>> sort=Text
>>>>>>>>>> sort=Year
>>>>>>>>>> sort=ProjectId
>>>>>>>>>> sort=CaseId
>>>>>>>>>> This is taken from the mvc-profiler:
>>>>>>>>>> "49308 ms waiting for server in 1 request(s) for 1
>>>>>>>>>> sessions(s)Session opened for 50025.86 ms for 1 request(s) - Data/Index"
>>>>>>>>>> Subsequent calls with the same orderby are cached and return
>>>>>>>>>> immediately.
>>>>>>>>>> The index is set up like so:
>>>>>>>>>> public class Data_Index : AbstractIndexCreationTask<**Data*******
>>>>>>>>>> *, Data_ Index.Result>
>>>>>>>>>> {
>>>>>>>>>> public class Result
>>>>>>>>>> {
>>>>>>>>>> public string Text { get; set; }
>>>>>>>>>> public int Year { get; set; }
>>>>>>>>>> public int? ProjectId { get; set; }
>>>>>>>>>> public int? CaseId { get; set; }
>>>>>>>>>> public int FilesCount { get; set; }
>>>>>>>>>> public string[] Files { get; set; }
>>>>>>>>>> }
>>>>>>>>>> public Data_ Index()
>>>>>>>>>> {
>>>>>>>>>> Map = data => from d in data
>>>>>>>>>> select new
>>>>>>>>>> {
>>>>>>>>>> Text = d.Text,
>>>>>>>>>> Year = d.Year,
>>>>>>>>>> ProjectId = d.ProjectId,
>>>>>>>>>> CaseId = d.CaseId,
>>>>>>>>>> FilesCount = d.Files.Count,
>>>>>>>>>> Files = d.Files.Select(x => x.Path).ToArray()
>>>>>>>>>> };
>>>>>>>>>> This is on the latest build of RavenDb running as a console
>>>>>>>>>> application. I've tried to reproduce this issue on a embedded server
>>>>>>>>>> running in memory but to no avail.
>>>>>>>>>> I'm considering denormalizing my data even further by extracting
>>>>>>>>>> all documents with files to another collection but hesitant because I
>>>>>>>>>> figure these queries should work without this much overhead.
On Wednesday, August 15, 2012 2:15:13 PM UTC+2, Oren Eini wrote:
> Tobias, > First, let us check some things. > Can you check the actual size of the index?
> On Wed, Aug 15, 2012 at 3:05 PM, Tobias Sebring <tseb...@gmail.com<javascript:> > > wrote:
>> What are my options for doing so and/or to make sure it stays in memory? >> The database is around 4-5GB and I don't expect it to become much larger >> than that. If I knew of a way to do it I could potentially pre-load the >> entire database into memory but it would still need to be persisted to disk.
>> On Wednesday, August 15, 2012 1:59:50 PM UTC+2, Oren Eini wrote:
>>> Strange. >>> At any rate, you need to warm up the index, make sure that a lot of it >>> is in memory before you can really make perf comparisons. That is >>> especially true with large indexes.
>>> On Wed, Aug 15, 2012 at 2:59 PM, Tobias Sebring <tseb...@gmail.com>wrote:
>>>> 25-50 chars on average but in rare cases upwards of 100 chars.
>>>> On Wednesday, August 15, 2012 1:54:22 PM UTC+2, Oren Eini wrote:
>>>>> Tobias, >>>>> What is the size of the text field?
>>>>> We do a lot of complex caching there, and depending on the actual >>>>> index size (can you check), loading through it might take a while.
>>>>> On Wed, Aug 15, 2012 at 2:45 PM, Tobias Sebring <tseb...@gmail.com>wrote:
>>>>>> Is 50 seconds reasonable for a cold start with the size of the data >>>>>> set I'm dealing with? If I remove the OrderBy on the text field the query >>>>>> will sort on Year, ProjectId and CaseId and return within 16 seconds which >>>>>> also seems slow for loading 2601 documents and doing an in-memory >>>>>> sort. A subsequent and modified query (e.g. adding another field to the >>>>>> query) will complete pretty fast. After a delay of inactivity the query >>>>>> will once again take 50 seconds to complete.
>>>>>> Considering these factors I'm guessing the sorting is done on the >>>>>> full document collection and then cached in memory? If so is there a way >>>>>> to increase the lifetime of this cache and/or possibly not sort on the full >>>>>> collection but on the documents returned from the query?
>>>>>> On Wednesday, August 15, 2012 1:19:35 PM UTC+2, Oren Eini wrote:
>>>>>>> Well, we do have to do extra work on your case, loading a lot more >>>>>>> info to memory to be able to sort on this. >>>>>>> If this happens on a cold start, it may take a while. >>>>>>> Can you try a different query (so it won't be cached), and see what >>>>>>> happens over time?
>>>>>>> On Wed, Aug 15, 2012 at 1:25 PM, Tobias Sebring <tseb...@gmail.com>wrote:
>>>>>>>> On Wednesday, August 15, 2012 11:53:53 AM UTC+2, Oren Eini wrote:
>>>>>>>>> How many items match the result?
>>>>>>>>> On Wed, Aug 15, 2012 at 12:36 PM, Tobias Sebring < >>>>>>>>> tseb...@gmail.com> wrote:
>>>>>>>>>> Any clue as to why this is so slow? I have still not been able to >>>>>>>>>> figure this out.
>>>>>>>>>> On Tuesday, August 14, 2012 12:31:39 PM UTC+2, Tobias Sebring >>>>>>>>>> wrote:
>>>>>>>>>>> I'm seeing some strange results when adding OrderBy operands to >>>>>>>>>>> my queries on a fairly large dataset (2.3M documents).
>>>>>>>>>>> In the following queries FilesCount_Range:[**0x00********000001 >>>>>>>>>>> TO NULL] will limit the dataset to 2600 out of the 2.3M >>>>>>>>>>> documents in the full dataset.
>>>>>>>>>>> The below query will return almost immediately: >>>>>>>>>>> query=FilesCount_Range:[**0x0000********0001 TO NULL] >>>>>>>>>>> start=0 >>>>>>>>>>> pageSize=25 >>>>>>>>>>> aggregation=None >>>>>>>>>>> fetch=Id >>>>>>>>>>> fetch=Files
>>>>>>>>>>> This query on the other hand takes 50 seconds to return: >>>>>>>>>>> query=FilesCount_Range:[**0x0000********0001 TO NULL] >>>>>>>>>>> start=0 >>>>>>>>>>> pageSize=25 >>>>>>>>>>> aggregation=None >>>>>>>>>>> fetch=Id >>>>>>>>>>> fetch=Files >>>>>>>>>>> sort=Text >>>>>>>>>>> sort=Year >>>>>>>>>>> sort=ProjectId >>>>>>>>>>> sort=CaseId
>>>>>>>>>>> This is taken from the mvc-profiler: >>>>>>>>>>> "49308 ms waiting for server in 1 request(s) for 1 >>>>>>>>>>> sessions(s)Session opened for 50025.86 ms for 1 request(s) - Data/Index"
>>>>>>>>>>> Subsequent calls with the same orderby are cached and return >>>>>>>>>>> immediately.
>>>>>>>>>>> The index is set up like so: >>>>>>>>>>> public class Data_Index : AbstractIndexCreationTask<**Data****** >>>>>>>>>>> **, Data_ Index.Result> >>>>>>>>>>> { >>>>>>>>>>> public class Result >>>>>>>>>>> { >>>>>>>>>>> public string Text { get; set; } >>>>>>>>>>> public int Year { get; set; } >>>>>>>>>>> public int? ProjectId { get; set; } >>>>>>>>>>> public int? CaseId { get; set; } >>>>>>>>>>> public int FilesCount { get; set; } >>>>>>>>>>> public string[] Files { get; set; } >>>>>>>>>>> }
>>>>>>>>>>> public Data_ Index() >>>>>>>>>>> { >>>>>>>>>>> Map = data => from d in data >>>>>>>>>>> select new >>>>>>>>>>> { >>>>>>>>>>> Text = d.Text, >>>>>>>>>>> Year = d.Year, >>>>>>>>>>> ProjectId = d.ProjectId, >>>>>>>>>>> CaseId = d.CaseId, >>>>>>>>>>> FilesCount = d.Files.Count, >>>>>>>>>>> Files = d.Files.Select(x => x.Path).ToArray() >>>>>>>>>>> };
>>>>>>>>>>> This is on the latest build of RavenDb running as a console >>>>>>>>>>> application. I've tried to reproduce this issue on a embedded server >>>>>>>>>>> running in memory but to no avail.
>>>>>>>>>>> I'm considering denormalizing my data even further by extracting >>>>>>>>>>> all documents with files to another collection but hesitant because I >>>>>>>>>>> figure these queries should work without this much overhead.
On Wed, Aug 15, 2012 at 3:45 PM, Tobias Sebring <tsebr...@gmail.com> wrote:
> I will as soon as I can later today. Right now I don't have access to the
> database computer and rebuilding the index takes ~4 hours.
> We are talking about the physical size of the index on disk right?
> On Wednesday, August 15, 2012 2:15:13 PM UTC+2, Oren Eini wrote:
>> Tobias,
>> First, let us check some things.
>> Can you check the actual size of the index?
>> On Wed, Aug 15, 2012 at 3:05 PM, Tobias Sebring <tseb...@gmail.com>wrote:
>>> What are my options for doing so and/or to make sure it stays in memory?
>>> The database is around 4-5GB and I don't expect it to become much larger
>>> than that. If I knew of a way to do it I could potentially pre-load the
>>> entire database into memory but it would still need to be persisted to disk.
>>> On Wednesday, August 15, 2012 1:59:50 PM UTC+2, Oren Eini wrote:
>>>> Strange.
>>>> At any rate, you need to warm up the index, make sure that a lot of it
>>>> is in memory before you can really make perf comparisons. That is
>>>> especially true with large indexes.
>>>> On Wed, Aug 15, 2012 at 2:59 PM, Tobias Sebring <tseb...@gmail.com>wrote:
>>>>> 25-50 chars on average but in rare cases upwards of 100 chars.
>>>>> On Wednesday, August 15, 2012 1:54:22 PM UTC+2, Oren Eini wrote:
>>>>>> Tobias,
>>>>>> What is the size of the text field?
>>>>>> We do a lot of complex caching there, and depending on the actual
>>>>>> index size (can you check), loading through it might take a while.
>>>>>> On Wed, Aug 15, 2012 at 2:45 PM, Tobias Sebring <tseb...@gmail.com>wrote:
>>>>>>> Is 50 seconds reasonable for a cold start with the size of the data
>>>>>>> set I'm dealing with? If I remove the OrderBy on the text field the query
>>>>>>> will sort on Year, ProjectId and CaseId and return within 16 seconds which
>>>>>>> also seems slow for loading 2601 documents and doing an in-memory
>>>>>>> sort. A subsequent and modified query (e.g. adding another field to the
>>>>>>> query) will complete pretty fast. After a delay of inactivity the query
>>>>>>> will once again take 50 seconds to complete.
>>>>>>> Considering these factors I'm guessing the sorting is done on the
>>>>>>> full document collection and then cached in memory? If so is there a way
>>>>>>> to increase the lifetime of this cache and/or possibly not sort on the full
>>>>>>> collection but on the documents returned from the query?
>>>>>>> On Wednesday, August 15, 2012 1:19:35 PM UTC+2, Oren Eini wrote:
>>>>>>>> Well, we do have to do extra work on your case, loading a lot more
>>>>>>>> info to memory to be able to sort on this.
>>>>>>>> If this happens on a cold start, it may take a while.
>>>>>>>> Can you try a different query (so it won't be cached), and see what
>>>>>>>> happens over time?
>>>>>>>> On Wed, Aug 15, 2012 at 1:25 PM, Tobias Sebring <tseb...@gmail.com>wrote:
>>>>>>>>> On Wednesday, August 15, 2012 11:53:53 AM UTC+2, Oren Eini wrote:
>>>>>>>>>> How many items match the result?
>>>>>>>>>> On Wed, Aug 15, 2012 at 12:36 PM, Tobias Sebring <
>>>>>>>>>> tseb...@gmail.com> wrote:
>>>>>>>>>>> Any clue as to why this is so slow? I have still not been able
>>>>>>>>>>> to figure this out.
>>>>>>>>>>> On Tuesday, August 14, 2012 12:31:39 PM UTC+2, Tobias Sebring
>>>>>>>>>>> wrote:
>>>>>>>>>>>> I'm seeing some strange results when adding OrderBy operands to
>>>>>>>>>>>> my queries on a fairly large dataset (2.3M documents).
>>>>>>>>>>>> In the following queries FilesCount_Range:[**0x00**********000001
>>>>>>>>>>>> TO NULL] will limit the dataset to 2600 out of the 2.3M
>>>>>>>>>>>> documents in the full dataset.
>>>>>>>>>>>> The below query will return almost immediately:
>>>>>>>>>>>> query=FilesCount_Range:[**0x0000**********0001 TO NULL]
>>>>>>>>>>>> start=0
>>>>>>>>>>>> pageSize=25
>>>>>>>>>>>> aggregation=None
>>>>>>>>>>>> fetch=Id
>>>>>>>>>>>> fetch=Files
>>>>>>>>>>>> This query on the other hand takes 50 seconds to return:
>>>>>>>>>>>> query=FilesCount_Range:[**0x0000**********0001 TO NULL]
>>>>>>>>>>>> start=0
>>>>>>>>>>>> pageSize=25
>>>>>>>>>>>> aggregation=None
>>>>>>>>>>>> fetch=Id
>>>>>>>>>>>> fetch=Files
>>>>>>>>>>>> sort=Text
>>>>>>>>>>>> sort=Year
>>>>>>>>>>>> sort=ProjectId
>>>>>>>>>>>> sort=CaseId
>>>>>>>>>>>> This is taken from the mvc-profiler:
>>>>>>>>>>>> "49308 ms waiting for server in 1 request(s) for 1
>>>>>>>>>>>> sessions(s)Session opened for 50025.86 ms for 1 request(s) - Data/Index"
>>>>>>>>>>>> Subsequent calls with the same orderby are cached and return
>>>>>>>>>>>> immediately.
>>>>>>>>>>>> The index is set up like so:
>>>>>>>>>>>> public class Data_Index : AbstractIndexCreationTask<**Data*****
>>>>>>>>>>>> *****, Data_ Index.Result>
>>>>>>>>>>>> {
>>>>>>>>>>>> public class Result
>>>>>>>>>>>> {
>>>>>>>>>>>> public string Text { get; set; }
>>>>>>>>>>>> public int Year { get; set; }
>>>>>>>>>>>> public int? ProjectId { get; set; }
>>>>>>>>>>>> public int? CaseId { get; set; }
>>>>>>>>>>>> public int FilesCount { get; set; }
>>>>>>>>>>>> public string[] Files { get; set; }
>>>>>>>>>>>> }
>>>>>>>>>>>> public Data_ Index()
>>>>>>>>>>>> {
>>>>>>>>>>>> Map = data => from d in data
>>>>>>>>>>>> select new
>>>>>>>>>>>> {
>>>>>>>>>>>> Text = d.Text,
>>>>>>>>>>>> Year = d.Year,
>>>>>>>>>>>> ProjectId = d.ProjectId,
>>>>>>>>>>>> CaseId = d.CaseId,
>>>>>>>>>>>> FilesCount = d.Files.Count,
>>>>>>>>>>>> Files = d.Files.Select(x => x.Path).ToArray()
>>>>>>>>>>>> };
>>>>>>>>>>>> This is on the latest build of RavenDb running as a console
>>>>>>>>>>>> application. I've tried to reproduce this issue on a embedded server
>>>>>>>>>>>> running in memory but to no avail.
>>>>>>>>>>>> I'm considering denormalizing my data even further by
>>>>>>>>>>>> extracting all documents with files to another collection but hesitant
>>>>>>>>>>>> because I figure these queries should work without this much overhead.
On Wednesday, August 15, 2012 2:49:08 PM UTC+2, Oren Eini wrote:
> YEד
> On Wed, Aug 15, 2012 at 3:45 PM, Tobias Sebring <tseb...@gmail.com<javascript:>
> > wrote:
>> I will as soon as I can later today. Right now I don't have access to the >> database computer and rebuilding the index takes ~4 hours.
>> We are talking about the physical size of the index on disk right?
>> On Wednesday, August 15, 2012 2:15:13 PM UTC+2, Oren Eini wrote:
>>> Tobias,
>>> First, let us check some things.
>>> Can you check the actual size of the index?
>>> On Wed, Aug 15, 2012 at 3:05 PM, Tobias Sebring <tseb...@gmail.com>wrote:
>>>> What are my options for doing so and/or to make sure it stays in >>>> memory? The database is around 4-5GB and I don't expect it to become much >>>> larger than that. If I knew of a way to do it I could potentially pre-load >>>> the entire database into memory but it would still need to be persisted to >>>> disk.
>>>> On Wednesday, August 15, 2012 1:59:50 PM UTC+2, Oren Eini wrote:
>>>>> Strange.
>>>>> At any rate, you need to warm up the index, make sure that a lot of it >>>>> is in memory before you can really make perf comparisons. That is >>>>> especially true with large indexes.
>>>>> On Wed, Aug 15, 2012 at 2:59 PM, Tobias Sebring <tseb...@gmail.com>wrote:
>>>>>> 25-50 chars on average but in rare cases upwards of 100 chars.
>>>>>> On Wednesday, August 15, 2012 1:54:22 PM UTC+2, Oren Eini wrote:
>>>>>>> Tobias,
>>>>>>> What is the size of the text field?
>>>>>>> We do a lot of complex caching there, and depending on the actual >>>>>>> index size (can you check), loading through it might take a while.
>>>>>>> On Wed, Aug 15, 2012 at 2:45 PM, Tobias Sebring <tseb...@gmail.com>wrote:
>>>>>>>> Is 50 seconds reasonable for a cold start with the size of the data >>>>>>>> set I'm dealing with? If I remove the OrderBy on the text field the query >>>>>>>> will sort on Year, ProjectId and CaseId and return within 16 seconds which >>>>>>>> also seems slow for loading 2601 documents and doing an in-memory >>>>>>>> sort. A subsequent and modified query (e.g. adding another field to the >>>>>>>> query) will complete pretty fast. After a delay of inactivity the query >>>>>>>> will once again take 50 seconds to complete.
>>>>>>>> Considering these factors I'm guessing the sorting is done on the >>>>>>>> full document collection and then cached in memory? If so is there a way >>>>>>>> to increase the lifetime of this cache and/or possibly not sort on the full >>>>>>>> collection but on the documents returned from the query?
>>>>>>>> On Wednesday, August 15, 2012 1:19:35 PM UTC+2, Oren Eini wrote:
>>>>>>>>> Well, we do have to do extra work on your case, loading a lot more >>>>>>>>> info to memory to be able to sort on this.
>>>>>>>>> If this happens on a cold start, it may take a while.
>>>>>>>>> Can you try a different query (so it won't be cached), and see >>>>>>>>> what happens over time?
>>>>>>>>> On Wed, Aug 15, 2012 at 1:25 PM, Tobias Sebring <tseb...@gmail.com
>>>>>>>>> > wrote:
>>>>>>>>>> On Wednesday, August 15, 2012 11:53:53 AM UTC+2, Oren Eini wrote:
>>>>>>>>>>> How many items match the result?
>>>>>>>>>>> On Wed, Aug 15, 2012 at 12:36 PM, Tobias Sebring <
>>>>>>>>>>> tseb...@gmail.com> wrote:
>>>>>>>>>>>> Any clue as to why this is so slow? I have still not been able >>>>>>>>>>>> to figure this out.
>>>>>>>>>>>> On Tuesday, August 14, 2012 12:31:39 PM UTC+2, Tobias Sebring >>>>>>>>>>>> wrote:
>>>>>>>>>>>>> I'm seeing some strange results when adding OrderBy operands >>>>>>>>>>>>> to my queries on a fairly large dataset (2.3M documents).
>>>>>>>>>>>>> In the following queries FilesCount_Range:[**0x00**********000001 >>>>>>>>>>>>> TO NULL] will limit the dataset to 2600 out of the 2.3M >>>>>>>>>>>>> documents in the full dataset.
>>>>>>>>>>>>> The below query will return almost immediately:
>>>>>>>>>>>>> query=FilesCount_Range:[**0x0000**********0001 TO NULL]
>>>>>>>>>>>>> start=0
>>>>>>>>>>>>> pageSize=25
>>>>>>>>>>>>> aggregation=None
>>>>>>>>>>>>> fetch=Id
>>>>>>>>>>>>> fetch=Files
>>>>>>>>>>>>> This query on the other hand takes 50 seconds to return:
>>>>>>>>>>>>> query=FilesCount_Range:[**0x0000**********0001 TO NULL]
>>>>>>>>>>>>> start=0
>>>>>>>>>>>>> pageSize=25
>>>>>>>>>>>>> aggregation=None
>>>>>>>>>>>>> fetch=Id
>>>>>>>>>>>>> fetch=Files
>>>>>>>>>>>>> sort=Text
>>>>>>>>>>>>> sort=Year
>>>>>>>>>>>>> sort=ProjectId
>>>>>>>>>>>>> sort=CaseId
>>>>>>>>>>>>> This is taken from the mvc-profiler:
>>>>>>>>>>>>> "49308 ms waiting for server in 1 request(s) for 1 >>>>>>>>>>>>> sessions(s)Session opened for 50025.86 ms for 1 request(s) - Data/Index"
>>>>>>>>>>>>> Subsequent calls with the same orderby are cached and return >>>>>>>>>>>>> immediately.
>>>>>>>>>>>>> The index is set up like so:
>>>>>>>>>>>>> public class Data_Index : AbstractIndexCreationTask<**Data****
>>>>>>>>>>>>> ******, Data_ Index.Result>
>>>>>>>>>>>>> {
>>>>>>>>>>>>> public class Result
>>>>>>>>>>>>> {
>>>>>>>>>>>>> public string Text { get; set; }
>>>>>>>>>>>>> public int Year { get; set; }
>>>>>>>>>>>>> public int? ProjectId { get; set; }
>>>>>>>>>>>>> public int? CaseId { get; set; }
>>>>>>>>>>>>> public int FilesCount { get; set; }
>>>>>>>>>>>>> public string[] Files { get; set; }
>>>>>>>>>>>>> }
>>>>>>>>>>>>> public Data_ Index()
>>>>>>>>>>>>> {
>>>>>>>>>>>>> Map = data => from d in data
>>>>>>>>>>>>> select new
>>>>>>>>>>>>> {
>>>>>>>>>>>>> Text = d.Text,
>>>>>>>>>>>>> Year = d.Year,
>>>>>>>>>>>>> ProjectId = d.ProjectId,
>>>>>>>>>>>>> CaseId = d.CaseId,
>>>>>>>>>>>>> FilesCount = d.Files.Count,
>>>>>>>>>>>>> Files = d.Files.Select(x => x.Path).ToArray()
>>>>>>>>>>>>> };
>>>>>>>>>>>>> This is on the latest build of RavenDb running as a console >>>>>>>>>>>>> application. I've tried to reproduce this issue on a embedded server >>>>>>>>>>>>> running in memory but to no avail.
>>>>>>>>>>>>> I'm considering denormalizing my data even further by >>>>>>>>>>>>> extracting all documents with files to another collection but hesitant >>>>>>>>>>>>> because I figure these queries should work without this much overhead.
Hm,
Big enough that we will take several queries to load significant portions
of it to memory.
In short, as long as the site is serving queries, you'll see good perf.
If you have long period of inactivity, we will drop all resources.
> On Wednesday, August 15, 2012 2:49:08 PM UTC+2, Oren Eini wrote:
>> YEד
>> On Wed, Aug 15, 2012 at 3:45 PM, Tobias Sebring <tseb...@gmail.com>wrote:
>>> I will as soon as I can later today. Right now I don't have access to
>>> the database computer and rebuilding the index takes ~4 hours.
>>> We are talking about the physical size of the index on disk right?
>>> On Wednesday, August 15, 2012 2:15:13 PM UTC+2, Oren Eini wrote:
>>>> Tobias,
>>>> First, let us check some things.
>>>> Can you check the actual size of the index?
>>>> On Wed, Aug 15, 2012 at 3:05 PM, Tobias Sebring <tseb...@gmail.com>wrote:
>>>>> What are my options for doing so and/or to make sure it stays in
>>>>> memory? The database is around 4-5GB and I don't expect it to become much
>>>>> larger than that. If I knew of a way to do it I could potentially pre-load
>>>>> the entire database into memory but it would still need to be persisted to
>>>>> disk.
>>>>> On Wednesday, August 15, 2012 1:59:50 PM UTC+2, Oren Eini wrote:
>>>>>> Strange.
>>>>>> At any rate, you need to warm up the index, make sure that a lot of
>>>>>> it is in memory before you can really make perf comparisons. That is
>>>>>> especially true with large indexes.
>>>>>> On Wed, Aug 15, 2012 at 2:59 PM, Tobias Sebring <tseb...@gmail.com>wrote:
>>>>>>> 25-50 chars on average but in rare cases upwards of 100 chars.
>>>>>>> On Wednesday, August 15, 2012 1:54:22 PM UTC+2, Oren Eini wrote:
>>>>>>>> Tobias,
>>>>>>>> What is the size of the text field?
>>>>>>>> We do a lot of complex caching there, and depending on the actual
>>>>>>>> index size (can you check), loading through it might take a while.
>>>>>>>> On Wed, Aug 15, 2012 at 2:45 PM, Tobias Sebring <tseb...@gmail.com>wrote:
>>>>>>>>> Is 50 seconds reasonable for a cold start with the size of the
>>>>>>>>> data set I'm dealing with? If I remove the OrderBy on the text field the
>>>>>>>>> query will sort on Year, ProjectId and CaseId and return within 16 seconds
>>>>>>>>> which also seems slow for loading 2601 documents and doing an
>>>>>>>>> in-memory sort. A subsequent and modified query (e.g. adding another field
>>>>>>>>> to the query) will complete pretty fast. After a delay of inactivity the
>>>>>>>>> query will once again take 50 seconds to complete.
>>>>>>>>> Considering these factors I'm guessing the sorting is done on the
>>>>>>>>> full document collection and then cached in memory? If so is there a way
>>>>>>>>> to increase the lifetime of this cache and/or possibly not sort on the full
>>>>>>>>> collection but on the documents returned from the query?
>>>>>>>>> On Wednesday, August 15, 2012 1:19:35 PM UTC+2, Oren Eini wrote:
>>>>>>>>>> Well, we do have to do extra work on your case, loading a lot
>>>>>>>>>> more info to memory to be able to sort on this.
>>>>>>>>>> If this happens on a cold start, it may take a while.
>>>>>>>>>> Can you try a different query (so it won't be cached), and see
>>>>>>>>>> what happens over time?
>>>>>>>>>> On Wed, Aug 15, 2012 at 1:25 PM, Tobias Sebring <
>>>>>>>>>> tseb...@gmail.com> wrote:
>>>>>>>>>>> On Wednesday, August 15, 2012 11:53:53 AM UTC+2, Oren Eini
>>>>>>>>>>> wrote:
>>>>>>>>>>>> How many items match the result?
>>>>>>>>>>>> On Wed, Aug 15, 2012 at 12:36 PM, Tobias Sebring <
>>>>>>>>>>>> tseb...@gmail.com> wrote:
>>>>>>>>>>>>> Any clue as to why this is so slow? I have still not been able
>>>>>>>>>>>>> to figure this out.
>>>>>>>>>>>>> On Tuesday, August 14, 2012 12:31:39 PM UTC+2, Tobias Sebring
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>> I'm seeing some strange results when adding OrderBy operands
>>>>>>>>>>>>>> to my queries on a fairly large dataset (2.3M documents).
>>>>>>>>>>>>>> In the following queries FilesCount_Range:[**0x00************000001
>>>>>>>>>>>>>> TO NULL] will limit the dataset to 2600 out of the 2.3M
>>>>>>>>>>>>>> documents in the full dataset.
>>>>>>>>>>>>>> The below query will return almost immediately:
>>>>>>>>>>>>>> query=FilesCount_Range:[**0x0000************0001 TO NULL]
>>>>>>>>>>>>>> start=0
>>>>>>>>>>>>>> pageSize=25
>>>>>>>>>>>>>> aggregation=None
>>>>>>>>>>>>>> fetch=Id
>>>>>>>>>>>>>> fetch=Files
>>>>>>>>>>>>>> This query on the other hand takes 50 seconds to return:
>>>>>>>>>>>>>> query=FilesCount_Range:[**0x0000************0001 TO NULL]
>>>>>>>>>>>>>> start=0
>>>>>>>>>>>>>> pageSize=25
>>>>>>>>>>>>>> aggregation=None
>>>>>>>>>>>>>> fetch=Id
>>>>>>>>>>>>>> fetch=Files
>>>>>>>>>>>>>> sort=Text
>>>>>>>>>>>>>> sort=Year
>>>>>>>>>>>>>> sort=ProjectId
>>>>>>>>>>>>>> sort=CaseId
>>>>>>>>>>>>>> This is taken from the mvc-profiler:
>>>>>>>>>>>>>> "49308 ms waiting for server in 1 request(s) for 1
>>>>>>>>>>>>>> sessions(s)Session opened for 50025.86 ms for 1 request(s) - Data/Index"
>>>>>>>>>>>>>> Subsequent calls with the same orderby are cached and return
>>>>>>>>>>>>>> immediately.
>>>>>>>>>>>>>> The index is set up like so:
>>>>>>>>>>>>>> public class Data_Index : AbstractIndexCreationTask<**Data***
>>>>>>>>>>>>>> *********, Data_ Index.Result>
>>>>>>>>>>>>>> {
>>>>>>>>>>>>>> public class Result
>>>>>>>>>>>>>> {
>>>>>>>>>>>>>> public string Text { get; set; }
>>>>>>>>>>>>>> public int Year { get; set; }
>>>>>>>>>>>>>> public int? ProjectId { get; set; }
>>>>>>>>>>>>>> public int? CaseId { get; set; }
>>>>>>>>>>>>>> public int FilesCount { get; set; }
>>>>>>>>>>>>>> public string[] Files { get; set; }
>>>>>>>>>>>>>> }
>>>>>>>>>>>>>> public Data_ Index()
>>>>>>>>>>>>>> {
>>>>>>>>>>>>>> Map = data => from d in data
>>>>>>>>>>>>>> select new
>>>>>>>>>>>>>> {
>>>>>>>>>>>>>> Text = d.Text,
>>>>>>>>>>>>>> Year = d.Year,
>>>>>>>>>>>>>> ProjectId = d.ProjectId,
>>>>>>>>>>>>>> CaseId = d.CaseId,
>>>>>>>>>>>>>> FilesCount = d.Files.Count,
>>>>>>>>>>>>>> Files = d.Files.Select(x => x.Path).ToArray()
>>>>>>>>>>>>>> };
>>>>>>>>>>>>>> This is on the latest build of RavenDb running as a console
>>>>>>>>>>>>>> application. I've tried to reproduce this issue on a embedded server
>>>>>>>>>>>>>> running in memory but to no avail.
>>>>>>>>>>>>>> I'm considering denormalizing my data even further by
>>>>>>>>>>>>>> extracting all documents with files to another collection but hesitant
>>>>>>>>>>>>>> because I figure these queries should work without this much overhead.
Yes, the site will be inactive for most hours out of the day and will only see sporadic use that will not be concentrated in any specific period of time. I expect the resources will have been dropped almost every time a user accesses the site.
On Wednesday, August 15, 2012 3:55:09 PM UTC+2, Oren Eini wrote:
> Hm,
> Big enough that we will take several queries to load significant portions > of it to memory.
> In short, as long as the site is serving queries, you'll see good perf.
> If you have long period of inactivity, we will drop all resources.
> Do you expect long periods of inactivity?
> On Wed, Aug 15, 2012 at 4:53 PM, Tobias Sebring <tseb...@gmail.com<javascript:>
> > wrote:
>> On Wednesday, August 15, 2012 2:49:08 PM UTC+2, Oren Eini wrote:
>>> YEד
>>> On Wed, Aug 15, 2012 at 3:45 PM, Tobias Sebring <tseb...@gmail.com>wrote:
>>>> I will as soon as I can later today. Right now I don't have access to >>>> the database computer and rebuilding the index takes ~4 hours.
>>>> We are talking about the physical size of the index on disk right?
>>>> On Wednesday, August 15, 2012 2:15:13 PM UTC+2, Oren Eini wrote:
>>>>> Tobias,
>>>>> First, let us check some things.
>>>>> Can you check the actual size of the index?
>>>>> On Wed, Aug 15, 2012 at 3:05 PM, Tobias Sebring <tseb...@gmail.com>wrote:
>>>>>> What are my options for doing so and/or to make sure it stays in >>>>>> memory? The database is around 4-5GB and I don't expect it to become much >>>>>> larger than that. If I knew of a way to do it I could potentially pre-load >>>>>> the entire database into memory but it would still need to be persisted to >>>>>> disk.
>>>>>> On Wednesday, August 15, 2012 1:59:50 PM UTC+2, Oren Eini wrote:
>>>>>>> Strange.
>>>>>>> At any rate, you need to warm up the index, make sure that a lot of >>>>>>> it is in memory before you can really make perf comparisons. That is >>>>>>> especially true with large indexes.
>>>>>>> On Wed, Aug 15, 2012 at 2:59 PM, Tobias Sebring <tseb...@gmail.com>wrote:
>>>>>>>> 25-50 chars on average but in rare cases upwards of 100 chars.
>>>>>>>> On Wednesday, August 15, 2012 1:54:22 PM UTC+2, Oren Eini wrote:
>>>>>>>>> Tobias,
>>>>>>>>> What is the size of the text field?
>>>>>>>>> We do a lot of complex caching there, and depending on the actual >>>>>>>>> index size (can you check), loading through it might take a while.
>>>>>>>>> On Wed, Aug 15, 2012 at 2:45 PM, Tobias Sebring <tseb...@gmail.com
>>>>>>>>> > wrote:
>>>>>>>>>> Is 50 seconds reasonable for a cold start with the size of the >>>>>>>>>> data set I'm dealing with? If I remove the OrderBy on the text field the >>>>>>>>>> query will sort on Year, ProjectId and CaseId and return within 16 seconds >>>>>>>>>> which also seems slow for loading 2601 documents and doing an >>>>>>>>>> in-memory sort. A subsequent and modified query (e.g. adding another field >>>>>>>>>> to the query) will complete pretty fast. After a delay of inactivity the >>>>>>>>>> query will once again take 50 seconds to complete.
>>>>>>>>>> Considering these factors I'm guessing the sorting is done on the >>>>>>>>>> full document collection and then cached in memory? If so is there a way >>>>>>>>>> to increase the lifetime of this cache and/or possibly not sort on the full >>>>>>>>>> collection but on the documents returned from the query?
>>>>>>>>>> On Wednesday, August 15, 2012 1:19:35 PM UTC+2, Oren Eini wrote:
>>>>>>>>>>> Well, we do have to do extra work on your case, loading a lot >>>>>>>>>>> more info to memory to be able to sort on this.
>>>>>>>>>>> If this happens on a cold start, it may take a while.
>>>>>>>>>>> Can you try a different query (so it won't be cached), and see >>>>>>>>>>> what happens over time?
>>>>>>>>>>> On Wed, Aug 15, 2012 at 1:25 PM, Tobias Sebring <
>>>>>>>>>>> tseb...@gmail.com> wrote:
>>>>>>>>>>>> On Wednesday, August 15, 2012 11:53:53 AM UTC+2, Oren Eini >>>>>>>>>>>> wrote:
>>>>>>>>>>>>> How many items match the result?
>>>>>>>>>>>>> On Wed, Aug 15, 2012 at 12:36 PM, Tobias Sebring <
>>>>>>>>>>>>> tseb...@gmail.com> wrote:
>>>>>>>>>>>>>> Any clue as to why this is so slow? I have still not been >>>>>>>>>>>>>> able to figure this out.
>>>>>>>>>>>>>> On Tuesday, August 14, 2012 12:31:39 PM UTC+2, Tobias >>>>>>>>>>>>>> Sebring wrote:
>>>>>>>>>>>>>>> I'm seeing some strange results when adding OrderBy operands >>>>>>>>>>>>>>> to my queries on a fairly large dataset (2.3M documents).
>>>>>>>>>>>>>>> In the following queries FilesCount_Range:[**0x00***********
>>>>>>>>>>>>>>> *000001 TO NULL] will limit the dataset to 2600 out of the >>>>>>>>>>>>>>> 2.3M documents in the full dataset.
>>>>>>>>>>>>>>> The below query will return almost immediately:
>>>>>>>>>>>>>>> query=FilesCount_Range:[**0x0000************0001 TO NULL]
>>>>>>>>>>>>>>> start=0
>>>>>>>>>>>>>>> pageSize=25
>>>>>>>>>>>>>>> aggregation=None
>>>>>>>>>>>>>>> fetch=Id
>>>>>>>>>>>>>>> fetch=Files
>>>>>>>>>>>>>>> This query on the other hand takes 50 seconds to return:
>>>>>>>>>>>>>>> query=FilesCount_Range:[**0x0000************0001 TO NULL]
>>>>>>>>>>>>>>> start=0
>>>>>>>>>>>>>>> pageSize=25
>>>>>>>>>>>>>>> aggregation=None
>>>>>>>>>>>>>>> fetch=Id
>>>>>>>>>>>>>>> fetch=Files
>>>>>>>>>>>>>>> sort=Text
>>>>>>>>>>>>>>> sort=Year
>>>>>>>>>>>>>>> sort=ProjectId
>>>>>>>>>>>>>>> sort=CaseId
>>>>>>>>>>>>>>> This is taken from the mvc-profiler:
>>>>>>>>>>>>>>> "49308 ms waiting for server in 1 request(s) for 1 >>>>>>>>>>>>>>> sessions(s)Session opened for 50025.86 ms for 1 request(s) - Data/Index"
>>>>>>>>>>>>>>> Subsequent calls with the same orderby are cached and return >>>>>>>>>>>>>>> immediately.
>>>>>>>>>>>>>>> The index is set up like so:
>>>>>>>>>>>>>>> public class Data_Index : AbstractIndexCreationTask<**Data**
>>>>>>>>>>>>>>> **********, Data_ Index.Result>
>>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>> public class Result
>>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>> public string Text { get; set; }
>>>>>>>>>>>>>>> public int Year { get; set; }
>>>>>>>>>>>>>>> public int? ProjectId { get; set; }
>>>>>>>>>>>>>>> public int? CaseId { get; set; }
>>>>>>>>>>>>>>> public int FilesCount { get; set; }
>>>>>>>>>>>>>>> public string[] Files { get; set; }
>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>> public Data_ Index()
>>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>> Map = data => from d in data
>>>>>>>>>>>>>>> select new
>>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>> Text = d.Text,
>>>>>>>>>>>>>>> Year = d.Year,
>>>>>>>>>>>>>>> ProjectId = d.ProjectId,
>>>>>>>>>>>>>>> CaseId = d.CaseId,
>>>>>>>>>>>>>>> FilesCount = d.Files.Count,
>>>>>>>>>>>>>>> Files = d.Files.Select(x => x.Path).ToArray()
>>>>>>>>>>>>>>> };
>>>>>>>>>>>>>>> This is on the latest build of RavenDb running as a console >>>>>>>>>>>>>>> application. I've tried to reproduce this issue on a embedded server >>>>>>>>>>>>>>> running in memory but to no avail.
>>>>>>>>>>>>>>> I'm considering denormalizing my data even further by >>>>>>>>>>>>>>> extracting all documents with files to another collection but hesitant >>>>>>>>>>>>>>> because I figure these queries should work without this much overhead.
On Wed, Aug 15, 2012 at 5:01 PM, Tobias Sebring <tsebr...@gmail.com> wrote:
> Yes, the site will be inactive for most hours out of the day and will only
> see sporadic use that will not be concentrated in any specific period of
> time. I expect the resources will have been dropped almost every time a
> user accesses the site.
> On Wednesday, August 15, 2012 3:55:09 PM UTC+2, Oren Eini wrote:
>> Hm,
>> Big enough that we will take several queries to load significant portions
>> of it to memory.
>> In short, as long as the site is serving queries, you'll see good perf.
>> If you have long period of inactivity, we will drop all resources.
>> Do you expect long periods of inactivity?
>> On Wed, Aug 15, 2012 at 4:53 PM, Tobias Sebring <tseb...@gmail.com>wrote:
>>> On Wednesday, August 15, 2012 2:49:08 PM UTC+2, Oren Eini wrote:
>>>> YEד
>>>> On Wed, Aug 15, 2012 at 3:45 PM, Tobias Sebring <tseb...@gmail.com>wrote:
>>>>> I will as soon as I can later today. Right now I don't have access to
>>>>> the database computer and rebuilding the index takes ~4 hours.
>>>>> We are talking about the physical size of the index on disk right?
>>>>> On Wednesday, August 15, 2012 2:15:13 PM UTC+2, Oren Eini wrote:
>>>>>> Tobias,
>>>>>> First, let us check some things.
>>>>>> Can you check the actual size of the index?
>>>>>> On Wed, Aug 15, 2012 at 3:05 PM, Tobias Sebring <tseb...@gmail.com>wrote:
>>>>>>> What are my options for doing so and/or to make sure it stays in
>>>>>>> memory? The database is around 4-5GB and I don't expect it to become much
>>>>>>> larger than that. If I knew of a way to do it I could potentially pre-load
>>>>>>> the entire database into memory but it would still need to be persisted to
>>>>>>> disk.
>>>>>>> On Wednesday, August 15, 2012 1:59:50 PM UTC+2, Oren Eini wrote:
>>>>>>>> Strange.
>>>>>>>> At any rate, you need to warm up the index, make sure that a lot of
>>>>>>>> it is in memory before you can really make perf comparisons. That is
>>>>>>>> especially true with large indexes.
>>>>>>>> On Wed, Aug 15, 2012 at 2:59 PM, Tobias Sebring <tseb...@gmail.com
>>>>>>>> > wrote:
>>>>>>>>> 25-50 chars on average but in rare cases upwards of 100 chars.
>>>>>>>>> On Wednesday, August 15, 2012 1:54:22 PM UTC+2, Oren Eini wrote:
>>>>>>>>>> Tobias,
>>>>>>>>>> What is the size of the text field?
>>>>>>>>>> We do a lot of complex caching there, and depending on the actual
>>>>>>>>>> index size (can you check), loading through it might take a while.
>>>>>>>>>> On Wed, Aug 15, 2012 at 2:45 PM, Tobias Sebring <
>>>>>>>>>> tseb...@gmail.com> wrote:
>>>>>>>>>>> Is 50 seconds reasonable for a cold start with the size of the
>>>>>>>>>>> data set I'm dealing with? If I remove the OrderBy on the text field the
>>>>>>>>>>> query will sort on Year, ProjectId and CaseId and return within 16 seconds
>>>>>>>>>>> which also seems slow for loading 2601 documents and doing an
>>>>>>>>>>> in-memory sort. A subsequent and modified query (e.g. adding another field
>>>>>>>>>>> to the query) will complete pretty fast. After a delay of inactivity the
>>>>>>>>>>> query will once again take 50 seconds to complete.
>>>>>>>>>>> Considering these factors I'm guessing the sorting is done on
>>>>>>>>>>> the full document collection and then cached in memory? If so is there a
>>>>>>>>>>> way to increase the lifetime of this cache and/or possibly not sort on the
>>>>>>>>>>> full collection but on the documents returned from the query?
>>>>>>>>>>> On Wednesday, August 15, 2012 1:19:35 PM UTC+2, Oren Eini wrote:
>>>>>>>>>>>> Well, we do have to do extra work on your case, loading a lot
>>>>>>>>>>>> more info to memory to be able to sort on this.
>>>>>>>>>>>> If this happens on a cold start, it may take a while.
>>>>>>>>>>>> Can you try a different query (so it won't be cached), and see
>>>>>>>>>>>> what happens over time?
>>>>>>>>>>>> On Wed, Aug 15, 2012 at 1:25 PM, Tobias Sebring <
>>>>>>>>>>>> tseb...@gmail.com> wrote:
>>>>>>>>>>>>> On Wednesday, August 15, 2012 11:53:53 AM UTC+2, Oren Eini
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>> How many items match the result?
>>>>>>>>>>>>>> On Wed, Aug 15, 2012 at 12:36 PM, Tobias Sebring <
>>>>>>>>>>>>>> tseb...@gmail.com> wrote:
>>>>>>>>>>>>>>> Any clue as to why this is so slow? I have still not been
>>>>>>>>>>>>>>> able to figure this out.
>>>>>>>>>>>>>>> On Tuesday, August 14, 2012 12:31:39 PM UTC+2, Tobias
>>>>>>>>>>>>>>> Sebring wrote:
>>>>>>>>>>>>>>>> I'm seeing some strange results when adding OrderBy
>>>>>>>>>>>>>>>> operands to my queries on a fairly large dataset (2.3M documents).
>>>>>>>>>>>>>>>> In the following queries FilesCount_Range:[**0x00**********
>>>>>>>>>>>>>>>> ****000001 TO NULL] will limit the dataset to 2600 out of
>>>>>>>>>>>>>>>> the 2.3M documents in the full dataset.
>>>>>>>>>>>>>>>> The below query will return almost immediately:
>>>>>>>>>>>>>>>> query=FilesCount_Range:[**0x0000**************0001 TO NULL]
>>>>>>>>>>>>>>>> start=0
>>>>>>>>>>>>>>>> pageSize=25
>>>>>>>>>>>>>>>> aggregation=None
>>>>>>>>>>>>>>>> fetch=Id
>>>>>>>>>>>>>>>> fetch=Files
>>>>>>>>>>>>>>>> This query on the other hand takes 50 seconds to return:
>>>>>>>>>>>>>>>> query=FilesCount_Range:[**0x0000**************0001 TO NULL]
>>>>>>>>>>>>>>>> start=0
>>>>>>>>>>>>>>>> pageSize=25
>>>>>>>>>>>>>>>> aggregation=None
>>>>>>>>>>>>>>>> fetch=Id
>>>>>>>>>>>>>>>> fetch=Files
>>>>>>>>>>>>>>>> sort=Text
>>>>>>>>>>>>>>>> sort=Year
>>>>>>>>>>>>>>>> sort=ProjectId
>>>>>>>>>>>>>>>> sort=CaseId
>>>>>>>>>>>>>>>> This is taken from the mvc-profiler:
>>>>>>>>>>>>>>>> "49308 ms waiting for server in 1 request(s) for 1
>>>>>>>>>>>>>>>> sessions(s)Session opened for 50025.86 ms for 1 request(s) - Data/Index"
>>>>>>>>>>>>>>>> Subsequent calls with the same orderby are cached and
>>>>>>>>>>>>>>>> return immediately.
>>>>>>>>>>>>>>>> The index is set up like so:
>>>>>>>>>>>>>>>> public class Data_Index : AbstractIndexCreationTask<**Data*
>>>>>>>>>>>>>>>> *************, Data_ Index.Result>
>>>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>>> public class Result
>>>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>>> public string Text { get; set; }
>>>>>>>>>>>>>>>> public int Year { get; set; }
>>>>>>>>>>>>>>>> public int? ProjectId { get; set; }
>>>>>>>>>>>>>>>> public int? CaseId { get; set; }
>>>>>>>>>>>>>>>> public int FilesCount { get; set; }
>>>>>>>>>>>>>>>> public string[] Files { get; set; }
>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>> public Data_ Index()
>>>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>>> Map = data => from d in data
>>>>>>>>>>>>>>>> select new
>>>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>>> Text = d.Text,
>>>>>>>>>>>>>>>> Year = d.Year,
>>>>>>>>>>>>>>>> ProjectId = d.ProjectId,
>>>>>>>>>>>>>>>> CaseId = d.CaseId,
>>>>>>>>>>>>>>>> FilesCount = d.Files.Count,
>>>>>>>>>>>>>>>> Files = d.Files.Select(x => x.Path).ToArray()
>>>>>>>>>>>>>>>> };
>>>>>>>>>>>>>>>> This is on the latest build of RavenDb running as a console
>>>>>>>>>>>>>>>> application. I've tried to reproduce this issue on a embedded server
>>>>>>>>>>>>>>>> running in memory but to no avail.
>>>>>>>>>>>>>>>> I'm considering denormalizing my data even further by
>>>>>>>>>>>>>>>> extracting all documents with files to another collection but hesitant
>>>>>>>>>>>>>>>> because I figure these queries should work without this much overhead.
I'm currently running it as a console application (i.e. haven't installed it as a service but not running embedded). The site is running in IIS. Going to try that setting to see if it makes any difference.
Another thing I've noticed over the past day is that if I leave indexing on while doing the batch import (takes aprox. 40 min) around 2,200,000 / 2,300,000 documents will have already been processed in the index when it's done. If I repeat the same batch import with indexing turned off, turn it back on, and wait for indexing to complete - the same index will take 3-4 hours to build with a very noticeable decrease in performance as the index gets bigger.
Another issue with the indexing seems to be that after you stop actively adding documents to the index the indexing will slow down and those last 100,000 documents may take 15-20 minutes to complete and more often than not the index will remain stale even though all documents have been processed (to the point that it does not return any results when querying). I've noticed that if I shutdown the ravendb server when this happens and then start it back up the index will no longer be stale.
On Thursday, August 16, 2012 6:02:33 PM UTC+2, Oren Eini wrote:
> How are you running this?
> In a service? In IIS?
> I assume that you are running in a child db mode, you need to set:
> Raven/Tenants/MaxIdleTimeForTenantDatabase
> To a high value.
> If you are running in IIS, you need to set IIS idle settings to zero and > disable recycling.
> On Wed, Aug 15, 2012 at 5:01 PM, Tobias Sebring <tseb...@gmail.com<javascript:>
> > wrote:
>> Yes, the site will be inactive for most hours out of the day and will >> only see sporadic use that will not be concentrated in any specific period >> of time. I expect the resources will have been dropped almost every time a >> user accesses the site.
>> On Wednesday, August 15, 2012 3:55:09 PM UTC+2, Oren Eini wrote:
>>> Hm,
>>> Big enough that we will take several queries to load significant >>> portions of it to memory.
>>> In short, as long as the site is serving queries, you'll see good perf.
>>> If you have long period of inactivity, we will drop all resources.
>>> Do you expect long periods of inactivity?
>>> On Wed, Aug 15, 2012 at 4:53 PM, Tobias Sebring <tseb...@gmail.com>wrote:
>>>> The index is 759mb.
>>>> Data file is 3,63 GB.
>>>> On Wednesday, August 15, 2012 2:49:08 PM UTC+2, Oren Eini wrote:
>>>>> YEד
>>>>> On Wed, Aug 15, 2012 at 3:45 PM, Tobias Sebring <tseb...@gmail.com>wrote:
>>>>>> I will as soon as I can later today. Right now I don't have access to >>>>>> the database computer and rebuilding the index takes ~4 hours.
>>>>>> We are talking about the physical size of the index on disk right?
>>>>>> On Wednesday, August 15, 2012 2:15:13 PM UTC+2, Oren Eini wrote:
>>>>>>> Tobias,
>>>>>>> First, let us check some things.
>>>>>>> Can you check the actual size of the index?
>>>>>>> On Wed, Aug 15, 2012 at 3:05 PM, Tobias Sebring <tseb...@gmail.com>wrote:
>>>>>>>> What are my options for doing so and/or to make sure it stays in >>>>>>>> memory? The database is around 4-5GB and I don't expect it to become much >>>>>>>> larger than that. If I knew of a way to do it I could potentially pre-load >>>>>>>> the entire database into memory but it would still need to be persisted to >>>>>>>> disk.
>>>>>>>> On Wednesday, August 15, 2012 1:59:50 PM UTC+2, Oren Eini wrote:
>>>>>>>>> Strange.
>>>>>>>>> At any rate, you need to warm up the index, make sure that a lot >>>>>>>>> of it is in memory before you can really make perf comparisons. That is >>>>>>>>> especially true with large indexes.
>>>>>>>>> On Wed, Aug 15, 2012 at 2:59 PM, Tobias Sebring <
>>>>>>>>> tseb...@gmail.com> wrote:
>>>>>>>>>> 25-50 chars on average but in rare cases upwards of 100 chars.
>>>>>>>>>> On Wednesday, August 15, 2012 1:54:22 PM UTC+2, Oren Eini wrote:
>>>>>>>>>>> Tobias,
>>>>>>>>>>> What is the size of the text field?
>>>>>>>>>>> We do a lot of complex caching there, and depending on the >>>>>>>>>>> actual index size (can you check), loading through it might take a while.
>>>>>>>>>>> On Wed, Aug 15, 2012 at 2:45 PM, Tobias Sebring <
>>>>>>>>>>> tseb...@gmail.com> wrote:
>>>>>>>>>>>> Is 50 seconds reasonable for a cold start with the size of the >>>>>>>>>>>> data set I'm dealing with? If I remove the OrderBy on the text field the >>>>>>>>>>>> query will sort on Year, ProjectId and CaseId and return within 16 seconds >>>>>>>>>>>> which also seems slow for loading 2601 documents and doing an >>>>>>>>>>>> in-memory sort. A subsequent and modified query (e.g. adding another field >>>>>>>>>>>> to the query) will complete pretty fast. After a delay of inactivity the >>>>>>>>>>>> query will once again take 50 seconds to complete.
>>>>>>>>>>>> Considering these factors I'm guessing the sorting is done on >>>>>>>>>>>> the full document collection and then cached in memory? If so is there a >>>>>>>>>>>> way to increase the lifetime of this cache and/or possibly not sort on the >>>>>>>>>>>> full collection but on the documents returned from the query?
>>>>>>>>>>>> On Wednesday, August 15, 2012 1:19:35 PM UTC+2, Oren Eini >>>>>>>>>>>> wrote:
>>>>>>>>>>>>> Well, we do have to do extra work on your case, loading a lot >>>>>>>>>>>>> more info to memory to be able to sort on this.
>>>>>>>>>>>>> If this happens on a cold start, it may take a while.
>>>>>>>>>>>>> Can you try a different query (so it won't be cached), and see >>>>>>>>>>>>> what happens over time?
>>>>>>>>>>>>> On Wed, Aug 15, 2012 at 1:25 PM, Tobias Sebring <
>>>>>>>>>>>>> tseb...@gmail.com> wrote:
>>>>>>>>>>>>>> On Wednesday, August 15, 2012 11:53:53 AM UTC+2, Oren Eini >>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>> How many items match the result?
>>>>>>>>>>>>>>> On Wed, Aug 15, 2012 at 12:36 PM, Tobias Sebring <
>>>>>>>>>>>>>>> tseb...@gmail.com> wrote:
>>>>>>>>>>>>>>>> Any clue as to why this is so slow? I have still not been >>>>>>>>>>>>>>>> able to figure this out.
>>>>>>>>>>>>>>>> On Tuesday, August 14, 2012 12:31:39 PM UTC+2, Tobias >>>>>>>>>>>>>>>> Sebring wrote:
>>>>>>>>>>>>>>>>> I'm seeing some strange results when adding OrderBy >>>>>>>>>>>>>>>>> operands to my queries on a fairly large dataset (2.3M documents).
>>>>>>>>>>>>>>>>> In the following queries FilesCount_Range:[**0x00*********
>>>>>>>>>>>>>>>>> *****000001 TO NULL] will limit the dataset to 2600 out >>>>>>>>>>>>>>>>> of the 2.3M documents in the full dataset.
>>>>>>>>>>>>>>>>> The below query will return almost immediately:
>>>>>>>>>>>>>>>>> query=FilesCount_Range:[**0x0000**************0001 TO >>>>>>>>>>>>>>>>> NULL]
>>>>>>>>>>>>>>>>> start=0
>>>>>>>>>>>>>>>>> pageSize=25
>>>>>>>>>>>>>>>>> aggregation=None
>>>>>>>>>>>>>>>>> fetch=Id
>>>>>>>>>>>>>>>>> fetch=Files
>>>>>>>>>>>>>>>>> This query on the other hand takes 50 seconds to return:
>>>>>>>>>>>>>>>>> query=FilesCount_Range:[**0x0000**************0001 TO >>>>>>>>>>>>>>>>> NULL]
>>>>>>>>>>>>>>>>> start=0
>>>>>>>>>>>>>>>>> pageSize=25
>>>>>>>>>>>>>>>>> aggregation=None
>>>>>>>>>>>>>>>>> fetch=Id
>>>>>>>>>>>>>>>>> fetch=Files
>>>>>>>>>>>>>>>>> sort=Text
>>>>>>>>>>>>>>>>> sort=Year
>>>>>>>>>>>>>>>>> sort=ProjectId
>>>>>>>>>>>>>>>>> sort=CaseId
>>>>>>>>>>>>>>>>> This is taken from the mvc-profiler:
>>>>>>>>>>>>>>>>> "49308 ms waiting for server in 1 request(s) for 1 >>>>>>>>>>>>>>>>> sessions(s)Session opened for 50025.86 ms for 1 request(s) - Data/Index"
>>>>>>>>>>>>>>>>> Subsequent calls with the same orderby are cached and >>>>>>>>>>>>>>>>> return immediately.
>>>>>>>>>>>>>>>>> The index is set up like so:
>>>>>>>>>>>>>>>>> public class Data_Index : AbstractIndexCreationTask<**Data
>>>>>>>>>>>>>>>>> **************, Data_ Index.Result>
>>>>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>>>> public class Result
>>>>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>>>> public string Text { get; set; }
>>>>>>>>>>>>>>>>> public int Year { get; set; }
>>>>>>>>>>>>>>>>> public int? ProjectId { get; set; }
>>>>>>>>>>>>>>>>> public int? CaseId { get; set; }
>>>>>>>>>>>>>>>>> public int FilesCount { get; set; }
>>>>>>>>>>>>>>>>> public string[] Files { get; set; }
>>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>> public Data_ Index()
>>>>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>>>> Map = data => from d in data
>>>>>>>>>>>>>>>>> select new
>>>>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>>>> Text = d.Text,
>>>>>>>>>>>>>>>>> Year = d.Year,
>>>>>>>>>>>>>>>>> ProjectId = d.ProjectId,
>>>>>>>>>>>>>>>>> CaseId = d.CaseId,
>>>>>>>>>>>>>>>>> FilesCount = d.Files.Count,
>>>>>>>>>>>>>>>>> Files = d.Files.Select(x => x.Path).ToArray()
>>>>>>>>>>>>>>>>> };
On Thu, Aug 16, 2012 at 7:24 PM, Tobias Sebring <tsebr...@gmail.com> wrote:
> I'm currently running it as a console application (i.e. haven't installed
> it as a service but not running embedded). The site is running in IIS.
> Going to try that setting to see if it makes any difference.
> Another thing I've noticed over the past day is that if I leave indexing
> on while doing the batch import (takes aprox. 40 min) around 2,200,000 /
> 2,300,000 documents will have already been processed in the index when it's
> done. If I repeat the same batch import with indexing turned off, turn it
> back on, and wait for indexing to complete - the same index will take 3-4
> hours to build with a very noticeable decrease in performance as the index
> gets bigger.
> Another issue with the indexing seems to be that after you stop actively
> adding documents to the index the indexing will slow down and those last
> 100,000 documents may take 15-20 minutes to complete and more often than
> not the index will remain stale even though all documents have been
> processed (to the point that it does not return any results when querying).
> I've noticed that if I shutdown the ravendb server when this happens and
> then start it back up the index will no longer be stale.
> On Thursday, August 16, 2012 6:02:33 PM UTC+2, Oren Eini wrote:
>> How are you running this?
>> In a service? In IIS?
>> I assume that you are running in a child db mode, you need to set:
>> Raven/Tenants/**MaxIdleTimeForTenantDatabase
>> To a high value.
>> If you are running in IIS, you need to set IIS idle settings to zero and
>> disable recycling.
>> On Wed, Aug 15, 2012 at 5:01 PM, Tobias Sebring <tseb...@gmail.com>wrote:
>>> Yes, the site will be inactive for most hours out of the day and will
>>> only see sporadic use that will not be concentrated in any specific period
>>> of time. I expect the resources will have been dropped almost every time a
>>> user accesses the site.
>>> On Wednesday, August 15, 2012 3:55:09 PM UTC+2, Oren Eini wrote:
>>>> Hm,
>>>> Big enough that we will take several queries to load significant
>>>> portions of it to memory.
>>>> In short, as long as the site is serving queries, you'll see good perf.
>>>> If you have long period of inactivity, we will drop all resources.
>>>> Do you expect long periods of inactivity?
>>>> On Wed, Aug 15, 2012 at 4:53 PM, Tobias Sebring <tseb...@gmail.com>wrote:
>>>>> The index is 759mb.
>>>>> Data file is 3,63 GB.
>>>>> On Wednesday, August 15, 2012 2:49:08 PM UTC+2, Oren Eini wrote:
>>>>>> YEד
>>>>>> On Wed, Aug 15, 2012 at 3:45 PM, Tobias Sebring <tseb...@gmail.com>wrote:
>>>>>>> I will as soon as I can later today. Right now I don't have access
>>>>>>> to the database computer and rebuilding the index takes ~4 hours.
>>>>>>> We are talking about the physical size of the index on disk right?
>>>>>>> On Wednesday, August 15, 2012 2:15:13 PM UTC+2, Oren Eini wrote:
>>>>>>>> Tobias,
>>>>>>>> First, let us check some things.
>>>>>>>> Can you check the actual size of the index?
>>>>>>>> On Wed, Aug 15, 2012 at 3:05 PM, Tobias Sebring <tseb...@gmail.com>wrote:
>>>>>>>>> What are my options for doing so and/or to make sure it stays in
>>>>>>>>> memory? The database is around 4-5GB and I don't expect it to become much
>>>>>>>>> larger than that. If I knew of a way to do it I could potentially pre-load
>>>>>>>>> the entire database into memory but it would still need to be persisted to
>>>>>>>>> disk.
>>>>>>>>> On Wednesday, August 15, 2012 1:59:50 PM UTC+2, Oren Eini wrote:
>>>>>>>>>> Strange.
>>>>>>>>>> At any rate, you need to warm up the index, make sure that a lot
>>>>>>>>>> of it is in memory before you can really make perf comparisons. That is
>>>>>>>>>> especially true with large indexes.
>>>>>>>>>> On Wed, Aug 15, 2012 at 2:59 PM, Tobias Sebring <
>>>>>>>>>> tseb...@gmail.com> wrote:
>>>>>>>>>>> 25-50 chars on average but in rare cases upwards of 100 chars.
>>>>>>>>>>> On Wednesday, August 15, 2012 1:54:22 PM UTC+2, Oren Eini wrote:
>>>>>>>>>>>> Tobias,
>>>>>>>>>>>> What is the size of the text field?
>>>>>>>>>>>> We do a lot of complex caching there, and depending on the
>>>>>>>>>>>> actual index size (can you check), loading through it might take a while.
>>>>>>>>>>>> On Wed, Aug 15, 2012 at 2:45 PM, Tobias Sebring <
>>>>>>>>>>>> tseb...@gmail.com> wrote:
>>>>>>>>>>>>> Is 50 seconds reasonable for a cold start with the size of the
>>>>>>>>>>>>> data set I'm dealing with? If I remove the OrderBy on the text field the
>>>>>>>>>>>>> query will sort on Year, ProjectId and CaseId and return within 16 seconds
>>>>>>>>>>>>> which also seems slow for loading 2601 documents and doing an
>>>>>>>>>>>>> in-memory sort. A subsequent and modified query (e.g. adding another field
>>>>>>>>>>>>> to the query) will complete pretty fast. After a delay of inactivity the
>>>>>>>>>>>>> query will once again take 50 seconds to complete.
>>>>>>>>>>>>> Considering these factors I'm guessing the sorting is done on
>>>>>>>>>>>>> the full document collection and then cached in memory? If so is there a
>>>>>>>>>>>>> way to increase the lifetime of this cache and/or possibly not sort on the
>>>>>>>>>>>>> full collection but on the documents returned from the query?
>>>>>>>>>>>>> On Wednesday, August 15, 2012 1:19:35 PM UTC+2, Oren Eini
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>> Well, we do have to do extra work on your case, loading a lot
>>>>>>>>>>>>>> more info to memory to be able to sort on this.
>>>>>>>>>>>>>> If this happens on a cold start, it may take a while.
>>>>>>>>>>>>>> Can you try a different query (so it won't be cached), and
>>>>>>>>>>>>>> see what happens over time?
>>>>>>>>>>>>>> On Wed, Aug 15, 2012 at 1:25 PM, Tobias Sebring <
>>>>>>>>>>>>>> tseb...@gmail.com> wrote:
>>>>>>>>>>>>>>> On Wednesday, August 15, 2012 11:53:53 AM UTC+2, Oren Eini
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>> How many items match the result?
>>>>>>>>>>>>>>>> On Wed, Aug 15, 2012 at 12:36 PM, Tobias Sebring <
>>>>>>>>>>>>>>>> tseb...@gmail.com> wrote:
>>>>>>>>>>>>>>>>> Any clue as to why this is so slow? I have still not been
>>>>>>>>>>>>>>>>> able to figure this out.
>>>>>>>>>>>>>>>>> On Tuesday, August 14, 2012 12:31:39 PM UTC+2, Tobias
>>>>>>>>>>>>>>>>> Sebring wrote:
>>>>>>>>>>>>>>>>>> I'm seeing some strange results when adding OrderBy
>>>>>>>>>>>>>>>>>> operands to my queries on a fairly large dataset (2.3M documents).
>>>>>>>>>>>>>>>>>> In the following queries FilesCount_Range:[**0x00********
>>>>>>>>>>>>>>>>>> ********000001 TO NULL] will limit the dataset to 2600out of the 2.3M documents in the full dataset.
>>>>>>>>>>>>>>>>>> The below query will return almost immediately:
>>>>>>>>>>>>>>>>>> query=FilesCount_Range:[**0x0000****************0001 TO
>>>>>>>>>>>>>>>>>> NULL]
>>>>>>>>>>>>>>>>>> start=0
>>>>>>>>>>>>>>>>>> pageSize=25
>>>>>>>>>>>>>>>>>> aggregation=None
>>>>>>>>>>>>>>>>>> fetch=Id
>>>>>>>>>>>>>>>>>> fetch=Files
>>>>>>>>>>>>>>>>>> This query on the other hand takes 50 seconds to return:
>>>>>>>>>>>>>>>>>> query=FilesCount_Range:[**0x0000****************0001 TO
>>>>>>>>>>>>>>>>>> NULL]
>>>>>>>>>>>>>>>>>> start=0
>>>>>>>>>>>>>>>>>> pageSize=25
>>>>>>>>>>>>>>>>>> aggregation=None
>>>>>>>>>>>>>>>>>> fetch=Id
>>>>>>>>>>>>>>>>>> fetch=Files
>>>>>>>>>>>>>>>>>> sort=Text
>>>>>>>>>>>>>>>>>> sort=Year
>>>>>>>>>>>>>>>>>> sort=ProjectId
>>>>>>>>>>>>>>>>>> sort=CaseId
>>>>>>>>>>>>>>>>>> This is taken from the mvc-profiler:
>>>>>>>>>>>>>>>>>> "49308 ms waiting for server in 1 request(s) for 1
>>>>>>>>>>>>>>>>>> sessions(s)Session opened for 50025.86 ms for 1 request(s) - Data/Index"
>>>>>>>>>>>>>>>>>> Subsequent calls with the same orderby are cached and
>>>>>>>>>>>>>>>>>> return immediately.
>>>>>>>>>>>>>>>>>> The index is set up like so:
>>>>>>>>>>>>>>>>>> public class Data_Index : AbstractIndexCreationTask<**
>>>>>>>>>>>>>>>>>> Data****************, Data_ Index.Result>
>>>>>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>>>>> public class Result
>>>>>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>>>>> public string Text { get; set; }
>>>>>>>>>>>>>>>>>> public int Year { get; set; }
>>>>>>>>>>>>>>>>>> public int? ProjectId { get; set; }
>>>>>>>>>>>>>>>>>> public int? CaseId { get; set; }
>>>>>>>>>>>>>>>>>> public int FilesCount { get; set; }
>>>>>>>>>>>>>>>>>> public string[] Files { get; set; }
>>>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>>> public Data_ Index()
>>>>>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>>>>> Map = data => from d in data
>>>>>>>>>>>>>>>>>> select new
>>>>>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>>>>> Text = d.Text,
>>>>>>>>>>>>>>>>>> Year = d.Year,
>>>>>>>>>>>>>>>>>> ProjectId = d.ProjectId,
>>>>>>>>>>>>>>>>>> CaseId = d.CaseId,
>>>>>>>>>>>>>>>>>> FilesCount = d.Files.Count,
>>>>>>>>>>>>>>>>>> Files = d.Files.Select(x => x.Path).ToArray()
>>>>>>>>>>>>>>>>>> };
I'm not sure I understand "child database". It's a separate database from the system database and I've created it in the manager by clicking New database.
On Thursday, August 16, 2012 6:28:30 PM UTC+2, Oren Eini wrote:
> Tobias,
> Is it a child database?
> On Thu, Aug 16, 2012 at 7:24 PM, Tobias Sebring <tseb...@gmail.com<javascript:>
> > wrote:
>> I'm currently running it as a console application (i.e. haven't installed >> it as a service but not running embedded). The site is running in IIS. >> Going to try that setting to see if it makes any difference.
>> Another thing I've noticed over the past day is that if I leave indexing >> on while doing the batch import (takes aprox. 40 min) around 2,200,000 / >> 2,300,000 documents will have already been processed in the index when it's >> done. If I repeat the same batch import with indexing turned off, turn it >> back on, and wait for indexing to complete - the same index will take 3-4 >> hours to build with a very noticeable decrease in performance as the index >> gets bigger.
>> Another issue with the indexing seems to be that after you stop actively >> adding documents to the index the indexing will slow down and those last >> 100,000 documents may take 15-20 minutes to complete and more often than >> not the index will remain stale even though all documents have been >> processed (to the point that it does not return any results when querying). >> I've noticed that if I shutdown the ravendb server when this happens and >> then start it back up the index will no longer be stale.
>> On Thursday, August 16, 2012 6:02:33 PM UTC+2, Oren Eini wrote:
>>> How are you running this?
>>> In a service? In IIS?
>>> I assume that you are running in a child db mode, you need to set:
>>> Raven/Tenants/**MaxIdleTimeForTenantDatabase
>>> To a high value.
>>> If you are running in IIS, you need to set IIS idle settings to zero and >>> disable recycling.
>>> On Wed, Aug 15, 2012 at 5:01 PM, Tobias Sebring <tseb...@gmail.com>wrote:
>>>> Yes, the site will be inactive for most hours out of the day and will >>>> only see sporadic use that will not be concentrated in any specific period >>>> of time. I expect the resources will have been dropped almost every time a >>>> user accesses the site.
>>>> On Wednesday, August 15, 2012 3:55:09 PM UTC+2, Oren Eini wrote:
>>>>> Hm,
>>>>> Big enough that we will take several queries to load significant >>>>> portions of it to memory.
>>>>> In short, as long as the site is serving queries, you'll see good perf.
>>>>> If you have long period of inactivity, we will drop all resources.
>>>>> Do you expect long periods of inactivity?
>>>>> On Wed, Aug 15, 2012 at 4:53 PM, Tobias Sebring <tseb...@gmail.com>wrote:
>>>>>> The index is 759mb.
>>>>>> Data file is 3,63 GB.
>>>>>> On Wednesday, August 15, 2012 2:49:08 PM UTC+2, Oren Eini wrote:
>>>>>>> YEד
>>>>>>> On Wed, Aug 15, 2012 at 3:45 PM, Tobias Sebring <tseb...@gmail.com>wrote:
>>>>>>>> I will as soon as I can later today. Right now I don't have access >>>>>>>> to the database computer and rebuilding the index takes ~4 hours.
>>>>>>>> We are talking about the physical size of the index on disk right?
>>>>>>>> On Wednesday, August 15, 2012 2:15:13 PM UTC+2, Oren Eini wrote:
>>>>>>>>> Tobias,
>>>>>>>>> First, let us check some things.
>>>>>>>>> Can you check the actual size of the index?
>>>>>>>>> On Wed, Aug 15, 2012 at 3:05 PM, Tobias Sebring <tseb...@gmail.com
>>>>>>>>> > wrote:
>>>>>>>>>> What are my options for doing so and/or to make sure it stays in >>>>>>>>>> memory? The database is around 4-5GB and I don't expect it to become much >>>>>>>>>> larger than that. If I knew of a way to do it I could potentially pre-load >>>>>>>>>> the entire database into memory but it would still need to be persisted to >>>>>>>>>> disk.
>>>>>>>>>> On Wednesday, August 15, 2012 1:59:50 PM UTC+2, Oren Eini wrote:
>>>>>>>>>>> Strange.
>>>>>>>>>>> At any rate, you need to warm up the index, make sure that a lot >>>>>>>>>>> of it is in memory before you can really make perf comparisons. That is >>>>>>>>>>> especially true with large indexes.
>>>>>>>>>>> On Wed, Aug 15, 2012 at 2:59 PM, Tobias Sebring <
>>>>>>>>>>> tseb...@gmail.com> wrote:
>>>>>>>>>>>> 25-50 chars on average but in rare cases upwards of 100 chars.
>>>>>>>>>>>> On Wednesday, August 15, 2012 1:54:22 PM UTC+2, Oren Eini >>>>>>>>>>>> wrote:
>>>>>>>>>>>>> Tobias,
>>>>>>>>>>>>> What is the size of the text field?
>>>>>>>>>>>>> We do a lot of complex caching there, and depending on the >>>>>>>>>>>>> actual index size (can you check), loading through it might take a while.
>>>>>>>>>>>>> On Wed, Aug 15, 2012 at 2:45 PM, Tobias Sebring <
>>>>>>>>>>>>> tseb...@gmail.com> wrote:
>>>>>>>>>>>>>> Is 50 seconds reasonable for a cold start with the size of >>>>>>>>>>>>>> the data set I'm dealing with? If I remove the OrderBy on the text field >>>>>>>>>>>>>> the query will sort on Year, ProjectId and CaseId and return within 16 >>>>>>>>>>>>>> seconds which also seems slow for loading 2601 documents and >>>>>>>>>>>>>> doing an in-memory sort. A subsequent and modified query (e.g. adding >>>>>>>>>>>>>> another field to the query) will complete pretty fast. After a delay of >>>>>>>>>>>>>> inactivity the query will once again take 50 seconds to complete.
>>>>>>>>>>>>>> Considering these factors I'm guessing the sorting is done on >>>>>>>>>>>>>> the full document collection and then cached in memory? If so is there a >>>>>>>>>>>>>> way to increase the lifetime of this cache and/or possibly not sort on the >>>>>>>>>>>>>> full collection but on the documents returned from the query?
>>>>>>>>>>>>>> On Wednesday, August 15, 2012 1:19:35 PM UTC+2, Oren Eini >>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>> Well, we do have to do extra work on your case, loading a >>>>>>>>>>>>>>> lot more info to memory to be able to sort on this.
>>>>>>>>>>>>>>> If this happens on a cold start, it may take a while.
>>>>>>>>>>>>>>> Can you try a different query (so it won't be cached), and >>>>>>>>>>>>>>> see what happens over time?
>>>>>>>>>>>>>>> On Wed, Aug 15, 2012 at 1:25 PM, Tobias Sebring <
>>>>>>>>>>>>>>> tseb...@gmail.com> wrote:
>>>>>>>>>>>>>>>> On Wednesday, August 15, 2012 11:53:53 AM UTC+2, Oren Eini >>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>> How many items match the result?
>>>>>>>>>>>>>>>>> On Wed, Aug 15, 2012 at 12:36 PM, Tobias Sebring <
>>>>>>>>>>>>>>>>> tseb...@gmail.com> wrote:
>>>>>>>>>>>>>>>>>> Any clue as to why this is so slow? I have still not been >>>>>>>>>>>>>>>>>> able to figure this out.
>>>>>>>>>>>>>>>>>> On Tuesday, August 14, 2012 12:31:39 PM UTC+2, Tobias >>>>>>>>>>>>>>>>>> Sebring wrote:
>>>>>>>>>>>>>>>>>>> I'm seeing some strange results when adding OrderBy >>>>>>>>>>>>>>>>>>> operands to my queries on a fairly large dataset (2.3M documents).
>>>>>>>>>>>>>>>>>>> In the following queries FilesCount_Range:[**0x00*******
>>>>>>>>>>>>>>>>>>> *********000001 TO NULL] will limit the dataset to 2600out of the 2.3M documents in the full dataset.
>>>>>>>>>>>>>>>>>>> The below query will return almost immediately:
>>>>>>>>>>>>>>>>>>> query=FilesCount_Range:[**0x0000****************0001 TO >>>>>>>>>>>>>>>>>>> NULL]
>>>>>>>>>>>>>>>>>>> start=0
>>>>>>>>>>>>>>>>>>> pageSize=25
>>>>>>>>>>>>>>>>>>> aggregation=None
>>>>>>>>>>>>>>>>>>> fetch=Id
>>>>>>>>>>>>>>>>>>> fetch=Files
>>>>>>>>>>>>>>>>>>> This query on the other hand takes 50 seconds to return:
>>>>>>>>>>>>>>>>>>> query=FilesCount_Range:[**0x0000****************0001 TO >>>>>>>>>>>>>>>>>>> NULL]
>>>>>>>>>>>>>>>>>>> start=0
>>>>>>>>>>>>>>>>>>> pageSize=25
>>>>>>>>>>>>>>>>>>> aggregation=None
>>>>>>>>>>>>>>>>>>> fetch=Id
>>>>>>>>>>>>>>>>>>> fetch=Files
>>>>>>>>>>>>>>>>>>> sort=Text
>>>>>>>>>>>>>>>>>>> sort=Year
>>>>>>>>>>>>>>>>>>> sort=ProjectId
>>>>>>>>>>>>>>>>>>> sort=CaseId
>>>>>>>>>>>>>>>>>>> This is taken from the mvc-profiler:
>>>>>>>>>>>>>>>>>>> "49308 ms waiting for server in 1 request(s) for 1 >>>>>>>>>>>>>>>>>>> sessions(s)Session opened for 50025.86 ms for 1 request(s) - Data/Index"
>>>>>>>>>>>>>>>>>>> Subsequent calls with the same orderby are cached and >>>>>>>>>>>>>>>>>>> return immediately.
>>>>>>>>>>>>>>>>>>> The index is set up like so:
>>>>>>>>>>>>>>>>>>> public class Data_Index : AbstractIndexCreationTask<**
>>>>>>>>>>>>>>>>>>> Data****************, Data_ Index.Result>
>>>>>>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>>>>>> public class Result
>>>>>>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>>>>>> public string Text { get; set; }
>>>>>>>>>>>>>>>>>>> public int Year { get; set; }
>>>>>>>>>>>>>>>>>>> public int? ProjectId { get; set; }
>>>>>>>>>>>>>>>>>>> public int? CaseId { get; set; }
>>>>>>>>>>>>>>>>>>> public int FilesCount { get; set; }
>>>>>>>>>>>>>>>>>>> public string[] Files { get; set; }
>>>>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>>>> public Data_ Index()
>>>>>>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>>>>>> Map = data => from d in data
>>>>>>>>>>>>>>>>>>> select
I just fixed a bug that would likely cause non system database to shut down
after inactivity, even if they are doing indexing, if you are doing a LOT
of indexing.
That would probably match what you are seeing.
Can you wait for the next build and test that?
On Thu, Aug 16, 2012 at 7:32 PM, Tobias Sebring <tsebr...@gmail.com> wrote:
> I'm not sure I understand "child database". It's a separate database from
> the system database and I've created it in the manager by clicking New
> database.
> On Thursday, August 16, 2012 6:28:30 PM UTC+2, Oren Eini wrote:
>> Tobias,
>> Is it a child database?
>> On Thu, Aug 16, 2012 at 7:24 PM, Tobias Sebring <tseb...@gmail.com>wrote:
>>> I'm currently running it as a console application
>>> (i.e. haven't installed it as a service but not running embedded). The site
>>> is running in IIS. Going to try that setting to see if it makes any
>>> difference.
>>> Another thing I've noticed over the past day is that if I leave indexing
>>> on while doing the batch import (takes aprox. 40 min) around 2,200,000 /
>>> 2,300,000 documents will have already been processed in the index when it's
>>> done. If I repeat the same batch import with indexing turned off, turn it
>>> back on, and wait for indexing to complete - the same index will take 3-4
>>> hours to build with a very noticeable decrease in performance as the index
>>> gets bigger.
>>> Another issue with the indexing seems to be that after you stop actively
>>> adding documents to the index the indexing will slow down and those last
>>> 100,000 documents may take 15-20 minutes to complete and more often than
>>> not the index will remain stale even though all documents have been
>>> processed (to the point that it does not return any results when querying).
>>> I've noticed that if I shutdown the ravendb server when this happens and
>>> then start it back up the index will no longer be stale.
>>> On Thursday, August 16, 2012 6:02:33 PM UTC+2, Oren Eini wrote:
>>>> How are you running this?
>>>> In a service? In IIS?
>>>> I assume that you are running in a child db mode, you need to set:
>>>> Raven/Tenants/**MaxIdleTimeForTe**nantDatabase
>>>> To a high value.
>>>> If you are running in IIS, you need to set IIS idle settings to zero
>>>> and disable recycling.
>>>> On Wed, Aug 15, 2012 at 5:01 PM, Tobias Sebring <tseb...@gmail.com>wrote:
>>>>> Yes, the site will be inactive for most hours out of the day and will
>>>>> only see sporadic use that will not be concentrated in any specific period
>>>>> of time. I expect the resources will have been dropped almost every time a
>>>>> user accesses the site.
>>>>> On Wednesday, August 15, 2012 3:55:09 PM UTC+2, Oren Eini wrote:
>>>>>> Hm,
>>>>>> Big enough that we will take several queries to load significant
>>>>>> portions of it to memory.
>>>>>> In short, as long as the site is serving queries, you'll see good
>>>>>> perf.
>>>>>> If you have long period of inactivity, we will drop all resources.
>>>>>> Do you expect long periods of inactivity?
>>>>>> On Wed, Aug 15, 2012 at 4:53 PM, Tobias Sebring <tseb...@gmail.com>wrote:
>>>>>>> The index is 759mb.
>>>>>>> Data file is 3,63 GB.
>>>>>>> On Wednesday, August 15, 2012 2:49:08 PM UTC+2, Oren Eini wrote:
>>>>>>>> YEד
>>>>>>>> On Wed, Aug 15, 2012 at 3:45 PM, Tobias Sebring <tseb...@gmail.com>wrote:
>>>>>>>>> I will as soon as I can later today. Right now I don't have access
>>>>>>>>> to the database computer and rebuilding the index takes ~4 hours.
>>>>>>>>> We are talking about the physical size of the index on disk right?
>>>>>>>>> On Wednesday, August 15, 2012 2:15:13 PM UTC+2, Oren Eini wrote:
>>>>>>>>>> Tobias,
>>>>>>>>>> First, let us check some things.
>>>>>>>>>> Can you check the actual size of the index?
>>>>>>>>>> On Wed, Aug 15, 2012 at 3:05 PM, Tobias Sebring <
>>>>>>>>>> tseb...@gmail.com> wrote:
>>>>>>>>>>> What are my options for doing so and/or to make sure it stays in
>>>>>>>>>>> memory? The database is around 4-5GB and I don't expect it to become much
>>>>>>>>>>> larger than that. If I knew of a way to do it I could potentially pre-load
>>>>>>>>>>> the entire database into memory but it would still need to be persisted to
>>>>>>>>>>> disk.
>>>>>>>>>>> On Wednesday, August 15, 2012 1:59:50 PM UTC+2, Oren Eini wrote:
>>>>>>>>>>>> Strange.
>>>>>>>>>>>> At any rate, you need to warm up the index, make sure that a
>>>>>>>>>>>> lot of it is in memory before you can really make perf comparisons. That is
>>>>>>>>>>>> especially true with large indexes.
>>>>>>>>>>>> On Wed, Aug 15, 2012 at 2:59 PM, Tobias Sebring <
>>>>>>>>>>>> tseb...@gmail.com> wrote:
>>>>>>>>>>>>> 25-50 chars on average but in rare cases upwards of 100 chars.
>>>>>>>>>>>>> On Wednesday, August 15, 2012 1:54:22 PM UTC+2, Oren Eini
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>> Tobias,
>>>>>>>>>>>>>> What is the size of the text field?
>>>>>>>>>>>>>> We do a lot of complex caching there, and depending on the
>>>>>>>>>>>>>> actual index size (can you check), loading through it might take a while.
>>>>>>>>>>>>>> On Wed, Aug 15, 2012 at 2:45 PM, Tobias Sebring <
>>>>>>>>>>>>>> tseb...@gmail.com> wrote:
>>>>>>>>>>>>>>> Is 50 seconds reasonable for a cold start with the size of
>>>>>>>>>>>>>>> the data set I'm dealing with? If I remove the OrderBy on the text field
>>>>>>>>>>>>>>> the query will sort on Year, ProjectId and CaseId and return within 16
>>>>>>>>>>>>>>> seconds which also seems slow for loading 2601 documents
>>>>>>>>>>>>>>> and doing an in-memory sort. A subsequent and modified query (e.g. adding
>>>>>>>>>>>>>>> another field to the query) will complete pretty fast. After a delay of
>>>>>>>>>>>>>>> inactivity the query will once again take 50 seconds to complete.
>>>>>>>>>>>>>>> Considering these factors I'm guessing the sorting is done
>>>>>>>>>>>>>>> on the full document collection and then cached in memory? If so is there a
>>>>>>>>>>>>>>> way to increase the lifetime of this cache and/or possibly not sort on the
>>>>>>>>>>>>>>> full collection but on the documents returned from the query?
>>>>>>>>>>>>>>> On Wednesday, August 15, 2012 1:19:35 PM UTC+2, Oren Eini
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>> Well, we do have to do extra work on your case, loading a
>>>>>>>>>>>>>>>> lot more info to memory to be able to sort on this.
>>>>>>>>>>>>>>>> If this happens on a cold start, it may take a while.
>>>>>>>>>>>>>>>> Can you try a different query (so it won't be cached), and
>>>>>>>>>>>>>>>> see what happens over time?
>>>>>>>>>>>>>>>> On Wed, Aug 15, 2012 at 1:25 PM, Tobias Sebring <
>>>>>>>>>>>>>>>> tseb...@gmail.com> wrote:
>>>>>>>>>>>>>>>>> On Wednesday, August 15, 2012 11:53:53 AM UTC+2, Oren
>>>>>>>>>>>>>>>>> Eini wrote:
>>>>>>>>>>>>>>>>>> How many items match the result?
>>>>>>>>>>>>>>>>>> On Wed, Aug 15, 2012 at 12:36 PM, Tobias Sebring <
>>>>>>>>>>>>>>>>>> tseb...@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>> Any clue as to why this is so slow? I have still not
>>>>>>>>>>>>>>>>>>> been able to figure this out.
>>>>>>>>>>>>>>>>>>> On Tuesday, August 14, 2012 12:31:39 PM UTC+2, Tobias
>>>>>>>>>>>>>>>>>>> Sebring wrote:
>>>>>>>>>>>>>>>>>>>> I'm seeing some strange results when adding OrderBy
>>>>>>>>>>>>>>>>>>>> operands to my queries on a fairly large dataset (2.3M documents).
>>>>>>>>>>>>>>>>>>>> In the following queries FilesCount_Range:[**0x00******
>>>>>>>>>>>>>>>>>>>> ************000001 TO NULL] will limit the dataset to
>>>>>>>>>>>>>>>>>>>> 2600 out of the 2.3M documents in the full dataset.
>>>>>>>>>>>>>>>>>>>> The below query will return almost immediately:
>>>>>>>>>>>>>>>>>>>> query=FilesCount_Range:[**0x0000******************0001
>>>>>>>>>>>>>>>>>>>> TO NULL]
>>>>>>>>>>>>>>>>>>>> start=0
>>>>>>>>>>>>>>>>>>>> pageSize=25
>>>>>>>>>>>>>>>>>>>> aggregation=None
>>>>>>>>>>>>>>>>>>>> fetch=Id
>>>>>>>>>>>>>>>>>>>> fetch=Files
>>>>>>>>>>>>>>>>>>>> This query on the other hand takes 50 seconds to return:
>>>>>>>>>>>>>>>>>>>> query=FilesCount_Range:[**0x0000******************0001
>>>>>>>>>>>>>>>>>>>> TO NULL]
>>>>>>>>>>>>>>>>>>>> start=0
>>>>>>>>>>>>>>>>>>>> pageSize=25
>>>>>>>>>>>>>>>>>>>> aggregation=None
>>>>>>>>>>>>>>>>>>>> fetch=Id
>>>>>>>>>>>>>>>>>>>> fetch=Files
>>>>>>>>>>>>>>>>>>>> sort=Text
>>>>>>>>>>>>>>>>>>>> sort=Year
>>>>>>>>>>>>>>>>>>>> sort=ProjectId
>>>>>>>>>>>>>>>>>>>> sort=CaseId
>>>>>>>>>>>>>>>>>>>> This is taken from the mvc-profiler:
>>>>>>>>>>>>>>>>>>>> "49308 ms waiting for server in 1 request(s) for 1
>>>>>>>>>>>>>>>>>>>> sessions(s)Session opened for 50025.86 ms for 1 request(s) - Data/Index"
>>>>>>>>>>>>>>>>>>>> Subsequent calls with the same orderby are cached and
>>>>>>>>>>>>>>>>>>>> return immediately.
>>>>>>>>>>>>>>>>>>>> The index is set up like so:
>>>>>>>>>>>>>>>>>>>> public class Data_Index : AbstractIndexCreationTask<**
>>>>>>>>>>>>>>>>>>>> Data******************, Data_ Index.Result>
>>>>>>>>>>>>>>>>>>>> {
While I have you on the hook I have another problem I'm struggling with:
My batch import is processed in multiple phases to build my full denormalized dataset. In one of the later phases I need to query data in the database to find the relevant document to update using .LuceneQuery<>(). This works well for a majority of the updates in this phase but a few iterations will fail to find the relevant document to update.
For every update iteration I modify fields that are part of the index I'm querying and as such I call .WaitForNonStaleResults() on the subsequent query after every update to (hopefully) make sure the index is in good shape. The index is defined as an AbstractIndexCreationTask<Data, Data.Result> (Map/Reduce) but there is no defined reduce function but rather I'm using it to query the data using lambda syntax on computed index-fields and then selecting the actual document type for the result: session.Advanced.LuceneQuery<Datas_Index.Result, Datas_Index>()
.WhereEqual(x => ..., ...)
.SelectFields<Data>()
.ToList()
After I rerun the same update code on the data that failed to find the relevant document to update - like magic - it's now returned and updates are successful. Any idea what I'm doing wrong here?
On Thursday, August 16, 2012 6:35:31 PM UTC+2, Oren Eini wrote:
> Okay, that is what I meant.
> I just fixed a bug that would likely cause non system database to shut > down after inactivity, even if they are doing indexing, if you are doing a > LOT of indexing.
> That would probably match what you are seeing.
> Can you wait for the next build and test that?
> On Thu, Aug 16, 2012 at 7:32 PM, Tobias Sebring <tseb...@gmail.com<javascript:>
> > wrote:
>> I'm not sure I understand "child database". It's a separate database from >> the system database and I've created it in the manager by clicking New >> database.
>> On Thursday, August 16, 2012 6:28:30 PM UTC+2, Oren Eini wrote:
>>> Tobias,
>>> Is it a child database?
>>> On Thu, Aug 16, 2012 at 7:24 PM, Tobias Sebring <tseb...@gmail.com>wrote:
>>>> I'm currently running it as a console application >>>> (i.e. haven't installed it as a service but not running embedded). The site >>>> is running in IIS. Going to try that setting to see if it makes any >>>> difference.
>>>> Another thing I've noticed over the past day is that if I leave >>>> indexing on while doing the batch import (takes aprox. 40 min) around >>>> 2,200,000 / 2,300,000 documents will have already been processed in the >>>> index when it's done. If I repeat the same batch import with indexing >>>> turned off, turn it back on, and wait for indexing to complete - the same >>>> index will take 3-4 hours to build with a very noticeable decrease in >>>> performance as the index gets bigger.
>>>> Another issue with the indexing seems to be that after you stop >>>> actively adding documents to the index the indexing will slow down and >>>> those last 100,000 documents may take 15-20 minutes to complete and more >>>> often than not the index will remain stale even though all documents have >>>> been processed (to the point that it does not return any results when >>>> querying). I've noticed that if I shutdown the ravendb server when this >>>> happens and then start it back up the index will no longer be stale.
>>>> On Thursday, August 16, 2012 6:02:33 PM UTC+2, Oren Eini wrote:
>>>>> How are you running this?
>>>>> In a service? In IIS?
>>>>> I assume that you are running in a child db mode, you need to set:
>>>>> Raven/Tenants/**MaxIdleTimeForTe**nantDatabase
>>>>> To a high value.
>>>>> If you are running in IIS, you need to set IIS idle settings to zero >>>>> and disable recycling.
>>>>> On Wed, Aug 15, 2012 at 5:01 PM, Tobias Sebring <tseb...@gmail.com>wrote:
>>>>>> Yes, the site will be inactive for most hours out of the day and will >>>>>> only see sporadic use that will not be concentrated in any specific period >>>>>> of time. I expect the resources will have been dropped almost every time a >>>>>> user accesses the site.
>>>>>> On Wednesday, August 15, 2012 3:55:09 PM UTC+2, Oren Eini wrote:
>>>>>>> Hm,
>>>>>>> Big enough that we will take several queries to load significant >>>>>>> portions of it to memory.
>>>>>>> In short, as long as the site is serving queries, you'll see good >>>>>>> perf.
>>>>>>> If you have long period of inactivity, we will drop all resources.
>>>>>>> Do you expect long periods of inactivity?
>>>>>>> On Wed, Aug 15, 2012 at 4:53 PM, Tobias Sebring <tseb...@gmail.com>wrote:
>>>>>>>> The index is 759mb.
>>>>>>>> Data file is 3,63 GB.
>>>>>>>> On Wednesday, August 15, 2012 2:49:08 PM UTC+2, Oren Eini wrote:
>>>>>>>>> YEד
>>>>>>>>> On Wed, Aug 15, 2012 at 3:45 PM, Tobias Sebring <tseb...@gmail.com
>>>>>>>>> > wrote:
>>>>>>>>>> I will as soon as I can later today. Right now I don't have >>>>>>>>>> access to the database computer and rebuilding the index takes ~4 hours.
>>>>>>>>>> We are talking about the physical size of the index on disk right?
>>>>>>>>>> On Wednesday, August 15, 2012 2:15:13 PM UTC+2, Oren Eini wrote:
>>>>>>>>>>> Tobias,
>>>>>>>>>>> First, let us check some things.
>>>>>>>>>>> Can you check the actual size of the index?
>>>>>>>>>>> On Wed, Aug 15, 2012 at 3:05 PM, Tobias Sebring <
>>>>>>>>>>> tseb...@gmail.com> wrote:
>>>>>>>>>>>> What are my options for doing so and/or to make sure it stays >>>>>>>>>>>> in memory? The database is around 4-5GB and I don't expect it to become >>>>>>>>>>>> much larger than that. If I knew of a way to do it I could potentially >>>>>>>>>>>> pre-load the entire database into memory but it would still need to be >>>>>>>>>>>> persisted to disk.
>>>>>>>>>>>> On Wednesday, August 15, 2012 1:59:50 PM UTC+2, Oren Eini >>>>>>>>>>>> wrote:
>>>>>>>>>>>>> Strange.
>>>>>>>>>>>>> At any rate, you need to warm up the index, make sure that a >>>>>>>>>>>>> lot of it is in memory before you can really make perf comparisons. That is >>>>>>>>>>>>> especially true with large indexes.
>>>>>>>>>>>>> On Wed, Aug 15, 2012 at 2:59 PM, Tobias Sebring <
>>>>>>>>>>>>> tseb...@gmail.com> wrote:
>>>>>>>>>>>>>> 25-50 chars on average but in rare cases upwards of 100 >>>>>>>>>>>>>> chars.
>>>>>>>>>>>>>> On Wednesday, August 15, 2012 1:54:22 PM UTC+2, Oren Eini >>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>> Tobias,
>>>>>>>>>>>>>>> What is the size of the text field?
>>>>>>>>>>>>>>> We do a lot of complex caching there, and depending on the >>>>>>>>>>>>>>> actual index size (can you check), loading through it might take a while.
>>>>>>>>>>>>>>> On Wed, Aug 15, 2012 at 2:45 PM, Tobias Sebring <
>>>>>>>>>>>>>>> tseb...@gmail.com> wrote:
>>>>>>>>>>>>>>>> Is 50 seconds reasonable for a cold start with the size of >>>>>>>>>>>>>>>> the data set I'm dealing with? If I remove the OrderBy on the text field >>>>>>>>>>>>>>>> the query will sort on Year, ProjectId and CaseId and return within 16 >>>>>>>>>>>>>>>> seconds which also seems slow for loading 2601 documents >>>>>>>>>>>>>>>> and doing an in-memory sort. A subsequent and modified query (e.g. adding >>>>>>>>>>>>>>>> another field to the query) will complete pretty fast. After a delay of >>>>>>>>>>>>>>>> inactivity the query will once again take 50 seconds to complete.
>>>>>>>>>>>>>>>> Considering these factors I'm guessing the sorting is done >>>>>>>>>>>>>>>> on the full document collection and then cached in memory? If so is there a >>>>>>>>>>>>>>>> way to increase the lifetime of this cache and/or possibly not sort on the >>>>>>>>>>>>>>>> full collection but on the documents returned from the query?
>>>>>>>>>>>>>>>> On Wednesday, August 15, 2012 1:19:35 PM UTC+2, Oren Eini >>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>> Well, we do have to do extra work on your case, loading a >>>>>>>>>>>>>>>>> lot more info to memory to be able to sort on this.
>>>>>>>>>>>>>>>>> If this happens on a cold start, it may take a while.
>>>>>>>>>>>>>>>>> Can you try a different query (so it won't be cached), and >>>>>>>>>>>>>>>>> see what happens over time?
>>>>>>>>>>>>>>>>> On Wed, Aug 15, 2012 at 1:25 PM, Tobias Sebring <
>>>>>>>>>>>>>>>>> tseb...@gmail.com> wrote:
>>>>>>>>>>>>>>>>>> On Wednesday, August 15, 2012 11:53:53 AM UTC+2, Oren >>>>>>>>>>>>>>>>>> Eini wrote:
>>>>>>>>>>>>>>>>>>> How many items match the result?
>>>>>>>>>>>>>>>>>>> On Wed, Aug 15, 2012 at 12:36 PM, Tobias Sebring <
>>>>>>>>>>>>>>>>>>> tseb...@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>> Any clue as to why this is so slow? I have still not >>>>>>>>>>>>>>>>>>>> been able to figure this out.
>>>>>>>>>>>>>>>>>>>> On Tuesday, August 14, 2012 12:31:39 PM UTC+2, Tobias >>>>>>>>>>>>>>>>>>>> Sebring wrote:
>>>>>>>>>>>>>>>>>>>>> I'm seeing some strange results when adding OrderBy >>>>>>>>>>>>>>>>>>>>> operands to my queries on a fairly large dataset (2.3M documents).
>>>>>>>>>>>>>>>>>>>>> In the following queries FilesCount_Range:[**0x00*****
On Thu, Aug 16, 2012 at 8:02 PM, Tobias Sebring <tsebr...@gmail.com> wrote:
> Absolutely.
> While I have you on the hook I have another problem I'm struggling with:
> My batch import is processed in multiple phases to build my full
> denormalized dataset. In one of the later phases I need to query data in
> the database to find the relevant document to update using
> .LuceneQuery<>(). This works well for a majority of the updates in this
> phase but a few iterations will fail to find the relevant document to
> update.
> For every update iteration I modify fields that are part of the index I'm
> querying and as such I call .WaitForNonStaleResults() on the subsequent
> query after every update to (hopefully) make sure the index is in good
> shape. The index is defined as an AbstractIndexCreationTask<Data,
> Data.Result> (Map/Reduce) but there is no defined reduce function but
> rather I'm using it to query the data using lambda syntax on computed
> index-fields and then selecting the actual document type for the result:
> session.Advanced.LuceneQuery<Datas_Index.Result, Datas_Index>()
> .WhereEqual(x => ..., ...)
> .SelectFields<Data>()
> .ToList()
> After I rerun the same update code on the data that failed to find the
> relevant document to update - like magic - it's now returned and updates
> are successful. Any idea what I'm doing wrong here?
> On Thursday, August 16, 2012 6:35:31 PM UTC+2, Oren Eini wrote:
>> Okay, that is what I meant.
>> I just fixed a bug that would likely cause non system database to shut
>> down after inactivity, even if they are doing indexing, if you are doing a
>> LOT of indexing.
>> That would probably match what you are seeing.
>> Can you wait for the next build and test that?
>> On Thu, Aug 16, 2012 at 7:32 PM, Tobias Sebring <tseb...@gmail.com>wrote:
>>> I'm not sure I understand "child database". It's a separate database
>>> from the system database and I've created it in the manager by clicking New
>>> database.
>>> On Thursday, August 16, 2012 6:28:30 PM UTC+2, Oren Eini wrote:
>>>> Tobias,
>>>> Is it a child database?
>>>> On Thu, Aug 16, 2012 at 7:24 PM, Tobias Sebring <tseb...@gmail.com>wrote:
>>>>> I'm currently running it as a console application
>>>>> (i.e. haven't installed it as a service but not running embedded). The site
>>>>> is running in IIS. Going to try that setting to see if it makes any
>>>>> difference.
>>>>> Another thing I've noticed over the past day is that if I leave
>>>>> indexing on while doing the batch import (takes aprox. 40 min) around
>>>>> 2,200,000 / 2,300,000 documents will have already been processed in the
>>>>> index when it's done. If I repeat the same batch import with indexing
>>>>> turned off, turn it back on, and wait for indexing to complete - the same
>>>>> index will take 3-4 hours to build with a very noticeable decrease in
>>>>> performance as the index gets bigger.
>>>>> Another issue with the indexing seems to be that after you stop
>>>>> actively adding documents to the index the indexing will slow down and
>>>>> those last 100,000 documents may take 15-20 minutes to complete and more
>>>>> often than not the index will remain stale even though all documents have
>>>>> been processed (to the point that it does not return any results when
>>>>> querying). I've noticed that if I shutdown the ravendb server when this
>>>>> happens and then start it back up the index will no longer be stale.
>>>>> On Thursday, August 16, 2012 6:02:33 PM UTC+2, Oren Eini wrote:
>>>>>> How are you running this?
>>>>>> In a service? In IIS?
>>>>>> I assume that you are running in a child db mode, you need to set:
>>>>>> Raven/Tenants/**MaxIdleTimeForTe****nantDatabase
>>>>>> To a high value.
>>>>>> If you are running in IIS, you need to set IIS idle settings to zero
>>>>>> and disable recycling.
>>>>>> On Wed, Aug 15, 2012 at 5:01 PM, Tobias Sebring <tseb...@gmail.com>wrote:
>>>>>>> Yes, the site will be inactive for most hours out of the day and
>>>>>>> will only see sporadic use that will not be concentrated in any specific
>>>>>>> period of time. I expect the resources will have been dropped almost every
>>>>>>> time a user accesses the site.
>>>>>>> On Wednesday, August 15, 2012 3:55:09 PM UTC+2, Oren Eini wrote:
>>>>>>>> Hm,
>>>>>>>> Big enough that we will take several queries to load significant
>>>>>>>> portions of it to memory.
>>>>>>>> In short, as long as the site is serving queries, you'll see good
>>>>>>>> perf.
>>>>>>>> If you have long period of inactivity, we will drop all resources.
>>>>>>>> Do you expect long periods of inactivity?
>>>>>>>> On Wed, Aug 15, 2012 at 4:53 PM, Tobias Sebring <tseb...@gmail.com>wrote:
>>>>>>>>> The index is 759mb.
>>>>>>>>> Data file is 3,63 GB.
>>>>>>>>> On Wednesday, August 15, 2012 2:49:08 PM UTC+2, Oren Eini wrote:
>>>>>>>>>> YEד
>>>>>>>>>> On Wed, Aug 15, 2012 at 3:45 PM, Tobias Sebring <
>>>>>>>>>> tseb...@gmail.com> wrote:
>>>>>>>>>>> I will as soon as I can later today. Right now I don't have
>>>>>>>>>>> access to the database computer and rebuilding the index takes ~4 hours.
>>>>>>>>>>> We are talking about the physical size of the index on disk
>>>>>>>>>>> right?
>>>>>>>>>>> On Wednesday, August 15, 2012 2:15:13 PM UTC+2, Oren Eini wrote:
>>>>>>>>>>>> Tobias,
>>>>>>>>>>>> First, let us check some things.
>>>>>>>>>>>> Can you check the actual size of the index?
>>>>>>>>>>>> On Wed, Aug 15, 2012 at 3:05 PM, Tobias Sebring <
>>>>>>>>>>>> tseb...@gmail.com> wrote:
>>>>>>>>>>>>> What are my options for doing so and/or to make sure it stays
>>>>>>>>>>>>> in memory? The database is around 4-5GB and I don't expect it to become
>>>>>>>>>>>>> much larger than that. If I knew of a way to do it I could potentially
>>>>>>>>>>>>> pre-load the entire database into memory but it would still need to be
>>>>>>>>>>>>> persisted to disk.
>>>>>>>>>>>>> On Wednesday, August 15, 2012 1:59:50 PM UTC+2, Oren Eini
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>> Strange.
>>>>>>>>>>>>>> At any rate, you need to warm up the index, make sure that a
>>>>>>>>>>>>>> lot of it is in memory before you can really make perf comparisons. That is
>>>>>>>>>>>>>> especially true with large indexes.
>>>>>>>>>>>>>> On Wed, Aug 15, 2012 at 2:59 PM, Tobias Sebring <
>>>>>>>>>>>>>> tseb...@gmail.com> wrote:
>>>>>>>>>>>>>>> 25-50 chars on average but in rare cases upwards of 100
>>>>>>>>>>>>>>> chars.
>>>>>>>>>>>>>>> On Wednesday, August 15, 2012 1:54:22 PM UTC+2, Oren Eini
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>> Tobias,
>>>>>>>>>>>>>>>> What is the size of the text field?
>>>>>>>>>>>>>>>> We do a lot of complex caching there, and depending on the
>>>>>>>>>>>>>>>> actual index size (can you check), loading through it might take a while.
>>>>>>>>>>>>>>>> On Wed, Aug 15, 2012 at 2:45 PM, Tobias Sebring <
>>>>>>>>>>>>>>>> tseb...@gmail.com> wrote:
>>>>>>>>>>>>>>>>> Is 50 seconds reasonable for a cold start with the size of
>>>>>>>>>>>>>>>>> the data set I'm dealing with? If I remove the OrderBy on the text field
>>>>>>>>>>>>>>>>> the query will sort on Year, ProjectId and CaseId and return within 16
>>>>>>>>>>>>>>>>> seconds which also seems slow for loading 2601 documents
>>>>>>>>>>>>>>>>> and doing an in-memory sort. A subsequent and modified query (e.g. adding
>>>>>>>>>>>>>>>>> another field to the query) will complete pretty fast. After a delay of
>>>>>>>>>>>>>>>>> inactivity the query will once again take 50 seconds to complete.
>>>>>>>>>>>>>>>>> Considering these factors I'm guessing the sorting is done
>>>>>>>>>>>>>>>>> on the full document collection and then cached in memory? If so is there a
>>>>>>>>>>>>>>>>> way to increase the lifetime of this cache and/or possibly not sort on the
>>>>>>>>>>>>>>>>> full collection but on the documents returned from the query?
>>>>>>>>>>>>>>>>> On Wednesday, August 15, 2012 1:19:35 PM UTC+2, Oren Eini
>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>> Well, we do have to do extra work on your case, loading a
>>>>>>>>>>>>>>>>>> lot more info to memory to be able to sort on this.
>>>>>>>>>>>>>>>>>> If this happens on a cold start, it may take a while.
>>>>>>>>>>>>>>>>>> Can you try a different query (so it won't be cached),
>>>>>>>>>>>>>>>>>> and see what happens over time?
>>>>>>>>>>>>>>>>>> On Wed, Aug 15, 2012 at 1:25 PM, Tobias Sebring <
>>>>>>>>>>>>>>>>>> tseb...@gmail.com> wrote:
On Thursday, August 16, 2012 8:18:57 PM UTC+2, Oren Eini wrote:
> No idea.
> Can you try generating a failing test?
> I am not sure that I am following.
> On Thu, Aug 16, 2012 at 8:02 PM, Tobias Sebring <tseb...@gmail.com<javascript:>
> > wrote:
>> Absolutely.
>> While I have you on the hook I have another problem I'm struggling with:
>> My batch import is processed in multiple phases to build my full >> denormalized dataset. In one of the later phases I need to query data in >> the database to find the relevant document to update using >> .LuceneQuery<>(). This works well for a majority of the updates in this >> phase but a few iterations will fail to find the relevant document to >> update.
>> For every update iteration I modify fields that are part of the index I'm >> querying and as such I call .WaitForNonStaleResults() on the subsequent >> query after every update to (hopefully) make sure the index is in good >> shape. The index is defined as an AbstractIndexCreationTask<Data, >> Data.Result> (Map/Reduce) but there is no defined reduce function but >> rather I'm using it to query the data using lambda syntax on computed >> index-fields and then selecting the actual document type for the result: >> session.Advanced.LuceneQuery<Datas_Index.Result, Datas_Index>()
>> .WhereEqual(x => ..., ...)
>> .SelectFields<Data>()
>> .ToList()
>> After I rerun the same update code on the data that failed to find the >> relevant document to update - like magic - it's now returned and updates >> are successful. Any idea what I'm doing wrong here?
>> On Thursday, August 16, 2012 6:35:31 PM UTC+2, Oren Eini wrote:
>>> Okay, that is what I meant.
>>> I just fixed a bug that would likely cause non system database to shut >>> down after inactivity, even if they are doing indexing, if you are doing a >>> LOT of indexing.
>>> That would probably match what you are seeing.
>>> Can you wait for the next build and test that?
>>> On Thu, Aug 16, 2012 at 7:32 PM, Tobias Sebring <tseb...@gmail.com>wrote:
>>>> I'm not sure I understand "child database". It's a separate database >>>> from the system database and I've created it in the manager by clicking New >>>> database.
>>>> On Thursday, August 16, 2012 6:28:30 PM UTC+2, Oren Eini wrote:
>>>>> Tobias,
>>>>> Is it a child database?
>>>>> On Thu, Aug 16, 2012 at 7:24 PM, Tobias Sebring <tseb...@gmail.com>wrote:
>>>>>> I'm currently running it as a console application >>>>>> (i.e. haven't installed it as a service but not running embedded). The site >>>>>> is running in IIS. Going to try that setting to see if it makes any >>>>>> difference.
>>>>>> Another thing I've noticed over the past day is that if I leave >>>>>> indexing on while doing the batch import (takes aprox. 40 min) around >>>>>> 2,200,000 / 2,300,000 documents will have already been processed in the >>>>>> index when it's done. If I repeat the same batch import with indexing >>>>>> turned off, turn it back on, and wait for indexing to complete - the same >>>>>> index will take 3-4 hours to build with a very noticeable decrease in >>>>>> performance as the index gets bigger.
>>>>>> Another issue with the indexing seems to be that after you stop >>>>>> actively adding documents to the index the indexing will slow down and >>>>>> those last 100,000 documents may take 15-20 minutes to complete and more >>>>>> often than not the index will remain stale even though all documents have >>>>>> been processed (to the point that it does not return any results when >>>>>> querying). I've noticed that if I shutdown the ravendb server when this >>>>>> happens and then start it back up the index will no longer be stale.
>>>>>> On Thursday, August 16, 2012 6:02:33 PM UTC+2, Oren Eini wrote:
>>>>>>> How are you running this?
>>>>>>> In a service? In IIS?
>>>>>>> I assume that you are running in a child db mode, you need to set:
>>>>>>> Raven/Tenants/**MaxIdleTimeForTe****nantDatabase
>>>>>>> To a high value.
>>>>>>> If you are running in IIS, you need to set IIS idle settings to zero >>>>>>> and disable recycling.
>>>>>>> On Wed, Aug 15, 2012 at 5:01 PM, Tobias Sebring <tseb...@gmail.com>wrote:
>>>>>>>> Yes, the site will be inactive for most hours out of the day and >>>>>>>> will only see sporadic use that will not be concentrated in any specific >>>>>>>> period of time. I expect the resources will have been dropped almost every >>>>>>>> time a user accesses the site.
>>>>>>>> On Wednesday, August 15, 2012 3:55:09 PM UTC+2, Oren Eini wrote:
>>>>>>>>> Hm,
>>>>>>>>> Big enough that we will take several queries to load significant >>>>>>>>> portions of it to memory.
>>>>>>>>> In short, as long as the site is serving queries, you'll see good >>>>>>>>> perf.
>>>>>>>>> If you have long period of inactivity, we will drop all resources.
>>>>>>>>> Do you expect long periods of inactivity?
>>>>>>>>> On Wed, Aug 15, 2012 at 4:53 PM, Tobias Sebring <tseb...@gmail.com
>>>>>>>>> > wrote:
>>>>>>>>>> The index is 759mb.
>>>>>>>>>> Data file is 3,63 GB.
>>>>>>>>>> On Wednesday, August 15, 2012 2:49:08 PM UTC+2, Oren Eini wrote:
>>>>>>>>>>> YEד
>>>>>>>>>>> On Wed, Aug 15, 2012 at 3:45 PM, Tobias Sebring <
>>>>>>>>>>> tseb...@gmail.com> wrote:
>>>>>>>>>>>> I will as soon as I can later today. Right now I don't have >>>>>>>>>>>> access to the database computer and rebuilding the index takes ~4 hours.
>>>>>>>>>>>> We are talking about the physical size of the index on disk >>>>>>>>>>>> right?
>>>>>>>>>>>> On Wednesday, August 15, 2012 2:15:13 PM UTC+2, Oren Eini >>>>>>>>>>>> wrote:
>>>>>>>>>>>>> Tobias,
>>>>>>>>>>>>> First, let us check some things.
>>>>>>>>>>>>> Can you check the actual size of the index?
>>>>>>>>>>>>> On Wed, Aug 15, 2012 at 3:05 PM, Tobias Sebring <
>>>>>>>>>>>>> tseb...@gmail.com> wrote:
>>>>>>>>>>>>>> What are my options for doing so and/or to make sure it stays >>>>>>>>>>>>>> in memory? The database is around 4-5GB and I don't expect it to become >>>>>>>>>>>>>> much larger than that. If I knew of a way to do it I could potentially >>>>>>>>>>>>>> pre-load the entire database into memory but it would still need to be >>>>>>>>>>>>>> persisted to disk.
>>>>>>>>>>>>>> On Wednesday, August 15, 2012 1:59:50 PM UTC+2, Oren Eini >>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>> Strange.
>>>>>>>>>>>>>>> At any rate, you need to warm up the index, make sure that a >>>>>>>>>>>>>>> lot of it is in memory before you can really make perf comparisons. That is >>>>>>>>>>>>>>> especially true with large indexes.
>>>>>>>>>>>>>>> On Wed, Aug 15, 2012 at 2:59 PM, Tobias Sebring <
>>>>>>>>>>>>>>> tseb...@gmail.com> wrote:
>>>>>>>>>>>>>>>> 25-50 chars on average but in rare cases upwards of 100 >>>>>>>>>>>>>>>> chars.
>>>>>>>>>>>>>>>> On Wednesday, August 15, 2012 1:54:22 PM UTC+2, Oren Eini >>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>> Tobias,
>>>>>>>>>>>>>>>>> What is the size of the text field?
>>>>>>>>>>>>>>>>> We do a lot of complex caching there, and depending on the >>>>>>>>>>>>>>>>> actual index size (can you check), loading through it might take a while.
>>>>>>>>>>>>>>>>> On Wed, Aug 15, 2012 at 2:45 PM, Tobias Sebring <
>>>>>>>>>>>>>>>>> tseb...@gmail.com> wrote:
>>>>>>>>>>>>>>>>>> Is 50 seconds reasonable for a cold start with the size >>>>>>>>>>>>>>>>>> of the data set I'm dealing with? If I remove the OrderBy on the text field >>>>>>>>>>>>>>>>>> the query will sort on Year, ProjectId and CaseId and return within 16 >>>>>>>>>>>>>>>>>> seconds which also seems slow for loading 2601 documents >>>>>>>>>>>>>>>>>> and doing an in-memory sort. A subsequent and modified query (e.g. adding >>>>>>>>>>>>>>>>>> another field to the query) will complete pretty fast. After a delay of >>>>>>>>>>>>>>>>>> inactivity the query will once again take 50 seconds to complete.
>>>>>>>>>>>>>>>>>> Considering these factors I'm guessing the sorting is >>>>>>>>>>>>>>>>>> done on the full document collection and then cached in memory? If so is >>>>>>>>>>>>>>>>>> there a way to increase the lifetime of this cache and/or possibly not sort >>>>>>>>>>>>>>>>>> on the full collection but on the documents returned from the query?
>>>>>>>>>>>>>>>>>> On Wednesday, August 15, 2012 1:19:35 PM UTC+2, Oren >>>>>>>>>>>>>>>>>> Eini wrote:
>>>>>>>>>>>>>>>>>>> Well, we do have to do extra work on your case, loading >>>>>>>>>>>>>>>>>>> a lot more info to memory to be able to sort on this.
>>>>>>>>>>>>>>>>>>> If this happens on a cold start, it may take a while.
>>>>>>>>>>>>>>>>>>> Can you try a different query (so it won't be cached), >>>>>>>>>>>>>>>>>>> and see what happens over time?
>>>>>>>>>>>>>>>>>>> On Wed, Aug 15, 2012 at 1:25 PM, Tobias Sebring <
>>>>>>>>>>>>>>>>>>> tseb...@gmail.com> wrote: