I'm using RavenDb to do bulk inserts from a large datadump similar to the process outlined by Ayende here: http://ayende.com/blog/4474/etl-process-using-raven. My problem is that unlike the Stackoverflow datadump where Ayende utilizes the userId for document IDs, the datadump I'm working with is using complex string IDs that I would prefer to not use for my document IDs. With indexing turned off and auto generated document IDs - I do not know how to load the inserted documents for patching.
Example process: foreach data in datas session.Store(new Data { Id = "data/1", DatadumpKey = "/data/2012/07/27/js9am2ms8la91" }) session.SaveChanges();
foreach part in parts var data = session.Query<Data>().SingleOrDefault(p => p.DatadumpKey == part.DatadumpKey); //this does not work since there is no index, and with indexes enabled it will always be stale. data.Parts.Add(new Part { ... })); session.SaveChanges();
1. What's the recommended solution to the issue explained above? 2. Is it possible to disable indexing from the client API rather than through HTTP? 3. What's the best way to export the new database and import it (overwrite) onto a production server in order to keep the downtime as low as possible?
On Fri, Jul 27, 2012 at 7:16 PM, Tobias Sebring <tsebr...@gmail.com> wrote:
> I'm using RavenDb to do bulk inserts from a large datadump similar to the
> process outlined by Ayende here: http://ayende.com/blog/4474/etl-process-using-raven.
> My problem is that unlike the Stackoverflow datadump where Ayende utilizes
> the userId for document IDs, the datadump I'm working with is using complex
> string IDs that I would prefer to not use for my document IDs. With
> indexing turned off and auto generated document IDs - I do not know how to
> load the inserted documents for patching.
> Example process:
> foreach data in datas
> session.Store(new Data { Id = "data/1", DatadumpKey =
> "/data/2012/07/27/js9am2ms8la91" })
> session.SaveChanges();
> foreach part in parts
> var data = session.Query<Data>().SingleOrDefault(p => p.DatadumpKey ==
> part.DatadumpKey); //this does not work since there is no index, and with
> indexes enabled it will always be stale.
> data.Parts.Add(new Part { ... }));
> session.SaveChanges();
> 1. What's the recommended solution to the issue explained above?
> 2. Is it possible to disable indexing from the client API rather than
> through HTTP?
> 3. What's the best way to export the new database and import it
> (overwrite) onto a production server in order to keep the downtime as low
> as possible?
For 1) - I'm having trouble implementing this without a tenfold degradation to performance. Any idea why? I'm using the client API to import batches of 1024 documents with the following code: using (var session = Store.OpenSession()) { foreach (var i in data.Batch) { session.Store(i);
}
session.SaveChanges();
foreach (var i in data.Batch) { try { session.Store(new IdMapping { Id = " IdMappings/" + i.LongId, LongId = i.LongId });
> On Fri, Jul 27, 2012 at 7:16 PM, Tobias Sebring <tsebr...@gmail.com>wrote:
>> I'm using RavenDb to do bulk inserts from a large datadump similar to the >> process outlined by Ayende here: http://ayende.com/blog/4474/etl-process-using-raven. >> My problem is that unlike the Stackoverflow datadump where Ayende utilizes >> the userId for document IDs, the datadump I'm working with is using complex >> string IDs that I would prefer to not use for my document IDs. With >> indexing turned off and auto generated document IDs - I do not know how to >> load the inserted documents for patching.
>> Example process: >> foreach data in datas >> session.Store(new Data { Id = "data/1", DatadumpKey = >> "/data/2012/07/27/js9am2ms8la91" }) >> session.SaveChanges();
>> foreach part in parts >> var data = session.Query<Data>().SingleOrDefault(p => p.DatadumpKey == >> part.DatadumpKey); //this does not work since there is no index, and with >> indexes enabled it will always be stale. >> data.Parts.Add(new Part { ... })); >> session.SaveChanges();
>> 1. What's the recommended solution to the issue explained above? >> 2. Is it possible to disable indexing from the client API rather than >> through HTTP? >> 3. What's the best way to export the new database and import it >> (overwrite) onto a production server in order to keep the downtime as low >> as possible?
On Fri, Jul 27, 2012 at 11:09 PM, Tobias Sebring <tsebr...@gmail.com> wrote:
> For 1) - I'm having trouble implementing this without a tenfold
> degradation to performance. Any idea why?
> I'm using the client API to import batches of 1024 documents with the
> following code:
> using (var session = Store.OpenSession())
> {
> foreach (var i in data.Batch)
> {
> session.Store(i);
> }
> session.SaveChanges();
>> On Fri, Jul 27, 2012 at 7:16 PM, Tobias Sebring <tsebr...@gmail.com>wrote:
>>> I'm using RavenDb to do bulk inserts from a large datadump similar to
>>> the process outlined by Ayende here: http://ayende.com/blog/4474/**etl-process-using-raven.
>>> My problem is that unlike the Stackoverflow datadump where Ayende utilizes
>>> the userId for document IDs, the datadump I'm working with is using complex
>>> string IDs that I would prefer to not use for my document IDs. With
>>> indexing turned off and auto generated document IDs - I do not know how to
>>> load the inserted documents for patching.
>>> Example process:
>>> foreach data in datas
>>> session.Store(new Data { Id = "data/1", DatadumpKey =
>>> "/data/2012/07/27/**js9am2ms8la91" })
>>> session.SaveChanges();
>>> foreach part in parts
>>> var data = session.Query<Data>().**SingleOrDefault(p =>
>>> p.DatadumpKey == part.DatadumpKey); //this does not work since there is no
>>> index, and with indexes enabled it will always be stale.
>>> data.Parts.Add(new Part { ... }));
>>> session.SaveChanges();
>>> 1. What's the recommended solution to the issue explained above?
>>> 2. Is it possible to disable indexing from the client API rather than
>>> through HTTP?
>>> 3. What's the best way to export the new database and import it
>>> (overwrite) onto a production server in order to keep the downtime as low
>>> as possible?
The session does not throw an exception. The try-catch is there to catch when I try to insert a duplicate key because sometimes the dataset I'm working with is not consistent. Are you saying the try-catch could be the cause of the performance degradation? Only two or so NonUniqueObjectException are thrown in the entire 2M document dataset.
On Friday, July 27, 2012 10:17:05 PM UTC+2, Oren Eini wrote:
> If the session throws an exception, you may no longer use the session.
> On Fri, Jul 27, 2012 at 11:09 PM, Tobias Sebring <tsebr...@gmail.com>wrote:
>> For 1) - I'm having trouble implementing this without a tenfold >> degradation to performance. Any idea why? >> I'm using the client API to import batches of 1024 documents with the >> following code: >> using (var session = Store.OpenSession()) >> { >> foreach (var i in data.Batch) >> { >> session.Store(i); >> } >> session.SaveChanges();
>>> On Fri, Jul 27, 2012 at 7:16 PM, Tobias Sebring <tsebr...@gmail.com>wrote:
>>>> I'm using RavenDb to do bulk inserts from a large datadump similar to >>>> the process outlined by Ayende here: http://ayende.com/blog/4474/**etl-process-using-raven. >>>> My problem is that unlike the Stackoverflow datadump where Ayende utilizes >>>> the userId for document IDs, the datadump I'm working with is using complex >>>> string IDs that I would prefer to not use for my document IDs. With >>>> indexing turned off and auto generated document IDs - I do not know how to >>>> load the inserted documents for patching.
>>>> Example process: >>>> foreach data in datas >>>> session.Store(new Data { Id = "data/1", DatadumpKey = >>>> "/data/2012/07/27/**js9am2ms8la91" }) >>>> session.SaveChanges();
>>>> foreach part in parts >>>> var data = session.Query<Data>().**SingleOrDefault(p => >>>> p.DatadumpKey == part.DatadumpKey); //this does not work since there is no >>>> index, and with indexes enabled it will always be stale. >>>> data.Parts.Add(new Part { ... })); >>>> session.SaveChanges();
>>>> 1. What's the recommended solution to the issue explained above? >>>> 2. Is it possible to disable indexing from the client API rather than >>>> through HTTP? >>>> 3. What's the best way to export the new database and import it >>>> (overwrite) onto a production server in order to keep the downtime as low >>>> as possible?
I don't _know_ what the issue is, but an exception from the session render
its state undefined.
Do this without try/catch.
Then see how many queries you make to the service.
On Fri, Jul 27, 2012 at 11:40 PM, Tobias Sebring <tsebr...@gmail.com> wrote:
> The session does not throw an exception. The try-catch is there to catch
> when I try to insert a duplicate key because sometimes the dataset I'm
> working with is not consistent. Are you saying the try-catch could be the
> cause of the performance degradation? Only two or so
> NonUniqueObjectException are thrown in the entire 2M document dataset.
> On Friday, July 27, 2012 10:17:05 PM UTC+2, Oren Eini wrote:
>> If the session throws an exception, you may no longer use the session.
>> On Fri, Jul 27, 2012 at 11:09 PM, Tobias Sebring <tsebr...@gmail.com>wrote:
>>> For 1) - I'm having trouble implementing this without a tenfold
>>> degradation to performance. Any idea why?
>>> I'm using the client API to import batches of 1024 documents with the
>>> following code:
>>> using (var session = Store.OpenSession())
>>> {
>>> foreach (var i in data.Batch)
>>> {
>>> session.Store(i);
>>> }
>>> session.SaveChanges();
>>>> On Fri, Jul 27, 2012 at 7:16 PM, Tobias Sebring <tsebr...@gmail.com>wrote:
>>>>> I'm using RavenDb to do bulk inserts from a large datadump similar to
>>>>> the process outlined by Ayende here: http://ayende.com/blog/4474/**et* >>>>> *l-process-using-raven. My problem is that unlike the Stackoverflow
>>>>> datadump where Ayende utilizes the userId for document IDs, the datadump
>>>>> I'm working with is using complex string IDs that I would prefer to not use
>>>>> for my document IDs. With indexing turned off and auto generated document
>>>>> IDs - I do not know how to load the inserted documents for patching.
>>>>> Example process:
>>>>> foreach data in datas
>>>>> session.Store(new Data { Id = "data/1", DatadumpKey =
>>>>> "/data/2012/07/27/**js9am2ms8la9**1" })
>>>>> session.SaveChanges();
>>>>> foreach part in parts
>>>>> var data = session.Query<Data>().**SingleOr**Default(p =>
>>>>> p.DatadumpKey == part.DatadumpKey); //this does not work since there is no
>>>>> index, and with indexes enabled it will always be stale.
>>>>> data.Parts.Add(new Part { ... }));
>>>>> session.SaveChanges();
>>>>> 1. What's the recommended solution to the issue explained above?
>>>>> 2. Is it possible to disable indexing from the client API rather than
>>>>> through HTTP?
>>>>> 3. What's the best way to export the new database and import it
>>>>> (overwrite) onto a production server in order to keep the downtime as low
>>>>> as possible?
Okay. Got it. It's working now but I have to utilize a ConcurrentDictionary to make sure no duplicate keys are attempted to be saved to RavenDb. This sadly means saving 2,000,000 strings in memory throughout the entire import process which brings the server to it's knees. Speed is about 200k documents per minute.
On Saturday, July 28, 2012 12:02:45 AM UTC+2, Oren Eini wrote:
> I don't _know_ what the issue is, but an exception from the session render > its state undefined. > Do this without try/catch. > Then see how many queries you make to the service.
> On Fri, Jul 27, 2012 at 11:40 PM, Tobias Sebring <tsebr...@gmail.com>wrote:
>> The session does not throw an exception. The try-catch is there to catch >> when I try to insert a duplicate key because sometimes the dataset I'm >> working with is not consistent. Are you saying the try-catch could be the >> cause of the performance degradation? Only two or so >> NonUniqueObjectException are thrown in the entire 2M document dataset.
>> On Friday, July 27, 2012 10:17:05 PM UTC+2, Oren Eini wrote:
>>> If the session throws an exception, you may no longer use the session.
>>> On Fri, Jul 27, 2012 at 11:09 PM, Tobias Sebring <tsebr...@gmail.com>wrote:
>>>> For 1) - I'm having trouble implementing this without a tenfold >>>> degradation to performance. Any idea why? >>>> I'm using the client API to import batches of 1024 documents with the >>>> following code: >>>> using (var session = Store.OpenSession()) >>>> { >>>> foreach (var i in data.Batch) >>>> { >>>> session.Store(i); >>>> } >>>> session.SaveChanges();
>>>>> On Fri, Jul 27, 2012 at 7:16 PM, Tobias Sebring <tsebr...@gmail.com>wrote:
>>>>>> I'm using RavenDb to do bulk inserts from a large datadump similar to >>>>>> the process outlined by Ayende here: http://ayende.com/blog/4474/**et >>>>>> **l-process-using-raven. My problem is that unlike the Stackoverflow >>>>>> datadump where Ayende utilizes the userId for document IDs, the datadump >>>>>> I'm working with is using complex string IDs that I would prefer to not use >>>>>> for my document IDs. With indexing turned off and auto generated document >>>>>> IDs - I do not know how to load the inserted documents for patching.
>>>>>> Example process: >>>>>> foreach data in datas >>>>>> session.Store(new Data { Id = "data/1", DatadumpKey = >>>>>> "/data/2012/07/27/**js9am2ms8la9**1" }) >>>>>> session.SaveChanges();
>>>>>> foreach part in parts >>>>>> var data = session.Query<Data>().**SingleOr**Default(p => >>>>>> p.DatadumpKey == part.DatadumpKey); //this does not work since there is no >>>>>> index, and with indexes enabled it will always be stale. >>>>>> data.Parts.Add(new Part { ... })); >>>>>> session.SaveChanges();
>>>>>> 1. What's the recommended solution to the issue explained above? >>>>>> 2. Is it possible to disable indexing from the client API rather than >>>>>> through HTTP? >>>>>> 3. What's the best way to export the new database and import it >>>>>> (overwrite) onto a production server in order to keep the downtime as low >>>>>> as possible?
On Saturday, July 28, 2012, Tobias Sebring wrote:
> Okay. Got it. It's working now but I have to utilize a
> ConcurrentDictionary to make sure no duplicate keys are attempted to be
> saved to RavenDb. This sadly means saving 2,000,000 strings in memory
> throughout the entire import process which brings the server to it's knees.
> Speed is about 200k documents per minute.
> Thank you for your help on this issue!
> On Saturday, July 28, 2012 12:02:45 AM UTC+2, Oren Eini wrote:
> I don't _know_ what the issue is, but an exception from the session render
> its state undefined.
> Do this without try/catch.
> Then see how many queries you make to the service.
> On Fri, Jul 27, 2012 at 11:40 PM, Tobias Sebring <tsebr...@gmail.com>wrote:
> The session does not throw an exception. The try-catch is there to catch
> when I try to insert a duplicate key because sometimes the dataset I'm
> working with is not consistent. Are you saying the try-catch could be the
> cause of the performance degradation? Only two or so
> NonUniqueObjectException are thrown in the entire 2M document dataset.
> On Friday, July 27, 2012 10:17:05 PM UTC+2, Oren Eini wrote:
> If the session throws an exception, you may no longer use the session.
> On Fri, Jul 27, 2012 at 11:09 PM, Tobias Sebring <tsebr...@gmail.com>wrote:
> For 1) - I'm having trouble implementing this without a tenfold
> degradation to performance. Any idea why?
> I'm using the client API to import batches of 1024 documents with the
> following code:
> using (var session = Store.OpenSession())
> {
> foreach (var i in data.Batch)
> {
> session.Store(i);
> }
> session.SaveChanges();
Got that fixed. Now I'm having trouble limiting the memory footprint of RavenDb. The memory consumption will gradually rise to 98% of physical ram at which point Windows 7 will start display warnings to close the program down and other applications will crash randomly.
I've tried the following things to limit memory utilization in accordance with other threads in this group:
I've turned off indexing: using (var webClient = new WebClient()) { webClient.UseDefaultCredentials = true; var result = webClient.UploadString(new Uri(new Uri("http://localhost:8080"), "/admin/stopindexing"), "POST", "");
}
Modified cache configuration settings (tried with different values - same result): <appSettings> <add key="Raven/MemoryCacheLimitPercentage" value="50" /> <add key="Raven/MemoryCacheLimitCheckInterval" value="00:00:15" /> <add key="Raven/MemoryCacheExpiration" value="60" /> </appSettings>
And disabled all caching: using (Store.DatabaseCommands.DisableAllCaching()) { ... batch store / savechanges
}
Non of these seem to have any effect on memory usage of the application with RavenDb is running in embedded mode. Commenting out the few lines of RavenDb code that handles batch imports results in a maximum 125mb memory usage on system with 16GB physical ram.
On Saturday, July 28, 2012 8:01:06 AM UTC+2, Oren Eini wrote:
> Use a bloom filter instead
> On Saturday, July 28, 2012, Tobias Sebring wrote:
>> Okay. Got it. It's working now but I have to utilize a >> ConcurrentDictionary to make sure no duplicate keys are attempted to be >> saved to RavenDb. This sadly means saving 2,000,000 strings in memory >> throughout the entire import process which brings the server to it's knees. >> Speed is about 200k documents per minute.
>> Thank you for your help on this issue!
>> On Saturday, July 28, 2012 12:02:45 AM UTC+2, Oren Eini wrote:
>> I don't _know_ what the issue is, but an exception from the session >> render its state undefined. >> Do this without try/catch. >> Then see how many queries you make to the service.
>> On Fri, Jul 27, 2012 at 11:40 PM, Tobias Sebring <tsebr...@gmail.com>wrote:
>> The session does not throw an exception. The try-catch is there to catch >> when I try to insert a duplicate key because sometimes the dataset I'm >> working with is not consistent. Are you saying the try-catch could be the >> cause of the performance degradation? Only two or so >> NonUniqueObjectException are thrown in the entire 2M document dataset.
>> On Friday, July 27, 2012 10:17:05 PM UTC+2, Oren Eini wrote:
>> If the session throws an exception, you may no longer use the session.
>> On Fri, Jul 27, 2012 at 11:09 PM, Tobias Sebring <tsebr...@gmail.com>wrote:
>> For 1) - I'm having trouble implementing this without a tenfold >> degradation to performance. Any idea why? >> I'm using the client API to import batches of 1024 documents with the >> following code: >> using (var session = Store.OpenSession()) >> { >> foreach (var i in data.Batch) >> { >> session.Store(i); >> } >> session.SaveChanges();
On Sun, Jul 29, 2012 at 4:44 AM, Tobias Sebring <tsebr...@gmail.com> wrote:
> Got that fixed. Now I'm having trouble limiting the memory footprint of
> RavenDb. The memory consumption will gradually rise to 98% of physical ram
> at which point Windows 7 will start display warnings to close the program
> down and other applications will crash randomly.
> I've tried the following things to limit memory utilization in accordance
> with other threads in this group:
> I've turned off indexing:
> using (var webClient = new WebClient())
> {
> webClient.UseDefaultCredentials = true;
> var result = webClient.UploadString(new Uri(new Uri("http://localhost:8080"),
> "/admin/stopindexing"), "POST", "");
> }
> And disabled all caching:
> using (Store.DatabaseCommands.DisableAllCaching())
> {
> ... batch store / savechanges
> }
> Non of these seem to have any effect on memory usage of the application
> with RavenDb is running in embedded mode. Commenting out the few lines of
> RavenDb code that handles batch imports results in a maximum 125mb memory
> usage on system with 16GB physical ram.
> On Saturday, July 28, 2012 8:01:06 AM UTC+2, Oren Eini wrote:
>> Use a bloom filter instead
>> On Saturday, July 28, 2012, Tobias Sebring wrote:
>>> Okay. Got it. It's working now but I have to utilize a
>>> ConcurrentDictionary to make sure no duplicate keys are attempted to be
>>> saved to RavenDb. This sadly means saving 2,000,000 strings in memory
>>> throughout the entire import process which brings the server to it's knees.
>>> Speed is about 200k documents per minute.
>>> Thank you for your help on this issue!
>>> On Saturday, July 28, 2012 12:02:45 AM UTC+2, Oren Eini wrote:
>>> I don't _know_ what the issue is, but an exception from the session
>>> render its state undefined.
>>> Do this without try/catch.
>>> Then see how many queries you make to the service.
>>> On Fri, Jul 27, 2012 at 11:40 PM, Tobias Sebring <tsebr...@gmail.com>wrote:
>>> The session does not throw an exception. The try-catch is there to catch
>>> when I try to insert a duplicate key because sometimes the dataset I'm
>>> working with is not consistent. Are you saying the try-catch could be the
>>> cause of the performance degradation? Only two or so
>>> NonUniqueObjectException are thrown in the entire 2M document dataset.
>>> On Friday, July 27, 2012 10:17:05 PM UTC+2, Oren Eini wrote:
>>> If the session throws an exception, you may no longer use the session.
>>> On Fri, Jul 27, 2012 at 11:09 PM, Tobias Sebring <tsebr...@gmail.com>wrote:
>>> For 1) - I'm having trouble implementing this without a tenfold
>>> degradation to performance. Any idea why?
>>> I'm using the client API to import batches of 1024 documents with the
>>> following code:
>>> using (var session = Store.OpenSession())
>>> {
>>> foreach (var i in data.Batch)
>>> {
>>> session.Store(i);
>>> }
>>> session.SaveChanges();
Code with commented out lines: var bc = new BlockingCollection<IndexedBatch<TData>>(); var importTask = Task.Run(() => { bc.GetConsumingEnumerable() .AsParallel() .WithExecutionMode(ParallelExecutionMode.ForceParallelism) .WithMergeOptions(ParallelMergeOptions.NotBuffered) .ForAll(data => { var st = Stopwatch.StartNew(); //using (var session = Store.OpenSession()) //{ foreach (var i in data.Batch) { //session.Store(i);
}
//session.SaveChanges(); //}
Console.WriteLine(@"Batch imported {0} in {1} ms", data.Index, st.ElapsedMilliseconds);
}); });
Build is from NuGet a few days ago: <package id="RavenDB.Client" version="1.2.2044-Unstable" /> <package id="RavenDB.Database" version="1.2.2044-Unstable" /> <package id="RavenDB.Embedded" version="1.2.2044-Unstable" />
Batch size is 1024 from recommendation I picked up here in the group. I'm running multiple import jobs concurrently but also tried limiting that with .WithDegreeOfParallelism(1) and got the same result.
On Sunday, July 29, 2012 7:41:11 AM UTC+2, Oren Eini wrote:
> What lines did you comment? > What build are you using? > How many items are you using per SaveChanges call?
> On Sun, Jul 29, 2012 at 4:44 AM, Tobias Sebring <tsebr...@gmail.com>wrote:
>> Got that fixed. Now I'm having trouble limiting the memory footprint of >> RavenDb. The memory consumption will gradually rise to 98% of physical ram >> at which point Windows 7 will start display warnings to close the program >> down and other applications will crash randomly.
>> I've tried the following things to limit memory utilization in accordance >> with other threads in this group:
>> I've turned off indexing: >> using (var webClient = new WebClient()) >> { >> webClient.UseDefaultCredentials = true; >> var result = webClient.UploadString(new Uri(new Uri("http://localhost: >> 8080"), "/admin/stopindexing"), "POST", ""); >> }
>> And disabled all caching: >> using (Store.DatabaseCommands.DisableAllCaching()) >> { >> ... batch store / savechanges >> }
>> Non of these seem to have any effect on memory usage of the application >> with RavenDb is running in embedded mode. Commenting out the few lines of >> RavenDb code that handles batch imports results in a maximum 125mb memory >> usage on system with 16GB physical ram.
>> On Saturday, July 28, 2012 8:01:06 AM UTC+2, Oren Eini wrote:
>>> Use a bloom filter instead
>>> On Saturday, July 28, 2012, Tobias Sebring wrote:
>>>> Okay. Got it. It's working now but I have to utilize a >>>> ConcurrentDictionary to make sure no duplicate keys are attempted to be >>>> saved to RavenDb. This sadly means saving 2,000,000 strings in memory >>>> throughout the entire import process which brings the server to it's knees. >>>> Speed is about 200k documents per minute.
>>>> Thank you for your help on this issue!
>>>> On Saturday, July 28, 2012 12:02:45 AM UTC+2, Oren Eini wrote:
>>>> I don't _know_ what the issue is, but an exception from the session >>>> render its state undefined. >>>> Do this without try/catch. >>>> Then see how many queries you make to the service.
>>>> On Fri, Jul 27, 2012 at 11:40 PM, Tobias Sebring <tsebr...@gmail.com>wrote:
>>>> The session does not throw an exception. The try-catch is there to >>>> catch when I try to insert a duplicate key because sometimes the dataset >>>> I'm working with is not consistent. Are you saying the try-catch could be >>>> the cause of the performance degradation? Only two or so >>>> NonUniqueObjectException are thrown in the entire 2M document dataset.
>>>> On Friday, July 27, 2012 10:17:05 PM UTC+2, Oren Eini wrote:
>>>> If the session throws an exception, you may no longer use the session.
>>>> On Fri, Jul 27, 2012 at 11:09 PM, Tobias Sebring <tsebr...@gmail.com>wrote:
>>>> For 1) - I'm having trouble implementing this without a tenfold >>>> degradation to performance. Any idea why? >>>> I'm using the client API to import batches of 1024 documents with the >>>> following code: >>>> using (var session = Store.OpenSession()) >>>> { >>>> foreach (var i in data.Batch) >>>> { >>>> session.Store(i); >>>> } >>>> session.SaveChanges();
> Build is from NuGet a few days ago:
> <package id="RavenDB.Client" version="1.2.2044-Unstable" />
> <package id="RavenDB.Database" version="1.2.2044-Unstable" />
> <package id="RavenDB.Embedded" version="1.2.2044-Unstable" />
> Batch size is 1024 from recommendation I picked up here in the group. I'm
> running multiple import jobs concurrently but also tried limiting that
> with .WithDegreeOfParallelism(1) and got the same result.
> On Sunday, July 29, 2012 7:41:11 AM UTC+2, Oren Eini wrote:
>> What lines did you comment?
>> What build are you using?
>> How many items are you using per SaveChanges call?
>> On Sun, Jul 29, 2012 at 4:44 AM, Tobias Sebring <tsebr...@gmail.com>wrote:
>>> Got that fixed. Now I'm having trouble limiting the memory footprint of
>>> RavenDb. The memory consumption will gradually rise to 98% of physical ram
>>> at which point Windows 7 will start display warnings to close the program
>>> down and other applications will crash randomly.
>>> I've tried the following things to limit memory utilization in
>>> accordance with other threads in this group:
>>> I've turned off indexing:
>>> using (var webClient = new WebClient())
>>> {
>>> webClient.**UseDefaultCredentials = true;
>>> var result = webClient.UploadString(new Uri(new Uri("http://localhost: >>> 8080"), "/admin/stopindexing"), "POST", "");
>>> }
>>> And disabled all caching:
>>> using (Store.DatabaseCommands.**DisableAllCaching())
>>> {
>>> ... batch store / savechanges
>>> }
>>> Non of these seem to have any effect on memory usage of the application
>>> with RavenDb is running in embedded mode. Commenting out the few lines of
>>> RavenDb code that handles batch imports results in a maximum 125mb memory
>>> usage on system with 16GB physical ram.
>>> On Saturday, July 28, 2012 8:01:06 AM UTC+2, Oren Eini wrote:
>>>> Use a bloom filter instead
>>>> On Saturday, July 28, 2012, Tobias Sebring wrote:
>>>>> Okay. Got it. It's working now but I have to utilize a
>>>>> ConcurrentDictionary to make sure no duplicate keys are attempted to be
>>>>> saved to RavenDb. This sadly means saving 2,000,000 strings in memory
>>>>> throughout the entire import process which brings the server to it's knees.
>>>>> Speed is about 200k documents per minute.
>>>>> Thank you for your help on this issue!
>>>>> On Saturday, July 28, 2012 12:02:45 AM UTC+2, Oren Eini wrote:
>>>>> I don't _know_ what the issue is, but an exception from the session
>>>>> render its state undefined.
>>>>> Do this without try/catch.
>>>>> Then see how many queries you make to the service.
>>>>> On Fri, Jul 27, 2012 at 11:40 PM, Tobias Sebring <tsebr...@gmail.com>wrote:
>>>>> The session does not throw an exception. The try-catch is there to
>>>>> catch when I try to insert a duplicate key because sometimes the dataset
>>>>> I'm working with is not consistent. Are you saying the try-catch could be
>>>>> the cause of the performance degradation? Only two or so
>>>>> NonUniqueObjectException are thrown in the entire 2M document dataset.
>>>>> On Friday, July 27, 2012 10:17:05 PM UTC+2, Oren Eini wrote:
>>>>> If the session throws an exception, you may no longer use the session.
>>>>> On Fri, Jul 27, 2012 at 11:09 PM, Tobias Sebring <tsebr...@gmail.com>wrote:
>>>>> For 1) - I'm having trouble implementing this without a tenfold
>>>>> degradation to performance. Any idea why?
>>>>> I'm using the client API to import batches of 1024 documents with the
>>>>> following code:
>>>>> using (var session = Store.OpenSession())
>>>>> {
>>>>> foreach (var i in data.Batch)
>>>>> {
>>>>> session.Store(i);
>>>>> }
>>>>> session.SaveChanges();
>> Build is from NuGet a few days ago: >> <package id="RavenDB.Client" version="1.2.2044-Unstable" /> >> <package id="RavenDB.Database" version="1.2.2044-Unstable" /> >> <package id="RavenDB.Embedded" version="1.2.2044-Unstable" />
>> Batch size is 1024 from recommendation I picked up here in the group. I'm >> running multiple import jobs concurrently but also tried limiting that >> with .WithDegreeOfParallelism(1) and got the same result.
>> On Sunday, July 29, 2012 7:41:11 AM UTC+2, Oren Eini wrote:
>>> What lines did you comment? >>> What build are you using? >>> How many items are you using per SaveChanges call?
>>> On Sun, Jul 29, 2012 at 4:44 AM, Tobias Sebring <tsebr...@gmail.com>wrote:
>>>> Got that fixed. Now I'm having trouble limiting the memory footprint of >>>> RavenDb. The memory consumption will gradually rise to 98% of physical ram >>>> at which point Windows 7 will start display warnings to close the program >>>> down and other applications will crash randomly.
>>>> I've tried the following things to limit memory utilization in >>>> accordance with other threads in this group:
>>>> I've turned off indexing: >>>> using (var webClient = new WebClient()) >>>> { >>>> webClient.**UseDefaultCredentials = true; >>>> var result = webClient.UploadString(new Uri(new Uri("http://localhost: >>>> 8080"), "/admin/stopindexing"), "POST", ""); >>>> }
>>>> And disabled all caching: >>>> using (Store.DatabaseCommands.**DisableAllCaching()) >>>> { >>>> ... batch store / savechanges >>>> }
>>>> Non of these seem to have any effect on memory usage of the application >>>> with RavenDb is running in embedded mode. Commenting out the few lines of >>>> RavenDb code that handles batch imports results in a maximum 125mb memory >>>> usage on system with 16GB physical ram.
>>>> On Saturday, July 28, 2012 8:01:06 AM UTC+2, Oren Eini wrote:
>>>>> Use a bloom filter instead
>>>>> On Saturday, July 28, 2012, Tobias Sebring wrote:
>>>>>> Okay. Got it. It's working now but I have to utilize a >>>>>> ConcurrentDictionary to make sure no duplicate keys are attempted to be >>>>>> saved to RavenDb. This sadly means saving 2,000,000 strings in memory >>>>>> throughout the entire import process which brings the server to it's knees. >>>>>> Speed is about 200k documents per minute.
>>>>>> Thank you for your help on this issue!
>>>>>> On Saturday, July 28, 2012 12:02:45 AM UTC+2, Oren Eini wrote:
>>>>>> I don't _know_ what the issue is, but an exception from the session >>>>>> render its state undefined. >>>>>> Do this without try/catch. >>>>>> Then see how many queries you make to the service.
>>>>>> On Fri, Jul 27, 2012 at 11:40 PM, Tobias Sebring <tsebr...@gmail.com>wrote:
>>>>>> The session does not throw an exception. The try-catch is there to >>>>>> catch when I try to insert a duplicate key because sometimes the dataset >>>>>> I'm working with is not consistent. Are you saying the try-catch could be >>>>>> the cause of the performance degradation? Only two or so >>>>>> NonUniqueObjectException are thrown in the entire 2M document dataset.
>>>>>> On Friday, July 27, 2012 10:17:05 PM UTC+2, Oren Eini wrote:
>>>>>> If the session throws an exception, you may no longer use the session.
>>>>>> On Fri, Jul 27, 2012 at 11:09 PM, Tobias Sebring <tsebr...@gmail.com>wrote:
>>>>>> For 1) - I'm having trouble implementing this without a tenfold >>>>>> degradation to performance. Any idea why? >>>>>> I'm using the client API to import batches of 1024 documents with the >>>>>> following code: >>>>>> using (var session = Store.OpenSession()) >>>>>> { >>>>>> foreach (var i in data.Batch) >>>>>> { >>>>>> session.Store(i); >>>>>> } >>>>>> session.SaveChanges();
>>>>>> On Fri, Jul 27, 2012 at 7:16 PM, Tobias Sebring <tsebr...@gmail.com>wrote:
>>>>>> I'm using RavenDb to do bulk inserts from a large datadump similar to >>>>>> the process outlined by Ayende here: http <http://ayende.com/blog/>
>>> Build is from NuGet a few days ago:
>>> <package id="RavenDB.Client" version="1.2.2044-Unstable" />
>>> <package id="RavenDB.Database" version="1.2.2044-Unstable" />
>>> <package id="RavenDB.Embedded" version="1.2.2044-Unstable" />
>>> Batch size is 1024 from recommendation I picked up here in the group.
>>> I'm running multiple import jobs concurrently but also tried limiting that
>>> with .WithDegreeOfParallelism(**1) and got the same result.
>>> On Sunday, July 29, 2012 7:41:11 AM UTC+2, Oren Eini wrote:
>>>> What lines did you comment?
>>>> What build are you using?
>>>> How many items are you using per SaveChanges call?
>>>> On Sun, Jul 29, 2012 at 4:44 AM, Tobias Sebring <tsebr...@gmail.com>wrote:
>>>>> Got that fixed. Now I'm having trouble limiting the memory footprint
>>>>> of RavenDb. The memory consumption will gradually rise to 98% of physical
>>>>> ram at which point Windows 7 will start display warnings to close the
>>>>> program down and other applications will crash randomly.
>>>>> I've tried the following things to limit memory utilization in
>>>>> accordance with other threads in this group:
>>>>> I've turned off indexing:
>>>>> using (var webClient = new WebClient())
>>>>> {
>>>>> webClient.**UseDefaultCredential**s = true;
>>>>> var result = webClient.UploadString(new Uri(new Uri("http://localhost: >>>>> 8080"), "/admin/stopindexing"), "POST", "");
>>>>> }
>>>>> And disabled all caching:
>>>>> using (Store.DatabaseCommands.**Disabl**eAllCaching())
>>>>> {
>>>>> ... batch store / savechanges
>>>>> }
>>>>> Non of these seem to have any effect on memory usage of the
>>>>> application with RavenDb is running in embedded mode. Commenting out the
>>>>> few lines of RavenDb code that handles batch imports results in a maximum
>>>>> 125mb memory usage on system with 16GB physical ram.
>>>>> On Saturday, July 28, 2012 8:01:06 AM UTC+2, Oren Eini wrote:
>>>>>> Use a bloom filter instead
>>>>>> On Saturday, July 28, 2012, Tobias Sebring wrote:
>>>>>>> Okay. Got it. It's working now but I have to utilize a
>>>>>>> ConcurrentDictionary to make sure no duplicate keys are attempted to be
>>>>>>> saved to RavenDb. This sadly means saving 2,000,000 strings in memory
>>>>>>> throughout the entire import process which brings the server to it's knees.
>>>>>>> Speed is about 200k documents per minute.
>>>>>>> Thank you for your help on this issue!
>>>>>>> On Saturday, July 28, 2012 12:02:45 AM UTC+2, Oren Eini wrote:
>>>>>>> I don't _know_ what the issue is, but an exception from the session
>>>>>>> render its state undefined.
>>>>>>> Do this without try/catch.
>>>>>>> Then see how many queries you make to the service.
>>>>>>> On Fri, Jul 27, 2012 at 11:40 PM, Tobias Sebring <tsebr...@gmail.com
>>>>>>> > wrote:
>>>>>>> The session does not throw an exception. The try-catch is there to
>>>>>>> catch when I try to insert a duplicate key because sometimes the dataset
>>>>>>> I'm working with is not consistent. Are you saying the try-catch could be
>>>>>>> the cause of the performance degradation? Only two or so
>>>>>>> NonUniqueObjectException are thrown in the entire 2M document dataset.
>>>>>>> On Friday, July 27, 2012 10:17:05 PM UTC+2, Oren Eini wrote:
>>>>>>> If the session throws an exception, you may no longer use the
>>>>>>> session.
>>>>>>> On Fri, Jul 27, 2012 at 11:09 PM, Tobias Sebring <tsebr...@gmail.com
>>>>>>> > wrote:
>>>>>>> For 1) - I'm having trouble implementing this without a tenfold
>>>>>>> degradation to performance. Any idea why?
>>>>>>> I'm using the client API to import batches of 1024 documents with
>>>>>>> the following code:
>>>>>>> using (var session = Store.OpenSession())
>>>>>>> {
>>>>>>> foreach (var i in data.Batch)
>>>>>>> {
>>>>>>> session.Store(i);
>>>>>>> }
>>>>>>> session.SaveChanges();
>>>>>>> On Fri, Jul 27, 2012 at 7:16 PM, Tobias Sebring <tsebr...@gmail.com
>>>>>>> > wrote:
>>>>>>> I'm using RavenDb to do bulk inserts from a large datadump similar
>>>>>>> to the process outlined by Ayende here: http<http://ayende.com/blog/>
I made the threading optional in the original repro controlled in by a boolean at the top of main() to show off the real code before I made it sequential: var runInParallel = false;
Note. that the ConcurrentDictionary is only ever accessed sequentially and I left it in there because it is one of the few things in the non-ravendb targeted code that will allocate a big chunk of memory.
>>>> Build is from NuGet a few days ago: >>>> <package id="RavenDB.Client" version="1.2.2044-Unstable" /> >>>> <package id="RavenDB.Database" version="1.2.2044-Unstable" /> >>>> <package id="RavenDB.Embedded" version="1.2.2044-Unstable" />
>>>> Batch size is 1024 from recommendation I picked up here in the group. >>>> I'm running multiple import jobs concurrently but also tried limiting that >>>> with .WithDegreeOfParallelism(**1) and got the same result.
>>>> On Sunday, July 29, 2012 7:41:11 AM UTC+2, Oren Eini wrote:
>>>>> What lines did you comment? >>>>> What build are you using? >>>>> How many items are you using per SaveChanges call?
>>>>> On Sun, Jul 29, 2012 at 4:44 AM, Tobias Sebring <tsebr...@gmail.com>wrote:
>>>>>> Got that fixed. Now I'm having trouble limiting the memory footprint >>>>>> of RavenDb. The memory consumption will gradually rise to 98% of physical >>>>>> ram at which point Windows 7 will start display warnings to close the >>>>>> program down and other applications will crash randomly.
>>>>>> I've tried the following things to limit memory utilization in >>>>>> accordance with other threads in this group:
>>>>>> I've turned off indexing: >>>>>> using (var webClient = new WebClient()) >>>>>> { >>>>>> webClient.**UseDefaultCredential**s = true; >>>>>> var result = webClient.UploadString(new Uri(new Uri("http://localhost >>>>>> :8080"), "/admin/stopindexing"), "POST", ""); >>>>>> }
>>>>>> And disabled all caching: >>>>>> using (Store.DatabaseCommands.**Disabl**eAllCaching()) >>>>>> { >>>>>> ... batch store / savechanges >>>>>> }
>>>>>> Non of these seem to have any effect on memory usage of the >>>>>> application with RavenDb is running in embedded mode. Commenting out the >>>>>> few lines of RavenDb code that handles batch imports results in a maximum >>>>>> 125mb memory usage on system with 16GB physical ram.
>>>>>> On Saturday, July 28, 2012 8:01:06 AM UTC+2, Oren Eini wrote:
>>>>>>> Use a bloom filter instead
>>>>>>> On Saturday, July 28, 2012, Tobias Sebring wrote:
>>>>>>>> Okay. Got it. It's working now but I have to utilize a >>>>>>>> ConcurrentDictionary to make sure no duplicate keys are attempted to be >>>>>>>> saved to RavenDb. This sadly means saving 2,000,000 strings in memory >>>>>>>> throughout the entire import process which brings the server to it's knees. >>>>>>>> Speed is about 200k documents per minute.
>>>>>>>> Thank you for your help on this issue!
>>>>>>>> On Saturday, July 28, 2012 12:02:45 AM UTC+2, Oren Eini wrote:
>>>>>>>> I don't _know_ what the issue is, but an exception from the session >>>>>>>> render its state undefined. >>>>>>>> Do this without try/catch. >>>>>>>> Then see how many queries you make to the service.
>>>>>>>> On Fri, Jul 27, 2012 at 11:40 PM, Tobias Sebring < >>>>>>>> tsebr...@gmail.com> wrote:
>>>>>>>> The session does not throw an exception. The try-catch is there to >>>>>>>> catch when I try to insert a duplicate key because sometimes the dataset >>>>>>>> I'm working with is not consistent. Are you saying the try-catch could be >>>>>>>> the cause of the performance degradation? Only two or so >>>>>>>> NonUniqueObjectException are thrown in the entire 2M document dataset.
>>>>>>>> On Friday, July 27, 2012 10:17:05 PM UTC+2, Oren Eini wrote:
>>>>>>>> If the session throws an exception, you may no longer use the >>>>>>>> session.
>>>>>>>> On Fri, Jul 27, 2012 at 11:09 PM, Tobias Sebring < >>>>>>>> tsebr...@gmail.com> wrote:
>>>>>>>> For 1) - I'm having trouble implementing this without a tenfold >>>>>>>> degradation to performance. Any idea why? >>>>>>>> I'm using the client API to import batches of 1024 documents with >>>>>>>> the following code: >>>>>>>> using (var session = Store.OpenSession()) >>>>>>>> { >>>>>>>> foreach (var i in data.Batch) >>>>>>>> { >>>>>>>> session.Store(i); >>>>>>>> } >>>>>>>> session.SaveChanges();
>>>>>>>> On Fri, Jul 27, 2012 at 7:16 PM, Tobias Sebring < >>>>>>>> tsebr...@gmail.com> wrote:
>>>>>>>> I'm using RavenDb to do bulk inserts from a large datadump similar >>>>>>>> to the process outlined by Ayende here: http<http://ayende.com/blog/>
I just noticed that clean solution wouldn't delete the files under
obj/ hence the archive attached was quite large. This is the same
repro as DataImport2.zip but it's 8kb instead of 25mb:
https://dl.dropbox.com/u/6420016/DataImport2-small.zip
On Jul 29, 4:57 pm, Tobias Sebring <tsebr...@gmail.com> wrote:
> I made the threading optional in the original repro controlled in by a
> boolean at the top of main() to show off the real code before I made it
> sequential:
> var runInParallel = false;
> Note. that the ConcurrentDictionary is only ever accessed sequentially and
> I left it in there because it is one of the few things in the non-ravendb
> targeted code that will allocate a big chunk of memory.
> On Sunday, July 29, 2012 3:52:27 PM UTC+2, Oren Eini wrote:
> > I can't follow the code, please create a repro without all the threading
> > complexity there.
> > On Sun, Jul 29, 2012 at 4:38 PM, Tobias Sebring <tsebr...@gmail.com>wrote:
> >>>> Build is from NuGet a few days ago:
> >>>> <package id="RavenDB.Client" version="1.2.2044-Unstable" />
> >>>> <package id="RavenDB.Database" version="1.2.2044-Unstable" />
> >>>> <package id="RavenDB.Embedded" version="1.2.2044-Unstable" />
> >>>> Batch size is 1024 from recommendation I picked up here in the group.
> >>>> I'm running multiple import jobs concurrently but also tried limiting that
> >>>> with .WithDegreeOfParallelism(**1) and got the same result.
> >>>> On Sunday, July 29, 2012 7:41:11 AM UTC+2, Oren Eini wrote:
> >>>>> What lines did you comment?
> >>>>> What build are you using?
> >>>>> How many items are you using per SaveChanges call?
> >>>>> On Sun, Jul 29, 2012 at 4:44 AM, Tobias Sebring <tsebr...@gmail.com>wrote:
> >>>>>> Got that fixed. Now I'm having trouble limiting the memory footprint
> >>>>>> of RavenDb. The memory consumption will gradually rise to 98% of physical
> >>>>>> ram at which point Windows 7 will start display warnings to close the
> >>>>>> program down and other applications will crash randomly.
> >>>>>> I've tried the following things to limit memory utilization in
> >>>>>> accordance with other threads in this group:
> >>>>>> I've turned off indexing:
> >>>>>> using (var webClient = new WebClient())
> >>>>>> {
> >>>>>> webClient.**UseDefaultCredential**s = true;
> >>>>>> var result = webClient.UploadString(new Uri(new Uri("http://localhost > >>>>>> :8080"), "/admin/stopindexing"), "POST", "");
> >>>>>> }
> >>>>>> And disabled all caching:
> >>>>>> using (Store.DatabaseCommands.**Disabl**eAllCaching())
> >>>>>> {
> >>>>>> ... batch store / savechanges
> >>>>>> }
> >>>>>> Non of these seem to have any effect on memory usage of the
> >>>>>> application with RavenDb is running in embedded mode. Commenting out the
> >>>>>> few lines of RavenDb code that handles batch imports results in a maximum
> >>>>>> 125mb memory usage on system with 16GB physical ram.
> >>>>>> On Saturday, July 28, 2012 8:01:06 AM UTC+2, Oren Eini wrote:
> >>>>>>> Use a bloom filter instead
> >>>>>>> On Saturday, July 28, 2012, Tobias Sebring wrote:
> >>>>>>>> Okay. Got it. It's working now but I have to utilize a
> >>>>>>>> ConcurrentDictionary to make sure no duplicate keys are attempted to be
> >>>>>>>> saved to RavenDb. This sadly means saving 2,000,000 strings in memory
> >>>>>>>> throughout the entire import process which brings the server to it's knees.
> >>>>>>>> Speed is about 200k documents per minute.
> >>>>>>>> Thank you for your help on this issue!
> >>>>>>>> On Saturday, July 28, 2012 12:02:45 AM UTC+2, Oren Eini wrote:
> >>>>>>>> I don't _know_ what the issue is, but an exception from the session
> >>>>>>>> render its state undefined.
> >>>>>>>> Do this without try/catch.
> >>>>>>>> Then see how many queries you make to the service.
> >>>>>>>> On Fri, Jul 27, 2012 at 11:40 PM, Tobias Sebring <
> >>>>>>>> tsebr...@gmail.com> wrote:
> >>>>>>>> The session does not throw an exception. The try-catch is there to
> >>>>>>>> catch when I try to insert a duplicate key because sometimes the dataset
> >>>>>>>> I'm working with is not consistent. Are you saying the try-catch could be
> >>>>>>>> the cause of the performance degradation? Only two or so
> >>>>>>>> NonUniqueObjectException are thrown in the entire 2M document dataset.
> >>>>>>>> On Friday, July 27, 2012 10:17:05 PM UTC+2, Oren Eini wrote:
> >>>>>>>> If the session throws an exception, you may no longer use the
> >>>>>>>> session.
> >>>>>>>> On Fri, Jul 27, 2012 at 11:09 PM, Tobias Sebring <
> >>>>>>>> tsebr...@gmail.com> wrote:
> >>>>>>>> For 1) - I'm having trouble implementing this without a tenfold
> >>>>>>>> degradation to performance. Any idea why?
> >>>>>>>> I'm using the client API to import batches of 1024 documents with
> >>>>>>>> the following code:
> >>>>>>>> using (var session = Store.OpenSession())
> >>>>>>>> {
> >>>>>>>> foreach (var i in data.Batch)
> >>>>>>>> {
> >>>>>>>> session.Store(i);
> >>>>>>>> }
> >>>>>>>> session.SaveChanges();
> >>>>>>>> On Fri, Jul 27, 2012 at 7:16 PM, Tobias Sebring <
> >>>>>>>> tsebr...@gmail.com> wrote:
> >>>>>>>> I'm using RavenDb to do bulk inserts from a large datadump similar
> >>>>>>>> to the process outlined by Ayende here: http<http://ayende.com/blog/>
On Sun, Jul 29, 2012 at 7:48 PM, Tobias Sebring <tsebr...@gmail.com> wrote:
> I just noticed that clean solution wouldn't delete the files under
> obj/ hence the archive attached was quite large. This is the same
> repro as DataImport2.zip but it's 8kb instead of 25mb:
> https://dl.dropbox.com/u/6420016/DataImport2-small.zip
> On Jul 29, 4:57 pm, Tobias Sebring <tsebr...@gmail.com> wrote:
> > I made the threading optional in the original repro controlled in by a
> > boolean at the top of main() to show off the real code before I made it
> > sequential:
> > var runInParallel = false;
> > Note. that the ConcurrentDictionary is only ever accessed sequentially
> and
> > I left it in there because it is one of the few things in the non-ravendb
> > targeted code that will allocate a big chunk of memory.
> > On Sunday, July 29, 2012 3:52:27 PM UTC+2, Oren Eini wrote:
> > > I can't follow the code, please create a repro without all the
> threading
> > > complexity there.
> > > On Sun, Jul 29, 2012 at 4:38 PM, Tobias Sebring <tsebr...@gmail.com
> >wrote:
> > >>>> Build is from NuGet a few days ago:
> > >>>> <package id="RavenDB.Client" version="1.2.2044-Unstable" />
> > >>>> <package id="RavenDB.Database" version="1.2.2044-Unstable" />
> > >>>> <package id="RavenDB.Embedded" version="1.2.2044-Unstable" />
> > >>>> Batch size is 1024 from recommendation I picked up here in the
> group.
> > >>>> I'm running multiple import jobs concurrently but also tried
> limiting that
> > >>>> with .WithDegreeOfParallelism(**1) and got the same result.
> > >>>> On Sunday, July 29, 2012 7:41:11 AM UTC+2, Oren Eini wrote:
> > >>>>> What lines did you comment?
> > >>>>> What build are you using?
> > >>>>> How many items are you using per SaveChanges call?
> > >>>>> On Sun, Jul 29, 2012 at 4:44 AM, Tobias Sebring <
> tsebr...@gmail.com>wrote:
> > >>>>>> Got that fixed. Now I'm having trouble limiting the memory
> footprint
> > >>>>>> of RavenDb. The memory consumption will gradually rise to 98% of
> physical
> > >>>>>> ram at which point Windows 7 will start display warnings to close
> the
> > >>>>>> program down and other applications will crash randomly.
> > >>>>>> I've tried the following things to limit memory utilization in
> > >>>>>> accordance with other threads in this group:
> > >>>>>> I've turned off indexing:
> > >>>>>> using (var webClient = new WebClient())
> > >>>>>> {
> > >>>>>> webClient.**UseDefaultCredential**s = true;
> > >>>>>> var result = webClient.UploadString(new Uri(new Uri("
> http://localhost > > >>>>>> :8080"), "/admin/stopindexing"), "POST", "");
> > >>>>>> }
> > >>>>>> And disabled all caching:
> > >>>>>> using (Store.DatabaseCommands.**Disabl**eAllCaching())
> > >>>>>> {
> > >>>>>> ... batch store / savechanges
> > >>>>>> }
> > >>>>>> Non of these seem to have any effect on memory usage of the
> > >>>>>> application with RavenDb is running in embedded mode. Commenting
> out the
> > >>>>>> few lines of RavenDb code that handles batch imports results in a
> maximum
> > >>>>>> 125mb memory usage on system with 16GB physical ram.
> > >>>>>> On Saturday, July 28, 2012 8:01:06 AM UTC+2, Oren Eini wrote:
> > >>>>>>> Use a bloom filter instead
> > >>>>>>> On Saturday, July 28, 2012, Tobias Sebring wrote:
> > >>>>>>>> Okay. Got it. It's working now but I have to utilize a
> > >>>>>>>> ConcurrentDictionary to make sure no duplicate keys are
> attempted to be
> > >>>>>>>> saved to RavenDb. This sadly means saving 2,000,000 strings in
> memory
> > >>>>>>>> throughout the entire import process which brings the server to
> it's knees.
> > >>>>>>>> Speed is about 200k documents per minute.
> > >>>>>>>> Thank you for your help on this issue!
> > >>>>>>>> On Saturday, July 28, 2012 12:02:45 AM UTC+2, Oren Eini wrote:
> > >>>>>>>> I don't _know_ what the issue is, but an exception from the
> session
> > >>>>>>>> render its state undefined.
> > >>>>>>>> Do this without try/catch.
> > >>>>>>>> Then see how many queries you make to the service.
> > >>>>>>>> On Fri, Jul 27, 2012 at 11:40 PM, Tobias Sebring <
> > >>>>>>>> tsebr...@gmail.com> wrote:
> > >>>>>>>> The session does not throw an exception. The try-catch is there
> to
> > >>>>>>>> catch when I try to insert a duplicate key because sometimes
> the dataset
> > >>>>>>>> I'm working with is not consistent. Are you saying the
> try-catch could be
> > >>>>>>>> the cause of the performance degradation? Only two or so
> > >>>>>>>> NonUniqueObjectException are thrown in the entire 2M document
> dataset.
> > >>>>>>>> On Friday, July 27, 2012 10:17:05 PM UTC+2, Oren Eini wrote:
> > >>>>>>>> If the session throws an exception, you may no longer use the
> > >>>>>>>> session.
> > >>>>>>>> On Fri, Jul 27, 2012 at 11:09 PM, Tobias Sebring <
> > >>>>>>>> tsebr...@gmail.com> wrote:
> > >>>>>>>> For 1) - I'm having trouble implementing this without a tenfold
> > >>>>>>>> degradation to performance. Any idea why?
> > >>>>>>>> I'm using the client API to import batches of 1024 documents
> with
> > >>>>>>>> the following code:
> > >>>>>>>> using (var session = Store.OpenSession())
> > >>>>>>>> {
> > >>>>>>>> foreach (var i in data.Batch)
> > >>>>>>>> {
> > >>>>>>>> session.Store(i);
> > >>>>>>>> }
> > >>>>>>>> session.SaveChanges();
> > >>>>>>>> On Fri, Jul 27, 2012 at 7:16 PM, Tobias Sebring <
> > >>>>>>>> tsebr...@gmail.com> wrote:
> > >>>>>>>> I'm using RavenDb to do bulk inserts from a large datadump
> similar
> > >>>>>>>> to the process outlined by Ayende here: http<
> http://ayende.com/blog/>
*snort*
The problem was that you called StopIndexing, that caused us to hold in
memory stuff until indexing would resume.
It is a bug that wasn't exposed until this exact scenario (large import
with indexing disabled), this being a rare case, we didn't notice that.
Thanks for this, fixed now and will be out in a few minutes.
On Sun, Jul 29, 2012 at 8:55 PM, Oren Eini (Ayende Rahien) <
aye...@ayende.com> wrote:
> Thanks, reproduced and testing this now.
> On Sun, Jul 29, 2012 at 7:48 PM, Tobias Sebring <tsebr...@gmail.com>wrote:
>> I just noticed that clean solution wouldn't delete the files under
>> obj/ hence the archive attached was quite large. This is the same
>> repro as DataImport2.zip but it's 8kb instead of 25mb:
>> https://dl.dropbox.com/u/6420016/DataImport2-small.zip
>> On Jul 29, 4:57 pm, Tobias Sebring <tsebr...@gmail.com> wrote:
>> > I made the threading optional in the original repro controlled in by a
>> > boolean at the top of main() to show off the real code before I made it
>> > sequential:
>> > var runInParallel = false;
>> > Note. that the ConcurrentDictionary is only ever accessed sequentially
>> and
>> > I left it in there because it is one of the few things in the
>> non-ravendb
>> > targeted code that will allocate a big chunk of memory.
>> > On Sunday, July 29, 2012 3:52:27 PM UTC+2, Oren Eini wrote:
>> > > I can't follow the code, please create a repro without all the
>> threading
>> > > complexity there.
>> > > On Sun, Jul 29, 2012 at 4:38 PM, Tobias Sebring <tsebr...@gmail.com
>> >wrote:
>> > >>>> Build is from NuGet a few days ago:
>> > >>>> <package id="RavenDB.Client" version="1.2.2044-Unstable" />
>> > >>>> <package id="RavenDB.Database" version="1.2.2044-Unstable" />
>> > >>>> <package id="RavenDB.Embedded" version="1.2.2044-Unstable" />
>> > >>>> Batch size is 1024 from recommendation I picked up here in the
>> group.
>> > >>>> I'm running multiple import jobs concurrently but also tried
>> limiting that
>> > >>>> with .WithDegreeOfParallelism(**1) and got the same result.
>> > >>>> On Sunday, July 29, 2012 7:41:11 AM UTC+2, Oren Eini wrote:
>> > >>>>> What lines did you comment?
>> > >>>>> What build are you using?
>> > >>>>> How many items are you using per SaveChanges call?
>> > >>>>> On Sun, Jul 29, 2012 at 4:44 AM, Tobias Sebring <
>> tsebr...@gmail.com>wrote:
>> > >>>>>> Got that fixed. Now I'm having trouble limiting the memory
>> footprint
>> > >>>>>> of RavenDb. The memory consumption will gradually rise to 98% of
>> physical
>> > >>>>>> ram at which point Windows 7 will start display warnings to
>> close the
>> > >>>>>> program down and other applications will crash randomly.
>> > >>>>>> I've tried the following things to limit memory utilization in
>> > >>>>>> accordance with other threads in this group:
>> > >>>>>> I've turned off indexing:
>> > >>>>>> using (var webClient = new WebClient())
>> > >>>>>> {
>> > >>>>>> webClient.**UseDefaultCredential**s = true;
>> > >>>>>> var result = webClient.UploadString(new Uri(new Uri("
>> http://localhost >> > >>>>>> :8080"), "/admin/stopindexing"), "POST", "");
>> > >>>>>> }
>> > >>>>>> And disabled all caching:
>> > >>>>>> using (Store.DatabaseCommands.**Disabl**eAllCaching())
>> > >>>>>> {
>> > >>>>>> ... batch store / savechanges
>> > >>>>>> }
>> > >>>>>> Non of these seem to have any effect on memory usage of the
>> > >>>>>> application with RavenDb is running in embedded mode. Commenting
>> out the
>> > >>>>>> few lines of RavenDb code that handles batch imports results in
>> a maximum
>> > >>>>>> 125mb memory usage on system with 16GB physical ram.
>> > >>>>>> On Saturday, July 28, 2012 8:01:06 AM UTC+2, Oren Eini wrote:
>> > >>>>>>> Use a bloom filter instead
>> > >>>>>>> On Saturday, July 28, 2012, Tobias Sebring wrote:
>> > >>>>>>>> Okay. Got it. It's working now but I have to utilize a
>> > >>>>>>>> ConcurrentDictionary to make sure no duplicate keys are
>> attempted to be
>> > >>>>>>>> saved to RavenDb. This sadly means saving 2,000,000 strings in
>> memory
>> > >>>>>>>> throughout the entire import process which brings the server
>> to it's knees.
>> > >>>>>>>> Speed is about 200k documents per minute.
>> > >>>>>>>> Thank you for your help on this issue!
>> > >>>>>>>> On Saturday, July 28, 2012 12:02:45 AM UTC+2, Oren Eini wrote:
>> > >>>>>>>> I don't _know_ what the issue is, but an exception from the
>> session
>> > >>>>>>>> render its state undefined.
>> > >>>>>>>> Do this without try/catch.
>> > >>>>>>>> Then see how many queries you make to the service.
>> > >>>>>>>> On Fri, Jul 27, 2012 at 11:40 PM, Tobias Sebring <
>> > >>>>>>>> tsebr...@gmail.com> wrote:
>> > >>>>>>>> The session does not throw an exception. The try-catch is
>> there to
>> > >>>>>>>> catch when I try to insert a duplicate key because sometimes
>> the dataset
>> > >>>>>>>> I'm working with is not consistent. Are you saying the
>> try-catch could be
>> > >>>>>>>> the cause of the performance degradation? Only two or so
>> > >>>>>>>> NonUniqueObjectException are thrown in the entire 2M document
>> dataset.
>> > >>>>>>>> On Friday, July 27, 2012 10:17:05 PM UTC+2, Oren Eini wrote:
>> > >>>>>>>> If the session throws an exception, you may no longer use the
>> > >>>>>>>> session.
>> > >>>>>>>> On Fri, Jul 27, 2012 at 11:09 PM, Tobias Sebring <
>> > >>>>>>>> tsebr...@gmail.com> wrote:
>> > >>>>>>>> For 1) - I'm having trouble implementing this without a tenfold
>> > >>>>>>>> degradation to performance. Any idea why?
>> > >>>>>>>> I'm using the client API to import batches of 1024 documents
>> with
>> > >>>>>>>> the following code:
>> > >>>>>>>> using (var session = Store.OpenSession())
>> > >>>>>>>> {
>> > >>>>>>>> foreach (var i in data.Batch)
>> > >>>>>>>> {
>> > >>>>>>>> session.Store(i);
>> > >>>>>>>> }
>> > >>>>>>>> session.SaveChanges();
On Sunday, July 29, 2012 7:58:44 PM UTC+2, Oren Eini wrote:
> *snort* > The problem was that you called StopIndexing, that caused us to hold in > memory stuff until indexing would resume. > It is a bug that wasn't exposed until this exact scenario (large import > with indexing disabled), this being a rare case, we didn't notice that. > Thanks for this, fixed now and will be out in a few minutes.
> On Sun, Jul 29, 2012 at 8:55 PM, Oren Eini (Ayende Rahien) < > aye...@ayende.com> wrote:
>> Thanks, reproduced and testing this now.
>> On Sun, Jul 29, 2012 at 7:48 PM, Tobias Sebring <tsebr...@gmail.com>wrote:
>>> I just noticed that clean solution wouldn't delete the files under >>> obj/ hence the archive attached was quite large. This is the same >>> repro as DataImport2.zip but it's 8kb instead of 25mb: >>> https://dl.dropbox.com/u/6420016/DataImport2-small.zip
>>> On Jul 29, 4:57 pm, Tobias Sebring <tsebr...@gmail.com> wrote: >>> > I made the threading optional in the original repro controlled in by a >>> > boolean at the top of main() to show off the real code before I made it >>> > sequential: >>> > var runInParallel = false;
>>> > Note. that the ConcurrentDictionary is only ever accessed sequentially >>> and >>> > I left it in there because it is one of the few things in the >>> non-ravendb >>> > targeted code that will allocate a big chunk of memory.
>>> > On Sunday, July 29, 2012 3:52:27 PM UTC+2, Oren Eini wrote:
>>> > > I can't follow the code, please create a repro without all the >>> threading >>> > > complexity there.
>>> > > On Sun, Jul 29, 2012 at 4:38 PM, Tobias Sebring <tsebr...@gmail.com >>> >wrote:
>>> > >>>> Build is from NuGet a few days ago: >>> > >>>> <package id="RavenDB.Client" version="1.2.2044-Unstable" /> >>> > >>>> <package id="RavenDB.Database" version="1.2.2044-Unstable" /> >>> > >>>> <package id="RavenDB.Embedded" version="1.2.2044-Unstable" />
>>> > >>>> Batch size is 1024 from recommendation I picked up here in the >>> group. >>> > >>>> I'm running multiple import jobs concurrently but also tried >>> limiting that >>> > >>>> with .WithDegreeOfParallelism(**1) and got the same result.
>>> > >>>> On Sunday, July 29, 2012 7:41:11 AM UTC+2, Oren Eini wrote:
>>> > >>>>> What lines did you comment? >>> > >>>>> What build are you using? >>> > >>>>> How many items are you using per SaveChanges call?
>>> > >>>>> On Sun, Jul 29, 2012 at 4:44 AM, Tobias Sebring < >>> tsebr...@gmail.com>wrote:
>>> > >>>>>> Got that fixed. Now I'm having trouble limiting the memory >>> footprint >>> > >>>>>> of RavenDb. The memory consumption will gradually rise to 98% >>> of physical >>> > >>>>>> ram at which point Windows 7 will start display warnings to >>> close the >>> > >>>>>> program down and other applications will crash randomly.
>>> > >>>>>> I've tried the following things to limit memory utilization in >>> > >>>>>> accordance with other threads in this group:
>>> > >>>>>> I've turned off indexing: >>> > >>>>>> using (var webClient = new WebClient()) >>> > >>>>>> { >>> > >>>>>> webClient.**UseDefaultCredential**s = true; >>> > >>>>>> var result = webClient.UploadString(new Uri(new Uri(" >>> http://localhost >>> > >>>>>> :8080"), "/admin/stopindexing"), "POST", ""); >>> > >>>>>> }
>>> > >>>>>> And disabled all caching: >>> > >>>>>> using (Store.DatabaseCommands.**Disabl**eAllCaching()) >>> > >>>>>> { >>> > >>>>>> ... batch store / savechanges >>> > >>>>>> }
>>> > >>>>>> Non of these seem to have any effect on memory usage of the >>> > >>>>>> application with RavenDb is running in embedded mode. >>> Commenting out the >>> > >>>>>> few lines of RavenDb code that handles batch imports results in >>> a maximum >>> > >>>>>> 125mb memory usage on system with 16GB physical ram.
>>> > >>>>>> On Saturday, July 28, 2012 8:01:06 AM UTC+2, Oren Eini wrote:
>>> > >>>>>>> Use a bloom filter instead
>>> > >>>>>>> On Saturday, July 28, 2012, Tobias Sebring wrote:
>>> > >>>>>>>> Okay. Got it. It's working now but I have to utilize a >>> > >>>>>>>> ConcurrentDictionary to make sure no duplicate keys are >>> attempted to be >>> > >>>>>>>> saved to RavenDb. This sadly means saving 2,000,000 strings >>> in memory >>> > >>>>>>>> throughout the entire import process which brings the server >>> to it's knees. >>> > >>>>>>>> Speed is about 200k documents per minute.
>>> > >>>>>>>> Thank you for your help on this issue!
>>> > >>>>>>>> On Saturday, July 28, 2012 12:02:45 AM UTC+2, Oren Eini >>> wrote:
>>> > >>>>>>>> I don't _know_ what the issue is, but an exception from the >>> session >>> > >>>>>>>> render its state undefined. >>> > >>>>>>>> Do this without try/catch. >>> > >>>>>>>> Then see how many queries you make to the service.
>>> > >>>>>>>> On Fri, Jul 27, 2012 at 11:40 PM, Tobias Sebring < >>> > >>>>>>>> tsebr...@gmail.com> wrote:
>>> > >>>>>>>> The session does not throw an exception. The try-catch is >>> there to >>> > >>>>>>>> catch when I try to insert a duplicate key because sometimes >>> the dataset >>> > >>>>>>>> I'm working with is not consistent. Are you saying the >>> try-catch could be >>> > >>>>>>>> the cause of the performance degradation? Only two or so >>> > >>>>>>>> NonUniqueObjectException are thrown in the entire 2M document >>> dataset.
>>> > >>>>>>>> On Friday, July 27, 2012 10:17:05 PM UTC+2, Oren Eini wrote:
>>> > >>>>>>>> If the session throws an exception, you may no longer use the >>> > >>>>>>>> session.
>>> > >>>>>>>> On Fri, Jul 27, 2012 at 11:09 PM, Tobias Sebring < >>> > >>>>>>>> tsebr...@gmail.com> wrote:
>>> > >>>>>>>> For 1) - I'm having trouble implementing this without a >>> tenfold >>> > >>>>>>>> degradation to performance. Any idea why? >>> > >>>>>>>> I'm using the client API to import batches of 1024 documents >>> with >>> > >>>>>>>> the following code: >>> > >>>>>>>> using (var session = Store.OpenSession()) >>> > >>>>>>>> { >>> > >>>>>>>> foreach (var i in data.Batch) >>> > >>>>>>>> { >>> > >>>>>>>> session.Store(i); >>> > >>>>>>>> } >>> > >>>>>>>> session.SaveChanges();