List<Employee> employees = new List<Employee>();
for (int i = 0; i < 1000 * 1000; i++)
{
employees.Add(new Employee
{
FirstName = "FirstName #" + i,
LastName = "LastName #" + i
});
}
swch.Start();
using (BulkInsertOperation bulkInsert = store.BulkInsert())
{
foreach(var emp in employees)
bulkInsert.Store(emp);
}
Debug.WriteLine(swch.Elapsed.ToString(@"m\:ss\.ff"));
--
You received this message because you are subscribed to the Google Groups "RavenDB - 2nd generation document database" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ravendb+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
List<Employee> employees = new List<Employee>();
for (int i = 0; i < 1000 * 1000; i++)
{
employees.Add(new Employee
{
FirstName = "FirstName #" + i,
LastName = "LastName #" + i
});
}
//Perform batching
var batches = employees.Batch(200);
List<Task> taskList = new List<Task>();
Task[] taskArray = new Task[200];
foreach (var batch in batches)
{
List<Employee> tempList = batch.ToList();
Task insertBatchTask = Task.Run(() =>
{
using (BulkInsertOperation bulkInsert = store.BulkInsert())
{
foreach (var emp in tempList)
bulkInsert.Store(emp);
}
});
taskList.Add(insertBatchTask);
}
swch.Start();
Task.WaitAll(taskList.ToArray());
Debug.WriteLine(swch.Elapsed.ToString(@"m\:ss\.ff"));
To unsubscribe from this group and stop receiving emails from it, send an email to ravendb+u...@googlegroups.com.
Hibernating Rhinos Ltd
Oren Eini l CEO l Mobile: + 972-52-548-6969
Office: +972-4-622-7811 l Fax: +972-153-4-622-7811
To unsubscribe from this group and stop receiving emails from it, send an email to ravendb+unsubscribe@googlegroups.com.
Hibernating Rhinos Ltd
Oren Eini l CEO l Mobile: + 972-52-548-6969
Office: +972-4-622-7811 l Fax: +972-153-4-622-7811
--
You received this message because you are subscribed to the Google Groups "RavenDB - 2nd generation document database" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ravendb+unsubscribe@googlegroups.com.
Can you send us a repro of this?
Hibernating Rhinos Ltd
Oren Eini l CEO l Mobile: + 972-52-548-6969
Office: +972-4-622-7811 l Fax: +972-153-4-622-7811
On Sat, Oct 28, 2017 at 1:58 PM, Justin A <jus...@adler.com.au> wrote:
Initially, my code did that and assploded.So i refactored it (to the code above) and i'm still getting an exception (not sure if it' the same one) and resharper it not showing 'access to modified closure' warning (with that code above).when i have it in a LINQ statement, then I get that.:(
--
You received this message because you are subscribed to the Google Groups "RavenDB - 2nd generation document database" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ravendb+u...@googlegroups.com.
To unsubscribe from this group and stop receiving emails from it, send an email to ravendb+unsubscribe@googlegroups.com.


Hibernating Rhinos Ltd
Oren Eini l CEO l Mobile: + 972-52-548-6969
Office: +972-4-622-7811 l Fax: +972-153-4-622-7811
--
The underlying issue is here:https://github.com/PureKrome/RavenDbBulkInsert/blob/master/src/RavenDbBulkInsert/Program.cs#L63
var tasks = batch.Select(employee => operation.StoreAsync(employee)).ToArray();
You aren't waiting for the StoreAsync to complete before starting the new one, which can cause issues.
Inserted 1,000,000 employee's in 36.09 seconds.
Hibernating Rhinos Ltd
Oren Eini l CEO l Mobile: + 972-52-548-6969
Office: +972-4-622-7811 l Fax: +972-153-4-622-7811
--
To unsubscribe from this group and stop receiving emails from it, send an email to rav...@googlegroups.com.
To unsubscribe from this group and stop receiving emails from it, send an email to ravendb+u...@googlegroups.com.
To post to this group, send email to rav...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ravendb/3a0fd49c-0499-4998-8914-a80df31a4b49%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
![]() | Oren Eini CEO / Hibernating Rhinos LTD
|
To unsubscribe from this group and stop receiving emails from it, send an email to rav...@googlegroups.com.
To post to this group, send email to rav...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ravendb/3a0fd49c-0499-4998-8914-a80df31a4b49%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
To unsubscribe from this group and stop receiving emails from it, send an email to rav...@googlegroups.com.
To post to this group, send email to rav...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ravendb/3a0fd49c-0499-4998-8914-a80df31a4b49%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
To unsubscribe from this group and stop receiving emails from it, send an email to ravendb+u...@googlegroups.com.
To post to this group, send email to rav...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ravendb/be1d3953-af95-47fb-853b-aa245728207d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
To unsubscribe from this group and stop receiving emails from it, send an email to ravendb+u...@googlegroups.com.
To post to this group, send email to rav...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ravendb/3028b212-7862-4f0f-99c9-2ed645167816%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
To view this discussion on the web visit https://groups.google.com/d/msgid/ravendb/3028b212-7862-4f0f-99c9-2ed645167816%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
To unsubscribe from this group and stop receiving emails from it, send an email to ravendb+u...@googlegroups.com.
To post to this group, send email to rav...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ravendb/cbfcba57-11a4-4e71-86f7-820a3bfd2b48%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
The benchmark we usually use is wrk, script found here:Try using multiple processes, not just threads, there are some per-process limits that might be hitting you.I'll try to setup a standalone benchmark you can use this week
--
You received this message because you are subscribed to the Google Groups "RavenDB - 2nd generation document database" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ravendb+u...@googlegroups.com.
To post to this group, send email to rav...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ravendb/2238fccd-9f81-4222-8325-bc75dd1be425%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
| Oren Eini CEO / Hibernating Rhinos LTD |
so maybe some explanation about what/how TempFiles is and how that played a part in the bulk insert stuff?
Here is a blog post (will be up tomorrow) with the full details.In short, I spend a few hours and managed to get to 100K writes/sec on AWS using ie3n.xlarge machine.The two issues are:* Latency of I/O ( vs. throughput)* The client being able to generate the data fast enough.
On Tue, Jun 11, 2019 at 4:25 PM Andrej Krivulčík <kriv...@gmail.com> wrote:
--I just tried running 4 processes with 4 threads each, on two servers each (so 8 processes in total, 32 threads in total). I get very similar results of roughly 100k writes/s as with single process with 32 threads.Having a standalone benchmark would be great, thanks.However, is there anything I can tweak to get as much performance as possible? Filesystem block size optimization etc? There are some system configuration recommendations here: (https://ravendb.net/docs/article-page/4.2/Csharp/start/installation/system-configuration-recommendations ) but these are for Linux only. Are there any documented best practices for deploying on Windows? https://ravendb.net/docs/article-page/4.2/Csharp/start/installation/deployment-considerations has some recommendations like not running from HDDs, but a comprehensive list of best practices would be very useful (like the issues that pop up in notifications - running with swap on HDD even though SSD is available etc.).
On Tuesday, June 11, 2019 at 2:41:02 PM UTC+2, Oren Eini wrote:The benchmark we usually use is wrk, script found here:Try using multiple processes, not just threads, there are some per-process limits that might be hitting you.I'll try to setup a standalone benchmark you can use this week
Mobile: +972-52-548-6969 Sales: sa...@ravendb.net
Skype: ayenderahien Support: sup...@ravendb.net
You received this message because you are subscribed to the Google Groups "RavenDB - 2nd generation document database" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rav...@googlegroups.com.
To post to this group, send email to rav...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ravendb/2238fccd-9f81-4222-8325-bc75dd1be425%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Here is a blog post (will be up tomorrow) with the full details.In short, I spend a few hours and managed to get to 100K writes/sec on AWS using ie3n.xlarge machine.The two issues are:* Latency of I/O ( vs. throughput)* The client being able to generate the data fast enough.
On Tue, Jun 11, 2019 at 4:25 PM Andrej Krivulčík <kriv...@gmail.com> wrote:
--I just tried running 4 processes with 4 threads each, on two servers each (so 8 processes in total, 32 threads in total). I get very similar results of roughly 100k writes/s as with single process with 32 threads.Having a standalone benchmark would be great, thanks.However, is there anything I can tweak to get as much performance as possible? Filesystem block size optimization etc? There are some system configuration recommendations here: (https://ravendb.net/docs/article-page/4.2/Csharp/start/installation/system-configuration-recommendations ) but these are for Linux only. Are there any documented best practices for deploying on Windows? https://ravendb.net/docs/article-page/4.2/Csharp/start/installation/deployment-considerations has some recommendations like not running from HDDs, but a comprehensive list of best practices would be very useful (like the issues that pop up in notifications - running with swap on HDD even though SSD is available etc.).
On Tuesday, June 11, 2019 at 2:41:02 PM UTC+2, Oren Eini wrote:The benchmark we usually use is wrk, script found here:Try using multiple processes, not just threads, there are some per-process limits that might be hitting you.I'll try to setup a standalone benchmark you can use this week
Mobile: +972-52-548-6969 Sales: sa...@ravendb.net
Skype: ayenderahien Support: sup...@ravendb.net
You received this message because you are subscribed to the Google Groups "RavenDB - 2nd generation document database" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rav...@googlegroups.com.
To post to this group, send email to rav...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ravendb/2238fccd-9f81-4222-8325-bc75dd1be425%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
DocumentStore.Conventions.BulkInsert.TrySerializeEntityToJsonStream
Oren, thanks for this blog post. Very informative and useful!
I went ahead and tried to break this 100k writes/s barrier I've been hitting in my benchmarks.
Good news is that I could break it and hit 200k writes/s. However, the journey was not very straightforward.
The best result so far was ~100k writes/s on Azure servers - standalone dedicated DB server with NVMe storage (200k write iops) and separate compute server which fed data to this DB server. When I added a second server, the write speed stayed the same and the clients were not fully utilized, so the client throughput was not the bottleneck anymore.
Okay, what could be the bottleneck here?
You mentioned in the post that on physical hardware, the limit is higher. Let's try this on bare metal.
My laptop has NVMe SSD with around 160k iops random write capacity. We connected by coworker's PC (6 cores/12 threads CPU) over a gigabit network (100 Mbps was a limiting factor here), and ran the benchmark again. 100k writes/s again.
Okay, what could be the bottleneck here?
Screw persistent storage and let's go to RAM. I set up a ramdisk which I used as database storage on the Azure VM and ran the benchmarks again. (I needed to format a file on the ramdisk as RavenDB refused to create journal files directly on the ramdisk, if anyone would like to try this.) 100k writes/s again.
Okay, what could be the bottleneck here?
Taking a look at IO Stats chart, we noticed an intriguing pattern: The journal writes were extremely fast and short, and there were gaps between all of them, both when the storage was NVMe and RAM - it was more pronounced when storing to ramdisk. Seems like the disk storage is no longer the bottleneck. In this particular case, the database stores the data in a single thread, where the iops capacity is much lower. 30k iops on my laptop, 25k iops on the Azure VM.
I ran the benchmark on two databases in parallel. One server pushed the data to one database, the other server to another database. The performance was a little bit better - 100k writes/s and a change. Not pinned to 100k anymore but not much better. Also, the servers generating the data started to slack off.
Okay, what could be the bottleneck here?
Turns out that the network between the Azure VMs was saturated now, at around 40 MB/s (400 Mbps). Back to bare metal and direct gigabit connection!
I set up two databases on my laptop and the other PC started to shove the data in parallel. Much to our delight, the write rate was around 200k writes/s! The storage capacity was finally being used. When we tried to add a third database, the speed didn't really increase and the laptop started throttling (thermal design is not the best on this one) so we didn't really care to continue.
Anyway, in this particular scenario - bulk inserting to one database - the 100k writes/s limit probably won't get any better. However, it's pretty good speed, especially as I started at around 15k with my attempts :-). In other scenarios - multiple imports, ordinary operation etc. - the write capacity is higher, which is good to know.
Again, thanks for the story.
Here is a blog post (will be up tomorrow) with the full details.In short, I spend a few hours and managed to get to 100K writes/sec on AWS using ie3n.xlarge machine.The two issues are:* Latency of I/O ( vs. throughput)* The client being able to generate the data fast enough.
On Tue, Jun 11, 2019 at 4:25 PM Andrej Krivulčík <kriv...@gmail.com> wrote:
--I just tried running 4 processes with 4 threads each, on two servers each (so 8 processes in total, 32 threads in total). I get very similar results of roughly 100k writes/s as with single process with 32 threads.Having a standalone benchmark would be great, thanks.However, is there anything I can tweak to get as much performance as possible? Filesystem block size optimization etc? There are some system configuration recommendations here: (https://ravendb.net/docs/article-page/4.2/Csharp/start/installation/system-configuration-recommendations ) but these are for Linux only. Are there any documented best practices for deploying on Windows? https://ravendb.net/docs/article-page/4.2/Csharp/start/installation/deployment-considerations has some recommendations like not running from HDDs, but a comprehensive list of best practices would be very useful (like the issues that pop up in notifications - running with swap on HDD even though SSD is available etc.).
On Tuesday, June 11, 2019 at 2:41:02 PM UTC+2, Oren Eini wrote:The benchmark we usually use is wrk, script found here:Try using multiple processes, not just threads, there are some per-process limits that might be hitting you.I'll try to setup a standalone benchmark you can use this week
Mobile: +972-52-548-6969 Sales: sa...@ravendb.net
Skype: ayenderahien Support: sup...@ravendb.net
You received this message because you are subscribed to the Google Groups "RavenDB - 2nd generation document database" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rav...@googlegroups.com.
To post to this group, send email to rav...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ravendb/2238fccd-9f81-4222-8325-bc75dd1be425%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Hi Oren - loved the blog post!!! There's heaps of gems in there that really really are awesome. Also loved the journey through the post -> that's just as valuable as the summary, imo. maybe even _more_ valuable, actually.Just a few suggestions for some potential edits, to the post:SIDE NOTE: I've just re-read my post below and there's a lot of stuff I'm saying, so just work with me through my journey :) It's mainly about clarification of disk setups and configurations.> Here you can see a single write, for 124KB, that took 200ms.Ok .. so if that's slow, what is an _expected_ target write speed? 10ms? 50ms? 2ms?
> setup the journals’ directory for this database to point to the new driveIs the suggestion here to split this 'temp' data to a separate drive with respect to the 'users-data' data to avoid disk-io-thrashing/fighting? Similar in concept to how in other old school DB's like MSSql you can move TempDB to a different filesystem to avoid fighting.
Also, if you're going to move the Journal's data to a different disk, what size would you be thinking? In the examples you added a new 8GB disk for Journals .. why 8GB?
a) no real reason - journals don't use much size and it was $$-cheap but had some ok IOPSb) you knew the final size of the data and it was a math formula (10% of final db size or the size of writes-per-second .. or something)
Why did you only go for 400 IOPS for the Journal disk? You know (through experience) that faster IOPS for Journal data (e.g. size of journal data + the amount of writes) start to add less value with respect to the $$-cost?
> switch instances again .. to i3en.xlarge instance ... mostly interested in the NVMe driveWait what? an NVMe drive? but .. the previous pictures sorta suggested this was already happening?? (NOTE: I come from a Windows + Azure background, so with the heaps of AWS info in this blog post, I've quickly made fast-assumptions to what things are/mean with respect to AWS terminology. I do know what an NVMe HD is though...).
Here is a screenie from what I was seeing ...
So i kept thinking ... these disks were all just folders on an NVMe disk.Lets go back and see what I misunderstood ...> The machine has an 8GB drive that I’m using to host RavenDBOk - my guess this will host the OS and RavenDb and doesn't get used in the read/writes. (EDIT: more on this assumption, later)> separate volume for the data itself. I create a 512GB gp2 volume (with 1536 IOPS) to start withOhhh... I read that as this was a disk that was created with the VM .. not some attached network disk. Also .. now I looked up what a gp2 is ... which == a 'typical' SSD.so this means ...OS + RavenDb + Journal -> 8GB .. SSD?Data -> 512 GB "normal" SSDand laterOS + RavenDb -> 8GB .. SSD?Data -> 512 GB "normal" SSDJournal -> 8GB "fast" SSD.Phew. ok.
continuing ..> This time to i3en.xlarge instance (4 cores, 30GB, 2 TB NVMe drive)So now it's like this...OS + RavenDb + Data + Journal -> 2TB NVMe
and then ..> On the same i3en.xlarge system, I attached the two volumes (512GB gp2 and 8GB io2) with the same setup (journals on the io2 volume)Which is then...OS + RavenDb -> 2TB NVMeData -> 512GB gp2 "normal" SSDJournal -> 8GB io2 "fast" SSDwhich has _really_ similar results for the last machine (99k and 93K) ??? so how can the results be so much higher than the previous machine when- data and journals are on the same 'attached' network disks [gp2 and io2]- 4 cores, both machines? [i3en.xlarge and t3a.xlarge]
How is the NVMe being used in the last image when data + journal's are on different disks? Is RavenDb actually using the NVMe for something else so the idea of the Journal being tempData is not 100% accurate? There's other tempdata that writes to somewhere, also?
SIDE NOTE: I just jumped into the "Playground" and noticed this...
so maybe some explanation about what/how TempFiles is and how that played a part in the bulk insert stuff?
---So yeah ... A few things I sorta got confused with so hopefully some clarifications could be added to the post?
I'm pretty damn certain I'll be hitting that blog post _heaps_ of times in the future to help test some hardware setups we plan to do with future projects. Sure _we_ don't need 100K writes/sec but it's nice to see what we can do/get with smaller setups and then have _some expectations_.Sorry for sorta rambling .. but I thought it was important and I found the post very exciting and helpful.regards,- little ol' me -
--
You received this message because you are subscribed to the Google Groups "RavenDB - 2nd generation document database" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ravendb+u...@googlegroups.com.
To post to this group, send email to rav...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ravendb/b011becc-bacc-4efc-bcad-db5ea3dcea02%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.