RavenDB 4 benchmarks?

54 views
Skip to first unread message

Rodrigo Zechin

unread,
May 15, 2018, 1:23:52 PM5/15/18
to RavenDB - 2nd generation document database
Hi guys, do you have any official RavenDB 4 benchmarks? Or have plans to publish any?
Thanks

Oren Eini (Ayende Rahien)

unread,
May 15, 2018, 3:54:06 PM5/15/18
to ravendb

Hibernating Rhinos Ltd  

Oren Eini l CEO Mobile: + 972-52-548-6969

Office: +972-4-622-7811 l Fax: +972-153-4-622-7811

 


On Tue, May 15, 2018 at 8:23 PM, Rodrigo Zechin <rodri...@gmail.com> wrote:
Hi guys, do you have any official RavenDB 4 benchmarks? Or have plans to publish any?
Thanks

--
You received this message because you are subscribed to the Google Groups "RavenDB - 2nd generation document database" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ravendb+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Federico Lois

unread,
May 15, 2018, 9:16:56 PM5/15/18
to rav...@googlegroups.com
Hi Rodrigo,

The big question here is, what kind of benchmarks are you looking for? We can create as many synthetics as there are ideas out there, but the most informative benchmark you can have is your own environment or as close as your dataset you can get if you still don't have a running system. Most benchmarks out there are measuring stuff that is never found in practice. We do use certain scenarios to guide our decisions and therefore has such benchmarks internally, but they are of little use to understand how a live environment would behave for your use case. 

Just to showcase some. 

One of our guys tried the Raspberry Pi with its own internal storage and could get 1K write transactions per minute. For reference 600 write transactions per second it is 50M write transactions per day; you know these numbers add up fast. 
If we attach a second Raspberry Pi acting as a controller of an external disk, one for RDB and the other for the disk we can get 6K per minute… 
For context, if you take the long tail of the internet probably 70% of websites could actually be served by a 100USD setup composed of Raspberry Pi’s, I am not saying it is a good idea, I am saying that you can.

You know the stackoverflow dataset? Stackoverflow dumps every once in a while all their content and put it for download via torrent. We use it for benchmarking internally because it is as real world as it gets. The dump we use is 59 GB of data. We are able to push all that, with full-text search, indexing and also some map-reduce jobs to count stuff in under 35 minutes in a x.large AWS instance (8 cores, 16HT). That’s 59 Gb of data in a single import. But here is why I tell you benchmarks paint a picture that sometimes is not indicative of actual performance in your use case. We are not pushing those 59Gb of data through the network, we once tried that, and we saturated the whole network link and we couldn’t pass 20% to 25% in CPU usage… so for the actual test, I have to keep aside a core to do the actual pushing of the data from localhost. 

For certain document sizes (small but not that small) we could achieve 35K write transactions per second on commodity hardware. On my gaming computer at home, a Core i7 7700 with 32Gb of memory I can get 35K transactions per second doing both the pushing from hundreds of clients threads and running the server at the same time in the same machine. You remember? From reference 600 per seconds is 50M per day, the math is pretty simple to do.  I use this to see how we behave under contention scenarios, we actually uncovered a repro for a CoreCLR runtime error doing so https://github.com/dotnet/coreclr/issues/13388 

You won't get numbers more official as the ones that I am telling you about here because those are the numbers we use to guide our optimization efforts, but they are certainly NOT indicative of your use case, your dataset, your hardware, etc. As they say: "There are lies, damned lies and statistics". Synthetic benchmarks while good statistics to guide some hypothesis are damned lies for those seeking to understand runtime behavior. ;) 

The best benchmark is your own reality. Some users are comparing RDB 4.0 performance against they own system implemented in RDB 3.x like Kamran Ayub tell on DotNetRocks at http://ow.ly/usgH30jK57T

Hope that helps.

Federico

Rodrigo Zechin

unread,
May 22, 2018, 11:10:00 PM5/22/18
to RavenDB - 2nd generation document database
Thanks, Oren and Federico!

I couldn't agree more with your statements, Federico. I don't personally trust benchmarks that are not based on a real need or use-case.

However, I also know there's plenty of people that will just take whatever official benchmarks that are posted on the database websites and make their decisions based on that.

My ask was for this group of people ;) to have a link I could quickly send to this target audience.

That is helpful already. Thanks

Reply all
Reply to author
Forward
0 new messages