RavenDb Performance Comparison

1,358 views
Skip to first unread message

Domo

unread,
Mar 28, 2012, 7:08:37 AM3/28/12
to ravendb
Hi,

I was interested in basic performance of RavenDB so I wrote a very
simple (and probably naive) test app to compare it with SQL Server.
Basically, I need a transactional data store to save, lookup and
delete simple objects only by there respective primary key. So for
both data stores I create an entry, make a lookup by primary key and
afterwards delete it. Each function is executed with a separate
SqlConnection/DocumentSession. Here is the code I'm using for the
test:
https://gist.github.com/2224467

Unfortunately on my dev computer (Windows 7, Core i7) RavenDB is
magnitudes behind SQL Server. I'm using RavenDB 701 in comparison with
SQL Server 2008, both with the default settings. SQL Server runs with
the default install settings without any performance tweaks
whatsoever. My tests show SQL Server manages about 7-10 times more
operations per second. Interesting for me, the most limiting factor
for RavenDB performance is not disc IO but rather the high CPU load.

I did not expect it to be as fast as SQL Server, after all the MS devs
have been working on performance improvements for many years. But I'm
surprised that the difference is that big.

I'm sure I'm missing something here with RavenDB or I'm doing
something wrong. Is this performance difference expected?

BR,
Domo

Itamar Syn-Hershko

unread,
Mar 28, 2012, 1:05:02 PM3/28/12
to rav...@googlegroups.com
A few things to note:

1. You are comparing apples and oranges - simple INSERT INTO statements vs the entire serialization and UoW support of the transactional DocumentSession class. Instead, you should have compared it with DatabaseCommands.Put(...) and Get(..). Better yet, with plain HTTP REST POST commands.

2. The high CPU load you see on the RavenDB instance may be due to background indexing (although I can't tell if any indexes were created).

3. SQL Server is optimized for writes. RavenDB is optimized for _reads_. That is, most queries will be answered way faster by RavenDB, as complex as they may be.

4. Adding to the previous point, as you said your test is very naive, but you can get loads of extra performance gains by modeling correctly for a document database. That is, the more complex your model is, RavenDB will perform better compared to any RDBMS

5. RavenDB supports the notion of batch operations. When doing several Store operations in one go stuff is getting faster, and that is automatically done for you by the API - call Store several times in one session before SaveChanges.

6. From our perspective, not everything is about performance. Time-To-Deliver is much more important to have for many shops. That being said, as I said: once the model starts being a non-naive one, RavenDB can be many times faster.

7. If you are not going to perform queries, all you need is simple key/value store, and you should probably use something like Redis. Use RavenDB if you need to support more advanced queries and a good .NET support (UoW etc).

nightwatch

unread,
Mar 28, 2012, 5:07:52 PM3/28/12
to rav...@googlegroups.com
Your cognitive dissonance is justified - like in washpowder commercials you've been told that NoSQL is 30% faster and 90% more scalable.
Now it's time to ask 'faster than what?'
First you'd have to select suitably slow SQL database, and small SQL databases aren't suitably slow.
Its' not that easy to create an example that will be slow in SQL but fast in NoSQL - probably something that will require lots of processing to come up with the result and can be pre-calculated in Raven index, or a complex document model that requires multiple tables/records/SQL queries for updating a single document. Such examples are common in real applications as they achieve a certain level of complexity. Another example is text data search - SQL server will never match Raven's full text search capabilities.

Domo

unread,
Mar 29, 2012, 4:16:00 AM3/29/12
to ravendb
Hi,

Thanks for your responses.
Regarding the advice using the database commands exposed by the
RavenDB API. The documentation mentions that these commands are not
executed as part of the transaction, which is unfortunate as I have to
update multiple entities as single atomic operation.

Maybe a little context here, and why I choose this example. I get a
constant stream of messages coming in, resulting in mostly updates and
inserts. This data is not read very often. So I pretty much expect to
be at least 50% more writes than reads. From this point of view, the
example matches the real world application I'm targeting. Think
something like ship vessels in the English Channel. All of them are
automatically sending all kinds of different messages resulting in
many operations but mostly writes.

The data model is actually very simple. Therefore the queries I have
are not complex. It's pretty much like in the example, there are
lookups on the primary key. I really do like the RavenDB API and
things like automatic index creation or the possibility to have a
lucene full text index. Unfortunately I will most likely not be able
to utilize these features here. I've worked with lucene.dotnet before
on other occasions and I know the queries are lightning fast. It is a
real shame I can't take advantage of that for my application.

Regarding the argument that with bigger database results will be much
different. Yes that's true, but actually I don't expect the database
to become very big at all. Think of the ship vessel example I
mentioned. While there a a lot of messages resulting in a lot of
operations there are not that many ships around. For example I know
that in the whole north sea there are "only a few thousands" ships on
sea (as well as in the harbor) sending messages. Only the last message
of certain kind is of interest to me, so there will not be that be
much objects/records around. I expect the total size of data in the
range of megabytes and not GB or more.

I know this is pretty non typical, and I'm evaluating very different
products working best for my scenario, RavenDB being one of them. I've
looked at Redis mentioned before. However, Redis does not support
master/master replication, which is important for me in case of errors
and I need to failover to the backup node. Another interesting one I
looked at is Couchbase, which is incredibly fast and supports master/
master replication (like RavenDB) but does not have transactions.

If RavenDB is not for me, maybe someone has an advice on other
solutions?

BR,
Domo

nightwatch

unread,
Mar 29, 2012, 6:47:27 AM3/29/12
to rav...@googlegroups.com
The list of possible tools for your case depends really on the numbers: how many records per second you want to insert, what will be the database size, what queries must be supported and how many of them per second etc.
You told that you'll be doing updates and a primary key lookup only - in this case a simple key-value store might be for you (but using a raw key-value store is difficult because of very basic functionality available out of the box).
Your numbers dont suggest any extreme needs - IMHO a standard SQL database (mysql | sqlite | sql server?) might be enough if you don't mind putting your  data in a relational model. 
Document databases are really nice to work with if you have complex document structures that are hard to map into RDBMS, but I don't think your documents are very complex. Generally speaking you have a very broad choice of tools and almost all will work ok so I'd recommend using something stable, proven, easy to use and known to you.

R

Oren Eini (Ayende Rahien)

unread,
Mar 29, 2012, 7:33:07 AM3/29/12
to rav...@googlegroups.com
Domo,
You can save do multiple operations using the low level commands, see the DatabaseCommand.Batch()
This run in a single transaction.

Whenever evaluating a technology, the most important aspect to consider is whatever it fits your needs.
You have create a test suite for RavenDB that appears to match what you need, and you see diferent results between RavenDB and SQL.

The question here is whatever based on your needs, the RavenDB numbers are sufficent?
Considering the low number of total ships, I think that this might be more than viable. And then you can take advantage of the other features of RavenDB that you mentioned, like the nice API and easy replication.

Domo

unread,
Mar 29, 2012, 7:59:59 AM3/29/12
to ravendb
Well, unfortunately the range of possible solutions is limited. What I
need is something being able to handle about about 500 - 1000 writes
per second with small amounts of data. This alone is not the big
problem. Many products offer of this kind of performance with
reasonable hardware, but additionally I need synchronous replication
to have the data available on a second node if there is a hardware
error. The switchover needs to be done in a matter of seconds and data
loss is not an option. That's the reason I need transactions, besides
being able to do multiple updates as part of a single operation.

In a relational model SQL Server with synchronous mirroring would fit
these requirements pretty well. There are a other non RDBMS products
like key/value stores or distributed caches (ex. AppFabric,
memcached). Unfortunately most of them fall either short on the
synchronous master/master replication or transaction part.

RavenDB has the replication and transaction capabilities and a very
nice API. However the write performance what I'm currently measuring
is not what I expected. This puzzles me little bit, because usually
the capabilities of the disc IO system are the limiting factor. This
is not the case here. The windows performance monitor clearly shows me
the CPU is going crazy and not enough. This is probably the trade off
by RavenDB; invest more CPU cycles during put operations to be very
fast later on once clients start looking for data.

Anyway, thanks very much for your inputs. I'm sure I'll find something
which suits my needs.

BR,
Domo

Oren Eini (Ayende Rahien)

unread,
Mar 29, 2012, 8:05:54 AM3/29/12
to rav...@googlegroups.com
Domo,
We routinely see thousands of writes per second on commodity hardware, so your numbers seems strange.
What storage engine are you using?
Do you have any indexes running during the write operations?

Oren Eini (Ayende Rahien)

unread,
Mar 29, 2012, 8:06:18 AM3/29/12
to rav...@googlegroups.com
Also, note that in RavenDB, the replication is NOT synchronous. 
It happens immediately, but it happens in a background thread.

Matt Warren

unread,
Mar 29, 2012, 9:04:16 AM3/29/12
to rav...@googlegroups.com
In this case (see  https://gist.github.com/2224467 ), the code being tested is:

private static void CreateReadDeleteInRavenDb()
      {
         var company = new Company();

         using (IDocumentSession session = _documentStore.OpenSession())
         {
            session.Store(company);
            session.SaveChanges();
         }

         using (IDocumentSession session = _documentStore.OpenSession())
         {
            var searchDoc = session.Load<Company>(company.Id);
         }

         using (IDocumentSession session = _documentStore.OpenSession())
         {
            session.Advanced.DatabaseCommands.Delete(company.Id, null);
         }
      }
So it's not doing any batching with the writes and it's open/closing the session around every command. So I guess there's a lot of overhead involved.

It's mimicking the SQL code exactly, which is what makes the Raven benchmarks so slow. 

Chris Marisic

unread,
Mar 29, 2012, 9:06:13 AM3/29/12
to rav...@googlegroups.com


On Thursday, March 29, 2012 7:59:59 AM UTC-4, Domo wrote:
What I need is something being able to handle about about 500 - 1000 writes
per second with small amounts of data.


This is a standard transaction log, this is one of the most dead set examples of SQL Server sweet spot.

If I was going to create a credit card processing company for my inbound transactions I wouldn't use Raven, not because I have any qualms with its performance or reliability. I would use SQL Server because this is where you can reach near perfection of SQL Server. A table that has a very limited number of columns, that is insert heavy. This table could likely even exist without 1 single index added to it, other than PK.

If your data model can exist in a flat table with a very fixed number of columns and not even need any indexes other than PK, there's very little you could ever get faster than a RDBMS except something like REDIS which is (primarily) a in memory based key value store.

Domo

unread,
Mar 29, 2012, 11:46:46 AM3/29/12
to ravendb
Hi,

> We routinely see thousands of writes per second on commodity hardware, so
> your numbers seems strange.
> What storage engine are you using?
> Do you have any indexes running during the write operations?

I'm using RavenDB out of the box without any changes, so I guess this
means Esent. The only code running during the tests is the one from
the link
https://gist.github.com/2224467

As far as I can tell, no index is created during write operations. At
least the web UI does not show anything. The code is running on my
test computer, which is a Core i7 and a SSD drive. Running the test
program with this setup I get about 50 operations per second. I know
that on "real" server hardware this will of course be better. I'm just
asking why SQL Server is so much faster. In the given example it is
about 400 ops per second with SQL Server on the same HW. An operation
being either the CreateReadDeleteInRavenDb or
CreateReadDeleteInSqlServer method from the example. I guess MS has
optimized its writes it a lot, so that it works nearly perfect for
this scenario.

> Also, note that in RavenDB, the replication is NOT synchronous. It happens immediately, but it happens in a background thread.
This is bad news. For me this means that under special circumstances I
may be able to loose the data which has not yet been successfully
replicated to another computer even though I received a HTTP OK on my
end. I know the data is stored on the local disk, but is not available
until the problem on the HW is solved and brought back up again. This
is most likely to late for my scenario. For my target use case I won't
actually need to have the data stored on the disc. As long as I can be
sure it is in the memory of at least two different machines I'm fine.
But than again for what I'm seeing the limiting factor for RavenDB is
not the actual IO operation on the disc.

BR,
Domo

Oren Eini (Ayende Rahien)

unread,
Mar 29, 2012, 12:10:23 PM3/29/12
to rav...@googlegroups.com
I just took a look at your test suite, and the major problem is that you are literally cutting things too quickly.

Here is my test scenario, I took your code, commented the SQL part and then did the following.

* Start a debug RavenDB Server with the default configuration (disk based, just started from scratch, no indexes, no nothing there).
* Run you code, I got the following result:

Performed 24 operations in 1,003ms

That seems to be pretty bad, right?

Then I run it again:

Performed 115 operations in 1,001ms

Then I realized that you test scenario was there for a single second only, and I decided to see what happens if we expand the test...
I run the test for 30 seconds, and the results were more like what I expected.

Performed 6,278 operations in 30,000ms

What happen if I try it for a minute?

Performed 13,883 operations in 60,003ms

But wait, there is something important here that we are missing.
We are running RavenDB in _debug mode_.

Inline image 1


As you can see, this pushes a LOT of data to the console, and that is REALLY expensive. 

I started it as a service instead, with no debug logs, and tried it again, for 60 seconds.

Performed 16,870 operations in 60,003ms

Note that you numbers are actually wrong, you are doing 3 separate operations here for every one that you count, this gives us over 50,000 operations in one minute.

But let us take this further, you don't care about reliability, and you care about perf, turn Lazy commits on:
<add key="Raven/TransactionMode" value="Lazy"/>

Performed 22,927 operations in 60,002ms

Again, you need to triple that number for the real number, but you get the point.

Finally, you want to do this purely in memory?

<add key="Raven/StorageEngine" value="Munin"/>
<add key="Raven/RunInMemory" value="True"/>

Performed 24,126 operations in 60,001ms

As you can see, it is faster, but not that much faster, Lazy commits are really making a difference.

Note that by this time, we are processing close to 75,000 operations per minute, or a rate of over 1,200 per second.

This is on commodity hardware, without really trying that hard.
image.png

Oren Eini (Ayende Rahien)

unread,
Mar 29, 2012, 12:14:06 PM3/29/12
to rav...@googlegroups.com
Based on your scenario, you want 2 node Memcached with writing & reading to both of them in parallel.

On Thu, Mar 29, 2012 at 5:46 PM, Domo <ste...@domnanovits.at> wrote:

Domo

unread,
Mar 30, 2012, 4:37:06 AM3/30/12
to ravendb
Actually, I already did run RavenDb as a service and not in the
console. Is used install command switch mentionend in the readme. I
also started a test run for 1 minute as you suggested but I can not
see the performance increase you are mentioning. The program output
shows me:

Performed 4399 operations in 60002ms

Which gives me about 70 ops per second which is a little faster than
the 1 second run before. However, I am nowhere near the numbers and
performance increase you gained by running it for 1 minute. I have
taken a screenshot showing the disk and cpu utilization in the
performance monitor. CPU is constantly high with little drops when
waiting for the disc.
http://i41.tinypic.com/b99fl.png

Like with the total numbers, this picture does not change if I run it
for a few seconds or a whole minute.

> Note that you numbers are actually wrong, you are doing 3 separate
> operations here for every one that you count, this gives us over 50,000
> operations in one minute.

If I understand you correctly, you mean the numbers are wrong when
compared to other tools or examples which are counting a single
document session with a single store() as one operation. That is true.
However, this is not about absolute numbers. I'm interested in
comparing it with other products, namely SQL Server in this case. And
in the example, SQL server has to do the very same amount of work to
be counted as what is one operation in the example. So I consider the
numbers still valid. I want to know the performance in relation to
other solutions. Of course, I agree you can't compare theses numbers
with anything other out there than this specific test itself.

From point of view RavenDB is very fast. Especially for reading and
queries. In my opinion this is where RavenDB really shines and shows
its strength; especially together with the linq API, scaling and so
forth. I've seen other data stores claiming to be high performance and
doing actually nothing and counting it to reach their advertised
benchmark.

> Based on your scenario, you want 2 node Memcached with writing & reading to
> both of them in parallel.
Yes, I thought about that. Indeed, Memcached does have atomic
operations on a single document but does not support transactions to a
have atomic writes to multiple ones.

Anyway, thanks very much for your inputs.

BR,
Domo

On Mar 29, 6:10 pm, "Oren Eini (Ayende Rahien)" <aye...@ayende.com>
wrote:
>  image.png
> 89KViewDownload
Reply all
Reply to author
Forward
0 new messages