Slow performance when comparing to SQL Server

852 views
Skip to first unread message

Thomas Schmidt

unread,
Mar 27, 2012, 4:29:00 AM3/27/12
to rav...@googlegroups.com
Hi,

i am evaluating performance of ravendb compared to SQL Server (somewhat apples to oranges i know), and i am experiencing much worse performance with raven db than with SQL Server, so i would like if i could get some pointers if i am using ravendb the wrong way or this is just the way it is. Basically i am dumping 10.000 very simple documents into the ravendb database and sql server, and then querying on them afterwards, setup is below. Running ravendb 1.0.701 from nuget and have disabled all logging. I am truncating table in sql server and doing a delete by index in ravendb before running the tests, and only running the tests when there are no stale indexes.

As i can see from the tests sql server takes only 398ms to do 10.000 queries while raven db takes between  15281ms and  8981 ms, that means that sql server is between 20 and 40 times faster than raven db for queries. I have tried with larger objects with around 30+ fields and it slows down ravendb even more.

What am i doing wrong here? Is this to be expected?

store init:
IDocumentStore store = new DocumentStore {Url = "http://localhost:8080"};
store.Initialize();
store.Conventions.IdentityPartsSeparator = "-";
IndexCreation.CreateIndexes(typeof(CategoryIndex).Assembly, store);


"domain" model:
public class Category
{
public string Id {get; set;}
public string Description { get; set; }
}

index:
public class CategoryIndex : AbstractIndexCreationTask<Category>
{
public CategoryIndex()
{
Map = categories => categories.Select(cat => new {cat.Id, cat.Description});
Index(cat => cat.Id, FieldIndexing.NotAnalyzed);
Index(cat => cat.Description, FieldIndexing.Analyzed);
}
}

data load ravendb - takes 46730ms:
Parallel.ForEach(Enumerable.​Range(0, 10000), i =>
{
using (IDocumentSession session = store.OpenSession())
{
session.Store(new Category {Description = "unit-test-description-" + i, Id = "categories-" + i}); //specifies id myself to avoid HILO roundtrip
session.SaveChanges();
}
});

data load sql server - takes 2619ms:
Parallel.ForEach(Enumerable.Range(0, 10000), i =>
{
using (SqlConnection connection = new SqlConnection(@"Persist Security Info=False;User ID=test;pwd=test;database=Test;server=localhost;"))
{
connection.Open();
using (SqlCommand cmd = new SqlCommand("INSERT INTO Category (Id,Description) VALUES (@Id, @Description)", connection))
{
cmd.Parameters.Add(new SqlParameter("@Id", "categories-" + i));
cmd.Parameters.Add(new SqlParameter("@Description", "unit-test-description" + i));
cmd.ExecuteNonQuery();
connection.Close();
}
}
});

test 1 ravendb - takes 15281ms:
Parallel.ForEach(Enumerable.Range(0, 10000), i =>
{
using (IDocumentSession session = store.OpenSession())
{
string id = "categories-" + i;
Category result = session.Query<Category, CategoryIndex>().FirstOrDefault(cat => cat.Id == id);
}
});

test 2 ravendb - takes 8981ms:
Parallel.ForEach(Enumerable.Range(0, 10000), i =>
{
using (IDocumentSession session = store.OpenSession())
{
string id = "categories-" + i;
Category result = session.Load<Category>(id);
}
});

test 3 sql server - takes 398ms
Parallel.ForEach(Enumerable.Range(0, 10000), i =>
{
using (SqlConnection connection = new SqlConnection(@"Persist Security Info=False;User ID=test;pwd=test;database=Test;server=localhost;"))
{
connection.Open();
using (SqlCommand cmd = new SqlCommand("SELECT Id, Description FROM dbo.Category WHERE Id = @Id", connection))
{
cmd.Parameters.Add(new SqlParameter("@Id", "categories-" + i));
SqlDataReader reader = cmd.ExecuteReader();
if (reader.Read())
{
Category cat = new Category
{
Id = reader.GetString(0),
Description = reader.GetString(1)
};
}
connection.Close();
}
}
});

nightwatch

unread,
Mar 27, 2012, 5:21:52 AM3/27/12
to rav...@googlegroups.com
SQL server is hard to beat when the data set is so small and queries qre simple, so these results aren't very surprising. All the data fits in memory, there's connection pooling and quite fast communication protocol - sql will easily do 10-20K operations per second in such conditions.

Ryan Heath

unread,
Mar 27, 2012, 5:53:15 AM3/27/12
to rav...@googlegroups.com
Perhaps you should read this thread:
http://groups.google.com/group/ravendb/browse_thread/thread/2f2734fda5f73d33/8399852631867405

There are quite a few options to speed up Raven.

// Ryan

Oren Eini (Ayende Rahien)

unread,
Mar 27, 2012, 6:05:47 AM3/27/12
to rav...@googlegroups.com
Thomas,
A few comments:


data load ravendb - takes 46730ms:
Parallel.ForEach(Enumerable.?Range(0, 10000), i =>
{
using (IDocumentSession session = store.OpenSession())
{
session.Store(new Category {Description = "unit-test-description-" + i, Id = "categories-" + i}); //specifies id myself to avoid HILO roundtrip
session.SaveChanges();
}
});

Note that this is a bad form of doing this, since it requires us to do a lot of separate calls vs. doing bulk request.

This is also not recommended:

test 1 ravendb - takes 15281ms:
Parallel.ForEach(Enumerable.Range(0, 10000), i =>
{
using (IDocumentSession session = store.OpenSession())
{
string id = "categories-" + i;
Category result = session.Query<Category, CategoryIndex>().FirstOrDefault(cat => cat.Id == id);
}
});


Since this forces us to force do an index query, then a load. If you have the ID, you need to use Load.

All of that said, I think that the major factor here is the notion of the connection pool.

Can you try something for me, do:

store.JsonRequestFactory.CustomizeRequest += req => ((HttpWebRequest)req).UnsafeAuthenticatedConnectionSharing = true;

What happens then?


On Tue, Mar 27, 2012 at 10:29 AM, Thomas Schmidt <thomas....@gmail.com> wrote:

Thomas Schmidt

unread,
Mar 27, 2012, 7:36:34 AM3/27/12
to rav...@googlegroups.com
It seems my previous post was lost in cyberspace so i will post my comments again :)

Oren,

i have tried the changes you suggested and the improvements where in the range of 10-15% when querying/loading, changing to a batched insert with batches of 1024 documents resulted in going from  46730ms to 11272ms although indexes to a while longer to get ready.

I didn't have a CustomizeRequest on the JsonRequestFactory, instead i used ConfigureRequest.

I did look at store.JsonRequestFactory.EnableBasicAuthenticationOverUnsecureHttpEvenThoughPasswordsWouldBeSentOverTheWireInClearTextToBeStolenByHackers and that made me spill coke all over my keyboard :)

I will try bulkloading even more complex documents into ravendb, documents to that require several joins in sql to do something similar and see how that goes.

I have also read the link that Ryan posted but didn't experience any improvements with the suggestions i hadn't already done in that thread.

On Tuesday, March 27, 2012 12:05:47 PM UTC+2, Oren Eini wrote:
Thomas,
A few comments:


data load ravendb - takes 46730ms:
Parallel.ForEach(Enumerable.?Range(0, 10000), i =>
{
using (IDocumentSession session = store.OpenSession())
{
session.Store(new Category {Description = "unit-test-description-" + i, Id = "categories-" + i}); //specifies id myself to avoid HILO roundtrip
session.SaveChanges();
}
});

Note that this is a bad form of doing this, since it requires us to do a lot of separate calls vs. doing bulk request.

This is also not recommended:

test 1 ravendb - takes 15281ms:
Parallel.ForEach(Enumerable.Range(0, 10000), i =>
{
using (IDocumentSession session = store.OpenSession())
{
string id = "categories-" + i;
Category result = session.Query<Category, CategoryIndex>().FirstOrDefault(cat => cat.Id == id);
}
});


Since this forces us to force do an index query, then a load. If you have the ID, you need to use Load.

All of that said, I think that the major factor here is the notion of the connection pool.

Can you try something for me, do:

store.JsonRequestFactory.CustomizeRequest += req => ((HttpWebRequest)req).UnsafeAuthenticatedConnectionSharing = true;

What happens then?


Ryan Heath

unread,
Mar 27, 2012, 7:51:41 AM3/27/12
to rav...@googlegroups.com
I think the deserialization of the objects could also add to the differences.

The SQL does not have to do that:
...


Category cat = new Category
{
Id = reader.GetString(0),
Description = reader.GetString(1)
};

...

Raven does:
...


Category result = session.Load<Category>(id);

...

Maybe a fairer test would be to have the .net deserialization excluded
from the Raven loop or added to the SQL loop.

// Ryan

Thomas Schmidt

unread,
Mar 27, 2012, 8:04:52 AM3/27/12
to rav...@googlegroups.com
I think that it must be used as is, as in the end i end up with Category objects and that is what i have to use, so i think the comparison is fair. It doesn't make sense to me to serialize/deserialize the data i get out of SQL Server or to somehow not deserialize the data i get back from ravendb.

On Tuesday, March 27, 2012 1:51:41 PM UTC+2, Ryan Heath wrote:
I think the deserialization of the objects could also add to the differences.

The SQL does not have to do that:
...
Category cat = new Category
{
  Id = reader.GetString(0),
  Description = reader.GetString(1)
};
...

Raven does:
...
Category result = session.Load<Category>(id);
...

Maybe a fairer test would be to have the .net deserialization excluded
from the Raven loop or added to the SQL loop.

// Ryan

On Tue, Mar 27, 2012 at 1:36 PM, Thomas Schmidt

Ryan Heath

unread,
Mar 27, 2012, 8:16:59 AM3/27/12
to rav...@googlegroups.com
Ofcourse you need Category objects :)
But you need to know where the slowness is, to be able to do something about it.
From what I've read you have tried a lot of options already but not
yet this one.

// Ryan

Damian Hickey

unread,
Mar 27, 2012, 11:29:53 AM3/27/12
to rav...@googlegroups.com
> All of that said, I think that the major factor here is the notion of the connection pool.

Are there plans for such a thing for RavenDB? Could be interesting perf gain for situations where people are using Raven behind app / web servers. I've always suspected that such a feature would be called Raven.Client.Heavyweight.

On Tuesday, 27 March 2012 11:05:47 UTC+1, Oren Eini wrote:
Thomas,
A few comments:


data load ravendb - takes 46730ms:
Parallel.ForEach(Enumerable.?Range(0, 10000), i =>
{
using (IDocumentSession session = store.OpenSession())
{
session.Store(new Category {Description = "unit-test-description-" + i, Id = "categories-" + i}); //specifies id myself to avoid HILO roundtrip
session.SaveChanges();
}
});

Note that this is a bad form of doing this, since it requires us to do a lot of separate calls vs. doing bulk request.

This is also not recommended:

test 1 ravendb - takes 15281ms:
Parallel.ForEach(Enumerable.Range(0, 10000), i =>
{
using (IDocumentSession session = store.OpenSession())
{
string id = "categories-" + i;
Category result = session.Query<Category, CategoryIndex>().FirstOrDefault(cat => cat.Id == id);
}
});


Since this forces us to force do an index query, then a load. If you have the ID, you need to use Load.

All of that said, I think that the major factor here is the notion of the connection pool.

Can you try something for me, do:

store.JsonRequestFactory.CustomizeRequest += req => ((HttpWebRequest)req).UnsafeAuthenticatedConnectionSharing = true;

What happens then?


Oren Eini (Ayende Rahien)

unread,
Mar 27, 2012, 11:53:22 AM3/27/12
to rav...@googlegroups.com
Damian,
It is already there, there is a http connection pool in .net. I am pretty sure it is enabled by default for HTTP1.1
Reply all
Reply to author
Forward
0 new messages