Constantly high CPU usage (70%+) on a simple project

102 views
Skip to first unread message

Marcelo Volmaro

unread,
Nov 24, 2015, 1:18:09 AM11/24/15
to RavenDB - 2nd generation document database
Hi,
I just ported a service app that was using mysql to RavenDB. I'm now in the process of testing it, and found that the application uses constantly about 70% of the CPU time.
The application is a simple logger that I use to log data from different sources. The current rate of messages is about 30/s. Each message is similar to this one:

{
    "Count": 1,
    "FirstOccurredOn": "2015-11-23T18:53:48.598028Z",
    "ApplicationName": "TaskManagerServer",
    "Details": "",
    "Id": "2dde00cd25ed4a35a32b0000000bf697",
    "OccurredOn": "2015-11-23T18:53:48.598028Z",
    "Tags": [],
    "TextMessage": "....(some text here)...",
    "Type": 4,
    "Metadata": {
        "ServerId": "1",
        "ProviderId": "1"
    }
}

I also have 3 simple indexes, as following:

public sealed class DebugMessages_ByDefaultIndexes : AbstractIndexCreationTask<DebugMessage>
{
public DebugMessages_ByDefaultIndexes()
{
Map = messages => messages
.Select(item => new
{
item.OccurredOn,
item.ApplicationName,
item.Type,
item.Tags,
_ = item.Metadata.Select(x => CreateField("Metadata_" + x.Key, x.Value))
});
}
}

public sealed class Tags_ByName : AbstractIndexCreationTask<DebugMessage, Tags_ByName.Tag>
{
public sealed class Tag
{
public string Application { get; set; }
public string Name { get; set; }
}
public Tags_ByName()
{
Map = messages => messages.SelectMany(message => message.Tags, (message, tag) => new Tag
{
Name = tag,
Application = message.ApplicationName
});
Reduce = results => results.GroupBy(tag => new
{
tag.Application,
tag.Name
}).Select(g => new Tag
{
Application = g.Key.Application,
Name = g.Key.Name
});
}
}

public sealed class Applications_ByName : AbstractIndexCreationTask<DebugMessage, Applications_ByName.Application>
{
public sealed class Application
{
public string Name { get; set; }
}
public Applications_ByName()
{
Map = messages => messages.Select(x => new Application
{
Name = x.ApplicationName
});
Reduce = results => results.GroupBy(application => new
{
application.Name
}).Select(g => new Application
{
Name = g.Key.Name
});
}
}

The applications is using RavenDB as embedded DB. Profiling the app shows that most of the time is spent on the indexing (as can be seen on the attached screenshoot).

I don't think the indexes are complex enough, and while the app was working with mysql, I can handle WAY more messages that what the system is currently handling (and I just started using it, not even using it at a 10% of what it should handle).

My questions are:
1) Are my indexes wrong? (I'm new to Raven... so that's the most probable thing).
2) I'm using Esent, as Voron seems to not be working on embedded (it says it doesn't works in 32bits, but the server where this app is running is 64 bits). Switching from embedded to Raven as a service may aleviate the issue?
3) I'm using an async session and adding the items one by one... maybe there is a better way? I receive the messages in batches, but the size of each batch is unknown (could be 1 message or 1000, depending on the load on the other side).
4) Any other ideas to help the system not be so taxed?

Thanks in advance.



ProfilerSession.png

Michael Yarichuk

unread,
Nov 24, 2015, 2:07:54 AM11/24/15
to RavenDB - 2nd generation document database
Hi,
First, what is the build that you are running?
1) The indexes look ok. Just to make sure, do you have suggestions enabled?
2) How is the app that runs Raven instance is compiled? AnyCPU? What happens if you compile it to x64 and run Voron?
3) Can you show code of how you do inserts now?
Also, you might want to take a look at Bulk Insert http://ravendb.net/docs/article-page/3.0/csharp/client-api/bulk-insert/how-to-work-with-bulk-insert-operation
4)
 a. Can you send the profiler session so I can take a look?
 b. Can you post your Raven configuration?
 c. Can you share hardware specs of a machine your app runs on?

--
You received this message because you are subscribed to the Google Groups "RavenDB - 2nd generation document database" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ravendb+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Best regards,

 

Michael Yarichuk

RavenDB Core Team

Tel: 972-4-6227811

Fax:972-153-4-6227811

Email : michael....@hibernatingrhinos.com

 

RavenDB paving the way to "Data Made Simple" http://ravendb.net/  

Daniel Häfele

unread,
Nov 24, 2015, 2:47:34 AM11/24/15
to RavenDB - 2nd generation document database
In the last 2 indexes you can propably remove the reduce part and just load the data from the index.
This might also reduce the time needed to update the indexes.

Marcelo Volmaro

unread,
Nov 24, 2015, 7:44:39 AM11/24/15
to RavenDB - 2nd generation document database


On Tuesday, November 24, 2015 at 4:07:54 AM UTC-3, Michael Yarichuk wrote:
Hi,
First, what is the build that you are running?
1) The indexes look ok. Just to make sure, do you have suggestions enabled?
No suggestions at all.

2) How is the app that runs Raven instance is compiled? AnyCPU? What happens if you compile it to x64 and run Voron?
Yes, AnyCPU. Local tests seems to indicate that compiling for x64 makes voron works (at least it doesn't throws any exceptions).

3) Can you show code of how you do inserts now?
await _session.StoreAsync(message) in a foreach. Note that the insertion process doesn't takes almost any time (it's in the order of the ms.. 10/30ms depending on the number of messages per batch).
 
 a. Can you send the profiler session so I can take a look?
Sure, no problem. Do you want a specific tool? I'm using Redgate ants for that, but can use any other tool if it has some trial/evaluation version.
 
 b. Can you post your Raven configuration?
<add key="Raven/DataDir" value="~\Databases\System" />
<add key="Raven/MaxNumberOfItemsToIndexInSingleBatch" value="256" />
<add key="Raven/MaxNumberOfItemsToPreFetchForIndexing" value="256" />
<add key="Raven/LimitIndexesCapabilities" value="true" />
<add key="Raven/TransactionMode" value="Lazy" />
 
 c. Can you share hardware specs of a machine your app runs on?
Windows server 2012R2 running on an Intel Xeon 2.6GHz, 2Gb ram, 40Gb hdd (standard disk, not sdd).

Marcelo Volmaro

unread,
Nov 24, 2015, 7:46:56 AM11/24/15
to RavenDB - 2nd generation document database
But the reduce part is there to get a single list of the tags/applications, not to do a projection. How do you suggest I could get a list of all the tags from all the messages, without having to query all of them?

Marcelo Volmaro

unread,
Nov 24, 2015, 7:58:49 AM11/24/15
to RavenDB - 2nd generation document database
Sorry, I forgot:
Build 3.0.3800, and no, I can't use batch inserts, as my messages already have an ID that I must respect.


On Tuesday, November 24, 2015 at 4:07:54 AM UTC-3, Michael Yarichuk wrote:

Oren Eini (Ayende Rahien)

unread,
Nov 24, 2015, 8:01:54 AM11/24/15
to ravendb
If you are writing constantly, we'll also be indexing constantly. 


Hibernating Rhinos Ltd  

Oren Eini l CEO Mobile: + 972-52-548-6969

Office: +972-4-622-7811 l Fax: +972-153-4-622-7811

 


On Mon, Nov 23, 2015 at 9:12 PM, Marcelo Volmaro <mvol...@gmail.com> wrote:

Marcelo Volmaro

unread,
Nov 24, 2015, 8:09:38 AM11/24/15
to RavenDB - 2nd generation document database
Sure, but the same app running on mysql doesn't event touch the 5% of the CPU, and I have the same indexes (with some limitations on the sql, as in order to index the metadata I have several columns where I store each value, and that's one of the reasons I wanted to switch to a nosql db...).

Oren Eini (Ayende Rahien)

unread,
Nov 24, 2015, 8:11:51 AM11/24/15
to ravendb
How many indexes total do you have, how many docs / sec?
Do you have a single instance, replication, sql replication, etc?

Marcelo Volmaro

unread,
Nov 24, 2015, 8:15:52 AM11/24/15
to rav...@googlegroups.com
Not sure what to answer as for how many indexes... Index classes are the 3 ones I shown on the first post. That's it. Fields, again, you can get a grasp from the first post, but right now I have about the equivalent to 16 indexed columns in sql (that includes both the static fields plus tags/metadata). This is embedded, so yes, single instance, no replication, no sql replication.

___________________
Marcelo Volmaro

--
You received this message because you are subscribed to a topic in the Google Groups "RavenDB - 2nd generation document database" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/ravendb/L3H1G8i-yeU/unsubscribe.
To unsubscribe from this group and all its topics, send an email to ravendb+u...@googlegroups.com.

Oren Eini (Ayende Rahien)

unread,
Nov 24, 2015, 8:17:18 AM11/24/15
to ravendb
You write documents using session, with about 30 docs / sec, but how many per session call?

Marcelo Volmaro

unread,
Nov 24, 2015, 8:21:42 AM11/24/15
to rav...@googlegroups.com
It varies, but min 3, max 700 per session.

___________________
Marcelo Volmaro

Oren Eini (Ayende Rahien)

unread,
Nov 24, 2015, 8:24:58 AM11/24/15
to ravendb
Okay, with those rates, can you try using BulkInsert instead?

Marcelo Volmaro

unread,
Nov 24, 2015, 8:28:43 AM11/24/15
to rav...@googlegroups.com
But the documentation states that I can't use my own IDs on bulk insert. That would mean that I will have to change the model on the server (not really an issue), add another index (as I need to load documents by the original ID) and change my client (the one that every app uses to log messages to this app. That could be an issue)... 
Is there a way I can use the IDs I already have, in bulk inserts?

___________________
Marcelo Volmaro

Oren Eini (Ayende Rahien)

unread,
Nov 24, 2015, 8:51:16 AM11/24/15
to ravendb
What do you mean, not use your own id?

Daniel Häfele

unread,
Nov 24, 2015, 8:52:15 AM11/24/15
to RavenDB - 2nd generation document database
I think you're mixing some things here.
If you're using the bulk insert feature, then you need to provide the IDs already in the client.
By default it will use HiLo if you don't specify an ID by yourself.
But in your case, you already have one.

So there should be no issue with bulk insert at all.

Marcelo Volmaro

unread,
Nov 24, 2015, 8:55:43 AM11/24/15
to rav...@googlegroups.com
Raven docs says on BulkInserts "Entity Id must be provided by the client. The client by default will use the HiLo generator in order to generate the Id.". My entities (messages) already have an ID at insertion time, that it is being generated per app basis (each app that logs messages generates its own ids).

So, basically, this is the body of an entity being inserted:
{
            "Count": 1,
            "FirstOccurredOn": "2015-11-24T13:54:40.9691384Z",
            "ApplicationName": "TaskManagerServer",
            "Details": "",
            "Id": "e780334f30ba4b34ab3f0000001ce3ef",
            "OccurredOn": "2015-11-24T13:54:40.9691384Z",
            "Tags": [],
            "TextMessage": "El cliente solicitó el Task",
            "Type": 4,
            "Metadata": {
                "ServerId": "1",
                "TaskId": "c6a82530ea5742348f08d591921d2bd7",
                "PluginId": "7",
                "ClientId": "9",
                "ProviderId": "1"
            }
}

As you can see, I already have an ID setup.

___________________
Marcelo Volmaro

Marcelo Volmaro

unread,
Nov 24, 2015, 8:56:46 AM11/24/15
to rav...@googlegroups.com
Ahhh... That's not what I understood. So then I'm good with batch inserts and will try them right now... thanks!

___________________
Marcelo Volmaro

Marcelo Volmaro

unread,
Nov 24, 2015, 10:28:18 AM11/24/15
to rav...@googlegroups.com
Ok, updated the code to use bulk inserts. That doesn't change the CPU usage. Now, force-compiling into x64 and using voron made a huge change. Now the CPU is at about 35%, so that change alone cut the cpu usage in half.
Why do I need to force compile into x64? Is there something wrong detecting the CPU? I have other software that makes uses of different .dlls (native versions using p-invoke) and they detect and work fine in x64.

I still believe the CPU usage is high, but 35% is much better than 70% for the same amount of inserts...

___________________
Marcelo Volmaro

Grisha Kotler

unread,
Nov 24, 2015, 12:27:42 PM11/24/15
to rav...@googlegroups.com
<add key="Raven/MaxNumberOfItemsToIndexInSingleBatch" value="256" />
<add key="Raven/MaxNumberOfItemsToPreFetchForIndexing" value="256" />

Why are you using those settings?

Hibernating Rhinos Ltd  cid:image001.png@01CF95E2.8ED1B7D0

Grisha Kotler l RavenDB Core Team Developer Mobile: +972-54-586-8647

RavenDB paving the way to "Data Made Simplehttp://ravendb.net/

Marcelo Volmaro

unread,
Nov 24, 2015, 1:22:53 PM11/24/15
to rav...@googlegroups.com
To try to optimize indexing, as it is what it takes most of the time.

___________________
Marcelo Volmaro

Michael Yarichuk

unread,
Nov 24, 2015, 4:53:43 PM11/24/15
to RavenDB - 2nd generation document database
What happens to CPU usage if you remove those? RavenDB tracks lots of heuristics and changes those (among others) on the fly, depending on the system.


Best regards,

 

Michael Yarichuk

RavenDB Core Team

Tel: 972-4-6227811

Fax:972-153-4-6227811

Email : michael....@hibernatingrhinos.com

 

RavenDB paving the way to "Data Made Simple" http://ravendb.net/  

Michael Yarichuk

unread,
Nov 24, 2015, 4:54:25 PM11/24/15
to RavenDB - 2nd generation document database
depending on the system load that is

Marcelo Volmaro

unread,
Nov 24, 2015, 8:02:28 PM11/24/15
to rav...@googlegroups.com
Absolutely nothing. CPU usage doesn't change at all.

___________________
Marcelo Volmaro

Michael Yarichuk

unread,
Nov 25, 2015, 2:54:58 AM11/25/15
to RavenDB - 2nd generation document database
Is it possible for you to make a stand alone app that reproduces what you are experiencing? I would like to take look with a profiler on my end

Marcelo Volmaro

unread,
Nov 25, 2015, 12:01:39 PM11/25/15
to rav...@googlegroups.com
Sure... where do you want me to upload it? Or send to...

___________________
Marcelo Volmaro

Michael Yarichuk

unread,
Nov 25, 2015, 2:56:23 PM11/25/15
to RavenDB - 2nd generation document database
Easiest would be to zip the solution and send it to our support mail (support (at) ravendb.net).
Alternatively, you can send us the link to  code repository like GitHub, file storage like Dropbox/Google Drive

Marcelo Volmaro

unread,
Nov 25, 2015, 3:38:50 PM11/25/15
to rav...@googlegroups.com
Done. Sent to support.

___________________
Marcelo Volmaro

Oren Eini (Ayende Rahien)

unread,
Nov 26, 2015, 2:46:12 AM11/26/15
to ravendb
Thanks, looking at this now.

Oren Eini (Ayende Rahien)

unread,
Nov 26, 2015, 3:19:53 AM11/26/15
to ravendb
Are you running this while debugging?
Running your sample project with Ctrl+F5, I see roughly 20% - 25% CPU usage.

At any rate, the primary reason for the CPU costs are the map/reduce indexes that you have.
What is happening is that your indexes Tags_ByName  and Applications_ByName require us to do tremendous amount of work.

You are using them to try to find unique tags and application names, but because they are mostly the same (in the sample data there are two apps and 20 tags), it means that we are continuously forced to re-reduce them.

Map/reduce need to keep track of all the intermediary stages, which means that there is a lot of work to be done, and this is effectively the worst case scenario for us.

You can get the same results much more effectively by using the terms from the DebugMessages_ByDefaultIndexes index, which will have the same result, but would be drastically cheaper.

 var apps = store.DatabaseCommands
       .GetTerms(new DebugMessages_ByDefaultIndexes().IndexName, "ApplicationName", null, 10)
       .ToList();

var tags = store.DatabaseCommands
                  .GetTerms(new DebugMessages_ByDefaultIndexes().IndexName, "Tags", null, 10)
                  .ToList();



Marcelo Volmaro

unread,
Nov 26, 2015, 6:31:43 AM11/26/15
to rav...@googlegroups.com
Excellent... Is there a way to get the tags by app using the same method? That would be awesome.
I already guessed that the problem could be those indexes, so I removed them (and manually maintain a dictionary<string, string[]> with the apps/tags) and now the cpu usage is about 9-14%, so much better, but if I can use the above queries and can skip the dictionary hack, much better (as I have to persist it from session to session)...

___________________
Marcelo Volmaro

Oren Eini (Ayende Rahien)

unread,
Nov 26, 2015, 6:34:30 AM11/26/15
to ravendb
You can use facets to do that, yes.

Marcelo Volmaro

unread,
Nov 26, 2015, 8:50:59 AM11/26/15
to rav...@googlegroups.com
Sorry to bother you with this, as now it is not part of the original topic, but I can't seem to make facets works (probably I'm not understanding how they works).

as a test, I have this:
session.Query<DebugMessage, DebugMessages_ByDefaultIndexes>().Where(x => x.ApplicationName == applicationName).Select(x => x.Tags).ToFacets(new[] { new Facet { Name = "Tags" }}
But I don't get any results. Taking out the Select doesn't seems to change anything...

___________________
Marcelo Volmaro

Oren Eini (Ayende Rahien)

unread,
Nov 26, 2015, 9:14:11 AM11/26/15
to ravendb
This looks something like:

   var facetQuery = s.Query<DebugMessage, DebugMessages_ByDefaultIndexes>()
       .Where(x=>x.ApplicationName == "Application_0")
       .ToFacets(new Facet[]
       {
           new Facet<DebugMessage>
           {
               Name = x=>x.Tags
           },
       });

Reply all
Reply to author
Forward
0 new messages