Slow query performance when bulkinserting

Floris Robbemont

unread,

Mar 22, 2017, 12:17:58 PM3/22/17

to RavenDB - 2nd generation document database

Hi,

We are experiencing slow queries when we're doing a large bulk insert. The total size of documents is aprox 12 million documents.

Without the bulkinsert everything is fast (< 100ms). But during the bulk, queries on indexes that are related to the bulk (the same collection as the bulk insert is doing) are dropping down to 8 seconds! The index we're querying is pretty basic, no loaddocument calls and it's a map only. It does have 9 different sort columns defined.

The load on the server is not that high during the bulk (20-30%). It has 16 cores and 16GB RAM. The bulk insert is being done from a different machine. All storage is RAID-10 with 6 SSD's.

I'm wondering why this is happening. I don't mind stale data, but 8 seconds per query cannot be right. I also expected the performance to be higher with this kind of hardware.

Is there anything we can do about this?

Oren Eini (Ayende Rahien)

unread,

Mar 22, 2017, 3:41:48 PM3/22/17

to ravendb

Are you using WaitForNonStaleResults in the query?

Hibernating Rhinos Ltd

Oren Eini l CEO l Mobile: + 972-52-548-6969

Office: +972-4-622-7811 l Fax: +972-153-4-622-7811

--
You received this message because you are subscribed to the Google Groups "RavenDB - 2nd generation document database" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ravendb+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Floris Robbemont

unread,

Mar 22, 2017, 8:03:26 PM3/22/17

to RavenDB - 2nd generation document database

No. The only thing we're doing is NoTracking and NoCaching. Also a SelectFields call.

But it's mostly the facets query that takes more then 8 seconds. The results query takes about a second.

If we use the direct query URL in a browser it's still very slow during the bulk. So that would mean it's not client related.

Oren Eini (Ayende Rahien)

unread,

Mar 22, 2017, 8:18:44 PM3/22/17

to ravendb

Can you run this under profiler?

Hibernating Rhinos Ltd

Oren Eini l CEO l Mobile: + 972-52-548-6969

Office: +972-4-622-7811 l Fax: +972-153-4-622-7811

Flo...@lucrasoft.nl

unread,

Mar 23, 2017, 4:21:53 AM3/23/17

to RavenDB - 2nd generation document database

It seems that the initial call (after something has changed in the database) is also very slow. If, during a bulk insert, documents are in constant change that might be the cause of it.

That means it's somewhere in the Facet and/or query logic on the server?

Op donderdag 23 maart 2017 01:18:44 UTC+1 schreef Oren Eini:

Can you run this under profiler?

Hibernating Rhinos Ltd
Oren Eini l CEO l Mobile: + 972-52-548-6969
Office: +972-4-622-7811 l Fax: +972-153-4-622-7811

On Thu, Mar 23, 2017 at 2:03 AM, Floris Robbemont <florisr...@hotmail.nl> wrote:

No. The only thing we're doing is NoTracking and NoCaching. Also a SelectFields call.

But it's mostly the facets query that takes more then 8 seconds. The results query takes about a second.

If we use the direct query URL in a browser it's still very slow during the bulk. So that would mean it's not client related.

--
You received this message because you are subscribed to the Google Groups "RavenDB - 2nd generation document database" group.

To unsubscribe from this group and stop receiving emails from it, send an email to ravendb+u...@googlegroups.com.

Flo...@lucrasoft.nl

unread,

Mar 23, 2017, 4:21:54 AM3/23/17

to RavenDB - 2nd generation document database

What type of profiler? It'll take a bit of setup, but I might be able to run a remote profiler with dotTrace and send you the details.

Do you want a trace of my application? Or RavenDB?

Op donderdag 23 maart 2017 01:18:44 UTC+1 schreef Oren Eini:

Can you run this under profiler?

Hibernating Rhinos Ltd
Oren Eini l CEO l Mobile: + 972-52-548-6969
Office: +972-4-622-7811 l Fax: +972-153-4-622-7811

On Thu, Mar 23, 2017 at 2:03 AM, Floris Robbemont <florisr...@hotmail.nl> wrote:

No. The only thing we're doing is NoTracking and NoCaching. Also a SelectFields call.

But it's mostly the facets query that takes more then 8 seconds. The results query takes about a second.

If we use the direct query URL in a browser it's still very slow during the bulk. So that would mean it's not client related.

--
You received this message because you are subscribed to the Google Groups "RavenDB - 2nd generation document database" group.

To unsubscribe from this group and stop receiving emails from it, send an email to ravendb+u...@googlegroups.com.

Floris Robbemont

unread,

Mar 23, 2017, 4:21:54 AM3/23/17

to rav...@googlegroups.com

No. The only thing we're doing is NoTracking and NoCaching. Also a SelectFields call.

But it's mostly the facets query that takes more then 8 seconds. The results query takes about a second.

If we use the direct query URL in a browser it's still very slow during the bulk. So that would mean it's not client related.

Met vriendelijke groet,

Floris Robbemont

You received this message because you are subscribed to a topic in the Google Groups "RavenDB - 2nd generation document database" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/ravendb/nZ1u8gbWz9k/unsubscribe.
To unsubscribe from this group and all its topics, send an email to ravendb+u...@googlegroups.com.

Flo...@lucrasoft.nl

unread,

Mar 23, 2017, 4:21:54 AM3/23/17

to RavenDB - 2nd generation document database

It is specifically the facets query that's running over 8 seconds (sometimes even more):

http://192.168.12.9:8080/databases/TyresInStock-ProductSearch-Production/facets/TyreSearch?facetDoc=facets%2FTyreFacets&query=Set:2084015249580922165 AND Stock2:true AND Sizes:2055516 AND Retreaded:false&operator=AND&facetStart=0&facetPageSize=

When I'm running the same query in the studio, it's fast as well. But then I'm not defining any fetches (that only happens when a query is generated by the RavenDB .NET client).

Info: we're running the latest stable 3.5 version

Op woensdag 22 maart 2017 17:17:58 UTC+1 schreef Floris Robbemont:

Oren Eini (Ayende Rahien)

unread,

Mar 23, 2017, 5:08:38 AM3/23/17

to ravendb

DotTrace would be great, for RavenDB, yes.

To unsubscribe from this group and stop receiving emails from it, send an email to ravendb+unsubscribe@googlegroups.com.

Oren Eini (Ayende Rahien)

unread,

Mar 23, 2017, 5:09:12 AM3/23/17

to ravendb

If you are constantly forcing updates on the index, we have to do quite a bit of work on each facet call, and don't benefit from any caching along the way.

Might be it, yes.

To unsubscribe from this group and stop receiving emails from it, send an email to ravendb+unsubscribe@googlegroups.com.

Oren Eini (Ayende Rahien)

unread,

Mar 23, 2017, 5:09:36 AM3/23/17

to ravendb

What does the facet doc looks like?

Hibernating Rhinos Ltd

Oren Eini l CEO l Mobile: + 972-52-548-6969

Office: +972-4-622-7811 l Fax: +972-153-4-622-7811

--

Flo...@lucrasoft.nl

unread,

Mar 23, 2017, 5:59:55 AM3/23/17

to RavenDB - 2nd generation document database

I can't upload the profiler results here. How do you want to receive them?

It's running on version 35191

This is the FacetDoc for the documents we're quering on:

{

"Facets": [

{

"Mode": "Default",

"Aggregation": "None",

"AggregationField": null,

"AggregationType": null,

"Name": "Brand",

"DisplayName": "brands",

"Ranges": [],

"MaxResults": null,

"TermSortMode": "ValueAsc",

"IncludeRemainingTerms": false

},

{

"Mode": "Default",

"Aggregation": "None",

"AggregationField": null,

"AggregationType": null,

"Name": "BrandType",

"DisplayName": "brandtype",

"Ranges": [],

"MaxResults": null,

"TermSortMode": "ValueAsc",

"IncludeRemainingTerms": false

},

{

"Mode": "Default",

"Aggregation": "None",

"AggregationField": null,

"AggregationType": null,

"Name": "Dot",

"DisplayName": "dot",

"Ranges": [],

"MaxResults": null,

"TermSortMode": "ValueAsc",

"IncludeRemainingTerms": false

},

{

"Mode": "Default",

"Aggregation": "None",

"AggregationField": null,

"AggregationType": null,

"Name": "Demo",

"DisplayName": "demo",

"Ranges": [],

"MaxResults": null,

"TermSortMode": "ValueAsc",

"IncludeRemainingTerms": false

},

{

"Mode": "Default",

"Aggregation": "None",

"AggregationField": null,

"AggregationType": null,

"Name": "Retreaded",

"DisplayName": "retreaded",

"Ranges": [],

"MaxResults": null,

"TermSortMode": "ValueAsc",

"IncludeRemainingTerms": true

},

{

"Mode": "Default",

"Aggregation": "None",

"AggregationField": null,

"AggregationType": null,

"Name": "LIndex",

"DisplayName": "loadindex",

"Ranges": [],

"MaxResults": null,

"TermSortMode": "ValueAsc",

"IncludeRemainingTerms": false

},

{

"Mode": "Default",

"Aggregation": "None",

"AggregationField": null,

"AggregationType": null,

"Name": "SIndex",

"DisplayName": "speedindex",

"Ranges": [],

"MaxResults": null,

"TermSortMode": "ValueAsc",

"IncludeRemainingTerms": false

},

{

"Mode": "Default",

"Aggregation": "None",

"AggregationField": null,

"AggregationType": null,

"Name": "LoadSpeed",

"DisplayName": "loadspeed",

"Ranges": [],

"MaxResults": null,

"TermSortMode": "ValueAsc",

"IncludeRemainingTerms": false

},

{

"Mode": "Default",

"Aggregation": "None",

"AggregationField": null,

"AggregationType": null,

"Name": "Width",

"DisplayName": "width",

"Ranges": [],

"MaxResults": null,

"TermSortMode": "ValueAsc",

"IncludeRemainingTerms": false

},

{

"Mode": "Default",

"Aggregation": "None",

"AggregationField": null,

"AggregationType": null,

"Name": "Height",

"DisplayName": "height",

"Ranges": [],

"MaxResults": null,

"TermSortMode": "ValueAsc",

"IncludeRemainingTerms": false

},

{

"Mode": "Default",

"Aggregation": "None",

"AggregationField": null,

"AggregationType": null,

"Name": "Inch",

"DisplayName": "inch",

"Ranges": [],

"MaxResults": null,

"TermSortMode": "ValueAsc",

"IncludeRemainingTerms": false

},

{

"Mode": "Default",

"Aggregation": "None",

"AggregationField": null,

"AggregationType": null,

"Name": "PurposeType",

"DisplayName": "purposetype",

"Ranges": [],

"MaxResults": null,

"TermSortMode": "ValueAsc",

"IncludeRemainingTerms": false

},

{

"Mode": "Default",

"Aggregation": "None",

"AggregationField": null,

"AggregationType": null,

"Name": "RunFlat",

"DisplayName": "runflat",

"Ranges": [],

"MaxResults": null,

"TermSortMode": "ValueAsc",

"IncludeRemainingTerms": false

},

{

"Mode": "Default",

"Aggregation": "None",

"AggregationField": null,

"AggregationType": null,

"Name": "Tube",

"DisplayName": "tube",

"Ranges": [],

"MaxResults": null,

"TermSortMode": "ValueAsc",

"IncludeRemainingTerms": false

},

{

"Mode": "Default",

"Aggregation": "None",

"AggregationField": null,

"AggregationType": null,

"Name": "VehicleType",

"DisplayName": "vehicletype",

"Ranges": [],

"MaxResults": null,

"TermSortMode": "ValueAsc",

"IncludeRemainingTerms": false

},

{

"Mode": "Default",

"Aggregation": "None",

"AggregationField": null,

"AggregationType": null,

"Name": "VehicleGroups",

"DisplayName": "vehiclegroups",

"Ranges": [],

"MaxResults": null,

"TermSortMode": "ValueAsc",

"IncludeRemainingTerms": false

},

{

"Mode": "Default",

"Aggregation": "None",

"AggregationField": null,

"AggregationType": null,

"Name": "Sizes",

"DisplayName": "sizes",

"Ranges": [],

"MaxResults": null,

"TermSortMode": "ValueAsc",

"IncludeRemainingTerms": false

}

]

}

Op donderdag 23 maart 2017 10:09:36 UTC+1 schreef Oren Eini:

To unsubscribe from this group and stop receiving emails from it, send an email to ravendb+u...@googlegroups.com.

Flo...@lucrasoft.nl

unread,

Mar 23, 2017, 6:59:11 AM3/23/17

to RavenDB - 2nd generation document database

I've uploaded the traces to a download location for you: https://www.lucrasoft.nl/downloads/Raven.Server.Profiler.zip

There's 3 different traces in there, all containing queries (both normal and facets) during a heavy process (in this case multiple document patches multithreaded (arround 8 threads)). The RavenDB server at that time had arround 70% CPU to *spare*

Raven.Server.dtp = Samping trace with CPU instruction timings

Raven.Server.PerformanceCounters.dtp = Sampling trace with performance counter measurements

Raven.Server.TimeLine.dtt = Timeline trace

Op donderdag 23 maart 2017 10:08:38 UTC+1 schreef Oren Eini:

Oren Eini (Ayende Rahien)

unread,

Mar 23, 2017, 7:14:20 AM3/23/17

to ravendb

This is the relevant part:

This is the relevant part:

https://github.com/ravendb/ravendb/blob/v3.5/Raven.Database/Indexing/IndexedTerms.cs#L161

The issue is likely that you are running into worst case scenario.

Facets are pretty cheap in general, because you can do most of the work once, and then just compute the difference.

The problem in your case is that the bulk insert generates a lot of independent large index batches, and we need to compute this for each of them.

The problem is a bit worst because by the next request you are going to make, we have even more index batches ( and index readers ), so we have to do even more work.

To unsubscribe from this group and stop receiving emails from it, send an email to ravendb+unsubscribe@googlegroups.com.

Flo...@lucrasoft.nl

unread,

Mar 23, 2017, 7:21:39 AM3/23/17

to RavenDB - 2nd generation document database

Thanks for the quick reply!

A little bit of background information:

We have two types of syncs running:

- One nightly sync which does a bulk insert on the entire collection, and removed untouched documents. This usually runs for an hour or so, and only outside business hours

- A quarterly sync which updates prices and stocks (done by sending multiple batches)

The nightly sync is not really a problem because this happens when there's no users on the website. The quaterly sync however is.

I don't know if you might be able to optimize for this, but the fields we're updating with the quaterly sync are *not* in the facet document. We only update prices and stock during the day, the rest is done at night.

So if you can detect whether or not the facet values have actually changed, you might be able to determine whether a facet cache update is needed. However, I don't know if that's possible.

Op donderdag 23 maart 2017 12:14:20 UTC+1 schreef Oren Eini:

Oren Eini (Ayende Rahien)

unread,

Mar 23, 2017, 10:36:11 AM3/23/17

to ravendb

No, we don't have a good way to do that, however, you can write them to a separate document and facet on that, and then you only have this over night

To unsubscribe from this group and stop receiving emails from it, send an email to ravendb+unsubscribe@googlegroups.com.

Flo...@lucrasoft.nl

unread,

Mar 25, 2017, 12:38:12 PM3/25/17

to RavenDB - 2nd generation document database

Ok, we've changed our solution to having 2 different documents: one with prices and one without.

The documents without price and stock information serve as facets, and will not change during the day. The other documents are used for the actual search results.

We're now getting good performance on both (facets and query).

One other thing though:

I was monitoring the server while syncing and doing the batch updates, and on both occasions the CPU was spiking to 70-80 percent while indexing. After looking at the indexing stats, it seems RavenDB was using 32 threads to index. The server only has 16 cores, so that would means somewhere the amount of threads are doubled.

I limited this to 8 threads (so setting MaxIndexTasks to 4), and the server seems to be behaving a lot better. I'm not seeing any drop in indexing performance. As a matter of fact, it's running faster then before.

My conclusion (not having all the information about the inner workings of RavenDB) is that there's threads fighting over resources somewhere (contention). I don't know how to counteract this with just the settings. Does this mean the storage is not fast enough?

Anyway... it's fixed now with the new solution. Thanks for helping.

Op donderdag 23 maart 2017 15:36:11 UTC+1 schreef Oren Eini:

Oren Eini (Ayende Rahien)

unread,

Mar 25, 2017, 5:00:43 PM3/25/17

to ravendb

We are typically using double the number of cores that you have on the machine, because we typically see stalls in such scenarios, and want to ensure more work is done by more threads.

But that should have been limited by your license.

To unsubscribe from this group and stop receiving emails from it, send an email to ravendb+unsubscribe@googlegroups.com.

Reply all

Reply to author

Forward