Performance of "IN" operator on auto created index

Trevor Hunter

unread,

Feb 24, 2020, 12:53:41 PM2/24/20

to RavenDB - 2nd generation document database

Hi,

We've noticed a large performance gap with the "IN" operator vs standard "AND/OR combinations" (https://ravendb.net/docs/article-page/4.2/csharp/indexes/querying/filtering#where--in)

We're using RavenDB 4.2.8.

Take the following two queries on a relatively simple object model that has a "CustomerId" (guid) and "Status" (Enum) field to query on:

Running the following on our data set (~11m documents) returns in about 2ms:

from Orders where (CustomerId = "0b3b11fe-612e-31b3-9258-0abs0e2bfbba")
and (Status = "Approved" OR Status == "AutoApproved")

Running the following on the same data set takes about 500ms to run and seems to max out all CPU of the DB server if we run with any load:

from Orders where (CustomerId = "0b3b11fe-612e-31b3-9258-0abs0e2bfbba")
and (Status IN ("Approved","AutoApproved"))

The auto index includes the customer id and the status fields. Both queries return the same results, but seem to be very different from a performance point of view.

We can fairly easily refactor the client code to not use the "IN" operator, but it seems strange there would be such a large performance penalty for this.

Regards,

Trev.

Oren Eini (Ayende Rahien)

unread,

Feb 24, 2020, 1:41:16 PM2/24/20

to ravendb

That shouldn't be the case, and off the top of my head, I can't think why it would be so.

Can you reproduce this? Something you can send us to check?

--
You received this message because you are subscribed to the Google Groups "RavenDB - 2nd generation document database" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ravendb+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ravendb/fa80f26b-42b5-4d88-af58-ef0483cab02c%40googlegroups.com.

--

Oren Eini
CEO / Hibernating Rhinos LTD

Mobile: 972-52-548-6969

Sales: sa...@ravendb.net

Skype: ayenderahine

Support: sup...@ravendb.net

Trevor Hunter

unread,

Feb 24, 2020, 4:02:33 PM2/24/20

to RavenDB - 2nd generation document database

I just sent a C# test harness to Support - ho

Rough outline, with the following object model and a database of 1 million documents:

public enum OrderStatus

{

Pending = 0,

Approved = 1,

AutoApproved = 2

}

public class Order

{

public string Id {get;set;}

public string CustomerId {get;set;}

public OrderStatus Status {get;set;}

}

The Query 1 (IN) below is about 10x slower than Query 2 (AND + OR) (running locally) when "customers/100" DOESN'T EXIST. When it does exist, it's still slower, but marginally. An auto index was created in response to these queries.

Query 1 (slow):

var q = session.Query<Order>()

.Where(o => o.CustomerId == "customers/100", true)

.Where(o=> o.Status.In(new []{OrderStatus.Approved, OrderStatus.AutoApproved}) ).Take(1);

Query 2 (fast):

var q = session.Query<Order>()

.Where(o => o.CustomerId == "customers/100", true)

.Where(o=> o.Status == OrderStatus.Approved || o.Status == OrderStatus.AutoApproved ).Take(1);

In this particular test harness, the performance difference shows itself especially when the customer ID (customers/100) doesn't exist. When the customer ID does exist, the difference is still present, but less. It's as if short-circuit logic works correctly for the logical operators, but not the IN operator.

On Monday, February 24, 2020 at 1:41:16 PM UTC-5, Oren Eini wrote:

That shouldn't be the case, and off the top of my head, I can't think why it would be so.
Can you reproduce this? Something you can send us to check?

On Mon, Feb 24, 2020 at 7:53 PM Trevor Hunter <thu...@kobo.com> wrote:

Hi,

We've noticed a large performance gap with the "IN" operator vs standard "AND/OR combinations" (https://ravendb.net/docs/article-page/4.2/csharp/indexes/querying/filtering#where--in)

We're using RavenDB 4.2.8.

Take the following two queries on a relatively simple object model that has a "CustomerId" (guid) and "Status" (Enum) field to query on:

Running the following on our data set (~11m documents) returns in about 2ms:
from Orders where (CustomerId = "0b3b11fe-612e-31b3-9258-0abs0e2bfbba")
and (Status = "Approved" OR Status == "AutoApproved")
Running the following on the same data set takes about 500ms to run and seems to max out all CPU of the DB server if we run with any load:
from Orders where (CustomerId = "0b3b11fe-612e-31b3-9258-0abs0e2bfbba")
and (Status IN ("Approved","AutoApproved")) 
The auto index includes the customer id and the status fields. Both queries return the same results, but seem to be very different from a performance point of view.
We can fairly easily refactor the client code to not use the "IN" operator, but it seems strange there would be such a large performance penalty for this.

Regards,

Trev.
--
You received this message because you are subscribed to the Google Groups "RavenDB - 2nd generation document database" group.

To unsubscribe from this group and stop receiving emails from it, send an email to rav...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/ravendb/fa80f26b-42b5-4d88-af58-ef0483cab02c%40googlegroups.com.

Oren Eini (Ayende Rahien)

unread,

Feb 26, 2020, 7:26:55 AM2/26/20

to ravendb

Thanks, we are looking into this

To unsubscribe from this group and stop receiving emails from it, send an email to ravendb+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ravendb/b144ac34-a544-4831-928c-7e38e6216f07%40googlegroups.com.

Reply all

Reply to author

Forward