Incredibly slow query performance with $lookup and “sub” aggregation pipeline

stefan.hol...@gmail.com

unread,

Feb 3, 2019, 8:20:09 PM2/3/19

to mongodb-user

Hi everybody!

I have an issue with $lookups which I would like to describe and ask for help.

Let's say I have two collections, tasks and customers.

Customers have a 1:n relation with tasks via a "customerId" field in customers.

I now have a view where I need to display tasks with customer names. AND I also need to be able to filter and sort for customer names. Which means I can't do the $limit or $match stage before $lookup in the following query.

So here is my example query:

db.task.aggregate([
    {
        "$match": {
            "_deleted": false
        }
    },
    "$lookup": {
        "from": "customer",
        "let": {
            "foreignId": "$customerId"
        },
        "pipeline": [
            {
                "$match": {
                    "$expr": {
                        "$and": [
                            {
                                "$eq": [
                                    "$_id",
                                    "$$foreignId"
                                ]
                            },
                            {
                              "$eq": [
                                "$_deleted",
                                false
                              ]
                            }
                        ]
                    }
                }
            }
        ],
        "as": "customer"
    },
    {
        "$unwind": {
            "path": "$customer",
            "preserveNullAndEmptyArrays": true
            }
    },
    {
        "$match": {
            "customer.name": 'some_search_string'
        }
    },
    {
        "$sort": {
            "customer.name": -1
        }
    },
    {
        "$limit": 35
    },
    {
        "$project": {
            "_id": 1,
            "customer._id": 1,
            "customer.name": 1,
            "description": 1,
            "end": 1,
            "start": 1,
            "title": 1
        }
    }
])

This query is getting incredibly slow when the collections are growing in size. With 1000 tasks and 20 customers it already takes about 500ms to deliver result.

I'm aware, that this happens because the $lookup operator has to do a tablescan for each row that enters the aggregation pipeline's lookup stage.

I have tried to set indexes like described here: Poor lookup aggregation performance but that doesn't seem to have any impact.

My next guess was that the "sub"-pipeline in the $lookup stage is not capable of using indexes, so I replaced it with a simple

"$lookup": {
    "from": "customer",
    "localField": "customerId",
    "foreignField": "_id",
    "as": "customer"
}

But still the indexes are not used or don't have any impact on performance. (To be honest I don't know which of both is the case since .explain() won't work with aggregation pipelines.)

I have tried the following indexes:

Ascending, desecending, hashed and text index on customerId
Ascending, desecending, hashed and text index on customer.name

I'm grateful for any ideas on what I'm doing wrong or how I could achive the same thing with a better aggregation pipeline.

Additional info: I'm using a three member replica set. I'm on MongoDB 4.0.

Thanks in advance!

Robert Cochran

unread,

Feb 3, 2019, 8:30:28 PM2/3/19

to mongodb-user

Hi,

Could you please:

Provide sample collections, each populated with a few documents.
List the exact version of MongoDB you are using, including the minor version? For instance I use MongoDB version 4.0.5. What is yours? The MongoDB teams apply a lot of fixes between minor versions so stating your exact version can be very helpful.
List the operating system(s) you are using.

Thanks so much

Bob

stefan.hol...@gmail.com

unread,

Feb 4, 2019, 9:14:55 AM2/4/19

to mongodb-user

Hi Bob!

Thanks four your response.

My MongoDB Version is 4.0.1.

Operating System is Ubuntu 16.04.5 LTS.

I attached some sample data as textfiles.

Best,

Stefan

customer_example.txt

task_example.txt

Asya Kamsky

unread,

Feb 4, 2019, 7:37:37 PM2/4/19

to mongodb-user

Don't you have this backwards? Wouldn't you be looking up tasks
related to customer?

Can a task exist without being attached to a customer? If not then
I'm not sure why you would use preserveNullAndEmptyArrays true option
on $unwind.
I can promise you that if an index is being used in the simple case
with localField/foreignField because you are doing a $lookup by "_id"
and there has to be an index on _id.

Regardless of all of this, I'm trying to understand why you only store
the customer ID with the task. If you need to display the name with
task (and/or filter on name) why not store the customer name with the
task along with customerId?

Asya

> --
> You received this message because you are subscribed to the Google Groups "mongodb-user"
> group.
>
> For other MongoDB technical support options, see: https://docs.mongodb.com/manual/support/
> ---
> You received this message because you are subscribed to the Google Groups "mongodb-user" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user...@googlegroups.com.
> To post to this group, send email to mongod...@googlegroups.com.
> Visit this group at https://groups.google.com/group/mongodb-user.
> To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/d7b5ee09-dbe8-47c9-a117-e93469974820%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

--
Asya Kamsky
Principal Product Manager
MongoDB
Download MongoDB - mongodb.org/downloads
We're Hiring! - https://www.mongodb.com/careers

stefan.hol...@gmail.com

unread,

Feb 5, 2019, 2:55:52 AM2/5/19

to mongodb-user

Hi Asya!

No, I have a list of tasks and also want to display the names of the attached customers (and be able to filter/sort for it).

Yes, a task can also have no customer attached.

Of course I could denormalize the customers name into the task collection, but since I'm building a realtime application and the customer name is also subject to change, I can't change the whole tasks collection on every keystroke. And I don't really want to duplicate all the data.

So you mean the default index on the _id field in the customer collection should be sufficient? If 500ms for a 1000x500 rows lookup is normal, thats very bad news for MongoDB in our project.

Thank you!

Asya Kamsky

unread,

Feb 20, 2019, 7:25:28 PM2/20/19

to mongodb-user

> If 500ms for a 1000x500 rows lookup is normal, thats very bad news for MongoDB in our project.

That's because you are choosing to do a join.

Are you really worried about customer name changing? Is that
something that happens frequently? I think this is a case where you
are trying to optimize for something that *might* happen (customer's
name changing) and maybe not particularly frequently, rather than
trying to optimize the performance of the most frequent path.

Asya

> To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/1cb40c61-dfb3-4c0f-b23a-2dbd928a4679%40googlegroups.com.

stefan.hol...@gmail.com

unread,

Feb 21, 2019, 5:38:55 AM2/21/19

to mongodb-user

Hi Asya!

Thanks for your response.

Maybe task/customer was a bad example, but there are others like task/invoice where I can't work around.

I know that joins do a complete tabelscan on every entry (still I think that 500ms for a 1000x20 rows request is very slow), but that was not my concern.

The problem is, that indexes have no impact at all in my lookup stage. Whatever index I set, the request takes the same time.

Best,

Stefan

Asya Kamsky

unread,

Mar 3, 2019, 5:47:47 PM3/3/19

to mongodb-user

It is not correct to say joins don't use indexes. $lookup does a
find and it *will* use an index if there is one.

What version are you running?

Asya

> To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/f7b265be-a97d-4bd5-a6a9-43ffe4d1713c%40googlegroups.com.

stefan.hol...@gmail.com

unread,

Mar 11, 2019, 11:16:30 AM3/11/19

to mongodb-user

Hello Asya!

Sorry for delay.

I found out why my indexes were not working: My collections (and indexes) were set to a different collation than my client / queries.

I corrected that and now the indexes are used.

The performance now changed for the better, but still the queries are very slow when I exceed some hundred documents in one of the collections (I'm not talking about thousands - only about 1200 entries in the base collection and 11 in the joined).

Is there any chance that $lookup performance will improve in a release in the near future? Or do you have any other idea how to handle my query different (aside from denormalising)?

Thank you very much!

Best,

Stefan

Reply all

Reply to author

Forward