Hi everybody!
I have an issue with $lookups which I would like to describe and ask for help.
Let's say I have two collections, tasks and customers.
Customers have a 1:n relation with tasks via a "customerId" field in customers.
I now have a view where I need to display tasks with customer names. AND I also need to be able to filter and sort for customer names. Which means I can't do the $limit or $match stage before $lookup in the following query.
So here is my example query:
db.task.aggregate([
{
"$match": {
"_deleted": false
}
},
"$lookup": {
"from": "customer",
"let": {
"foreignId": "$customerId"
},
"pipeline": [
{
"$match": {
"$expr": {
"$and": [
{
"$eq": [
"$_id",
"$$foreignId"
]
},
{
"$eq": [
"$_deleted",
false
]
}
]
}
}
}
],
"as": "customer"
},
{
"$unwind": {
"path": "$customer",
"preserveNullAndEmptyArrays": true
}
},
{
"$match": {
"customer.name": 'some_search_string'
}
},
{
"$sort": {
"customer.name": -1
}
},
{
"$limit": 35
},
{
"$project": {
"_id": 1,
"customer._id": 1,
"customer.name": 1,
"description": 1,
"end": 1,
"start": 1,
"title": 1
}
}
])
This query is getting incredibly slow when the collections are growing in size. With 1000 tasks and 20 customers it already takes about 500ms to deliver result.
I'm aware, that this happens because the $lookup operator has to do a tablescan for each row that enters the aggregation pipeline's lookup stage.
I have tried to set indexes like described here: Poor lookup aggregation performance but that doesn't seem to have any impact.
My next guess was that the "sub"-pipeline in the $lookup stage is not capable of using indexes, so I replaced it with a simple
"$lookup": {
"from": "customer",
"localField": "customerId",
"foreignField": "_id",
"as": "customer"
}
But still the indexes are not used or don't have any impact on performance. (To be honest I don't know which of both is the case since .explain() won't work with aggregation pipelines.)
I have tried the following indexes:
I'm grateful for any ideas on what I'm doing wrong or how I could achive the same thing with a better aggregation pipeline.
Additional info: I'm using a three member replica set. I'm on MongoDB 4.0.
Thanks in advance!