$lookup pipeline match short circuit

49 views
Skip to first unread message

Matt C

unread,
Jun 6, 2018, 6:04:26 PM6/6/18
to mongodb-user
Hello all, my use case is this:

I have two collections, `events` and `operations`.  Events have a `type` and only certain event types have operations associated with them.  I would like to create an aggregate pipeline such that the `operations` collection is only queried when the event type is 'MySpecialEventType'.  This is the latest thing I've tried:

collection.aggregate([
    {$match: query},
    {$lookup: {
      from: operationCollection,
      let: {eventHash: '$eventHash', type: '$event.type'},
      pipeline: [
        {$match: {$expr: {$and: [
          {$eq: ['$$type', 'MySpecialEventType']},
          {$eq: ['$meta.eventHash', '$$eventHash']}
        ]}}},
        {$sort: {'meta.eventOrder': 1}},
        {$replaceRoot: {newRoot: "$operation"}}
      ],
      as: 'event.operation'
    }},
  ])

This query produces the proper result.  However, by analyzing the performance of the query with and without the $$type comparison, it appears that a $lookup query is still being performed even when the event type is not 'MySpecialEventType'.

If there is a proper/different/better way to approach this problem, great!

If not, it seems to me that mongo could evaluate the $match expressions involving local variables first and short circuit the whole operation if the conditions are not met.


Marius De Wit

unread,
Jun 7, 2018, 4:32:06 AM6/7/18
to mongod...@googlegroups.com
Hi Matt

So for this query, only documents in the operationCollection that has a type of 'MySpecialEventType' and the specified meta.eventHash should be selected. For this to work, all documents in your operationCollection will have to be checked to see if it meets the criteria. If you have an index defined on these two fields, then the lookup should go a lot quicker as only the index will have to be checked. Maybe use the mongodb profiling functionality to check that your index is indeed used.

Some more notes on performance: Ive found that lookup isnt very fast and this is especially noticeable when joining big tables of several gigabytes together. In certain cases it is better to do two separate queries to the db. First, do a query just to get your eventHash and event type and store these values client side. Then do another query to the operationCollection with a simple match with these values. 

If you need an even faster result, doing 2 separate queries also gives you the option to speed up things even further if you ensure that the second query on the operationCollection is a covered query. This can be done by adding a project stage to only include fields that are present in the operationCollection index. 

Hope this helps a bit
Marius

--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.
 
For other MongoDB technical support options, see: https://docs.mongodb.com/manual/support/
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user+unsubscribe@googlegroups.com.
To post to this group, send email to mongod...@googlegroups.com.
Visit this group at https://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/9922fa68-01aa-4735-887b-387fb70655a5%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Matt C

unread,
Jun 7, 2018, 9:16:39 AM6/7/18
to mongodb-user
Hi Marius, thanks for all the tips, I certainly have other use cases where I can apply that information.  Let me restate what I am trying to do. 

I would like to prevent the query on the `operationCollection` altogether when `event.type` (from the parent, event collection) is *not* MySpecialEventType.  I believe I am storing a local pipeline variable `type` to store `event.type` and comparing that to a string literal in the expression involving $$type.  It is common for programming languages to process AND statements from left to right and simply fail the entire operation if/when it finds a conditional that is not truthy.  Since the first conditional in this case only involves a localVariable, then that condition can be evaluated as false and the rest of the operation can be safely aborted, eliminating the penalty of querying the `operationCollection`.  So I would like to determine: 
  1. am i doing it wrong (syntax error? / wrong API?)
  2. am I correct in my assessment that mongodb does not currently short circuit expressions involving local variables in the way that I have just described
  3. if 2 is true, then do other people agree with me that this would be a good enhancement request?
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user...@googlegroups.com.

Marius De Wit

unread,
Jun 7, 2018, 10:05:45 AM6/7/18
to mongod...@googlegroups.com
Ah ok now I understand, apologies for misreading your original question.

You probably dont want do the match before lookup, as that will remove the rows that does not have type as 'MySpecialEventType'. What you want is for the join to only happen if type = MySpecialEventType, but still keep the other rows as well.

I also think you are correct in saying that MongoDB does not take the short circuit possibility into consideration. As I understand it the order inside an $and expression is not taken into consideration. 

The only way that I can think of (to force mongo not to do the lookup for all rows) is splitting your pipeline into two different facets. One Facet will have a match in front that will only allow records through that has type equal to MySpecialEventType. After the match, do the lookup as shown above, except dont match inside the lookup on $$type again. The other facet will have a match in front that only allows records through with type not equal to MySpecialEventType. In your main pipeline, merge the results together that was obtained from the two facets. 

Im not sure if this would be faster than what you already have however!


To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user+unsubscribe@googlegroups.com.

To post to this group, send email to mongod...@googlegroups.com.
Visit this group at https://groups.google.com/group/mongodb-user.

Matt C

unread,
Jun 7, 2018, 10:12:10 AM6/7/18
to mongod...@googlegroups.com
Yes, I did explore the $facet approach you described, but did not pursue it to the point of trying to assemble the facets into a consistent document.  I guess I'll need to explore that further.

You received this message because you are subscribed to a topic in the Google Groups "mongodb-user" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/mongodb-user/RI5NrmdDHY8/unsubscribe.
To unsubscribe from this group and all its topics, send an email to mongodb-user+unsubscribe@googlegroups.com.

To post to this group, send email to mongod...@googlegroups.com.
Visit this group at https://groups.google.com/group/mongodb-user.

For more options, visit https://groups.google.com/d/optout.



--
Matt Collier

Reply all
Reply to author
Forward
Message has been deleted
0 new messages