Best way to filter out documents without a field using partial filter expression

770 views
Skip to first unread message

Kristijan Novoselić

unread,
May 23, 2017, 10:47:20 AM5/23/17
to mongodb-user
Hi!

I'm trying to create a unique index over two fields in a document, but only where second field is not null. This is a little tricky because partial_filter_expression has a limited operator support, and doesn't support negations.
I've come up three different ways and I'm not sure which one is the best. Let's say the fields of interest are first_name and last_name. I'd like to skip unique index if last_name is null.

1) Preprocess the document slightly in app code before inserting into MongoDB to make sure that last_name field isn't created if it's null. In this case I can do
partial_filter_expression: {last_name: {'$exists': true}}
2) Filter by field type, as last_name will always be a string. This will skip over null values:
partial_filter_expression: {last_name: {'$type': 'string'}}
3) Use a "trick" to filter out null values by comparing strings:
partial_filter_expression: {last_name: {'$gt': ''}}

Which of the filters above should I prefer? Are there any performance (or other) issues I need to consider? 

Thanks!

Asya Kamsky

unread,
May 23, 2017, 10:54:34 AM5/23/17
to mongodb-user
Interesting question.  I have questions for you which may influence the answer.

Is the purpose of this index only to provide uniqueness enforcement for combination of First name Last name fields or is it also meant to be used for fast search for records?   If the latter, then partial index expression must be present in every query that you intend to be able to use this index.   It can be present *implicitly* and it turns out that '$type':'string' expression is one that will *not* use this index when you query with {first_name:'value', last_name:'value2'} but the $exists and $gt:'' expression indexes will use this index.   This might be a bug, but still, something to consider.

As far as performance, I don't think the difference (if any exists) between those other two expressions would be large enough to be measurable.

Asya


--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.
 
For other MongoDB technical support options, see: https://docs.mongodb.com/manual/support/
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user+unsubscribe@googlegroups.com.
To post to this group, send email to mongod...@googlegroups.com.
Visit this group at https://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/a01f8cdc-10aa-484d-87f2-0fd809cd2fa9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Asya Kamsky
Lead Product Manager
MongoDB
Download MongoDB - mongodb.org/downloads
We're Hiring! - https://www.mongodb.com/careers

Kristijan Novoselić

unread,
Jun 2, 2017, 2:33:10 AM6/2/17
to mongodb-user
Thank you for your response. Somehow I didn't get the notification so I'm a little late. The purpose is only to ensure uniqueness. I'll keep this in mind if need to use it for filtering as well. Thanks again!


On Tuesday, May 23, 2017 at 4:54:34 PM UTC+2, Asya Kamsky wrote:
Interesting question.  I have questions for you which may influence the answer.

Is the purpose of this index only to provide uniqueness enforcement for combination of First name Last name fields or is it also meant to be used for fast search for records?   If the latter, then partial index expression must be present in every query that you intend to be able to use this index.   It can be present *implicitly* and it turns out that '$type':'string' expression is one that will *not* use this index when you query with {first_name:'value', last_name:'value2'} but the $exists and $gt:'' expression indexes will use this index.   This might be a bug, but still, something to consider.

As far as performance, I don't think the difference (if any exists) between those other two expressions would be large enough to be measurable.

Asya

On Tue, May 23, 2017 at 3:55 AM, Kristijan Novoselić <kristijan...@gmail.com> wrote:
Hi!

I'm trying to create a unique index over two fields in a document, but only where second field is not null. This is a little tricky because partial_filter_expression has a limited operator support, and doesn't support negations.
I've come up three different ways and I'm not sure which one is the best. Let's say the fields of interest are first_name and last_name. I'd like to skip unique index if last_name is null.

1) Preprocess the document slightly in app code before inserting into MongoDB to make sure that last_name field isn't created if it's null. In this case I can do
partial_filter_expression: {last_name: {'$exists': true}}
2) Filter by field type, as last_name will always be a string. This will skip over null values:
partial_filter_expression: {last_name: {'$type': 'string'}}
3) Use a "trick" to filter out null values by comparing strings:
partial_filter_expression: {last_name: {'$gt': ''}}

Which of the filters above should I prefer? Are there any performance (or other) issues I need to consider? 

Thanks!

--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.
 
For other MongoDB technical support options, see: https://docs.mongodb.com/manual/support/
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages