mongodb regex without prefix performance issue

143 views
Skip to first unread message

Shady Shaker

unread,
Dec 22, 2015, 6:27:13 AM12/22/15
to mongodb-user
I have performance issue when I use regex without prefix match to find any part in text or the end of text (find queries with explain plan below)
as per mongodb performance for regex search will be slow unless we use preffix match, but in my case I need to do search without prefix match.

what is the best way to get the best performance for regex search without prefix


db.PERSON.find( {"arNames.arName" : {$regex: "XAVIER"}}).explain("executionStats");
db.PERSON.find( {"arNames.arName" : {$regex: "XAVIER$"}}).explain("executionStats");
db.PERSON.find( {"arNames.arName" : {$regex: "^XAVIER"}}).explain("executionStats");

1) any part of text

db.PERSON.find( {"arNames.arName" : {$regex: "XAVIER"}}).explain("executionStats");

    "executionStats" : {
        "executionSuccess" : true, 
        "nReturned" : NumberInt(1397), 
        "executionTimeMillis" : NumberInt(70877), 
        "totalKeysExamined" : NumberInt(1730874), 
        "totalDocsExamined" : NumberInt(1414867), 
        "executionStages" : {
            "stage" : "KEEP_MUTATIONS", 
            "nReturned" : NumberInt(1397), 
            "executionTimeMillisEstimate" : NumberInt(70600), 
            "works" : NumberInt(1730875), 
            "advanced" : NumberInt(1397), 
            "needTime" : NumberInt(1729477), 
            "needFetch" : NumberInt(0), 
            "saveState" : NumberInt(13526), 
            "restoreState" : NumberInt(13526), 
            "isEOF" : NumberInt(1), 
            "invalidates" : NumberInt(0), 
            "inputStage" : {
                "stage" : "FETCH", 
                "filter" : {
                    "arNames.arName" : /XAVIER/
                }, 
                "nReturned" : NumberInt(1397), 
                "executionTimeMillisEstimate" : NumberInt(70560), 
                "works" : NumberInt(1730875), 
                "advanced" : NumberInt(1397), 
                "needTime" : NumberInt(1729477), 
                "needFetch" : NumberInt(0), 
                "saveState" : NumberInt(13526), 
                "restoreState" : NumberInt(13526), 
                "isEOF" : NumberInt(1), 
                "invalidates" : NumberInt(0), 
                "docsExamined" : NumberInt(1414867), 
                "alreadyHasObj" : NumberInt(0), 
                "inputStage" : {
                    "stage" : "IXSCAN", 
                    "nReturned" : NumberInt(1414867), 
                    "executionTimeMillisEstimate" : NumberInt(66820), 
                    "works" : NumberInt(1730874), 
                    "advanced" : NumberInt(1414867), 
                    "needTime" : NumberInt(316007), 
                    "needFetch" : NumberInt(0), 
                    "saveState" : NumberInt(13526), 
                    "restoreState" : NumberInt(13526), 
                    "isEOF" : NumberInt(1), 
                    "invalidates" : NumberInt(0), 
                    "keyPattern" : {
                        "arNames.arName" : NumberInt(1)
                    }, 
                    "indexName" : "arNames.arName_1", 
                    "isMultiKey" : true, 
                    "direction" : "forward", 
                    "indexBounds" : {
                        "arNames.arName" : [
                            "[\"\", {})", 
                            "[/XAVIER/, /XAVIER/]"
                        ]
                    }, 
                    "keysExamined" : NumberInt(1730874), 
                    "dupsTested" : NumberInt(1730874), 
                    "dupsDropped" : NumberInt(316007), 
                    "seenInvalidated" : NumberInt(0), 
                    "matchTested" : NumberInt(0)
                }
            }
        }
    } 
    
    
2) end of text
db.PERSON.find( {"arNames.arName" : {$regex: "XAVIER$"}}).explain("executionStats");
    "executionStats" : {
        "executionSuccess" : true, 
        "nReturned" : NumberInt(0), 
        "executionTimeMillis" : NumberInt(66463), 
        "totalKeysExamined" : NumberInt(1730874), 
        "totalDocsExamined" : NumberInt(1414867), 
        "executionStages" : {
            "stage" : "KEEP_MUTATIONS", 
            "nReturned" : NumberInt(0), 
            "executionTimeMillisEstimate" : NumberInt(66080), 
            "works" : NumberInt(1730875), 
            "advanced" : NumberInt(0), 
            "needTime" : NumberInt(1730874), 
            "needFetch" : NumberInt(0), 
            "saveState" : NumberInt(13526), 
            "restoreState" : NumberInt(13526), 
            "isEOF" : NumberInt(1), 
            "invalidates" : NumberInt(0), 
            "inputStage" : {
                "stage" : "FETCH", 
                "filter" : {
                    "arNames.arName" : /XAVIER$/
                }, 
                "nReturned" : NumberInt(0), 
                "executionTimeMillisEstimate" : NumberInt(66050), 
                "works" : NumberInt(1730875), 
                "advanced" : NumberInt(0), 
                "needTime" : NumberInt(1730874), 
                "needFetch" : NumberInt(0), 
                "saveState" : NumberInt(13526), 
                "restoreState" : NumberInt(13526), 
                "isEOF" : NumberInt(1), 
                "invalidates" : NumberInt(0), 
                "docsExamined" : NumberInt(1414867), 
                "alreadyHasObj" : NumberInt(0), 
                "inputStage" : {
                    "stage" : "IXSCAN", 
                    "nReturned" : NumberInt(1414867), 
                    "executionTimeMillisEstimate" : NumberInt(62360), 
                    "works" : NumberInt(1730874), 
                    "advanced" : NumberInt(1414867), 
                    "needTime" : NumberInt(316007), 
                    "needFetch" : NumberInt(0), 
                    "saveState" : NumberInt(13526), 
                    "restoreState" : NumberInt(13526), 
                    "isEOF" : NumberInt(1), 
                    "invalidates" : NumberInt(0), 
                    "keyPattern" : {
                        "arNames.arName" : NumberInt(1)
                    }, 
                    "indexName" : "arNames.arName_1", 
                    "isMultiKey" : true, 
                    "direction" : "forward", 
                    "indexBounds" : {
                        "arNames.arName" : [
                            "[\"\", {})", 
                            "[/XAVIER$/, /XAVIER$/]"
                        ]
                    }, 
                    "keysExamined" : NumberInt(1730874), 
                    "dupsTested" : NumberInt(1730874), 
                    "dupsDropped" : NumberInt(316007), 
                    "seenInvalidated" : NumberInt(0), 
                    "matchTested" : NumberInt(0)
                }
            }
        }
    }
           
           
3) with prefix
db.PERSON.find( {"arNames.arName" : {$regex: "^XAVIER"}}).explain("executionStats");
    "executionStats" : {
        "executionSuccess" : true, 
        "nReturned" : NumberInt(1397), 
        "executionTimeMillis" : NumberInt(7), 
        "totalKeysExamined" : NumberInt(1398), 
        "totalDocsExamined" : NumberInt(1397), 
        "executionStages" : {
            "stage" : "FETCH", 
            "nReturned" : NumberInt(1397), 
            "executionTimeMillisEstimate" : NumberInt(10), 
            "works" : NumberInt(1398), 
            "advanced" : NumberInt(1397), 
            "needTime" : NumberInt(0), 
            "needFetch" : NumberInt(0), 
            "saveState" : NumberInt(10), 
            "restoreState" : NumberInt(10), 
            "isEOF" : NumberInt(1), 
            "invalidates" : NumberInt(0), 
            "docsExamined" : NumberInt(1397), 
            "alreadyHasObj" : NumberInt(0), 
            "inputStage" : {
                "stage" : "IXSCAN", 
                "nReturned" : NumberInt(1397), 
                "executionTimeMillisEstimate" : NumberInt(10), 
                "works" : NumberInt(1398), 
                "advanced" : NumberInt(1397), 
                "needTime" : NumberInt(0), 
                "needFetch" : NumberInt(0), 
                "saveState" : NumberInt(10), 
                "restoreState" : NumberInt(10), 
                "isEOF" : NumberInt(1), 
                "invalidates" : NumberInt(0), 
                "keyPattern" : {
                    "arNames.arName" : NumberInt(1)
                }, 
                "indexName" : "arNames.arName_1", 
                "isMultiKey" : true, 
                "direction" : "forward", 
                "indexBounds" : {
                    "arNames.arName" : [
                        "[\"XAVIER\", \"XAVIES\")", 
                        "[/^XAVIER/, /^XAVIER/]"
                    ]
                }, 
                "keysExamined" : NumberInt(1398), 
                "dupsTested" : NumberInt(1397), 
                "dupsDropped" : NumberInt(0), 
                "seenInvalidated" : NumberInt(0), 
                "matchTested" : NumberInt(0)
            }
        }
    }           

Lee Parayno

unread,
Dec 23, 2015, 1:51:26 PM12/23/15
to mongod...@googlegroups.com
MongoDB uses B-Trees for it’s indexes, so any regex for any part of a key other than anchored to the prefix will have to do  a complete scan of all the index entries in order to reduce the documents to forward to the next stage of execution.

Notice in the explain plan for your prefix query and the index stage only has to work on 1398 documents it is able to find through use of the index.
 

--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.
 
For other MongoDB technical support options, see: http://www.mongodb.org/about/support/.
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user...@googlegroups.com.
To post to this group, send email to mongod...@googlegroups.com.
Visit this group at https://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/b7b5058f-dd20-47e6-a91d-a4fb92db0dd2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Tim Hawkins

unread,
Dec 23, 2015, 1:57:15 PM12/23/15
to mongodb-user

You should look at "text" indexes the text search functions. They are also case insensative, and use language stemming.

--

Dwight Merriman

unread,
Dec 23, 2015, 4:15:04 PM12/23/15
to mongodb-user
one thing to look at is text indexes.  they will help you do classic text searches, but will not solve for finding the middle of a word.  so if "XAVIER" were surrounded in the value string in question by white space it would work. 

if you need to do { x : $regex : { "abc$"} }, often, it might be worthwhile to store an extra copy of x's value, but reversed, in an x_rev.  then you can query it as  { x : $regex : { "^cba"} }

if you need to find arbitrary substrings i can't think of an easy, efficient solution.  i can think of some ideas that might work for special cases, if you want to talk more about that le tme know more, including about the use case.

this may not be relevant, but if you simply have names consisting of say, M words, and you want to do searches for any word in a person's name wherever it appears, you could use multi keys.  store something like: 

{ name : [ "John","Wesley","Harding","Sr" ] } 

create an index on name; 

then this is efficient: 
db.foo.find( { name : "Wesley" } )
db.foo.find( { name : /^Hardin/ } )

Shady Shaker

unread,
Dec 27, 2015, 2:56:24 AM12/27/15
to mongodb-user
Tank you all for your replies.

text search will not help me because I need to search in any part in name 
db.PERSON.find( {"arNames.arName" : {$regex: "XAVIER.*ASTER.*ZAC"}}).


Dear Dwight Merriman thank you for your reply, if we put the name in array "{ name : [ "John","Wesley","Harding","Sr" ] }" it will resolve issue in case if I have only name that I sould find in the full name, 
db.PERSON.find( {"arNames.arName" : {$regex: "XAVIER"}}), but it will not work in the below situation

for example I have the below names:


{ name : "HARRY MARK ANTONIE FISHER FIRST" }
{ name : "MARK FISHER" }
{ name : "JACKSON MARK FISHER" }
{ name : "ZAC MARK LOUIS FISHER DAN" }
{ name : "MARK ANTONIE" }


db.PERSON.find( {"name" : {$regex: "MARK.*FISHER"}}).
output
{ name : "HARRY MARK ANTONIE FISHER FIRST" }
{ name : "MARK FISHER" }
{ name : "JACKSON MARK FISHER" }
{ name : "ZAC MARK LOUIS FISHER DAN" }


db.PERSON.find( {"name" : {$regex: "MARK FISHER"}}).
output
{ name : "MARK FISHER" }
{ name : "JACKSON MARK FISHER" }


db.PERSON.find( {"name" : {$regex: "HARRY.*ANTONIE.*FIRST"}}).
output
{ name : "HARRY MARK ANTONIE FISHER FIRST" }
Reply all
Reply to author
Forward
0 new messages