$regex prefix match, multiple values slow

1,853 views
Skip to first unread message

Raxit Sheth

unread,
Dec 5, 2012, 3:01:14 PM12/5/12
to mongod...@googlegroups.com, Raxit Sheth
Hi


I want to search for all documents in collection where field is start
with string1 or start with string2 ....or start with string n

Case 1 : For one string, query is fast. (return all field which start
with given string)

Case 2 : For multiple string prefix match, query is slow, Even explain
is telling it is using index! Is it known bug?


Query 1:
> db.startend.find({start: {"$regex": "^te7uexyz2g3"}}).explain()
{
"cursor" : "BtreeCursor start_1 multi",
"isMultiKey" : false,
"n" : 9,
"nscannedObjects" : 9,
"nscanned" : 10,
"nscannedObjectsAllPlans" : 9,
"nscannedAllPlans" : 10,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 0,
"indexBounds" : {
"start" : [
[
"te7uexyz2g3",
"te7uexyz2g4"
],
[
/^te7uexyz2g3/,
/^te7uexyz2g3/
]
]
},
"server" : "raxit-Lenovo-G570:27017"
}


Query 2:
> db.startend.find({start: {"$regex": "^te7uexyz2g3|^rax1"}}).explain()
{
"cursor" : "BtreeCursor start_1 multi",
"isMultiKey" : false,
"n" : 9,
"nscannedObjects" : 9,
"nscanned" : 101010,
"nscannedObjectsAllPlans" : 9,
"nscannedAllPlans" : 101010,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 2755,
"indexBounds" : {
"start" : [
[
"",
{

}
],
[
/^te7uexyz2g3|^rax1/,
/^te7uexyz2g3|^rax1/
]
....


-Raxit

Sam Helman

unread,
Dec 5, 2012, 3:22:48 PM12/5/12
to mongod...@googlegroups.com, Raxit Sheth
Does the issue go away if you use the $or operator rather than the built in regex "or"?

For example, 

db.startend.find({start: { $or : [{"$regex": "^te7uexyz2g3"}, { "$regex" : ^rax1 }] } })

Raxit Sheth

unread,
Dec 5, 2012, 3:25:27 PM12/5/12
to mongod...@googlegroups.com, Raxit Sheth
> db.startend.find({start: { $or : [{"$regex": "^te7uexyz2g3"}, { "$regex" : "^rax1" }] } })
error: { "$err" : "invalid operator: $or", "code" : 10068 }

Sam Helman

unread,
Dec 5, 2012, 3:54:55 PM12/5/12
to mongod...@googlegroups.com, Raxit Sheth
Ah, sorry about that.  Try:

db.startend.find({ $or : [{ start : {"$regex": "^te7uexyz2g3"} }, { start : { "$regex" : "^rax1" } }] } }) 

Raxit Sheth

unread,
Dec 5, 2012, 4:07:02 PM12/5/12
to Sam Helman, mongod...@googlegroups.com
Sam

Thanks, Tried this and it is fast!!!
db.startend.find({ $or : [{ start : {"$regex": "^te7uexyz2g3"} }, {
start : { "$regex" : "^rax1" } }] } )

Any clue why regex having or prefix is slow? (we can use this
workaround for time being, but for complex regex, it will be mess!)

Raxit

Sam Millman

unread,
Dec 5, 2012, 4:20:15 PM12/5/12
to mongod...@googlegroups.com
What if you did ^(te7uexyz2g3|rax1) ?


--
You received this message because you are subscribed to the Google
Groups "mongodb-user" group.
To post to this group, send email to mongod...@googlegroups.com
To unsubscribe from this group, send email to
mongodb-user...@googlegroups.com
See also the IRC channel -- freenode.net#mongodb

Raxit Sheth <Mobile 4 Mumbai>

unread,
Dec 5, 2012, 4:41:35 PM12/5/12
to mongod...@googlegroups.com
@Sammaye

Slow, please find explain() below.

> db.startend.find({start: {"$regex": "^(te7uexyz2g3|raxit)"} } ).explain()




    "cursor" : "BtreeCursor start_1 multi",
    "isMultiKey" : false,
    "n" : 9,
    "nscannedObjects" : 9,
    "nscanned" : 101010,
    "nscannedObjectsAllPlans" : 9,
    "nscannedAllPlans" : 101010,
    "scanAndOrder" : false,
    "indexOnly" : false,
    "nYields" : 0,
    "nChunkSkips" : 0,
    "millis" : 2865,
    "indexBounds" : {
        "start" : [
            [
                "",
                {
                   
                }
            ],
            [
                /^(te7uexyz2g3|raxit)/,
                /^(te7uexyz2g3|raxit)/
            ]
        ]

Raxit Sheth

unread,
Dec 5, 2012, 5:00:54 PM12/5/12
to mongod...@googlegroups.com
Hi Sam

Again help on this plz,

qry1={$or:[{start:{"$regex":"^te7uexyz2g3"}},{start:{"$regex":"^rax1"}}]}
qry2={$or:[{end:{"$regex":"^te7u6u8nr6z7"}},{end:{"$regex":"^rax2"}}]}
qry3= {$and:[qry1,qry2]}

db.startend.find(qry1).explain() ---> Fast
db.startend.find(qry2).explain() ---> Fast
db.startend.find(qry3).explain() ---> SuperSlow!!!

Raxit

Raxit Sheth

unread,
Dec 5, 2012, 5:06:22 PM12/5/12
to mongod...@googlegroups.com
> db.startend.find(qry3).explain()
{
"cursor" : "BasicCursor", <-------


If i am trying to give hint, on all indexes, still queries are slow
i have 3 indexes
{start:1}
{end:1}
{start:1,end:1}

No help with either of that!!!

Raxit

Sam Helman

unread,
Dec 5, 2012, 5:14:55 PM12/5/12
to mongod...@googlegroups.com
Does the cursor show up as "BasicCursor" even when you use hint()?

Raxit Sheth

unread,
Dec 5, 2012, 5:36:23 PM12/5/12
to mongod...@googlegroups.com
Hi Sam

When not using hint() --> Full Tablescan --> Slow

When using hint() -->using proper index ---> But slow !!!
Slow even hint({start:1}), hint({end:1}) or compound index hint like
hint({start:1,end:1})


I am pasting output, what all indexes collection is having + each explain

--------------------

qry1={$or:[{start:{"$regex":"^te7uexyz2g3"}},{start:{"$regex":"^rax1"}}]}
qry2={$or:[{end:{"$regex":"^te7u6u8nr6z7"}},{end:{"$regex":"^rax2"}}]}
qry3= {$and:[qry1,qry2]}

> db.startend.getIndexes()
[
{
"v" : 1,
"key" : {
"_id" : 1
},
"ns" : "gmap3.startend",
"name" : "_id_"
},
{
"v" : 1,
"key" : {
"start" : 1
},
"ns" : "gmap3.startend",
"name" : "start_1"
},
{
"v" : 1,
"key" : {
"start" : 1,
"end" : 1
},
"ns" : "gmap3.startend",
"name" : "start_1_end_1"
},
{
"v" : 1,
"key" : {
"end" : 1
},
"ns" : "gmap3.startend",
"name" : "end_1"
}
]


>
> db.startend.find(qry3).explain()
{
"cursor" : "BasicCursor",
"isMultiKey" : false,
"n" : 1,
"nscannedObjects" : 101010,
"nscanned" : 101010,
"nscannedObjectsAllPlans" : 101010,
"nscannedAllPlans" : 101010,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 7389,
"indexBounds" : {

},
"server" : "raxit-Lenovo-G570:27017"
}
> db.startend.find(qry3).hint({start:1}).explain()
{
"cursor" : "BtreeCursor start_1",
"isMultiKey" : false,
"n" : 1,
"nscannedObjects" : 9,
"nscanned" : 101010,
"nscannedObjectsAllPlans" : 9,
"nscannedAllPlans" : 101010,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 5963,
"indexBounds" : {
"start" : [
[
{
"$minElement" : 1
},
{
"$maxElement" : 1
}
]
]
},
"server" : "raxit-Lenovo-G570:27017"
}
> db.startend.find(qry3).hint({end:1}).explain()
{
"cursor" : "BtreeCursor end_1",
"isMultiKey" : false,
"n" : 1,
"nscannedObjects" : 16,
"nscanned" : 101010,
"nscannedObjectsAllPlans" : 16,
"nscannedAllPlans" : 101010,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 7993,
"indexBounds" : {
"end" : [
[
{
"$minElement" : 1
},
{
"$maxElement" : 1
}
]
]
},
"server" : "raxit-Lenovo-G570:27017"
}
> db.startend.find(qry3).hint({start:1,end:1}).explain()
{
"cursor" : "BtreeCursor start_1_end_1",
"isMultiKey" : false,
"n" : 1,
"nscannedObjects" : 1,
"nscanned" : 101010,
"nscannedObjectsAllPlans" : 1,
"nscannedAllPlans" : 101010,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 5980,
"indexBounds" : {
"start" : [
[
{
"$minElement" : 1
},
{
"$maxElement" : 1
}
]
],
"end" : [
[
{
"$minElement" : 1
},
{
"$maxElement" : 1
}
]
]
},
"server" : "raxit-Lenovo-G570:27017"
}

------------


Raxit

Sam Helman

unread,
Dec 5, 2012, 6:05:53 PM12/5/12
to mongod...@googlegroups.com
What happens if you try 

{ $or : [ { start: {"$regex":"^te7uexyz2g3"}, end:{"$regex":"^te7u6u8nr6z7"} },
            { start: {"$regex":"^te7uexyz2g3"}, end:{"$regex":"^rax2"} },
            { start: {"$regex":"^rax1"}, end:{"$regex":"^te7u6u8nr6z7"} },
            { start: {"$regex":"^rax1"}, end:{"$regex":"^rax2"} }
 ] }

which is logically equivalent?   

qry1={$or:[{start:{"$regex":"^te7uexyz2g3"}},{start:{"$regex":"^rax1"}}]} 
qry2={$or:[{end:{"$regex":"^te7u6u8nr6z7"}},{end:{"$regex":"^rax2"}}]} 
qry3=  {$and:[qry1,qry2]}

Raxit Sheth

unread,
Dec 5, 2012, 6:31:59 PM12/5/12
to mongod...@googlegroups.com
Sam

It works fast and using proper index as well! Thanks!!!

problem here is i need to PREFIX match AND( or(8 string for start),
or( 8 string for end))

:-)

Raxit

Sam Millman

unread,
Dec 6, 2012, 4:26:12 AM12/6/12
to mongod...@googlegroups.com
This is sounding more and more like a bug in possibly the way the nested $or within the $and is working


Raxit Sheth <Mobile 4 Mumbai>

unread,
Dec 6, 2012, 11:07:51 AM12/6/12
to mongodb-user
Any Jira is opened or any quick fix (unstable is also fine)?

Raxit

On Dec 6, 2:26 pm, Sam Millman <sam.mill...@gmail.com> wrote:
> This is sounding more and more like a bug in possibly the way the nested
> $or within the $and is working
>
> On 5 December 2012 23:31, Raxit Sheth <raxitsheth2...@gmail.com> wrote:
>
>
>
>
>
>
>
> > Sam
>
> > It works fast and using proper index as well! Thanks!!!
>
> > problem here is  i need to PREFIX match AND( or(8 string for start),
> > or( 8 string for end))
>
> > :-)
>
> > Raxit
>

Sam Helman

unread,
Dec 6, 2012, 2:49:55 PM12/6/12
to mongod...@googlegroups.com
One issue is that the different queries in an $and query cannot use separate indexes - this is not a bug.  Whichever index is used, it will be slow for one of the two $or clauses.

What if you try the query using $in instead of $or, and scrapping the $and?  For instance:

{
  "start" : { $in : [ < 8 regexes > ] },
  "end" : { $in : [ < 8 regexes > ] }

Raxit Sheth

unread,
Dec 6, 2012, 3:07:55 PM12/6/12
to mongod...@googlegroups.com
Sam


> db.startend.find({start: {$regex:'^te7u'} }).count()
56449
> db.startend.find({start: {$in: [{$regex:'^te7u'},{$regex:'^te7u'}] } }).count()
0

Instead of 8 regex, i just tried with two,

First query is to ensure, there are result... but using $in with
$regex is not supported.
(but this feature is nice to have... wishlist!)



Raxit

Raxit Sheth <Mobile 4 Mumbai>

unread,
Dec 8, 2012, 4:27:31 PM12/8/12
to mongodb-user
Is there any workaround?

On Dec 7, 1:07 am, Raxit Sheth <raxitsheth2...@gmail.com> wrote:
> Sam
>

Raxit Sheth <Mobile 4 Mumbai>

unread,
Dec 9, 2012, 1:13:25 AM12/9/12
to mongodb-user
I think this is major bottleneck! And i assume this can be very
common patterns for other folks also!
Any plan to have **smarter** query plan?

if i have 1M records and i want to do $and query on 2 diff field, and
first query return only 100 document, 2nd query will still do full
index scan!!! (i hope NOT full table scan!)

Raxit
> > > > To post to this group, send email to mongod...@googlegroups.com<javascript:>
> > > > To unsubscribe from this group, send email to
> > > > mongodb-user...@googlegroups.com <javascript:>

Dwight Merriman

unread,
Dec 9, 2012, 2:37:21 PM12/9/12
to mongod...@googlegroups.com
it would be nice one day to have an index intersection query plan

what are you trying to do, if it is like text search you may want to follow the full text search Jira ticket

Raxit Sheth <Mobile 4 Mumbai>

unread,
Dec 10, 2012, 1:28:55 AM12/10/12
to mongodb-user
We are not doing full text search.

Sam Helman

unread,
Dec 10, 2012, 1:06:32 PM12/10/12
to mongod...@googlegroups.com
For the moment, it does not seem like there are any direct ways to resolve this - you could, of course, expand the entire query out into $or clauses, or break up the query, or tweak your schema to allow the search to be different (for instance, by doing exact matching).  However, I do think that full text search will be relevant to your problem, since it will allow boolean OR and AND queries.  Here is the server ticket for it : https://jira.mongodb.org/browse/SERVER-380.
Reply all
Reply to author
Forward
0 new messages