full text search performance trick.

824 views
Skip to first unread message

Liang Gong

unread,
Apr 7, 2015, 11:41:23 AM4/7/15
to mongod...@googlegroups.com
I need search for the ip address type of string (like 192.168.0.1) a lot in my text fields. It seems that Mongodb is extremely slow in such operations. If i remove the dots from the original text (192168000001) and search for the new string, I can tell the performance  improvement. My theory is the dots causes the ambiguity to the search engine which leads to such slowness. 
I wonder if there are any solutions other than removing the dot?

Tim Hawkins

unread,
Apr 7, 2015, 11:54:49 AM4/7/15
to mongod...@googlegroups.com

You could convert the ipaddresses into 32bit unsigned integers,  which is what they represent.  Once indexed you can then perform fast queries. 

Once in integer format then performing network searchs for things like "192.168.*" would involve calculating the lowest and highest ipaddress in the range and using it with $gte and $lte queries.

There are a number of libraries in various languages for performing the parsing and conversion to and from integers.

For example the php pear library https://pear.php.net/manual/en/package.networking.net-ipv4.php


On Tue, Apr 7, 2015, 23:41 Liang Gong <gowith...@gmail.com> wrote:
I need search for the ip address type of string (like 192.168.0.1) a lot in my text fields. It seems that Mongodb is extremely slow in such operations. If i remove the dots from the original text (192168000001) and search for the new string, I can tell the performance  improvement. My theory is the dots causes the ambiguity to the search engine which leads to such slowness. 
I wonder if there are any solutions other than removing the dot?

--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.
 
For other MongoDB technical support options, see: http://www.mongodb.org/about/support/.
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user...@googlegroups.com.
To post to this group, send email to mongod...@googlegroups.com.
Visit this group at http://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/051bd8cf-a93f-4f3f-9c93-6b6287f6c63d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Liang Gong

unread,
Apr 7, 2015, 12:17:20 PM4/7/15
to mongod...@googlegroups.com
That was my original solution which was denied by the boss. His decision was based on the facts that the data in MongoDb are collected from different data source and it is a management nightmare to do such code changes in all the places. My bottom line solution will be use a "trigger" type solution in MongoDb to transfer the data after they were inserted. 

I hope if there are so relatively simpler solution on query end such as concatenate filters 192, 168. 

David Alexander

unread,
Apr 8, 2015, 12:50:47 PM4/8/15
to mongod...@googlegroups.com
I suspect the problem may be in your regexp. From the sounds of things you already have a suspicion about the meaning of the dots.
It would be helpful to have the offending line of code... but without that here is my best guess.

You need to escape the dots otherwise they match any single character. Note this is a generic regexp - I haven't checked it for MongoDB syntax compliance,

/[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}/

You may have to add further validation (start and end sentinels, restricting the range of allowed values - it depends upon your application). 

Liang Gong

unread,
Apr 10, 2015, 9:39:49 AM4/10/15
to mongod...@googlegroups.com
There is no regular expression involvement at all. it is a common full text search such as db.XXXX.find({$text:{$search:"192.168.0.1"}})

David Alexander

unread,
Apr 10, 2015, 1:31:52 PM4/10/15
to mongod...@googlegroups.com
Now that you have asked a specific question, I can give a specific answer...

From 

$search Field

In the $search field, specify a string of words that the text operator parses and uses to query the text index. The text operator treats most punctuation in the string as delimiters, except a hyphen - that negates term or an escaped double quotes \" that specifies a phrase.

Phrases

To match on a phrase, as opposed to individual terms, enclose the phrase in escaped double quotes (\"), as in:

"\"ssl certificate\""

If the $search string includes a phrase and individual terms, text search will only match the documents that include the phrase. More specifically, the search performs a logical AND of the phrase with the individual terms in the search string.

Reply all
Reply to author
Forward
0 new messages