Using Mongodb as Search Engine?

1,133 views
Skip to first unread message

Nuris Kandar Musthafa

unread,
Aug 9, 2010, 9:06:22 AM8/9/10
to mongod...@googlegroups.com
Hi,

If I have 200 million records and using mongodb as real time search  engine. Is it possible?
I have read about high performance search engine, mostly using Sphinxsearch, Elastic search, Xapian etc.



I have tried using mongodb with 5 million records as search engine, it runs slowly.
database does process 200 insert/s and in the same time it search data with mongo regex.
this is my server specification:

Raid 1,
RAM 24 GB
Quad Core @2.4 GHz
Operating System: Ubuntu Server 10.04

In My Database I just create 1 db and 1 collection

> show collections
system.indexes
system.users
userinfo

> db.userinfo.findOne()
{
"_id" : ObjectId("4c5f7c87992b940812000000"),
"useragent" : "Mozilla/5.0 (X11; U; Linux i686; en-US) AppleWebKit/533.2 (KHTML, like Gecko) Chrome/5.0.342.9 Safari/533.2",
"ipaddress" : "192.168.2.10",
"requesttime" : 1281326215,
"date" : "2010-08-09"
}

> db.stats()
{
"collections" : 4,
"objects" : 3718804,
"avgObjSize" : 289.263861176873,
"dataSize" : 1075715604,
"storageSize" : 1300811776,
"numExtents" : 27,
"indexes" : 5,
"indexSize" : 1387194240,
"fileSize" : 6373244928,
"ok" : 1
}

> db.serverStatus()
{
"version" : "1.6.0",
"uptime" : 6445,
"uptimeEstimate" : 3404,
"localTime" : "Mon Aug 09 2010 20:03:59 GMT+0700 (WIT)",
"globalLock" : {
"totalTime" : 6444515919,
"lockTime" : 231870232,
"ratio" : 0.03597946454230801,
"currentQueue" : {
"total" : 0,
"readers" : 0,
"writers" : 0
}
},
"connections" : {
"current" : 3,
"available" : 19997
},
"indexCounters" : {
"btree" : {
"accesses" : 101613,
"hits" : 101613,
"misses" : 0,
"resets" : 0,
"missRatio" : 0
}
},
"backgroundFlushing" : {
"flushes" : 107,
"total_ms" : 50274,
"average_ms" : 469.85046728971963,
"last_ms" : 1012,
"last_finished" : "Mon Aug 09 2010 20:03:35 GMT+0700 (WIT)"
},
"opcounters" : {
"insert" : 1783641,
"query" : 1174409,
"update" : 0,
"delete" : 0,
"getmore" : 0,
"command" : 3529436
},
"asserts" : {
"regular" : 0,
"warning" : 0,
"msg" : 0,
"user" : 1,
"rollovers" : 0
},
"note" : "run against admin for more info",
"ok" : 1
}

I have added fullTextSearch Index to useragent

My Server is standing Alone.

Thanks,
Regards


Nuris.

Andreas Jung

unread,
Aug 9, 2010, 9:38:02 AM8/9/10
to mongod...@googlegroups.com
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Nuris Kandar Musthafa wrote:

> database does process 200 insert/s and in the same time it search data
> with mongo regex.
> this is my server specification:


Searching with a regex won't use indexes - perhaps that's the reason.

- -aj
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkxgBLoACgkQCJIWIbr9KYwnHACgjvFsvuTnmvQmrfk/2uONDR53
5vAAoIMVc9ec23yLefX3zGg5Zoy6AmXi
=e6yR
-----END PGP SIGNATURE-----

lists.vcf

Lee Theobald

unread,
Aug 9, 2010, 9:38:59 AM8/9/10
to mongodb-user
It's certainly possible and people have done it (e.g.
http://hmarr.com/2010/mar/18/full-text-search-with-mongodb/). It's
not going to be as fast as a well established full text search engine
though. On top of that you'll also have to implement things like stop
wording & query syntax yourself. I think you will be better off
putting the time into intergrating something like Lucene/Solr
(discussed here: http://groups.google.com/group/mongodb-user/browse_thread/thread/b4a2417288dabe97/6268402ec69986fb).

Lee,
> "localTime" : "Mon Aug09 2010 20begin_of_the_skype_highlighting              09 2010 20      end_of_the_skype_highlighting:03:59 GMT+0700 (WIT)",
> "last_finished" : "Mon Aug09 2010 20begin_of_the_skype_highlighting              09 2010 20      end_of_the_skype_highlighting:03:35 GMT+0700 (WIT)"

Tim Hawkins

unread,
Aug 9, 2010, 9:38:35 AM8/9/10
to mongod...@googlegroups.com
We use sphinxsearch with MongoDB, its a little awkward, each time we insert a document into mongodb we also insert it into sphinx, I have a wrapper class in php
that wraps the collection interface and does the sphinx update on insert and save etc. 

I am using the latest 1.10 which has string attributes, and i use that to store the mongo objectID in the index record. 

Its almost working, the biggest problem is faking an unique integer docid for sphinx, given that we have a slow insert rate, using time() suffices for now.  I understand that they 
are working on string docid's for sphinx which would solve that problems as the '_id' could be used for that then . 


--
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To post to this group, send email to mongod...@googlegroups.com.
To unsubscribe from this group, send email to mongodb-user...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.

Windows - The only OS you can buy from TOYSrUS
http://www.toysrus.com/product/index.jsp?productId=3896283 

Tim Hawkins
tim.h...@me.com


Sam Millman

unread,
Aug 9, 2010, 9:51:52 AM8/9/10
to mongod...@googlegroups.com
I would go with lee on this use solr. It is much more optimised and to get your own script to work as good would take forever...I would just use indexes within solr and grab all other information from mongodb when needed
--
Bow Chicka Bow Wow
Reply all
Reply to author
Forward
0 new messages