[mongodb-user] How to use full text search in MongoDB?

1,183 views
Skip to first unread message

Frank Wen

unread,
Apr 20, 2010, 2:08:12 AM4/20/10
to mongodb-user
I like MongoDB, it's great, and I'm ready to use it in a new project.
But I have a question: Can I use full text search in MongoDB? Or I
have to use MongoDB+Lucene? I hope I can implement my search
application without lucene.

--
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To post to this group, send email to mongod...@googlegroups.com.
To unsubscribe from this group, send email to mongodb-user...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.

Mitch Pirtle

unread,
Apr 20, 2010, 8:34:51 AM4/20/10
to mongod...@googlegroups.com
On Tue, Apr 20, 2010 at 2:08 AM, Frank Wen <xiaob...@gmail.com> wrote:
> I like MongoDB, it's great, and I'm ready to use it in a new project.
> But I have a question: Can I use full text search in MongoDB? Or I
> have to use MongoDB+Lucene? I hope I can implement my search
> application without lucene.

ElasticSearch might be intriguing for you:

http://www.elasticsearch.com/

Quite a few have integrated with Lucene.

-- Mitch

Suno Ano

unread,
Apr 20, 2010, 10:12:28 AM4/20/10
to mongod...@googlegroups.com
Frank> I like MongoDB, it's great, and I'm ready to use it in a new
Frank> project. But I have a question: Can I use full text search in
Frank> MongoDB? Or I have to use MongoDB+Lucene? I hope I can implement
Frank> my search application without lucene.

Have a look at
http://www.mongodb.org/display/DOCS/Full+Text+Search+in+Mongo

Skall

unread,
Feb 12, 2012, 5:09:21 AM2/12/12
to Suno Ano, mongol...@googlegroups.com, mongod...@googlegroups.com
You can use MongoLantern 0.7 for mongodb fulltext search. It's having
all basic features which might be required for fulltext search engine.
Please let me know whether it serves your purpose.

On Apr 20 2010, 7:12 pm, Suno Ano <suno....@sunoano.org> wrote:
>  Frank> I like MongoDB, it's great, and I'm ready to use it in a new
>  Frank> project. But I have a question: Can I use full text search in
>  Frank> MongoDB? Or I have to use MongoDB+Lucene? I hope I can implement
>  Frank> my search application without lucene.
>

> Have a look athttp://www.mongodb.org/display/DOCS/Full+Text+Search+in+Mongo

Andreas Jung

unread,
Feb 12, 2012, 5:13:13 AM2/12/12
to mongod...@googlegroups.com, Suno Ano, mongol...@googlegroups.com
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Why would one want to use MongoLatern compared to existing
enterprise-level open-source widely-used and community-supported
solutions like Solr, Lucence or ElasticSearch?

- -aj

- --
ZOPYX Limited | zopyx group
Charlottenstr. 37/1 | The full-service network for Zope & Plone
D-72070 T�bingen | Produce & Publish
www.zopyx.com | www.produce-and-publish.com
- ------------------------------------------------------------------------
E-Publishing, Python, Zope & Plone development, Consulting


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQGUBAEBAgAGBQJPN5C4AAoJEADcfz7u4AZjrPELv3OvAQp6G/syDYhtZ76UNr5B
6zd5MDwQYtckMszflss4+Br0QjvRXyFD8fWX5IUx9fZgXT6vQzWyhVHKAMxon3SL
bAap0wbEhrRaZWbh/7FFb/cWesMgC5bw8k8UMuYNzl8PAAq2xLSVtcw5yMtMDIq7
7350oj9MxCe86pfFO4ufqQSe7sTGb7fO7yrbWVBvTilequVc7CIO+tv27XvxBOAi
ZEBSVP0GoyeAGNnz8bIVzs0FQ81bkpL1PhNrMHu/EqIGlwTt/svWcJT5+qfF7cWN
bxcm/7Hv1U4owHRpLCWNp8kLHf+kfURw3m5k9rbdswB7SkAGLJuXEgLuKgh14qQU
zcgstyFpVVMnMtHdwTswxDJE60AD42vkwndjMoTPdFOKgkbbyuEgu4D7LhW4g/U9
r0f3stUa0BZ8ZlUM4o4RKEJqqpEmNBRfJ+4GunVNJm64aYV/hI7qvMNF6+Jzo45B
CPKfAmf9EId/zH3uYD73nr6vTKpGJIs=
=ulPZ
-----END PGP SIGNATURE-----

lists.vcf

Sougata Pal.

unread,
Feb 12, 2012, 5:30:35 AM2/12/12
to mongod...@googlegroups.com, Suno Ano, mongol...@googlegroups.com
The only reason MongoLantern is entire depends upon MongoDB so it can use all best practices from MongoDB. As well as one don't have to read or maintain differentiated architecture for search itself. Only maintaining MongoDB servers or clusters will be enough to run MongoLantern.

Please let me know if you have any queries. We will love to avail reviews from potential mongodb users to make it better.

D-72070 Tübingen        | Produce & Publish

www.zopyx.com           | www.produce-and-publish.com
- ------------------------------------------------------------------------
E-Publishing, Python, Zope & Plone development, Consulting


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQGUBAEBAgAGBQJPN5C4AAoJEADcfz7u4AZjrPELv3OvAQp6G/syDYhtZ76UNr5B
6zd5MDwQYtckMszflss4+Br0QjvRXyFD8fWX5IUx9fZgXT6vQzWyhVHKAMxon3SL
bAap0wbEhrRaZWbh/7FFb/cWesMgC5bw8k8UMuYNzl8PAAq2xLSVtcw5yMtMDIq7
7350oj9MxCe86pfFO4ufqQSe7sTGb7fO7yrbWVBvTilequVc7CIO+tv27XvxBOAi
ZEBSVP0GoyeAGNnz8bIVzs0FQ81bkpL1PhNrMHu/EqIGlwTt/svWcJT5+qfF7cWN
bxcm/7Hv1U4owHRpLCWNp8kLHf+kfURw3m5k9rbdswB7SkAGLJuXEgLuKgh14qQU
zcgstyFpVVMnMtHdwTswxDJE60AD42vkwndjMoTPdFOKgkbbyuEgu4D7LhW4g/U9
r0f3stUa0BZ8ZlUM4o4RKEJqqpEmNBRfJ+4GunVNJm64aYV/hI7qvMNF6+Jzo45B
CPKfAmf9EId/zH3uYD73nr6vTKpGJIs=
=ulPZ
-----END PGP SIGNATURE-----
--
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To post to this group, send email to mongod...@googlegroups.com.
To unsubscribe from this group, send email to mongodb-user...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.




--
Thanks
Sougata Pal.
Chief Architect, Techunits
http://in.linkedin.com/in/skallpaul
Contact: +91 9051042886

Andreas Jung

unread,
Feb 12, 2012, 5:37:00 AM2/12/12
to mongod...@googlegroups.com, Suno Ano, mongol...@googlegroups.com
I don't want to be offending but reading through the MongoLatern code: your solution
is far away from one would call a reasonable and powerfull fulltext indexing solution.
The mentioned existing solutions are definitely more suitable for real-world applications.

-aj

Sougata Pal. wrote:
> The only reason MongoLantern is entire depends upon MongoDB so it can
> use all best practices from MongoDB. As well as one don't have to read
> or maintain differentiated architecture for search itself. Only
> maintaining MongoDB servers or clusters will be enough to run
> MongoLantern.
>
> Please let me know if you have any queries. We will love to avail
> reviews from potential mongodb users to make it better.
>
> On Sun, Feb 12, 2012 at 3:43 PM, Andreas Jung <li...@zopyx.com
> <mailto:li...@zopyx.com>> wrote:
>
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Why would one want to use MongoLatern compared to existing
> enterprise-level open-source widely-used and community-supported
> solutions like Solr, Lucence or ElasticSearch?
>
> - -aj
>
> Skall wrote:
> > You can use MongoLantern 0.7 for mongodb fulltext search. It's
> > having all basic features which might be required for fulltext
> search
> > engine. Please let me know whether it serves your purpose.
> >
> > On Apr 20 2010, 7:12 pm, Suno Ano <suno....@sunoano.org

> <mailto:suno....@sunoano.org>> wrote:
> >> Frank> I like MongoDB, it's great, and I'm ready to use it in a
> >> new Frank> project. But I have a question: Can I use full text
> >> search in Frank> MongoDB? Or I have to use MongoDB+Lucene? I hope I
> >> can implement Frank> my search application without lucene.
> >>
> >> Have a look
> >> athttp://www.mongodb.org/display/DOCS/Full+Text+Search+in+Mongo
> <http://www.mongodb.org/display/DOCS/Full+Text+Search+in+Mongo>
> >>
> >> -- You received this message because you are subscribed to the
> >> Google Groups "mongodb-user" group. To post to this group, send
> >> email to mongod...@googlegroups.com

> <mailto:mongod...@googlegroups.com>. To unsubscribe from this


> >> group, send email to mongodb-user...@googlegroups.com

> <mailto:mongodb-user%2Bunsu...@googlegroups.com>. For


> >> more options, visit this group
> >> athttp://groups.google.com/group/mongodb-user?hl=en

> <http://groups.google.com/group/mongodb-user?hl=en>.


> >
>
> - --
> ZOPYX Limited | zopyx group
> Charlottenstr. 37/1 | The full-service network for Zope & Plone
> D-72070 Tübingen | Produce & Publish

> www.zopyx.com <http://www.zopyx.com> |
> www.produce-and-publish.com <http://www.produce-and-publish.com>


> -
> ------------------------------------------------------------------------
> E-Publishing, Python, Zope & Plone development, Consulting
>
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.11 (Darwin)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
>
> iQGUBAEBAgAGBQJPN5C4AAoJEADcfz7u4AZjrPELv3OvAQp6G/syDYhtZ76UNr5B
> 6zd5MDwQYtckMszflss4+Br0QjvRXyFD8fWX5IUx9fZgXT6vQzWyhVHKAMxon3SL
> bAap0wbEhrRaZWbh/7FFb/cWesMgC5bw8k8UMuYNzl8PAAq2xLSVtcw5yMtMDIq7
> 7350oj9MxCe86pfFO4ufqQSe7sTGb7fO7yrbWVBvTilequVc7CIO+tv27XvxBOAi
> ZEBSVP0GoyeAGNnz8bIVzs0FQ81bkpL1PhNrMHu/EqIGlwTt/svWcJT5+qfF7cWN
> bxcm/7Hv1U4owHRpLCWNp8kLHf+kfURw3m5k9rbdswB7SkAGLJuXEgLuKgh14qQU
> zcgstyFpVVMnMtHdwTswxDJE60AD42vkwndjMoTPdFOKgkbbyuEgu4D7LhW4g/U9
> r0f3stUa0BZ8ZlUM4o4RKEJqqpEmNBRfJ+4GunVNJm64aYV/hI7qvMNF6+Jzo45B
> CPKfAmf9EId/zH3uYD73nr6vTKpGJIs=
> =ulPZ
> -----END PGP SIGNATURE-----
>
> --
> You received this message because you are subscribed to the Google
> Groups "mongodb-user" group.
> To post to this group, send email to mongod...@googlegroups.com

> <mailto:mongod...@googlegroups.com>.


> To unsubscribe from this group, send email to
> mongodb-user...@googlegroups.com

> <mailto:mongodb-user%2Bunsu...@googlegroups.com>.

lists.vcf

Sougata Pal.

unread,
Feb 12, 2012, 5:41:40 AM2/12/12
to mongod...@googlegroups.com, Suno Ano, mongol...@googlegroups.com
Yes.  You are right. The initiative is only 1 months old. So we need to test it with different use cases to achieve fully featured fulltext search engine. Can you please suggest something if possible.

Andreas Jung

unread,
Feb 12, 2012, 5:48:15 AM2/12/12
to mongod...@googlegroups.com, Suno Ano, mongol...@googlegroups.com
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Sougata Pal. wrote:
> Yes. You are right. The initiative is only 1 months old. So we need
> to test it with different use cases to achieve fully featured
> fulltext search engine. Can you please suggest something if
> possible.

What I suggest to most users: use existing proven solutions and don't
try to reinvent wheels in a bad way over and over again. High-quality
fulltext search is *hard*. I am maintaining a fulltext solution
for Zope and Python myself for a decade (rewritten four times). Getting
things done right (e.g. full unicode support, stemming for most
languages etc.) is a hard business...but feel free to go ahead...define
your goals that you want to reach and compare it to the skills and your
time and effort for the project and triple check if you can reach for
goals..the 100th half-baked fulltext search solution is unlikely needed.
There are already too many on the market...

- -aj


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQGUBAEBAgAGBQJPN5jvAAoJEADcfz7u4AZjuq8Lvi/2Qz86NnSNeEW2/u69fxCZ
61koiSPM5SVcxTgpLSy3RP+i2t15/MkTZwk1pTkxuEw15eNlQBIWv9zGsdTx7VKN
VN2DwFR2uq19sV6qyOvpYNycXEatMvjyAO84palm/AnVPqnr+si2U6mjsDahn2og
HHNyCsSNgpUoK7w6xNkhkbyUbbImVirzhlkc7uZTw0j2/BEBmcIpDwx0lHwRNsKA
x6qJW5VMuiYmKD1NHiycbhkhBy7r5as1e/splSkqnmTPh6VS7F2rRzX0xi/HcO/2
KnHtmi7nZvs77SbmUC68i66zzTou05H+xLY66hNHxDLJu7m9CgDmOYbDRCT1pV7f
RtpTEKO1hRydIj4tgfiVmeQIoQt8RWW4jfEp54NokOph8w7BPoT/26JPyR+/vw1p
9dMk1ydcbtOxD30JWKUwKIcavX3mdeIG6S7Rkj+pdeYdjJmE5wTdm1X2bn8XcBhl
guLGO0/spicx9ytL0ITNEXqdGlXE2Ic=
=piO0
-----END PGP SIGNATURE-----

lists.vcf

Sougata Pal.

unread,
Feb 12, 2012, 6:02:29 AM2/12/12
to mongod...@googlegroups.com, Suno Ano, mongol...@googlegroups.com
Ok.

I just looked at Zope. Your evaluations having potentials values to me. But I still will love to continue with this trial, and wait to achieve best results as of standards.

Note: 
I already tested MongoLantern with few application and they seems working good in terms of performance and result quality. I am still looking to test and enhance it with more use cases.

-----END PGP SIGNATURE-----

--
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To post to this group, send email to mongod...@googlegroups.com.
To unsubscribe from this group, send email to mongodb-user...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.

Sam Millman

unread,
Feb 12, 2012, 6:23:07 AM2/12/12
to mongod...@googlegroups.com
Mongolatern does look good and all but it does lack a lot of things:

- Index barrels
- A scalable querying strategy (it currently uses regexs to form a huge query, by the looks of the surce code)
- Lexicons

And much more. I mean it is a good start but it needs a lot of work before, say I, would implement it. It is implemented much like Lucene and tbh Lucene has some serious scalability problems after 60k records (hence why you should never use Lucene).

I did think of making a full text search in Mongo and nearly did but then quit cos I realised that Storage DBs are not meant to be searchable.

I personally use Sphinx, it is awesome. I have used pretty much every other tech out there and thats the opne that works in 15b records.

Sougata Pal.

unread,
Feb 12, 2012, 6:33:04 AM2/12/12
to mongod...@googlegroups.com, mongol...@googlegroups.com
Hi Sam,

You are right.  Currently I am trying to achive the best search result using regex and all, which is not expected at all.

The classes are very similar to Zend Lucene to help developer to not to learn a complete new set of resources rather using there existing knowledge to built apps faster.

Current releases of MongoLantern works very good for smaller databases havine 2 - 2.5 M entries. Hope we can tune it to use more features from enterprise fulltext search engines. Currenly we are targeting to achieve search results & features very similar to MySQL.

I have also used Sphinx with MySQL, it's very fast but didn't worked well in terms of my query supplied.

Sam Millman

unread,
Feb 12, 2012, 7:48:06 AM2/12/12
to mongod...@googlegroups.com
Sounds good except:


"Currenly we are targeting to achieve search results & features very similar to MySQL."

Many SQL experts (if not all, including myself, ironically I am a RDBA by trade) would not understand that, SQL is just as terrible at full text search as Mongo. They are just not, by default, designed to search in the same manner as say, Google (not Google's bigdb since thats just the index).

Sphinx only uses SQL as its index frm which to grab the data required to build the search indexes, the barrel indexes and the lexicons are implemented in c but are accessible via the SQL driver cos they matched their API to SQLs. But yes no one uses the searching abilities of SQL.

Sougata Pal.

unread,
Feb 12, 2012, 8:54:38 AM2/12/12
to mongod...@googlegroups.com

Yes. True. Current version results are quite good in terms of quality, but not having vast options yet as software is quite young.  We are targeting something which could use mongodb api for scaling but with fulltext search features.  Your valuable suggestions are highly appreciated.    

Thanks
Sougata Pal

- Sent via Google Android

Andreas Jung

unread,
Feb 12, 2012, 9:01:32 AM2/12/12
to mongod...@googlegroups.com
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Sam Millman wrote:
> Mongolatern does look good and all but it does lack a lot of things:
>
> - Index barrels

What are index barells?

> - A scalable querying strategy (it currently uses regexs to form a
> huge query, by the looks of the surce code)

Dealing with wildcards is tricky and solutions depends on whether you
are search for prefix* or *suffix or a query with arbitrary wildcards.
The implementation may range from standard btree based lexicons
to the usage of n-grams. There is no general implemenation for dealing
with all possible query options in the same efficient way.

- -aj
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQGUBAEBAgAGBQJPN8Y8AAoJEADcfz7u4AZjZqILwK+N636x5pfsrlN87X4FK2uH
pUxbS4pfCdqYcHcyk1JJMVTEakh4rar3ogukwUcW9Vbp4JwmXkC008kZzvE5qnLv
DCXBiCP7h3YPJVb3+d7TdoSTscb3Jz7Yjn/2aa0sze+cqKGnyHzm8oxI71VaXLbf
3wJeovmDyt9HC1gG0l7KMPUyo8uykBJhS72zFig8S5Nfwjjhj/6VwXCtMj6bdeRv
bKgmqhOOnacsdqUlXFuzs1IPj/YffvqIPqqOYheZU0gzpJT7wlmnHqAl1/UI9HYB
rME4MW2DX+77lS9nMYSPUG4+xv9T6l4qZH9gYYaydLZPJqpCsVuO70hQcmCfSAfs
UneiLpEC0eMWNmxycQgccRtGzqxAl4/XjWEwJNvVfb7+Nw9yK+zmpm7TXCp9ylVi
xxF6oWkjx0qdgdiqEBdtmZW3+kGD+SZEOhd25n39Cv62eH92mx+0FmmKnGe2NdPN
LeWDASX4mJmOUkdVFZ8AxIt3t0bQcP8=
=VQgX
-----END PGP SIGNATURE-----

lists.vcf

Sam Millman

unread,
Feb 12, 2012, 9:09:20 AM2/12/12
to mongod...@googlegroups.com
@Andreas: http://infolab.stanford.edu/~backrub/google.html read this its good, you will notice in the diagram Brin provides the index is sorted into "barrels", it is quite complex exactly how these barrels act but it is one of the things that makes Google and other search techs so damn fast.

-----END PGP SIGNATURE-----

Sougata Pal.

unread,
Feb 12, 2012, 8:53:50 PM2/12/12
to mongod...@googlegroups.com
Ok.  I will surely love to look at it.
Reply all
Reply to author
Forward
0 new messages