Mongo Search via Phonetic or Fuzzy-Match

2,152 views
Skip to first unread message

LAMongoP

unread,
Feb 24, 2010, 5:38:04 PM2/24/10
to mongodb-user
Is there any way to do fuzzy-match or phonetic searches within Mongo
or are queries limited to regex-based searches?

Is there a way for Mongo to be extended to support an aggregate
function containing the algorithms used in PHP's similar_text(),
Metaphone, Soundex or Levenshtein? MySQL supports Soundex as an
aggregate.

If that is not an option, what options exist to search a collection
for a non-exact match?

Mathias Stearn

unread,
Feb 24, 2010, 6:00:47 PM2/24/10
to mongod...@googlegroups.com
You can always just store the soundex-encoded string in a separate
field in mongo and search against that. Soundex is a really trivial
algorithm (http://en.wikipedia.org/wiki/Soundex) and should only take
a handful of lines.

We try to move as much processing as possible to the client in cases
like that because it is usually easier to scale app servers than
databases.

> --
> You received this message because you are subscribed to the Google Groups "mongodb-user" group.
> To post to this group, send email to mongod...@googlegroups.com.
> To unsubscribe from this group, send email to mongodb-user...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.
>
>

LAMongoP

unread,
Feb 25, 2010, 2:50:11 AM2/25/10
to mongodb-user
Storing the data in soundex or metaphone format would solve the issue
in a rudimentary way but is not exactly ideal since you need to know
exactly what fields you are searching on, utilize roughly 70% extra
storage on those fields and potentially maintain additional indexes.
Also, this method does not allow comparing a user string for match-
weight using functions like Levenshtein and similar_text(), maybe
there is a better way to go about this?

Can Sphinx or a similar db-search-engine be utilized against data
stored in Mongo?

Dwight Merriman

unread,
Feb 25, 2010, 7:56:04 AM2/25/10
to mongod...@googlegroups.com
watch this ticket: http://jira.mongodb.org/browse/SERVER-380

for future developments.

in the meantime mathias' suggestion may be a good temporary solution.
Reply all
Reply to author
Forward
0 new messages