add custom words to ignored words in full text search.

492 views
Skip to first unread message

aliane abdelouahab

unread,
Jan 29, 2013, 10:09:57 AM1/29/13
to mongodb-user
hi
is there a way to tell mongodb to ignore a dictionnary of words that
will be ingnored from indexing; from mongodb documentation, it is
written:
Indexes and queries drop stop words (i.e. “the,” “an,” “a,” “and,”
etc.)
so is there a way to add other words to the list, this is espetially
helpful for those who will not use english, so they'll empty the
dictionnary and add their own words.
NB: the driver is Pymongo.

Sam Millman

unread,
Jan 29, 2013, 10:12:27 AM1/29/13
to mongod...@googlegroups.com
This has been talked about and that and it is (as far as I know) planned however I do not believe the functionality is yet there.



--
--
You received this message because you are subscribed to the Google
Groups "mongodb-user" group.
To post to this group, send email to mongod...@googlegroups.com
To unsubscribe from this group, send email to
mongodb-user...@googlegroups.com
See also the IRC channel -- freenode.net#mongodb

---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.



aliane abdelouahab

unread,
Jan 29, 2013, 10:17:06 AM1/29/13
to mongodb-user
hope this will be for soon, because the dictionnary is only a mongodb
document that is fast to parse, then to be ignored :D

On 29 jan, 16:12, Sam Millman <sam.mill...@gmail.com> wrote:
> This has been talked about and that and it is (as far as I know) planned
> however I do not believe the functionality is yet there.
>

Marian Steinbach

unread,
Jan 29, 2013, 11:19:21 AM1/29/13
to mongod...@googlegroups.com
By the way, you can look at the stopword lists currently used, in all supported languages, here:


At least then you know what MongoDB drops currently.

Marian

aliane abdelouahab

unread,
Jan 29, 2013, 12:44:42 PM1/29/13
to mongodb-user
thank you for the link
but i dont see arabic!

Marian Steinbach

unread,
Jan 29, 2013, 1:18:56 PM1/29/13
to mongod...@googlegroups.com
You're welcome.

Yes, Arabic is not there. As the developers have announced, they start with a number of "latin languages". I guess that simplifies things quite a bit for them, but of course closes out a great part of the world for now. Let's hope that there are enough resources to work on this.

aliane abdelouahab

unread,
Jan 29, 2013, 4:53:00 PM1/29/13
to mongodb-user
i've added a simple arabic file, hope it will get more attention, but,
if mongodb is 100% unicode, why making a non-latin characters is
considered another case?

Marian Steinbach

unread,
Jan 30, 2013, 4:20:54 AM1/30/13
to mongod...@googlegroups.com
The character set might not be the issue. Full text search has to deal with language-specific tasks like stemming. If people search for "house", they usually also want to find occurrences of "houses". If they search for "mouse", they also want to find occurrences of "mice".

Sam Millman

unread,
Jan 30, 2013, 4:27:39 AM1/30/13
to mongod...@googlegroups.com
As Marian said, the sarch needs more than just stop words to work in arabic.

As you know Arabic is a notoriously difficult language to stem, much like Chinese or Japanese. It is not as simple as just replacing the end of the word with something else which is what stemming in latin languages (such as English) basically does.

Stemming is of course just one of the things it has to do.

But I am sure the search will, hopefully, move out to non-latin languages as it perfects its core.



--

aliane abdelouahab

unread,
Jan 30, 2013, 5:13:17 AM1/30/13
to mongodb-user
no, arabic shares lot of rules with latin languages: it has suffixes,
prefixes, irregular verbs.
it's not only because it is an attached-words language that it seems
to be complicated, it's something like a latin in hand writing ;)

On 30 jan, 10:27, Sam Millman <sam.mill...@gmail.com> wrote:
> As Marian said, the sarch needs more than just stop words to work in arabic.
>
> As you know Arabic is a notoriously difficult language to stem, much like
> Chinese or Japanese. It is not as simple as just replacing the end of the
> word with something else which is what stemming in latin languages (such as
> English) basically does.
>
> Stemming is of course just one of the things it has to do.
>
> But I am sure the search will, hopefully, move out to non-latin languages
> as it perfects its core.
>
Reply all
Reply to author
Forward
0 new messages