Issue with Spanish Text Index stemming

16 views
Skip to first unread message

Ruben Gonzalez

unread,
Mar 15, 2018, 7:12:46 PM3/15/18
to mongodb-user
Hi group

NOTE: Using a text index, the process of deleting some letters at the end is a language-specific feature called "stemming".

Why using a Text Index in Spanish the stemming of "filología" is "filolog" and the stemming of "filologia" (without accent mark) is "filologi"?

Words ending in -ía are very common in Spanish and searching without accent marks in very common too.



> db.Series.find({ "$text": { "$search": 'filologia clásica', "$language": "es" } }, {indexlanguage: 1}).explain("executionStats") 
... 
"terms" : [ "filologi", "clasic" ], 
... 
 > db.Series.find({ "$text": { "$search": 'filología clásica', "$language": "es" } }, {indexlanguage: 1}).explain("executionStats") 
... 
"terms" : [ "filolog", "clasic" ],
... 

Is it a bug in the stemming process or in my Text Index configuration?:

> db.User.getIndexes()
[
    {
        "v" : 1,
        "key" : {
            "_fts" : "text",
            "_ftsx" : 1
        },
        "name" : "$**_text",
        "ns" : "test.User",
        "weights" : {
            "$**" : 1
        },
        "default_language" : "english",
        "language_override" : "language",
        "textIndexVersion" : 3
    }
]


Thank you.

Ruben Gonzalez

unread,
Jul 4, 2018, 10:29:37 AM7/4/18
to mongodb-user
Reply all
Reply to author
Forward
0 new messages