Issue with Spanish Text Index stemming

16 views

Skip to first unread message

Ruben Gonzalez

unread,

Mar 15, 2018, 7:12:46 PM3/15/18

to mongodb-user

Hi group

NOTE: Using a text index, the process of deleting some letters at the end is a language-specific feature called "stemming".

Why using a Text Index in Spanish the stemming of "filología" is "filolog" and the stemming of "filologia" (without accent mark) is "filologi"?

Words ending in -ía are very common in Spanish and searching without accent marks in very common too.

> db.Series.find({ "$text": { "$search": 'filologia clásica', "$language": "es" } }, {indexlanguage: 1}).explain("executionStats")

...

"terms" : [ "filologi", "clasic" ],

...

> db.Series.find({ "$text": { "$search": 'filología clásica', "$language": "es" } }, {indexlanguage: 1}).explain("executionStats")

...

"terms" : [ "filolog", "clasic" ],

...

Is it a bug in the stemming process or in my Text Index configuration?:

> db.User.getIndexes()

[

{

"v" : 1,

"key" : {

"_fts" : "text",

"_ftsx" : 1

"name" : "$**_text",

"ns" : "test.User",

"weights" : {

"$**" : 1

"default_language" : "english",

"language_override" : "language",

"textIndexVersion" : 3

}

]

Thank you.

Ruben Gonzalez

unread,

Jul 4, 2018, 10:29:37 AM7/4/18

to mongodb-user

Sorry. This mail duplicates https://groups.google.com/forum/#!msg/mongodb-user/yEnD21E_9q0/WoIR7-jtAAAJ

Reply all

Reply to author

Forward

0 new messages