ElasticSearch Synonym with AtoM

178 views
Skip to first unread message

Ricardo Pinho

unread,
Apr 1, 2021, 2:04:56 PM4/1/21
to AtoM Users

Dear AtoM users,

I've been trying to configure ElasticSearch Synonym to use on AtoM search engine, but with no results.
Based in:

I've started with a simple example, create a synonym of Dr and Doctor.

1. Just edit the: search.yml
sudo vi plugins/arElasticSearchPlugin/config/search.yml
Adding the lines (in blue):
analyzer:
  synonym:
    tokenizer: whitespace
    filter: [synonym]

  default:
...
  filter:
    synonym:
      type: synonym
      synonyms: [dr, doctor]
      tokenizer: whitespace
     engram:
...
2. (save file) and then run:
sudo php symfony cc
sudo service php7.2-fpm restart
sudo systemctl restart memcached
sudo php symfony search:populate


But when I search in AtoM for "dr" only shows occurrences with "Dr"
And when I search for "doctor" only shows occurrences with "Doctor"
If I use: "dr OR doctor" is returns both occurrences, just like I wanted to, without using "OR".

Please, has anyone configured this ES Synonym in AtoM? What I'm I doing wrong?
Thank you in advance!
Best regards,
Ricardo Pinho

Dan Gillean

unread,
Apr 1, 2021, 5:20:27 PM4/1/21
to ICA-AtoM Users
Hi Ricardo, 

I've never played with additional filters and analyzers in Elasticsearch, and I'm not a developer or system administrator, so my input will be very limited. However, one thing you'd likely need to do after this kind of change would be to restart Elasticsearch: 
  • sudo systemctl restart elasticsearch
Also, are you replacing the existing analyzers and filters in the search.yml file with these, or are you simply trying to add additional analyzers? I'm pretty sure that by default only one analyzer can be used at a time, though an analyzer can have zero or more token filters, which are applied in order. We're getting way out of my depth here, but see for example: 
It also sounds like what you're using would be a custom analyzer, since I think the ES default analyzers do not use the synonym token filter. See: 
As such, depending on HOW you've added that information to plugins/arElasticSearchPlugin/config/search.yml, it's possible that it is conflicting with the existing settings?

I'm guessing a bit here, but hoping these links and suggestions might help point you in the right direction. Please let us know what you learn as you continue testing!

Regards, 

Dan Gillean, MAS, MLIS
AtoM Program Manager
Artefactual Systems, Inc.
604-527-2056
@accesstomemory
he / him


--
You received this message because you are subscribed to the Google Groups "AtoM Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ica-atom-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ica-atom-users/493f92d0-eff0-4bc3-9e40-a0b1e451e6c2n%40googlegroups.com.

Ricardo Pinho

unread,
Apr 1, 2021, 6:45:06 PM4/1/21
to AtoM Users
Thank you Dan to be so kind for replying and for all those good hunches!
I will take them in consideration and if I get some results I will post them here.

In the mean time,
if anyone else already solved this problem or have other hunches, please let me know!
Thanks
Ricardo Pinho

David Juhasz

unread,
Apr 1, 2021, 8:30:35 PM4/1/21
to ica-ato...@googlegroups.com
Hi Ricardo,

I think you need to add the "synonym" filter to the default analyzer, or it won't be applied to the AtoM data in the Elasticsearch index, e.g.

analyzer:
  default:
    tokenizer: standard
    filter: [lowercase, preserved_asciifolding, synonym]


The "default" analyzer is used for most text fields in the AtoM Elasticesarch index. If you want to create a separate "synonym" analyzer as you've indicated, you will have to designate the Elasticsearch fields that you want to use the synonym analyzer, which is complicated due to the way AtoM maps data to the ES index.  If you want to go down the separate "synonym" analyzer road, I'd recommend looking at this commit that added the "alphasort" normalizer for an idea of which files you might need to change: https://github.com/artefactual/atom/commit/0814c0f7767cedbd2e644103f7ae189a9f98bb84


Best regards,
David Juhasz (he/him)

Senior Developer
Artefactual Systems


David at Artefactual

unread,
Apr 1, 2021, 9:00:40 PM4/1/21
to AtoM Users
Hi Ricardo,

I just realized that if you want to add the synonym filter for a language other than English, you may need to add it to the appropriate culture in the search.yml file. For example, to handle Portuguese synonyms it would need to be added to:

portuguese:
    tokenizer: standard
    filter: [lowercase, portuguese_stop, preserved_asciifolding, synonym]

See: https://github.com/artefactual/atom/blob/qa/2.x/plugins/arElasticSearchPlugin/config/search.yml#L111

Ricardo Pinho

unread,
Apr 5, 2021, 12:14:12 PM4/5/21
to ica-ato...@googlegroups.com
Thanks David for those clues!
You are right, we must add the "synonym" filter to the language analyzer. (including for english language)
It worked!, after adding it like shown below:

          english:
            tokenizer: standard
            filter: [lowercase, english_stop, preserved_asciifolding , synonym ]

Adding it to the default filter got no effects.
Best regards,
Ricardo

You received this message because you are subscribed to a topic in the Google Groups "AtoM Users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/ica-atom-users/pNcijqsg0Wk/unsubscribe.
To unsubscribe from this group and all its topics, send an email to ica-atom-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ica-atom-users/f8f07224-2a64-4d99-a1be-e8970ac35787n%40googlegroups.com.


--
Ricardo Pinho

David Juhasz

unread,
Apr 5, 2021, 7:13:47 PM4/5/21
to ica-ato...@googlegroups.com
Great! I'm glad that helped you find a solution Ricardo.  I didn't know about the Elasticsearch "synonym" capability, so I learned something new too. :)
--

David Juhasz (
he/him) Senior Developer
Artefactual Systems


Ricardo Pinho

unread,
Apr 6, 2021, 4:34:47 AM4/6/21
to ica-ato...@googlegroups.com
Thanks David. We never stop learning with each other... ;-)

One more minor problem that you eventually can help with any suggestion.
When I try to use a separate txt file for the synonyms using "synonyms_path" in:
atom/plugins/arElasticSearchPlugin/config/search.yml

      synonyms_path: analysis/synonym.txt
      # synonyms: [dr => doctor]


ElasticSearch default path points to:
/etc/elasticsearch/analysis/synonym.txt

Is there any way in the search.yml file to point to a relative path inside the atom install dir?
For example in the "atom/plugins/arElasticSearchPlugin/config"?

Thank you.
Best regards,
Ricardo Pinho



--
Ricardo Pinho

David at Artefactual

unread,
Apr 6, 2021, 2:50:29 PM4/6/21
to AtoM Users
Hi Ricardo,

I'm not 100% sure it will work, but you can try using the string "%SF_APP_DIR%" (no quotes) for the AtoM root directory, or "%SF_PLUGINS_DIR%" for the AtoM plugins directory.  See https://github.com/artefactual/atom/blob/qa/2.x/apps/qubit/config/autoload.yml for an example.

Cheers,
David

Ricardo Pinho

unread,
Apr 9, 2021, 6:42:56 AM4/9/21
to ica-ato...@googlegroups.com
Thank you David!
I come to share my recent attempts on this subject.

Both of those string variables work inside search.yml:
%SF_APP_DIR% for the AtoM root directory, or

%SF_PLUGINS_DIR% for the AtoM plugins directory

I've tried with this setting in the atom /plugins/arElasticSearchPlugin/config/search.yml :

synonyms_path: %SF_PLUGINS_DIR%/arElasticSearchPlugin/config/analysis/synonym.txt
# synonyms_path: analysis/synonym.txt

# synonyms: [dr => doctor]


And ES did tried to get it at:
"/usr/share/nginx/atom/plugins/arElasticSearchPlugin/config/analysis/synonym.txt"

But unfortunately when running the:
sudo php symfony search:populate
I got the error:
access denied ("java.io.FilePermission" "/usr/share/nginx/atom/plugins/arElasticSearchPlugin/config/analysis/synonym.txt" "read")

After a quick search I found out that:
Elasticsearch is using the java security manager on startup, that is configured by a rule, that only allows opening of files in the config directory but not in arbitrary directories. The correct way of doing this is to customize java security policy and specify the file(s) you want to access using policy files.

So, for the moment, I'm staying with the default ES path.

Cheers,
Ricardo Pinho


David Juhasz

unread,
Apr 9, 2021, 1:58:12 PM4/9/21
to ica-ato...@googlegroups.com
Hi Ricardo,

Oh, too bad about the Java security rule. :(  Thanks for letting us all know what worked for you and what didn't. :)

Best regards,
David
--

David Juhasz (
he/him) Senior Developer
Artefactual Systems

Reply all
Reply to author
Forward
0 new messages