STWFSA backend: problem with the options for braces in labels

28 views
Skip to first unread message

Enrico Laloli

unread,
Jun 22, 2023, 6:23:01 AM6/22/23
to Annif Users
I am using STWFSA and MLLM as backends. The first has several options to deal with explanations in brackets in labels, but  I find them confusing and not working as I expect.

I use bracket with labels for ambiguous concepts and for homographs (terms spelled the same, but with a different meaning). Of course, this is not a SKOS feature, SKOS has no way of dealing with this problem.

Like:
shares (capital market)
or
bat (animal)
bat (baseball)

Now, there is an option with SWFSA: extract_upper_case_from_braces
This is explained as: "Removes the explanation in braces from labels. I.e., GDP (Gross Domestic Product) will be transformed to GDP".

The name implies something else than the description, in my opinion.

When this is used I would expect that labels with the same spelling are the treated the same. Of course, this not what you would want, because it gives you back the problem you wanted to solve with the braces.

The other option is named: extract_any_case_from_braces
With the description: "Can extract content of braces in labels.  In contrast to extract_upper_case_from_braces it will extract the part inside the parenthesis and not the part before."

Again the name is not really in sync with the explanation.

I used the first option, but it didn´t work: no content with just the first term without braces was matched. I tried both options, but no difference. I did ask the authors of the algorithm, but no reply.

Does anyone have experience with using these options or with this problem of homographs?

This my stwfsa configuration:

[stw-nl]
name=STWFSA lexical
language=nl
backend=stwfsa
vocab=tax-nl
extract_upper_case_from_braces=True
#extract_any_case_from_braces=True
limit=6
analyzer=spacy(nl_core_news_sm,lowercase=1)
Reply all
Reply to author
Forward
0 new messages