alternatives in skos

67 views
Skip to first unread message

christelann...@gmail.com

unread,
Feb 7, 2022, 5:27:31 AM2/7/22
to Annif Users

Dear all,

I am (still) working on my (multilingual) SKOS in which I am putting various languages (English, German, Dutch, Swedish, Danish and French). It's a SKOS for early modern normative texts (laws). As I want to use it while training Annif I have a bit of a linguistic question, hopefully, someone knows an answer.

When dealing with Swiss sources (which I do now), I am talking to the SSRQ (Foundation for Swiss Legal Sources) and they have a lengthy list of keywords and synonyms (see example here - we are aware that some more political correct terminology is needed).  However, they also list with various spellings(as the Swiss had many different ways of spelling things over time and per region). Especially for the early modern time, without any standard spelling the idea of synonyms is a bit challenging, as it is - to me - rather unclear when something is a synonym and when it is an alternative spelling.

Anyway, my question boils down to the following: does Annif benefit from having synonyms in a SKOS? And does it benefit from having originally spelled words in the SKOS as well? (Would it perform better?)

Best,
Annemieke 


Message has been deleted
Message has been deleted

anna.k...@googlemail.com

unread,
Feb 8, 2022, 8:40:51 AM2/8/22
to Annif Users

Dear Annemieke,

wow, good question! Not sure if I can answer it but the answer would interest me as well.

I guess first of all the question is not "does Annif benefit from ..." but some of the models provided in Annif might, others don't.
stwfsapy (https://github.com/zbw/stwfsapy ; in Annif: https://github.com/NatLibFi/Annif/wiki/Backend%3A-STWFSA) for example does look into altLabels.

And I have seen alternative spellings modelled in SKOS via hiddenLabel -- so I guess one could modify algorithms that look into altLabels / synonyms to check hiddenLabels as well.
I am not sure how much of the preprocessing already takes care of that but maybe it would indeed help if these alternative spellings are explicitly confirmed via hiddenLabel.

Anyway, now I have mostly added a bunch of questions to your question :-D but maybe it helps ring a bell with other people here on the list.

Cheers
Anna

Osma Suominen

unread,
Feb 11, 2022, 6:54:24 AM2/11/22
to annif...@googlegroups.com
Hi Annemieke and Anna,

Let me add a few details and respond to Anna's questions.

It's the lexical algorithms in Annif that can benefit from alternative
labels. Currently this means MLLM, STWFSA and YAKE. All the other base
algorithms (TFIDF, fastText, Omikuji) don't care at all about the labels
in the vocabulary, whether preferred or alternate. To them the
subjects/concepts are abstract categories which could just as well be
identified by numbers (and in fact are represented as numbers internally).

In SKOS, there are three kinds of labels (terms): prefLabels, altLabels
and hiddenLabels. So alternate labels such as synonyms and different
spellings can be expressed either using altLabels or hiddenLabels. There
is some guidance for this in the SKOS Primer [1]. Another source of
guidance (though sadly not openly available) is the ISO thesaurus
standard ISO 25964-1. Generally hiddenLabels are used for misspellings
or otherwise inappropriate terms that still need to be represented.

Here is how the Annif lexical backends support altLabels and hiddenLabels:

* MLLM by default uses prefLabels and altLabels, but not hiddenLabels,
when looking for vocabulary terms in text. With the setting
use_hidden_labels set to true, also hiddenLabels will be used.

* STWFSA, like MLLM, uses prefLabels and altLabels. hiddenLabels are not
supported (verified by grepping the codebase of stwfsapy).

* YAKE by default uses prefLabels and altLabels when matching keyphrases
found in text to vocabulary terms. It is possible to enable also
hiddenLabels to be used with the label_types setting, like this:
label_types=prefLabel,altLabel,hiddenLabel

I would recommend that you represent your synonyms and alternative
spelling either as altLabels or hiddenLabels, depending on
circumstances. It is likely that you will then get better results with
the lexical backends, but you will have to try it out yourself. With
MLLM and YAKE, you will need to change the default settings if you want
to use hiddenLabels as well as altLabels.

-Osma

[1] https://www.w3.org/TR/skos-primer/#seclabel


'anna.k...@googlemail.com' via Annif Users kirjoitti 8.2.2022 klo 15.40:
> Dear Annemieke,
>
> wow, good question! Not sure if I can answer it but the answer would
> interest me as well.
>
> I guess first of all the question is not "does Annif benefit from ..."
> but some of the models provided in Annif might, others don't.
> stwfsapy (https://github.com/zbw/stwfsapy
> <https://github.com/zbw/stwfsapy> ; in Annif:
> https://github.com/NatLibFi/Annif/wiki/Backend%3A-STWFSA
> <https://github.com/NatLibFi/Annif/wiki/Backend%3A-STWFSA>) for example
> does look into altLabels.
>
> And I have seen alternative spellings modelled in SKOS via hiddenLabel
> -- so I guess one could modify algorithms that look into altLabels /
> synonyms to check hiddenLabels as well.
> I am not sure how much of the preprocessing already takes care of that
> but maybe it would indeed help if these alternative spellings are
> explicitly confirmed via hiddenLabel.
>
> Anyway, now I have mostly added a bunch of questions to your question
> :-D but maybe it helps ring a bell with other people here on the list.
>
> Cheers
> Anna
>
> christelann...@gmail.com schrieb am Montag, 7. Februar 2022 um 11:27:31
> UTC+1:
>
>
> Dear all,
>
> I am (still) working on my (multilingual) SKOS in which I am putting
> various languages (English, German, Dutch, Swedish, Danish and
> French). It's a SKOS for early modern normative texts (laws). As I
> want to use it while training Annif I have a bit of a linguistic
> question, hopefully, someone knows an answer.
>
> When dealing with Swiss sources (which I do now), I am talking to
> the SSRQ (Foundation for Swiss Legal Sources) and they have a
> lengthy list of keywords and synonyms (see example here
> <https://www.ssrq-sds-fds.ch/lemma-db-edit/views/view-lemma.xq?id=lem016983> -
> we are aware that some more political correct terminology is
> needed).  However, they also list with various spellings
> <https://www.ssrq-sds-fds.ch/lemma-db-edit/views/view-lemma.xq?id=lem003404>(as
> the Swiss had many different ways of spelling things over time and
> per region). Especially for the early modern time, without any
> standard spelling the idea of synonyms is a bit challenging, as it
> is - to me - rather unclear when something is a synonym and when it
> is an alternative spelling.
>
> Anyway, my question boils down to the following: does Annif benefit
> from having synonyms in a SKOS? And does it benefit from having
> originally spelled words in the SKOS as well? (Would it perform better?)
>
> Best,
> Annemieke 
>
>
> --
> You received this message because you are subscribed to the Google
> Groups "Annif Users" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to annif-users...@googlegroups.com
> <mailto:annif-users...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/annif-users/9e8f3ff9-976d-4327-a85b-690e30f2bdc0n%40googlegroups.com
> <https://groups.google.com/d/msgid/annif-users/9e8f3ff9-976d-4327-a85b-690e30f2bdc0n%40googlegroups.com?utm_medium=email&utm_source=footer>.

--
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 15 (Unioninkatu 36)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
osma.s...@helsinki.fi
http://www.nationallibrary.fi
Reply all
Reply to author
Forward
0 new messages