usage of SKOS concepts in MLLM

65 views
Skip to first unread message

Sophie Schneider

unread,
Mar 30, 2023, 5:04:03 AM3/30/23
to Annif Users

Dear all,

I have some questions regarding the MLLM algorithm. More specifically, I am interested in the additional indexes based on SKOS concepts. From reading through the wiki page description on MLLM I understand that these additional indexes (step 2) are transformed into numerical features and thus are solely used for improving results based on the set of candidates from matching document terms to the term index (step 1).

Can semantically related terms be suggested with this backend as well? Here is an example: If a document matches terms A and B and both of  them have a SKOS:broader(/SKOS:related/SKOS:narrower) concept C, is it possible for C to be a keyword suggested by MLLM (even if C does not occur in the document itself)?

In addition to this: Are there any differences in the way the algorithm deals with different types of relationships (e.g. SKOS:broader vs. SKOS:narrower)?

Thanks in advance and Best,
Sophie

________________________________________

Sophie Schneider
Wissenschaftliche Mitarbeiterin Mensch.Maschine.Kultur
Staatsbibliothek zu Berlin - Preußischer Kulturbesitz

Osma Suominen

unread,
Mar 30, 2023, 10:33:41 AM3/30/23
to annif...@googlegroups.com
Hello Sophie,

please find my responses inline:

Sophie Schneider kirjoitti 30.3.2023 klo 12.04:
> I have some questions regarding the MLLM algorithm. More specifically, I
> am interested in the additional indexes based on SKOS concepts. From
> reading through the wiki page description on MLLM
> <https://github.com/NatLibFi/Annif/wiki/Backend%3A-MLLM> I understand
> that these additional indexes (step 2) are transformed into numerical
> features and thus are solely used for improving results based on the set
> of candidates from matching document terms to the term index (step 1).
>
> Can semantically related terms be suggested with this backend as well?
> Here is an example: If a document matches terms A and B and both of them
> have a SKOS:broader(/SKOS:related/SKOS:narrower) concept C, is it
> possible for C to be a keyword suggested by MLLM (even if C does not
> occur in the document itself)?

No, it will not suggest C if it doesn't appear in the document at all.
MLLM can only ever suggest concepts whose term (prefLabel, altLabel - or
hiddenLabel if enabled) is mentioned in the document at least once.

> In addition to this: Are there any differences in the way the algorithm
> deals with different types of relationships (e.g. SKOS:broader vs.
> SKOS:narrower)?

Yes, MLLM uses three distinct features based on SKOS broader, narrower
and related relationships. If you happen to know Maui (see Medelyan's
PhD thesis), this is different, as Maui only uses a single common
feature for all these kinds of semantic relationships.

MLLM also has a fourth feature based on (shared) skos:Collection membership.

You can find all the features in the MLLM code:

https://github.com/NatLibFi/Annif/blob/26039b35e7ac644a14371c9ac7a33590e5d5426f/annif/lexical/mllm.py#L102-L116

Best,
Osma

--
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 15 (Unioninkatu 36)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
osma.s...@helsinki.fi
http://www.nationallibrary.fi

Sophie Schneider

unread,
Apr 3, 2023, 4:55:59 AM4/3/23
to Annif Users
Dear Osma,

perfect, thanks for the quick response and clarification!

Best,
Sophie

Donny Winston

unread,
May 21, 2024, 5:45:47 AMMay 21
to Annif Users
I have a related question on usage of SKOS concepts in MLLM, and thought I would add to this thread rather than start another thread. I hope this is acceptable.

Has the effect of (a) deriving transitive broader/narrower relations (i.e. using inferred skos:broaderTransitive skos:narrowerTransitive relations as features instead of using skos:broader and skos:narrower directly), or (b) the effect of "normalizing" to either broader or narrower (i.e. if normalizing to narrower, replacing each statement A skos:broader B with the statement B skos:narrower A), or both, been investigated? I am interested in trying this, but I'd rather not attempt to hack it if someone else has already tried and evaluated its performance relative to the as-is MLLM backend. Perhaps such behavior could be toggled via "backend-specific parameters" (<https://github.com/NatLibFi/Annif/wiki/Backend%3A-MLLM#backend-specific-parameters>)? These modes seem like natural ways to leverage the semantics implied by SKOS.

Best,
Donny

P.S. I hope you meet one or more folks from the team at the upcoming workshop on June 3 at the Open Repositories Conference; I am co-leading a workshop scheduled for the same time (!) but that should end earlier than the Annif workshop. :)

Osma Suominen

unread,
Jun 7, 2024, 10:32:32 AMJun 7
to annif...@googlegroups.com
Hi Donny!

We met at OR2024 (yay!) and I tried to give you an answer there, but
here's one for the others as well who could be following the discussion.

MLLM has basic support for SKOS hierarchies, i.e. direct skos:broader
and skos:narrower relationships (as well as skos:related and shared
membership in a SKOS Collection). These are turned into features for the
small classifier model that is used to combine all the features into a
prediction about how appropriate a particular concept is for the input text.

There is no special support for the broaderTransitive and
narrowerTransitive relationships. However, these could be calculated
either within the MLLM backend or externally (for example using Skosify)
and used as features in the MLLM classifier - this would require some
relatively small changes to the MLLM codebase, basically just copying
what is done for the other SKOS semantic relations. I don't know how
much this would improve the results, but my hunch is that not very much
- maybe at most a one percentage point improvement in measures like
precision and recall. Which of course might be significant! Feel free to
try! But maybe it's best to benchmark the results first without making
any changes to the code and seeing if that's good enough for your use case.

-Osma
> https://github.com/NatLibFi/Annif/blob/26039b35e7ac644a14371c9ac7a33590e5d5426f/annif/lexical/mllm.py#L102-L116 <https://github.com/NatLibFi/Annif/blob/26039b35e7ac644a14371c9ac7a33590e5d5426f/annif/lexical/mllm.py#L102-L116>
>
> Best,
> Osma
>
> --
> Osma Suominen
> D.Sc. (Tech), Information Systems Specialist
> National Library of Finland
> P.O. Box 15 (Unioninkatu 36)
> 00014 HELSINGIN YLIOPISTO
> Tel. +358 50 3199529 <tel:+358%2050%203199529>
> osma.s...@helsinki.fi
> http://www.nationallibrary.fi <http://www.nationallibrary.fi>
>
> --
> You received this message because you are subscribed to the Google
> Groups "Annif Users" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to annif-users...@googlegroups.com
> <mailto:annif-users...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/annif-users/08f54034-0bc7-48b8-8e83-80418f642ae4n%40googlegroups.com <https://groups.google.com/d/msgid/annif-users/08f54034-0bc7-48b8-8e83-80418f642ae4n%40googlegroups.com?utm_medium=email&utm_source=footer>.
Reply all
Reply to author
Forward
0 new messages