How to make an MLLM model for another language

26 views
Skip to first unread message

Javier 1882

unread,
Feb 4, 2025, 9:51:52 AMFeb 4
to Annif Users
Hello all,
I want to use Annif for Spanish texts and I've seen that only English, Swedish and Finnish are supported. What would be the necessary steps to make something like the YSO MLLM English, but for Spanish?
Kind regards,
Javier

Osma Suominen

unread,
Feb 5, 2025, 7:52:11 AMFeb 5
to annif...@googlegroups.com
Hello Javier,

Annif supports many languages, not just English, Swedish and Finnish.

MLLM is a lexical model, which means that it matches words/terms in the
text to terms in the vocabulary. Thus the vocabulary needs to have
labels in the language of your texts. YSO doesn't have Spanish labels,
so you can't use YSO for this task unless you first translate it (which
is a major piece of work). Instead you should probably look for another
SKOS controlled vocabulary that has Spanish language labels. For example
UNESCO Thesaurus, AGROVOC and EuroVoc are such vocabularies.

Once you have a vocabulary with Spanish terms, you can configure an MLLM
project like this:

backend=mllm
language=es
analyzer=snowball(spanish)

For the analyzer there are other options with support for Spanish, for
example:
analyzer=simplemma(es)

For more information about analyzers see here:
https://github.com/NatLibFi/Annif/wiki/Analyzers

Best,
Osma
> --
> You received this message because you are subscribed to the Google
> Groups "Annif Users" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to annif-users...@googlegroups.com
> <mailto:annif-users...@googlegroups.com>.
> To view this discussion visit
> https://groups.google.com/d/msgid/annif-users/651dfa1c-dd01-4de6-b778-7f55597a4b96n%40googlegroups.com <https://groups.google.com/d/msgid/annif-users/651dfa1c-dd01-4de6-b778-7f55597a4b96n%40googlegroups.com?utm_medium=email&utm_source=footer>.

--
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 15 (Unioninkatu 36)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
osma.s...@helsinki.fi
http://www.nationallibrary.fi
Message has been deleted

Javier 1882

unread,
Feb 20, 2025, 9:30:36 AMFeb 20
to Annif Users
Hello Osma,
Apologies for the late reply, just saw your answer now.
Sounds good! I've done a little bit of research on EuroVoc and it looks good for my use case.
Thank you for the comprehensive answer, I'll follow up on this answer if any other problems arise.
Best,
Javier

Reply all
Reply to author
Forward
0 new messages