training Omikuji backend

16 views
Skip to first unread message

Uldis Bojars

unread,
May 5, 2026, 7:31:01 AMMay 5
to Annif Users
Hi,

We have trained the Annif Omikuji backend with a short text corpus (MARC field 245 contents + field 505 if present) covering ~67 thousand records.

Is there a way to improve the resulting model, e.g., by adjusting its parameters?

Here are the current parameters:

language=lv
backend=omikuji
analyzer=simplemma(lv)
vocab=nllsh_2026_04
cluster_balanced=False
cluster_k=100
max_depth=3

Best regards,
Uldis

Osma Suominen

unread,
May 25, 2026, 6:17:06 AM (3 days ago) May 25
to annif...@googlegroups.com
Hi Uldis,

I seem to remember that your vocabulary NLLSH is quite large. With only
67k training records, there probably will not be enough training data to
get good results on this type of model. You would need a lot more than
that; a good rule of thumb is that you need at least 10 times as many
training examples as there are subjects/concepts in your vocabulary. For
example, we train our YSO models on over 1M short text records.

If you can't find enough training data in your own databases, one option
is to look at synthetic data generation. We did that in the
LLMs4Subjects Shared Task. See this paper for some details:
https://arxiv.org/abs/2508.15877

The code is here:
https://github.com/NatLibFi/Annif-LLMs4Subjects-GermEval2025

Best,
Osma
> --
> You received this message because you are subscribed to the Google
> Groups "Annif Users" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to annif-users...@googlegroups.com <mailto:annif-
> users+un...@googlegroups.com>.
> To view this discussion visit https://groups.google.com/d/msgid/annif-
> users/
> CAJjMrEO6JVrcs-2THN%2BgJND4FyKMa2GO6CjjWtr2ZRen8wxBjg%40mail.gmail.com
> <https://groups.google.com/d/msgid/annif-users/
> CAJjMrEO6JVrcs-2THN%2BgJND4FyKMa2GO6CjjWtr2ZRen8wxBjg%40mail.gmail.com?
> utm_medium=email&utm_source=footer>.

--
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 15 (Unioninkatu 36)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
osma.s...@helsinki.fi
http://www.nationallibrary.fi

Reply all
Reply to author
Forward
0 new messages