impact of the vocabulary format on training (.ttl or .tsv)

21 views
Skip to first unread message

Gabriel Souzs

unread,
May 13, 2025, 9:26:08 AMMay 13
to Annif Users
  I couldn't find anything in the documentation about this, so I would like to know if there is any difference in the results (F1, precision, recall...) if we choose to load the vocabulary using only the .tsv format.
We are using Omikuji as the backend.


Best regards, Gabriel Souza   

Osma Suominen

unread,
May 13, 2025, 9:29:18 AMMay 13
to annif...@googlegroups.com
Hello Gabriel,

If you use Omikuji, the vocabulary format doesn't make a difference.
Omikuji and other associative backends (tfidf, fasttext, svc) don't care
about the lexical labels and other structure inside the vocabulary.

In contrast, lexical backends (mllm, stwfsa, yake) do care about e.g.
alternative labels which cannot be included in the TSV format, so you
have to use SKOS for best results.

-Osma
> --
> You received this message because you are subscribed to the Google
> Groups "Annif Users" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to annif-users...@googlegroups.com <mailto:annif-
> users+un...@googlegroups.com>.
> To view this discussion visit https://groups.google.com/d/msgid/annif-
> users/384efe18-40b7-445c-bcb8-9d201e14d116n%40googlegroups.com <https://
> groups.google.com/d/msgid/annif-users/384efe18-40b7-445c-
> bcb8-9d201e14d116n%40googlegroups.com?utm_medium=email&utm_source=footer>.

--
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 15 (Unioninkatu 36)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
osma.s...@helsinki.fi
http://www.nationallibrary.fi

Reply all
Reply to author
Forward
0 new messages