Filtering suggestions in annif eval

Sven Sass

unread,

May 21, 2025, 4:38:52 AM5/21/25

to Annif Users

Hello,

I'm evaluating Annif with very good results - thanks for creating and maintaining such a cool project!

For our data we do know for certain subjects if they must never be in the suggestions for a given document. Currently they are sometimes suggested, which is totally reasonable for the algorithm.

Is there any way to have an output filter/callback which is executed/called right after the suggestions are calculated by a given backend so that "annif eval" works with the filterered suggestions and they are also filtered before used in ensembles?

Thanks a lot and

regards,

Sven

ps.: I saw a message stating (I'm paraphrasing) "Annif is not supposed to be used as pyhton library" - but I thought: asking does not hurt ;)

Osma Suominen

unread,

May 21, 2025, 7:22:23 AM5/21/25

to annif...@googlegroups.com

Hello Sven!

(re-posting to annif-users, as I accidentally replied only to Sven)

Great to hear that you've found Annif useful and achieved good results!

The problem you have is quite common - algorithms sometimes consistently
suggest inappropriate subjects. It was reported a while ago in this
issue: https://github.com/NatLibFi/Annif/issues/735

This has since been addressed in a PR, which added support for a
configuration feature where specific subjects can be excluded:
https://github.com/NatLibFi/Annif/pull/840

This is not yet included in any Annif release, but the code has been
merged to the main branch on GitHub. It will be released as part of 1.4.

That feature requires listing the excluded concepts individually by URI
in the Annif configuration file. That can be cumbersome if there are
many such concepts, so we are currently working on support for exclusion
rules to make it possible to exclude many concepts in one go based on
different criteria: https://github.com/NatLibFi/Annif/issues/844

There is already a draft PR with the initial changes required for this
(https://github.com/NatLibFi/Annif/pull/846) but the work has stalled
for a while. I can't promise when it will be ready.

Regarding use of Annif as a Python library - you are of course free to
do so if you find it useful, but it's not really a use case that we are
planning for or supporting. For instance Python class and method APIs
within Annif quite often change even in minor releases, which could
cause breakage. There are no callbacks of the kind you mention, but I
think that the exclude functionality described ago could be used to
accomplish the same end result.

Best,
Osma

> --
> You received this message because you are subscribed to the Google
> Groups "Annif Users" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to annif-users...@googlegroups.com <mailto:annif-
> users+un...@googlegroups.com>.
> To view this discussion visit https://groups.google.com/d/msgid/annif-
> users/db2e60fe-bc43-4bd1-86d1-c144557ec5bdn%40googlegroups.com <https://
> groups.google.com/d/msgid/annif-users/db2e60fe-bc43-4bd1-86d1-
> c144557ec5bdn%40googlegroups.com?utm_medium=email&utm_source=footer>.

--
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 15 (Unioninkatu 36)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
osma.s...@helsinki.fi
http://www.nationallibrary.fi

Sven Sass

unread,

May 21, 2025, 8:38:38 AM5/21/25

to Annif Users

Hello Osma,

thanks for the quick response.

The dynamic filtering mentioned in (https://github.com/NatLibFi/Annif/issues/735) could help in one of my cases. For any given text we know for sure which topics to exclude (could be up to 6.000) and could pass them, if there was a dynamic parameter for this.

I was hoping more for a callback approach, where in the project.cfg you could specifiy something like "suggestfilter=<pythonfunction>" and the python function receives predefined parameters like "text" and "suggestions" and returns "(potentially altered) suggestestions" as result. Maybe this is a too specific request to be dealt on Annif project level. There is another idea to alter the suggestions based on the suggestions themselves - so it is not just about filtering unwanted, but also tweaking results with "outside knowledge".

Thanks again for this great project (I see the llm4subject branch :) ) and your time!