Subject inclusion/exclusion and ensembles

20 views
Skip to first unread message

MJ Suhonos

unread,
Jan 5, 2026, 10:42:48 AMJan 5
to Annif Users
Hi all,

I'm very interested in the subject inclusion/exclusion functionality available in Annif 1.4+.  However, I'd like to ask about its behaviour when using ensembles.  In particular, I know you cannot normally combine different vocabularies within an ensemble.

However, with this new functionality, is it possible to have a large vocabulary, and two separate backends that each specify a *different* subset of included subjects, and then combine their results using an ensemble (since the unfiltered vocabulary is the same)?

A use case example would be something like LCSH/LCNAF, which is often expressed and applied as a very large single vocabulary.  One backend could be specified with the LCSH subset, and another backend with the LCNAF subset, with the results combined into the same label space.  This would presumably reduce the resources needed to train each subset, since the respective backends only "see" the filtered subset.

My apologies if I've missed or misunderstood the documentation on this; I wanted to ask here before I started experimenting and started getting confused.

Thanks,
MJ

Osma Suominen

unread,
Jan 7, 2026, 4:06:20 AMJan 7
to annif...@googlegroups.com
Hi MJ,

This kind of use for inclusion/exclusion in ensembles was indeed one of
the intended use cases. There is also some related discussion in this
issue comment:
https://github.com/NatLibFi/Annif/issues/596#issuecomment-3135932206

If you want to try it out with LCSH+LCNAF, you should define a single
vocabulary (maybe called lcsh_lcnaf or something along those lines) that
both the ensemble and the individual projects use. Then set
exclude/include rules that narrow down the vocabulary for the individual
projects. See the above comment for a similar configuration for the GND
vocabulary.

Please do report back on how this worked for you! This is a new feature
so we don't have much experience with it yet.

Best,
Osma
> --
> You received this message because you are subscribed to the Google
> Groups "Annif Users" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to annif-users...@googlegroups.com <mailto:annif-
> users+un...@googlegroups.com>.
> To view this discussion visit https://groups.google.com/d/msgid/annif-
> users/a5c0fba7-6094-42e9-8aa5-64c014aca76an%40googlegroups.com <https://
> groups.google.com/d/msgid/annif-users/
> a5c0fba7-6094-42e9-8aa5-64c014aca76an%40googlegroups.com?
> utm_medium=email&utm_source=footer>.

--
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 15 (Unioninkatu 36)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
osma.s...@helsinki.fi
http://www.nationallibrary.fi

MJ Suhonos

unread,
Jan 29, 2026, 11:45:42 AM (11 days ago) Jan 29
to Annif Users
Hi Osma,

Thanks for the clarification!  This functionality works great for the use case described, and as intended.  However, I wanted to make a note of an issue I encountered.  With the YAKE backend, it gives an error when using a project definition like:

vocab=my_vocab(exclude=*,include=http://some.uri)

This results in "TypeError: list indices must be integers or slices, not NoneType".  However, if the exclusion part is removed, it works fine:

vocab=my_vocab(include=http://some.uri)

It's not clear to me yet whether functionally these select the same subset of terms (I would _think_ so…), and it only appears with YAKE, other backends work fine.

Is this worth submitting a GitHub issue?

Cheers,
MJ

Osma Suominen

unread,
Jan 30, 2026, 3:47:55 AM (10 days ago) Jan 30
to annif...@googlegroups.com
Hi MJ,

Thanks for reporting back! The YAKE issue you mentioned sounds like a
bug to me. Please do report it as an issue on GitHub!

-Osma
> annif- <https://groups.google.com/d/msgid/annif->
> > users/a5c0fba7-6094-42e9-8aa5-64c014aca76an%40googlegroups.com
> <http://40googlegroups.com> <https://
> > groups.google.com/d/msgid/annif-users/ <http://groups.google.com/
> d/msgid/annif-users/>
> > a5c0fba7-6094-42e9-8aa5-64c014aca76an%40googlegroups.com
> <http://40googlegroups.com>?
> > utm_medium=email&utm_source=footer>.
>
> --
> Osma Suominen
> D.Sc. (Tech), Information Systems Specialist
> National Library of Finland
> P.O. Box 15 (Unioninkatu 36)
> 00014 HELSINGIN YLIOPISTO
> Tel. +358 50 3199529 <tel:+358%2050%203199529>
> osma.s...@helsinki.fi
> http://www.nationallibrary.fi <http://www.nationallibrary.fi>
>
> --
> You received this message because you are subscribed to the Google
> Groups "Annif Users" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to annif-users...@googlegroups.com <mailto:annif-
> users+un...@googlegroups.com>.
> To view this discussion visit https://groups.google.com/d/msgid/annif-
> users/39e51127-35e9-4137-805a-c7e8c8a607a1n%40googlegroups.com <https://
> groups.google.com/d/msgid/annif-users/39e51127-35e9-4137-805a-
> c7e8c8a607a1n%40googlegroups.com?utm_medium=email&utm_source=footer>.
Reply all
Reply to author
Forward
0 new messages