Evaluating ebm backend / developer branch

Sven Sass

unread,

Feb 2, 2026, 5:17:20 AMFeb 2

to Annif Users

Hello all,

I'm trying to evaluate the ebm backend, but I wanted to check beforehand:

1.) Is it a bad idea for an non (Annif-)developer trying to evalulate that backend as it not stable for now?

2.) If it is not a too bad idea: what would the correct approach be

a.) checkout branch "deutsche-nationalbibliothek-issue855-add-ebm-backend-gh-hosted-large-runner"

b.) checkout https://github.com/deutsche-nationalbibliothek/ebm4subjects (and use it for current main?)

c.) something else?

Any information appreciated.

Kind regards

Sven

Maximilian Kähler

unread,

Feb 3, 2026, 6:29:56 AMFeb 3

to Annif Users

Hi Sven,

as one of the co-developers here, my answer would be the following:

Right now there is an error with the logger that needs fixing, that you should probably wait for (give us a week or two).
Generally, we are happy about any feedback from you as a beta-tester. So the answer is a cautious "yes, go ahead but mind the gap..."

How to proceed:

* the correct Annif-branch to work with, is in on our DNB fork: deutsche-nationalbibliothek:issue855-add-ebm-backend
* the ebm4subjects-package could be deployed from pypi, unless you want to work with it's source code. In this case, take the main-branch in https://github.com/deutsche-nationalbibliothek/ebm4subjects
* in our latest version of ebm4subjects, support for sentenceTransformer is an optional dependency, that you would install when installing annif with the backend "ebm-in-process" (see pyproject.toml)
* To get startet: there is a draft for a wiki page on ebm: https://github.com/NatLibFi/Annif/wiki/DRAFT-%E2%80%90-Backend:-EBM This contains all information how you can configure the backend. The actual embedding model from huggingface is probably the most important parameter.

* To manage expectations: ebm is a method developed to improve performance in the long tail of large vocabularies. On it's own you can expect results that are in about the same metric-values as MLLM, but the actual matches should be significantly distinct from MLLM suggestions (as similarities are based on embeddings and not string representations). You should use ebm along with e.g. omikuji or another statistical approach for best results.

Please feel free to send us feedback via github. Especially, if you run into errors.

Best,
Maximilian

Sven Sass

unread,

Feb 4, 2026, 12:51:35 AMFeb 4

to Annif Users

Hi Maximilian,

thank you for your prompt answer and the detailed information on how to process.

I'm happy to hear that it is worth a go and will surely provide feedback.

Kind regards,

Sven

Maximilian Kähler

unread,

Feb 5, 2026, 5:15:02 AMFeb 5

to Annif Users

The error with the logger has been fixed. So you can give it a try, now.

Best,

Maximilian

Sven Sass

unread,

Feb 6, 2026, 12:45:44 AMFeb 6

to Annif Users

Hello Maximilian,

thanks for the information. Evaluation will probably start next week. Thanks so much for the support!

Best regards,

Sven

Sven Sass

unread,

Feb 10, 2026, 4:58:08 AMFeb 10

to Annif Users

Hello all,

if someone else is thinking about evaluating the ebm backend: following Maximilians instructions it is quite easy to setup the project.

Best regards,

Sven

Sven Sass

unread,

Feb 19, 2026, 3:24:16 AM (14 days ago) Feb 19

to Annif Users

Hello Maximilian,

I tried to send you a personal message, but I'm not sure if it reached you, so just for safety I post it here again.

Currently I'm stuck with my evaluation, because I run into an basically with any configuration (mainly: embedding). I also used your example configuration but to no avail

Traceback (most recent call last):
File "/opt/annif/dev3/Annif/venv/bin/annif", line 6, in <module>
sys.exit(cli())
~~~^^
File "/opt/annif/dev3/Annif/venv/lib/python3.13/site-packages/click/core.py", line 1442, in __call__
return self.main(*args, **kwargs)
~~~~~~~~~^^^^^^^^^^^^^^^^^
File "/opt/annif/dev3/Annif/venv/lib/python3.13/site-packages/click/core.py", line 1363, in main
rv = self.invoke(ctx)
File "/opt/annif/dev3/Annif/venv/lib/python3.13/site-packages/click/core.py", line 1830, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^
File "/opt/annif/dev3/Annif/venv/lib/python3.13/site-packages/click/core.py", line 1226, in invoke
return ctx.invoke(self.callback, **ctx.params)
~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/annif/dev3/Annif/venv/lib/python3.13/site-packages/click/core.py", line 794, in invoke
return callback(*args, **kwargs)
File "/opt/annif/dev3/Annif/venv/lib/python3.13/site-packages/click/decorators.py", line 34, in new_func
return f(get_current_context(), *args, **kwargs)
File "/opt/annif/dev3/Annif/venv/lib/python3.13/site-packages/flask/cli.py", line 400, in decorator
return ctx.invoke(f, *args, **kwargs)
~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^
File "/opt/annif/dev3/Annif/venv/lib/python3.13/site-packages/click/core.py", line 794, in invoke
return callback(*args, **kwargs)
File "/opt/annif/dev3/Annif/annif/cli.py", line 504, in run_eval
for hit_sets, subject_sets in pool.imap_unordered(
~~~~~~~~~~~~~~~~~~~^
psmap.suggest_batch, corpus.doc_batches
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
):
^
File "/home/dev/.local/share/uv/python/cpython-3.13.11-linux-x86_64-gnu/lib/python3.13/multiprocessing/pool.py", line 873, in next
raise value
File "/home/dev/.local/share/uv/python/cpython-3.13.11-linux-x86_64-gnu/lib/python3.13/multiprocessing/pool.py", line 125, in worker
result = (True, func(*args, **kwds))
~~~~^^^^^^^^^^^^^^^
File "/opt/annif/dev3/Annif/annif/parallel.py", line 76, in suggest_batch
suggestion_batch = project.suggest(batch, self.backend_params)
File "/opt/annif/dev3/Annif/annif/project.py", line 272, in suggest
return self._suggest_with_backend(transformed_docs, backend_params)
~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/annif/dev3/Annif/annif/project.py", line 151, in _suggest_with_backend
return self.backend.suggest(docs, beparams)
~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^
File "/opt/annif/dev3/Annif/annif/backend/backend.py", line 143, in suggest
return self._suggest_batch(documents, params=beparams)
~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/annif/dev3/Annif/annif/backend/ebm.py", line 188, in _suggest_batch
candidates = self._model.generate_candidates_batch(
texts=[doc.text for doc in documents],
doc_ids=[i for i in range(len(documents))],
)
File "/opt/annif/dev3/Annif/venv/lib/python3.13/site-packages/ebm4subjects/ebm_model.py", line 567, in generate_candidates_batch
chunk_index = pl.concat(chunk_index).with_row_index("query_id")
~~~~~~~~~^^^^^^^^^^^^^
File "/opt/annif/dev3/Annif/venv/lib/python3.13/site-packages/polars/functions/eager.py", line 234, in concat
out = wrap_df(plr.concat_df(elems))
~~~~~~~~~~~~~^^^^^^^
polars.exceptions.SchemaError: type Int64 is incompatible with expected type Null

Not sure, how I can fix this?

Any insights appreciated.

Best regards,
Sven

Maximilian Kähler

unread,

Feb 19, 2026, 5:37:24 AM (14 days ago) Feb 19

to Annif Users

Dear Sven,

thank you for reporting this. It is actually quite difficult to figure out the root of this error remotely.

What we need is more information, ideally a minimal reproducible example, that allows us to recreate this error in our setting.

Would you mind reporting this error in an issue here:

https://github.com/NatLibFi/Annif/issues

and I would ask you to add the following information:

* your projects.cfg

* some test data (including a test vocab) that produces this error
* the client call that you used ("annif train [your options]")
* package versions in your python environment

I know, this is a lot of work. But it takes even more effort digging into this, without knowing the circumstances.

Thank you!

Best,

Maximilian

Sven Sass

unread,

Feb 20, 2026, 1:54:18 AM (13 days ago) Feb 20

to Annif Users

Hello Maximilan,

I posted an issue report here: https://github.com/NatLibFi/Annif/issues/936

Please let me know if I can be of any help while investigating this issue. I'm really happy to help.

Best regards,

Sven

Sven Sass

unread,

Feb 24, 2026, 2:21:17 AM (9 days ago) Feb 24

to Annif Users

Hello Maximilan,

I'm still on the process of evaluating the EBM with different embeddings/ensembles etc. Once I'm finished I'll post my observations here, in case it might help someone else.

I did notice that if I train a project with a given setting for device, duckdb_threads and want to change it while evaluating the project it will still use the configuration with which it was trained with.

Eg: I do train on "cuda:0" and then change the project configuration to "cuda:1" it will evaluate on "cuda:0".

I have not double checked with other backends if this is the intended behavior - I think it would be nice to switch the gpu or use more/less when required. For now I was training on one GPU, but I could imagine a case training on mutiple GPUs while evaluating only on one.

Similar to this: if I copy the projects folder (projects/[project_name]") to another folder and configure a project for this folder it throws an error:

"Error: Cannot open file "<..>/data/projects/ebm-jina/ebm-duck.db": No such file or directory"

where "ebm-jina" is the original projects name not the current projects name ("ebm-jina-50000")

And of course: I don’t mean to nitpick — I just want to help.

And one more question: the "jinaai/jina-embeddings-v5-text-small" embedding expectes the parameter "task" to be set. This should be one of: retrieval, text-matching, clustering, classification. Is "classification" the right choice?

encode_args_documents={"device": "cuda:0", "batch_size": 300, "show_progress_bar": True, "task": "classification"}

Thank you so much and

best regards,

Sven

ps.: while evaluating I do see a message like this:

"configuration generated by an older version of XGBoost, please export the model by calling
`Booster.save_model` from that version first, then load it back in current version. See:

https://xgboost.readthedocs.io/en/stable/tutorials/saving_model.html

for more details about differences between saving model and serializing."

As to my understanding of the linked page (and chat-gpts) a trained model does not store gpu information and it should be possible to run it on another gpu.

Reply all

Reply to author

Forward