Evaluating ebm backend / developer branch

46 views
Skip to first unread message

Sven Sass

unread,
Feb 2, 2026, 5:17:20 AMFeb 2
to Annif Users
Hello all,

I'm trying to evaluate the ebm backend, but I wanted to check beforehand:

1.) Is it a bad idea for an non (Annif-)developer trying to evalulate that backend as it not stable for now?
2.) If it is not a too bad idea: what would the correct approach be 
  a.) checkout branch "deutsche-nationalbibliothek-issue855-add-ebm-backend-gh-hosted-large-runner" 
  b.) checkout https://github.com/deutsche-nationalbibliothek/ebm4subjects (and use it for current main?)
  c.) something else?

Any information appreciated.

Kind regards
Sven

Maximilian Kähler

unread,
Feb 3, 2026, 6:29:56 AMFeb 3
to Annif Users
Hi Sven,

as one of the co-developers here, my answer would be the following:

Right now there is an error with the logger that needs fixing, that you should probably wait for (give us a  week  or two). 
Generally, we are happy about any feedback from you as a beta-tester. So the answer is a cautious "yes, go ahead but mind the gap..." 

How to proceed:

  * the correct Annif-branch to work with, is in on our DNB fork: deutsche-nationalbibliothek:issue855-add-ebm-backend
  * the ebm4subjects-package could be deployed from pypi, unless you want to work with it's source code. In this case, take the main-branch in https://github.com/deutsche-nationalbibliothek/ebm4subjects
  * in our latest version of ebm4subjects, support for sentenceTransformer is an optional dependency, that you would install when installing annif with the backend "ebm-in-process" (see pyproject.toml)
  *  To get startet: there is a draft for a wiki page on ebm: https://github.com/NatLibFi/Annif/wiki/DRAFT-%E2%80%90-Backend:-EBM  This contains all information how you can configure the backend. The actual  embedding model from huggingface is probably the most important parameter.  
  * To manage expectations: ebm is a method developed to improve performance in the long tail of large vocabularies. On it's own you can expect results that are in about the same metric-values as MLLM, but the actual matches should be significantly distinct from MLLM suggestions (as similarities are based on embeddings and not string representations). You should use ebm along with e.g. omikuji or another statistical approach for best results.   

Please feel free to send us feedback via github. Especially, if you run into errors. 

Best,
Maximilian

Sven Sass

unread,
Feb 4, 2026, 12:51:35 AMFeb 4
to Annif Users
Hi Maximilian,

thank you for your prompt answer and the detailed information on how to process.

I'm happy to hear that it is worth a go and will surely provide feedback.

Kind regards,
Sven

Maximilian Kähler

unread,
Feb 5, 2026, 5:15:02 AMFeb 5
to Annif Users
The error with the logger has been fixed. So you can give it a try, now. 
Best,
Maximilian

Sven Sass

unread,
Feb 6, 2026, 12:45:44 AMFeb 6
to Annif Users
Hello Maximilian,

thanks for the information. Evaluation will probably start next week. Thanks so much for the support!

Best regards,
Sven

Sven Sass

unread,
Feb 10, 2026, 4:58:08 AMFeb 10
to Annif Users
Hello all,

if someone else is thinking about evaluating the ebm backend: following Maximilians instructions it is quite easy to setup the project.

Best regards,
Sven

Sven Sass

unread,
Feb 19, 2026, 3:24:16 AM (14 days ago) Feb 19
to Annif Users
Hello Maximilian,

I tried to send you a personal message, but I'm not sure if it reached you, so just for safety I post it here again.

Currently I'm stuck with my evaluation, because I run into an basically with any configuration (mainly: embedding). I also used your example configuration but to no avail

Traceback (most recent call last):
  File "/opt/annif/dev3/Annif/venv/bin/annif", line 6, in <module>
    sys.exit(cli())
             ~~~^^
  File "/opt/annif/dev3/Annif/venv/lib/python3.13/site-packages/click/core.py", line 1442, in __call__
    return self.main(*args, **kwargs)
           ~~~~~~~~~^^^^^^^^^^^^^^^^^
  File "/opt/annif/dev3/Annif/venv/lib/python3.13/site-packages/click/core.py", line 1363, in main
    rv = self.invoke(ctx)
  File "/opt/annif/dev3/Annif/venv/lib/python3.13/site-packages/click/core.py", line 1830, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
                           ~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^
  File "/opt/annif/dev3/Annif/venv/lib/python3.13/site-packages/click/core.py", line 1226, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/annif/dev3/Annif/venv/lib/python3.13/site-packages/click/core.py", line 794, in invoke
    return callback(*args, **kwargs)
  File "/opt/annif/dev3/Annif/venv/lib/python3.13/site-packages/click/decorators.py", line 34, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/opt/annif/dev3/Annif/venv/lib/python3.13/site-packages/flask/cli.py", line 400, in decorator
    return ctx.invoke(f, *args, **kwargs)
           ~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^
  File "/opt/annif/dev3/Annif/venv/lib/python3.13/site-packages/click/core.py", line 794, in invoke
    return callback(*args, **kwargs)
  File "/opt/annif/dev3/Annif/annif/cli.py", line 504, in run_eval
    for hit_sets, subject_sets in pool.imap_unordered(
                                  ~~~~~~~~~~~~~~~~~~~^
        psmap.suggest_batch, corpus.doc_batches
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    ):
    ^
  File "/home/dev/.local/share/uv/python/cpython-3.13.11-linux-x86_64-gnu/lib/python3.13/multiprocessing/pool.py", line 873, in next
    raise value
  File "/home/dev/.local/share/uv/python/cpython-3.13.11-linux-x86_64-gnu/lib/python3.13/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
                    ~~~~^^^^^^^^^^^^^^^
  File "/opt/annif/dev3/Annif/annif/parallel.py", line 76, in suggest_batch
    suggestion_batch = project.suggest(batch, self.backend_params)
  File "/opt/annif/dev3/Annif/annif/project.py", line 272, in suggest
    return self._suggest_with_backend(transformed_docs, backend_params)
           ~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/annif/dev3/Annif/annif/project.py", line 151, in _suggest_with_backend
    return self.backend.suggest(docs, beparams)
           ~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^
  File "/opt/annif/dev3/Annif/annif/backend/backend.py", line 143, in suggest
    return self._suggest_batch(documents, params=beparams)
           ~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/annif/dev3/Annif/annif/backend/ebm.py", line 188, in _suggest_batch
    candidates = self._model.generate_candidates_batch(
        texts=[doc.text for doc in documents],
        doc_ids=[i for i in range(len(documents))],
    )
  File "/opt/annif/dev3/Annif/venv/lib/python3.13/site-packages/ebm4subjects/ebm_model.py", line 567, in generate_candidates_batch
    chunk_index = pl.concat(chunk_index).with_row_index("query_id")
                  ~~~~~~~~~^^^^^^^^^^^^^
  File "/opt/annif/dev3/Annif/venv/lib/python3.13/site-packages/polars/functions/eager.py", line 234, in concat
    out = wrap_df(plr.concat_df(elems))
                  ~~~~~~~~~~~~~^^^^^^^
polars.exceptions.SchemaError: type Int64 is incompatible with expected type Null

Not sure, how I can fix this? 

Any insights appreciated.

Best regards,
Sven

Maximilian Kähler

unread,
Feb 19, 2026, 5:37:24 AM (14 days ago) Feb 19
to Annif Users
Dear Sven,

thank you for reporting this. It is actually quite  difficult to figure out the root of this error remotely.
What we need is more information, ideally a minimal reproducible example, that allows us to recreate this error in our setting. 
Would you mind reporting this error in an issue here:

and I would ask you to add the following information:

* your projects.cfg
* some test data (including a test vocab) that produces this error
* the client call that you used ("annif train [your options]")
 * package versions in your python environment

I know, this is a lot of work. But it takes even more effort digging into this, without knowing the circumstances. 

Thank you!

Best,
Maximilian

Sven Sass

unread,
Feb 20, 2026, 1:54:18 AM (13 days ago) Feb 20
to Annif Users
Hello Maximilan,

I posted an issue report here: https://github.com/NatLibFi/Annif/issues/936

Please let me know if I can be of any help while investigating this issue. I'm really happy to help.

Best regards,
Sven

Sven Sass

unread,
Feb 24, 2026, 2:21:17 AM (9 days ago) Feb 24
to Annif Users
Hello Maximilan,

I'm still on the process of evaluating the EBM with different embeddings/ensembles etc. Once I'm finished I'll post my observations here, in case it might help someone else.

I did notice that if I train a project with a given setting for device, duckdb_threads and want to change it while evaluating the project it will still use the configuration with which it was trained with.

Eg: I do train on "cuda:0" and then change the project configuration to "cuda:1" it will evaluate on "cuda:0".

I have not double checked with other backends if this is the intended behavior - I think it would be nice to switch the gpu or use more/less when required. For now I was training on one GPU, but I could imagine a case training on mutiple GPUs while evaluating only on one.

Similar to this: if I copy the projects folder (projects/[project_name]") to another folder and configure a project for this folder it throws an error:
"Error: Cannot open file "<..>/data/projects/ebm-jina/ebm-duck.db": No such file or directory"
where "ebm-jina" is the original projects name not the current projects name ("ebm-jina-50000")

And of course: I don’t mean to nitpick — I just want to help.

And one more question: the "jinaai/jina-embeddings-v5-text-small" embedding expectes the parameter "task" to be set. This should be one of: retrieval, text-matching, clustering, classification. Is "classification" the right choice?
encode_args_documents={"device": "cuda:0", "batch_size": 300, "show_progress_bar": True, "task": "classification"}

Thank you so much and

best regards,
Sven


ps.: while evaluating I do see a message like this:
"configuration generated by an older version of XGBoost, please export the model by calling
`Booster.save_model` from that version first, then load it back in current version. See:

    https://xgboost.readthedocs.io/en/stable/tutorials/saving_model.html

for more details about differences between saving model and serializing."

As to my understanding of the linked page (and chat-gpts) a trained model does not store gpu information and it should be possible to run it on another gpu.
Reply all
Reply to author
Forward
0 new messages