Dear Sven,
thank you for your thorough experimentation!
Regarding what went wrong with the annif hyperopt, when it seemed to
arrive at a suboptimal solution.
You said you used the command:
annif hyperopt -m "F1 score (doc avg)"
There are two issues with this:
1. You tried to optimize for F1 score, which is not a very stable
metric. I recommend that you leave the metric at the default, which is NDCG.
2. The default number of hyperopt trials is only 10. With such a low
number of trials, you have to be lucky to find a good set of weights. I
suggest trying a much larger number of trials, at least 100 but 200-300
would be better (e.g. --trials 300). The initial 10 trials is just for
verifying that the process itself is working.
You can also set e.g. --jobs 8 to spread the load into multiple CPU
cores, which should speed up the process.
Best,
Osma
'Sven Sass' via Annif Users kirjoitti 18.3.2026 klo 14.25:
> Hello Maximilian,
>
> tl;dr:
> a.) I wanted to double check that training with 600k document fails and
> I'm afraid this is indeed the case for both e5 and m3 embeddings.
> b.) I was able to improve F1-score with manual weights on ebm-ensemble
> by 0.77%
>
> More details:
>
> *_intfloat/multilingual-e5-small_*
> *_BAAI/bge-m3_*
> *_Ensemble_*
> *_Training-set_*
> As for the amount of training documents - I had the same question
> for Osma and he also told me that 600k is far too much. As for
> omikuji my measurement is: more is better. As mentioned in my
> previous post the mllm improved with fewer training documents. I'm
> still struggling to understand how 1.000 documents can be sufficient
> if I have around 7.000 labels? This would mean I would not have an
> example for every label (best case: only one example for 1.000 labels).
>
> If I do find the time I'll look into the max_chunk_count and
> max_chunk_length. My gut feeling tells me, this won't help with 600k
> documents anyway (disregarding if this is a reasonable approach in
> the first place). Batch size was at 32 for most of the tests. I did
> increase it to 300 in some cases - but all of those worked in the
> end. I did monitor that GPU-load is not at 100% - it is more at
> 50%-70%. However it was at constant 100% with the x-transformer
> backend if I remember correctly and it surely was the case when I
> was generating embeddings with llamaindex. Maybe there is some room
> for performance optimization which I did not find in my evaluations.
> But I surely like how seemless it uses both GPUs at the same time,
> this was not the case with the x-transformer.
>
> *_Ensembles_*
> I did use hyperopt to calculate weights: "annif hyperopt -m "F1
> score (doc avg)" ensemble-ebm-50000"
> Our expectations of the weights match though. I expected other
> weights, but had no indication that hyperopt did not produce valid
> results. I'll give it a shot with "manual weighting" to see if it
> improves the result.
>
> As my previous tests did not show improvements for nn-ensemble I did
> not test it. I agree, revisiting this after the new nn-ensemble is
> finished does make sense.
>
>
> Here is some more detailed information about my issues during
> training an devaluation.
>
> *google/embeddinggemma-300m*
> While training 600.000 articles it was obvious after 2% that it was
> using up the GPU memory fast and at a steady pace. Assuming the
> ratio remains the same it would have taken about 1.400 GB for 100%
> so. In a later retest with 50.000 documents it stop with "cuda out
> of memory".
>
>
> *_jinaai/jina-embeddings-v3_*
> *_intfloat/multilingual-e5-small_*
> No issues with 50,000, it took 18h for training
>
> Started training process with 600k documents to see what happens.
> I'm not 100% sure if I tested this, at least it is not documented so
> I'm training it again.
>
>
> *_BAAI/bge-m3_*
> No issues with 50,000, it took 21h for training
>
> Will start training process with 600k documents to see what happens.
> Again: I do think I've tested this, but have not documented it.
>
> *_nvidia/llama-embed-nemotron-8b_*
> "Out of cuda memory" more or less at startup
>
> *_Qwen/Qwen3-Embedding-8B_*
> "Out of cuda memory" more or less at startup
>
> *_jinaai/jina-embeddings-v5-text-small_*
> Training with 50.000k works finde, but evaluation 100k documents
> crashes after ~18h with 190GB CPU-memory. Last message was
> "Backend ebm: running vector search and creating candidates with
> query_jobs: 16"
> The was barely any CPU/GPU load.
>
>
> My assumption is that any model failing with "Out of cuda memory"
> right at the beginning, will not work with any amount of documents
> on my GPU. I did implement a RAG-system in another context and did
> succesfully use Qwen/Qwen3-Embedding-8B embedding for generating the
> embeddings - so this embedding does fit in my GPU (generally speaking).
>
> Side note: I'm really curios how the new V5 embedding competes vs
> other local embeddings and OpenAIs "text-embedding-3-large" (in the
> RAG-context I'm allowed to use cloud services)
>
>
> I'll send an update when the mentioned training tests are done -
> feel free to ask for any information!
>
> Best regards,
> Sven
>
>
>
mfaka...@gmail.com schrieb am Mittwoch, 11. März 2026 um 15:39:45 UTC+1:
>
> Dear Sevn,
>
> thank you for that detailed report. And also for your previous
> message:
>
> *Parameter-Change between training and evaluation*
> Your request to switch some configuration parameters (like cuda
> vs cpu) between training and evaluation is very reasonable. We
> have already implemented this and released the ebm4subjetcs
> package in a new version. An update to the annif backend will
> follow shortly. Thank you for putting that forward. You will
> then be able to overwrite most parameters with the annif client
> arguments. You will also be able to switch the deployment
> options (from in-process for training to API for production).
>
> *Correct usage of jinaai/jina-embeddings-v5-text-small*
> I haven't worked with the newest jinai-ai model. An earlier
> version supported assymmetric embeddings for retrieval, e.g.
> task = "retrieval.passage" (for documents) and
> "retrieval.query" (for vocab). This is best fitted to EBM. I
> think this is now handled with the argument "prompt name" and
> task = "retrieval". I think setting it to task "classification"
> would not be ideal.
> *Saving ressources with EBM:*
> Indeed, processing time for EBM is quite slow. The bottle-neck
> is the embedding generation, primarily. EBM is not a typical
> supervised learning backend in the sense, that the quality
> scales with the amount of training data. What is trained is only
> the ranking model, which may be saturated by 1.000 documents or
> even less. So fueling it with 600k documents for training is way
> overshot. Processing time scales linear with documents for EBM.
> To *cut cost*, consider:
> * reducing the number of training docs
> * restrciting the number of chunks per document
> `max_chunk_count` (especially for longer ducments this can be
> expensive).
> * allow for larger chunks `max_chunk_length` is quite 50
> characters by default, which means that any chunk larger then
> that will be split after the next sentence. So usually one
> sentence is one chunk. If you choose a higher number, chunking
> will be coarser, also resulting in fewer chunks in total
> * some models (like jinai-AI) also support "matryoshka
> embeddings", which allows you to choose a smaller embedding
> dimension. I haven't tested it myself, but this might also help
> to speed things up.
>
> I was surprised that the training crashed with so many of the
> embedding models. I would expect 48GB of VRAM to be well enough
> for most of these models, as long as you don't set the
> batch_size to unreasonable high values (start with 32, see if it
> works, then abort and double up. Repeat until you reach your
> VRAM-limit). What was the cause for "not being able to train"
> with the other models? Something like CUDA-out-of-memory?
>
> *Inlcuding EBM into an Annif ensemble:*
> It is unfortunate, that you could not improve your ensemble by
> adding EBM. From what I can tell in your setup, putting the
> weights between omikuji,mllm and ebm to values so close to equal
> puts to much emphasize on the weaker components (EBM and MLLM).
> Did you determine these weights manually or with annif optimise?
> If I had to guess parameters I'd say omikuji:0.66 and splitting
> up the rest bewteen EBM and MLLM.
> We are not at this point with our own ensembles at DNB. So we
> still need to find the best way to integrate EBM into en
> ensemble. Maybe you can also achieve better performance with nn-
> ensemble. But maybe you should wait with this until the
> (re-)developments with the nn-ensemble are finished.
>
> Thank you again for your feedback. This is very valuable.
> Especially in the current phase before the first release, it is
> very helpful to have early testers! Please, don't hesitate to
> report any other issues, that you might have.
>
> Best,
> Maximilian
>
>
j3s...@googlemail.com schrieb am Dienstag, 10. März 2026 um
> 10:29:38 UTC+1:
>
> Hello Maximilian/all,
>
> my evaluation is finished and here are my findings.
>
>
> *_Prerequisites_*:
> My dataset is quite large: I do have around 700.000 short
> text documents of which 600.000 are used for training and
> 100.000 for evaluation with around 7.000 labels. I'm using
> 256GB RAM (+256 swap) and have two ADA 6000 (48GB VRAM each).
>
> I did evaluate basicly all available backends before to find
> out what is the best combination of backends in my case.
> Best is here defined as best "F1 score (doc avg)". The
> champion is this this ensemble (basic ensemble not nn)
> - Omikuji-Attention*0.4624,
> - Xtransformer*0.4206,
> - MLLM*0,1170
> with a limit of 15 and a threshold of 0.15 the F1 value is
> at 69,44%.
>
> The task was to evaluate if I can add the EBM-backend to
> improve this result.
>
>
> *_Installation_*
> As this backend is not yet in the main version of Annif the
> installation steps can be found here:
https://github.com/
> NatLibFi/Annif/issues/936 <
https://github.com/NatLibFi/
> Annif/issues/936>.
>
> Please note: Maximilian pointed out that using uv is easier:
> uv sync --extra ebm-in-process # or --extra ebm-api
>
>
> *_Training_*
> *_Evaluation_*
> saving_model.html <
https://xgboost.readthedocs.io/en/
> stable/tutorials/saving_model.html>
>
> for more details about differences between saving model
> and serializing."
>
> As to my understanding of the linked page (and chat-
> gpts) a trained model does not store gpu information and
> it should be possible to run it on another gpu.
>
>
> Sven Sass schrieb am Freitag, 20. Februar 2026 um
> 07:54:18 UTC+1:
>
> Hello Maximilan,
>
> I posted an issue report here:
https://github.com/
> NatLibFi/Annif/issues/936 <
https://github.com/
> NatLibFi/Annif/issues/936>
> File "/opt/annif/dev3/Annif/annif/
> ebm4subjects <https://
>
github.com/deutsche-
> nationalbibliothek/
> ebm4subjects>
> * in our latest
> version of ebm4subjects,
> support for
> sentenceTransformer is
> an optional dependency,
> that you would install
> when installing annif
> with the backend "ebm-
> in-process" (see
> pyproject.toml)
> * To get startet:
> there is a draft for a
> wiki page on ebm:
>
https://github.com/
> NatLibFi/Annif/wiki/
> DRAFT-%E2%80%90-
> Backend:-EBM <https://
>
github.com/NatLibFi/
> Annif/wiki/DRAFT-
> <
https://github.com/
> deutsche-
> nationalbibliothek/
> ebm4subjects> (and
> use it for current
> main?)
> c.) something else?
>
> Any information
> appreciated.
>
> Kind regards
> Sven
>
> --
> You received this message because you are subscribed to the Google
> Groups "Annif Users" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to
annif-users...@googlegroups.com <mailto:
annif-
>
users+un...@googlegroups.com>.
> To view this discussion visit
https://groups.google.com/d/msgid/annif-
> users/e1a3ac4f-ecea-4b70-bc2f-79721ed7c223n%
40googlegroups.com <https://
>
groups.google.com/d/msgid/annif-users/e1a3ac4f-ecea-4b70-
> bc2f-79721ed7c223n%
40googlegroups.com?utm_medium=email&utm_source=footer>.
--
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 15 (Unioninkatu 36)
00014 HELSINGIN YLIOPISTO
Tel.
+358 50 3199529
osma.s...@helsinki.fi
http://www.nationallibrary.fi