Hi Sven,
as one of the co-developers here, my answer would be the following:
Right now there is an error with the logger that needs fixing, that you should probably wait for (give us a
week or two).
Generally, we are happy about any feedback from you as a beta-tester. So the answer is a cautious "yes, go ahead but mind the gap..."
How to proceed:
* the correct Annif-branch to work with, is in on our DNB fork: deutsche-nationalbibliothek:issue855-add-ebm-backend
* the ebm4subjects-package could be deployed from pypi, unless you want to work with it's source code. In this case, take the main-branch in
https://github.com/deutsche-nationalbibliothek/ebm4subjects * in our latest version of ebm4subjects, support for sentenceTransformer is an optional dependency, that you would install when installing annif with the backend "ebm-in-process" (see pyproject.toml)
* To get startet: there is a draft for a wiki page on ebm:
https://github.com/NatLibFi/Annif/wiki/DRAFT-%E2%80%90-Backend:-EBM This contains all information how you can configure the backend. The actual embedding model from huggingface is probably the most important parameter.
* To manage expectations: ebm is a method developed to improve performance in the long tail of large vocabularies. On it's own you can expect results that are in about the same metric-values as MLLM, but the actual matches should be significantly distinct from MLLM suggestions (as similarities are based on embeddings and not string representations). You should use ebm along with e.g. omikuji or another statistical approach for best results.
Please feel free to send us feedback via github. Especially, if you run into errors.
Best,
Maximilian