Dear all,
As you may remember, we participated in the LLMs4Subjects challenge earlier in 2025. The challenge was held in conjunction with the SemEval-2025 workshop at the ACL conference. The task was to apply large language models (LLMs) to produce high-quality subject indexing, using bibliographic records containing titles and abstracts from the bilingual (English & German) TIBKAT database using the extensive German GND vocabulary.
We have previously reported on the good placement of our Annif tool in this competition, as well as on the pre-publications that the participating teams had already produced (see previous post). At the end of July, a poster session was also held in connection with SemEval-2025, which Annif developer Osma Suominen attended virtually. The poster summarizing our results is available on GitHub.
Now the entire SemEval-2025 workshop proceedings are available¹, including articles related to the LLMs4Subjects subtask (Task 5). We read them with great interest and got ideas for the further development of Annif!
The LLMs4Subjects competition continued in the form of the GermEval workshop challenge! This time, the goal was to improve the results of the first round and focus on the resource efficiency of the models used. We participated in this round as well, finishing in 1st place! You can read the pre-print of our contribution at: https://doi.org/10.48550/arXiv.2508.15877
¹Rosenthal, S., Rosá, A., Ghosh, D., & Zampieri, M. (Eds.). (2025). Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025). Vienna, Austria: Association for Computational Linguistics.https://aclanthology.org/volumes/2025.semeval-1/