Hello Osma, hello Annif team and users,Â
the German National Library (DNB) evaluated Annif over the last year. At this moment we prepare to go live with a first Annif workflow for automatic subject indexing. We use The Integrated Authority File (GND) as the vocabulary for our purpose to enrich the metadata of German online publications.Â
Our favourite candidate for a productive backend is an ensemble consisting of Omikuji-Bonsai and Maui. Therefore, we are very interested and pleased that the new Maui-like lexical matching backend (MLLM) is now part of Annif 0.52. So we have one more tool to improve the quality and we are very happy to work with a sustainable solution to reduce the dimension of the annif installation .Â
Here are some very "fresh" results of our first small evaluation: Comparing Maui with MLLM. Â
Vocabulary:Â
1.3 million GND descriptors modelled in SKOS (simple version with preflabel, altlabel, no relations etc.)Â
Training data:Â
8559 German-language tables of contents.Â
Test data:
Test set A = 1261 German-language online publications
Test set B = 937 German-language tables of contentsÂ
Results:
|| Test set A (Online Publications) F1@5 *Maui 0.174* | F1@5 *MLLM 0.196*||
|| Test set B (Table of Contents) F1@5 *Maui 0.178*Â | F1@5 *MLLM 0.205* ||
 See the full eval metrics in the attachement _DNB_Sketch_Comparision_Maui_vs_MLLM_20210421.pdf_
Conclusion:
Under the same conditions the MLLM backend produces out of the box better results than Maui!
This is very encouraging and motivating for us to spend more time to test in detail the properties of the new MLLM backend. We’ll optimize it for our purpose step-by-step especially regarding to the backend-specific parameters. Furthermore, we are going to optimize our GND SKOS file (adding relations, hidden labels and collections).
We also are going to evaluate the stwfsa backend soon as one more great backend with an underlying lexical method.
Thanks to Osma and his colleagues for MLLM, well done!Â
Greetings,
Christoph, Sandro and the Team at DNB
Dear Osma, Dear Annif Team,Â
I would like to add something about the performance of MLLM. As you wrote in the post above, MLLM needs a relatively long time to train. From DNB's point of view this is not a big problem (faster is of course always wishful ;-), because we have separate and not time critical workflows for these kinds of processes. However - and therefore I take up the topic again - MLLM also needs a very, very long time for the processing of documents (processing time/lead time). In view of a productive use of MLLM (standalone or combined in an ensemble) this is indeed a considerable disadvantage.
For comparison, here is the processing time of an electronic
document (text length 30000 characters) processed in a Docker container version
with annif 0.52:
[INFO ] 2021/05/19 13:32:18: 1162771593
[INFO ] 2021/05/19 13:32:18: Use: gnd-maui-en-0.52-1
[INFO ] 2021/05/19 13:32:18: Use: gnd-mllm-en-0.52-1
[INFO ] 2021/05/19 13:32:24: Use: gnd-omikuji-bonsai-en-0.52-1
[INFO ] 2021/05/19 13:32:25: Use: gnd-ensemble-en-0.52-1 (Maui + omikuji-bonsai)
[INFO ] 2021/05/19 13:32:28: Use: gnd-ensemble-en-0.52-3 (MLLM + omikuji-bonsai)
[INFO ] 2021/05/19 13:32:38: 1167595939
...
Maui takes less than a second for one document, omikuji-bonsai takes about one second, MLLM takes six seconds. An ensemble of Maui + omikuji-bonsai takes three seconds, an ensemble of MLLM + omikuji-bonsai takes ten seconds.
The ensemble of MLLM + omikuji-bonsai produces very good quality results and is therefore a hot candidate for productive use, but it is a bottleneck, too. We have an average daily accession of 3000 online publications (monographs or articles) in DNB, so it makes a difference for our use case whether the processing takes 9000 seconds (Maui + omikuji-bonsai) or 30000 seconds (MLLM + omikuji-bonsai). The dimension is the following one: 30000 seconds are a little bit more than 8 hours each day. Regarding to the fact, that we have some days with more than 10000 fresh online publications this becomes critical. So the problem should be tackled at the source. The reduction of processing time of MLLM or the possibility of more parallel workflows inside of Annif will help us to process more online publications each day and to avoid complex workflows in automatic indexing systems that use Annif with MLLM inside.Â
It would be very helpful if a reduction of the processing time of MLLM could be technically realized.Â
Thanxs and Greetings,
Christoph & Sandro (Team DNB)
Dear Osma, Dear Annif Team,Â
thanks for the feedback and willingness to look for a solution!
We will pack the required data and send it to you via E-Mail.Â
Many greetings, Christoph and Sandro
Hi Mona, Hi Juho, Hi Osma,
thank you for the
research and improvements on the speed of MLLM! We have set up a small test
series, here are the results:
====================*Base*===========================================
 ----------------------------------------------------------------------------------------------
#Hardware
| Test on CLI were done on 16 CPU on VM with OS Ubuntu 18.04.4 LTS, 32 GB Memory, 640 GB RAM
| Test on Docker were done on 8 CPU on VM with OS Ubuntu 20.04.2, 64 GB Memory, 126 GB RAM
  (tests were done each with all cores, option --jobs was not used)
Â
#Modells
| gnd-mllm-de-0.53: MLLM Model based on Annif 0.53_1
| gnd-mllm-de-0.54: MLLM Model based on Annif 0.54
| gnd-ensemble-de-0.54-0 (MLLM Model based on Annif 0.54 + omikuji-bonsai Model based on Annif 0.52)
Â
#Vocabulary
| German Authorithy File (GND) 1.3 million GND descriptors modelled in SKOS (simple version with preflabels and altlabels, no relations)
Â
#Trainset
| 20.998 German-language fulltexts with text length 30000 characters
Â
#Testsets
| Small Testset: 47 German-language fulltexts with text length 30000 characters
| Big Testset: 928 German-language fulltexts with text length 30000 characters
Â
#Single Test Documents
| Single doc 1/2/3: 3 single German-language fulltexts with text length 30000 characters
 Â
==============*Test series (results per command)*===============================
 ----------------------------------------------------------------------------------------------
#train
 | gnd-mllm-de-0.53 | Time total: 1478m7,066s (24 hours 38 minutes)
| gnd-mllm-de-0.54 | Time total: 108m2,953s (1 hour 48 minutes)
Â
 ----------------------------------------------------------------------------------------------
#eval
 | gnd-mllm-de-0.53 | Small Testset Time total: 10m53,712s
| gnd-mllm-de-0.54 | Small Testset Time total: 7m36,942s
Â
| gnd-mllm-de-0.53 | Small Testset Time average per document: 13,91s
| gnd-mllm-de-0.54 | Small Testset Time average per document: 9,72s
Â
| gnd-mllm-de-0.53 | Big Testset Time total: 111m39,175s
| gnd-mllm-de-0.54 | Big Testset Time total: 42m21,261s
Â
| gnd-mllm-de-0.53 | Big Testset Time average per document: 7,22s
| gnd-mllm-de-0.54 | Big Testset Time average per document: 2,74s
 Â
----------------------------------------------------------------------------------------------
#index
 | gnd-mllm-de-0.53 | Small Testset Time total: 9m44,783s
| gnd-mllm-de-0.54 | Small Testset Time total: 6m26,651s
Â
| gnd-mllm-de-0.53 | Small Testset Time average per document: 12,44s
| gnd-mllm-de-0.54 | Small Testset Time average per document: 8,23s
Â
| gnd-mllm-de-0.53 | Big Testset Time total: 91m34,165s
| gnd-mllm-de-0.54 | Big Testset Time total: 21m55,500s
Â
| gnd-mllm-de-0.53 | Big Testset Time average per document: 5,92s
| gnd-mllm-de-0.54 | Big Testset Time average per document: 1,42s
 Â
----------------------------------------------------------------------------------------------
#suggest
|
gnd-mllm-de-0.53 | Single doc 1 Time total: 5m24,054s
| gnd-mllm-de-0.54 | Single doc 1 Time total: 5m31,013s
Â
| gnd-mllm-de-0.53 | Single doc 2 Time total: 5m18,673s
| gnd-mllm-de-0.54 | Single doc 2 Time total: 5m25,215s
Â
| gnd-mllm-de-0.53 | Single doc 3 Time total: 5m15,833s
| gnd-mllm-de-0.54 | Single doc 3 Time total: 5m27,852s
 Â
---------------------------------------------------------------------------------------------
# Single doc processed in a Docker container version with Annif 0.54
(For comparison, see also my post of 02.07.2021, 14:47:06 above)
Â
[INFO ] 2021/08/30 16:52:53: 1162771593
[INFO ] 2021/08/30 16:52:53: Use: gnd-maui-de-0.52-1
[INFO ] 2021/08/30 16:52:53: Use: gnd-mllm-de-0.54-0
[INFO ] 2021/08/30 16:52:54: Use: gnd-omikuji-bonsai-de-0.52-1
[INFO ] 2021/08/30 16:52:55: Use: gnd-ensemble-de-0.52-1 (Maui + omikuji-bonsai)
[INFO ] 2021/08/30 16:52:56: Use: gnd-ensemble-de-0.54-0 (MLLM 0.54 + omikuji-bonsai)
[INFO ] 2021/08/30 16:52:59: 1167595939
 Â
============*Conclusion performance MLLM Annif 0.53_1 vs. MLLM Annif 0.54*========================
 The training time for MLLM has decreased in our case by a factor of 13,7 (using all 16 CPUs). Wow, thats much faster. Great!
 The eval command with MLLM 0.54 processes a document of the Big Testset (928 docs) with an average time of 2,74s. That’s 4,49 seconds faster than MLLM 0.53_1. This also applies to the Small Testset (47 docs), where the new release is 4,19 seconds faster.
 Same for the index command: MLLM 0.54 indexes a document of the Big Testset with an average time of 1,42s. Thats 4,5 seconds faster than MLLM 0.53_1. This also applies to the Small Testset, where the new release is 4,2 seconds faster.
 When processing a single document on CLI with cat plus the suggest command, surprisingly MLLM 0.53_1 is around 7 to 12 seconds faster than MLLM 0.54 (?). We were therefore all the more interested to see how MLLM 0.54 behaves with suggest under Docker via REST api.
When
using suggest under Docker via REST api, the tests happily shows a picture of
performance improvement again. MLLM 0.54 processes a document with an average
time of 1s. MLLM under 0.52 has taken six seconds. So MLLM 0.54 is 5 seconds
faster. An ensemble of MLLM 0.52 +
omikuji-bonsai has taken 10 seconds. An ensemble of MLLM 0.54 + omikuji-bonsai needs
3 seconds. Thats 7 seconds faster!
 In summary: MLLM is much faster and thus more suitable for (productive) use in the future, even when using a large vocabulary. Thanks for the willingness to invest here and the realization!
 Monet terveiset,
Â
Christoph & Sandro