Problem using omikuji backend

20 views
Skip to first unread message

Alfonso Ali

unread,
Nov 21, 2024, 1:41:07 AM11/21/24
to annif...@googlegroups.com
Hi,

I'm evaluating annif and so far i have been able to use the tfidf and mllm backends without problems. But I'm getting an error when training with the omikuji backend:

> annif train -j0 decs-omikuji train.tsv.gz
Backend omikuji: creating vectorizer
Backend omikuji: creating train file
2024-11-20T22:52:43.155Z INFO  [omikuji::data] Loading data from data/projects/decs-omikuji/omikuji-train.txt
2024-11-20T22:52:44.377Z INFO  [omikuji::data] Parsing data
2024-11-20T22:52:46.174Z INFO  [omikuji::data] Loaded 249474 examples; it took 3.02s
2024-11-20T22:52:46.475Z INFO  [omikuji::model::train] Training model with hyper-parameters HyperParam { n_trees: 3, min_branch_size: 100, max_depth: 20, centroid_threshold: 0.0, collapse_every_n_layers: 0, linear: HyperParam { loss_type: Hinge, eps: 0.1, c: 1.0, weight_threshold: 0.1, max_iter: 20 }, cluster: HyperParam { k: 2, balanced: true, eps: 0.0001, min_size: 2 }, tree_structure_only: false, train_trees_1_by_1: false }
2024-11-20T22:52:46.476Z INFO  [omikuji::model::train] Initializing tree trainer
2024-11-20T22:52:46.512Z INFO  [omikuji::model::train] Computing label centroids
Labels 22581 / 22581 [============================================================================================================================] 100.00 % 7858.36/s 2024-11-20T22:53:24.536Z INFO  [omikuji::model::train] Start training forest
25229 / 68347 [===============================================>----------------------------------------------------------------------------------] 36.91 % 101.00/s 7m 
Killed

Annif version: 
1.2.0 (using docker)

Project cfg:
[decs-omikuji]
name=DeCS Omikuji Parabel
language=es
backend=omikuji
analyzer=snowball(spanish)
vocab=decs

OS: 
macOS 15.1

The same train data worked flawlessly with tfidf and mllm.

Regards,


juho.i...@helsinki.fi

unread,
Nov 21, 2024, 3:14:12 AM11/21/24
to Annif Users
Hi!

I think you are running out of memory, thus the abrupt and short error message. Maybe you can monitor the memory usage while training?

The Omikuji and FastText projects can require significant amounts of disk space and memory: for example, the Annif projects running for our Finto AI service take about 30 GBs memory and 22 GB disk space (FastText project for Finnish takes 6 GBs and Omikuji Bonsai for Finnish 3.6 GBs disk space). Note that while training a project the memory requirement can be higher than when running it for suggestions.

The memory requirement depends on the amount of training documents and algorithm configuration. The parameters min_df and ngram are the most important ones for this, see the details in this Wiki section: https://github.com/NatLibFi/Annif/wiki/Backend%3A-Omikuji#backend-specific-parameters (We should probably add some actual numbers of the required memory there.)

Maybe Docker for MacOS has also some configuration for exposing memory to the container that should be checked?

-Juho

Ali

unread,
Nov 22, 2024, 2:10:11 AM11/22/24
to Annif Users
Hi Juho!

Thanks, I tried omikuji bonsai as explained in the wiki and it worked flawlessly!!

Regards,
  Ali

Reply all
Reply to author
Forward
0 new messages