Good to hear you got the project working :)
Yes, training NN ensemble can take a long time, for example around 2 hours when training on 1400 full-text documents (see the
NN ensemble exercise of Annif tutorial). You have very many documents but they are short, and I can't guess how long it would take. You could first try out with a limited number of training documents (lines in your TSV file) by adding the
--docs-limit <number> to the train command. For example to train with only 1000 documents
annif train rameau-ensemble-fr /home/aurelie/ABES/Annif-tutorial/data-sets/rameau/rameau-train.tsv --docs-limit 1000
When you know how long this takes, you can estimate how long would training on your full training set take.
Also, we usually train NN ensemble with full-text documents (that is with text lengths like several pages of PDF documents at least), so I'm not sure how much using NN ensemble instead of simple ensemble helps. Usually using NN ensemble instead of simple ensemble increases the evaluation metrics about 1-3 percentage points.
I think using
annif suggest on TSV file does not make sense if that is in the short text document format, because that file then already contains the subjects (URIs). For
annif eval this is exactly what is needed (for comparing the Annif suggestions against gold-standard subjects). If you want Annif to give suggestions for each notice in TSV file, then each line would need to be fed separately (and without the URIs) to annif suggest, but this could be quite slow. An alternative way would be to separate each notice to its own txt-file and store them in a directory, to which you could run annif index.
-Juho