Hi!
Yes, the roles and meaning of the various numerical values could be explained more clearly in Annif Wiki.
The "annif suggest" command operates on one document at time. It gives a list of subjects suggestions with numerical values, i.e. the suggestion scores. They come from the backend that the project uses, and the exact way how the backend and its algorithm calculates the scores is complicated. Generally it is not possible to track how the score is calculated. However, the score values are between 0 and 1; and the higher value, the more relevant the suggestion is to the document (or this is what the algorithm thinks). The threshold option of the suggest command applies to the suggestion scores.
The "annif eval" command operates on multiple documents that already have human-selected, "gold-standard" subjects attached. It gives numerical values for many metrics, which are calculated by comparing the subject suggestions to the gold-standard subjects using all given documents. There are many ways how to exactly do the comparison and metric calculation, which is why many metrics exist. Each metric emphasizes a different aspect of "correctness" of the suggestions. We usually aim to optimize the
F1@5 score when developing models for Finto AI service.
Hope this helps,
-Juho