How to upgrade a Vocabulary?

Parthasarathi Mukhopadhyay

unread,

Apr 3, 2022, 9:51:30 AM4/3/22

to annif...@googlegroups.com

Dear all

I've a situation like this (in Annif pip install v. 0.57) :

1. A vocabulary control device (ttl format) already loaded into Annif;

2. A new version of the same vocabulary released today (ttl with language level);

3. Now there are two options, if I wish to load the new version - a) delete the old version from Annif and loadvoc the new one or b) upgrade the old version in Annif with the new one.

Problem is:

What do you people suggest? Will the loadvoc command for the new version of vocabulary solve the issue?

I've not yet trained Annif (in the middle of preparing the training dataset) for the loaded vocabulary.

But if training is done and the vocabulary version changed, then what will be the solution?

Regards

Parthasarathi Mukhopadhyay

Professor, Department of Library and Information Science,

University of Kalyani, Kalyani - 741 235 (WB), India

juho.i...@helsinki.fi

unread,

Apr 4, 2022, 4:05:12 AM4/4/22

to Annif Users

Hi Parthasarathi,

That's a good question, I'll try give some information and background for the a) and b) options.

a) Deleting the old vocabulary and loading the new
Note that this can now (since Annif 0.57) be done in one command by using the "--force" option of the loadvoc command, for example:
annif loadvoc project-id path/to/vocabulary.ttl --force

If a project using the vocabulary in question has already been trained, the project needs to be retrained afterwards. Otherwise the suggestions that the project gives could be wrong (because the project itself does not know that the vocabulary is changed).

b) Update the vocabulary with the new one
This means running loadvoc without "--force". Quoting Annif-wiki (https://github.com/NatLibFi/Annif/wiki/Commands#load-vocabulary):

"If a vocabulary has already been loaded, reinvoking loadvoc with a new subject file will update the Annif's internal vocabulary: label names are updated and any subject not appearing in the new subject file is removed. Note that new subjects will not be suggested before the project is retrained with the updated vocabulary."

So this option too requires retraining of the project, if one wants to have the newly added subjects in the Annif suggestions.

The bottom line is: option b is usually better, because it does necessitate retraining of the project (if one is happy just with updated label names and removed subjects).

However, in your case, as you have not yet trained the Annif project, I would go with the option a (it would keep the internal vocabulary slightly simpler/shorter, and would be a "clean start").

By the way: as you are preparing the training dataset, it is best to use the new vocabulary in there too, so you will have some example documents in the dataset that are assigned with newly added subjects.

-Juho

Parthasarathi Mukhopadhyay

unread,

Apr 6, 2022, 7:03:34 AM4/6/22

to Annif Users

Hello Juho

Thanks for the insight.

I've followed the path 'A' as suggested, and the training dataset we have started fresh on the basis of the new version of the vocabulary.

In case of future requirements we will follow the path 'B' as suggested.

Heartfelt thanks and best regards

--
You received this message because you are subscribed to the Google Groups "Annif Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to annif-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/annif-users/be6d39d6-5413-4df2-9e71-556b63f2e193n%40googlegroups.com.

Reply all

Reply to author

Forward