Problems with SVC

18 views
Skip to first unread message

Busse, Frank

unread,
Oct 19, 2022, 5:25:56 AM10/19/22
to annif...@googlegroups.com

Dear annif team,

 

during our last series of tests with the SVC backend, I made a typo in projects.cfg and wanted to ask if our observation is so correct.

If a typo occurs in the parameter name in our case with the min_df parameter, is the line ignored and if present the default value is used?

If this is the case, is there an option to stop the process if the projects.cfg contains a non-valid line? And is there a way to display all the parameters with which a model was actually trained?

 

Due to the typo described above, we discovered another problem. When using the SVC backend in conjunction with the simplema analyzer, we observed an error message that we can't quite explain.

In our test case we have about 100 classes to be trained on, as training material we use about 624,000 digital tables of contents and about 298,000 full texts but truncated to 30,000 characters.

If we set the parameter min_df to 2 everything is fine and the training is completed successfully. But if we use the default value of 1 the training stops after a few hours and we get this output:

 

Backend svc: creating vectorizer

Backend svc: creating classifier

Command terminated by signal 11

 

Do you have any idea what could be the reason for the termination?

 

Best regards from the German National Library

Frank

 

 

 

juho.i...@helsinki.fi

unread,
Oct 26, 2022, 3:20:13 AM10/26/22
to Annif Users

Hi Frank,

Sorry for delayed answer, please see inline comments.

Best regards,

Annif-team


On Wednesday, 19 October 2022 at 12:25:56 UTC+3 Frank Busse wrote:

Dear annif team,

during our last series of tests with the SVC backend, I made a typo in projects.cfg and wanted to ask if our observation is so correct.

If a typo occurs in the parameter name in our case with the min_df parameter, is the line ignored and if present the default value is used?

Yes, if the parameter name is not correct, that line is ignored and the default value for the parameter is used.

If this is the case, is there an option to stop the process if the projects.cfg contains a non-valid line?

Currently there is no option to error on invalid lines. Different backends take different parameters, and many are just passed from Annif to the backend library, so it’s not very easy to validate them. However, we have thought a related issue, that is validating the parameter types and values in Annif before passing them to the backend library:

And is there a way to display all the parameters with which a model was actually trained?

Currently no, but there is an old issue that contains the idea:

Feel free to comment the issues if you have suggestions for them or find them useful, or create new issues for feature requests.

Due to the typo described above, we discovered another problem. When using the SVC backend in conjunction with the simplema analyzer, we observed an error message that we can't quite explain.

In our test case we have about 100 classes to be trained on, as training material we use about 624,000 digital tables of contents and about 298,000 full texts but truncated to 30,000 characters.

If we set the parameter min_df to 2 everything is fine and the training is completed successfully. But if we use the default value of 1 the training stops after a few hours and we get this output:

 

Backend svc: creating vectorizer

Backend svc: creating classifier

Command terminated by signal 11

 

Do you have any idea what could be the reason for the termination?

The cause for the termination could be running out of memory. With min_df=2 the words/tokens that appear only once in the training set are ignored, but with min_df=1 all words in the training set are used and the model requires more memory. See the wiki page:
Reply all
Reply to author
Forward
0 new messages