Error during training nn

Ewelina Bosko

unread,

Oct 18, 2021, 6:05:49 AM10/18/21

to Annif Users

Hi,
I have some problem with training nn_ensemble model based on 2 models: mllm and omikuji-bonsai.

These 2 (mllm and omikuji) has been trained well, but when I try to experiment with different configurations of nn_ensemble I'm getting TensorFlow error:

tensorflow.python.framework.errors_impl.InvalidArgumentError: Incompatible shapes: [32,3625206] vs. [64,1812603]

[[node gradient_tape/model/add/add/BroadcastGradientArgs (defined at Annif/annif/backend/nn_ensemble.py:214) ]] [Op:__inference_train_function_734]

It happened for nodes= 1 and nodes = 10

When I used nodes = 100, the model is killed after a while with warnings:

2021-10-18 08:16:58.167615: W tensorflow/core/framework/cpu_allocator_impl.cc:80] Allocation of 1450082400 exceeds 10% of free system memory.

2021-10-18 08:16:58.374218: W tensorflow/core/framework/cpu_allocator_impl.cc:80] Allocation of 1450082400 exceeds 10% of free system memory.

2021-10-18 08:16:58.560988: W tensorflow/core/framework/cpu_allocator_impl.cc:80] Allocation of 1450082400 exceeds 10% of free system memory.

I have 32gb on he machine.

Here is the config:

[mllm-pl-lem]

name=MLLM Polish

language=pl

backend=mllm

analyzer=simple

vocab=vocab-pl-lem

limit=30

[omikuji-bonsai-pl-lem]

name=Omikuji Bonsai Polish

language=pl

backend=omikuji

analyzer=simple

vocab=vocab-pl-lem

cluster_balanced=False

cluster_k=100

max_depth=3

[nn-ensemble-pl-lem]

name=NN ensemble Polish

language=pl

backend=nn_ensemble

sources=mllm-pl-lem,omikuji-bonsai-pl-lem

limit=100

vocab=vocab-pl

nodes=1

dropout_rate=0.4

epochs=10

Can anyone help me? :)

juho.i...@helsinki.fi

unread,

Oct 18, 2021, 11:57:29 AM10/18/21

to Annif Users

Hi and thanks for very good problem description :)

It seems that you have a typo(?) in the vocab setting of the nn-ensemble project: it "is vocab-pl" whereas in the base projects the vocab is "vocab-pl-lem". The vocabularies should be the same in the (nn)-ensemble and its base projects. I get a similar (but not exactly the same) error message about incompatible shapes as you if the vocabularies are not the same.

About the case for nodes=100 and warnings of memory: I think the reason for the process being killed is not necessarily running out of memory despite the warnings (although you seem to have quite big vocabulary, looking at the TensorFlow error about the incompatible shapes), but again just the differing vocabs. However, if the reason is memory and you have no other way around it, then instead of using a nn-ensemble you could try a regular ensemble, but with optimized weights for the base projects. The hyperopt command can be used for finding good weights, e.g.:

annif hyperopt nn-ensemble-pl-lem --trials 200 path/to/docs

For the record, using only one node in a nn-ensemble project is not helpful, as you probably knew, it makes the neural-network to work quite the same as a regular ensemble.

I noticed you have a non-default limit of 30 in the MLLM project but not in Bonsai: you could try a non-default limit also in Bonsai, and for both projects some higher value, even like 1000: when the suggestions from the base projects are combined using (nn-)ensemble, it can be advantageous to have many "base-suggestions" available.

-Juho

Ewelina Bosko

unread,

Oct 19, 2021, 2:02:59 AM10/19/21

to Annif Users

Thanks for your answer and lot of suggestions :) I really appreciate your help.
I haven't noticed the typo in vocabulary, have fixed that already and now I'm trying to train the model once again.
I'll let you know when if it works.

Reply all

Reply to author

Forward

Error during training nn_ensemble

Ewelina Bosko

juho.i...@helsinki.fi

Ewelina Bosko