Error during training nn_ensemble

112 views
Skip to first unread message

Ewelina Bosko

unread,
Oct 18, 2021, 6:05:49 AM10/18/21
to Annif Users
Hi,
I have some problem with training nn_ensemble model based on 2 models: mllm and omikuji-bonsai. 
These 2  (mllm and omikuji) has been trained well, but when I try to experiment with different configurations of nn_ensemble I'm getting TensorFlow error:

tensorflow.python.framework.errors_impl.InvalidArgumentError:  Incompatible shapes: [32,3625206] vs. [64,1812603]
[[node gradient_tape/model/add/add/BroadcastGradientArgs (defined at Annif/annif/backend/nn_ensemble.py:214) ]] [Op:__inference_train_function_734]

It happened for nodes= 1 and nodes = 10
When I used nodes = 100, the model is killed after a while with warnings: 

2021-10-18 08:16:58.167615: W tensorflow/core/framework/cpu_allocator_impl.cc:80] Allocation of 1450082400 exceeds 10% of free system memory.
2021-10-18 08:16:58.374218: W tensorflow/core/framework/cpu_allocator_impl.cc:80] Allocation of 1450082400 exceeds 10% of free system memory.
2021-10-18 08:16:58.560988: W tensorflow/core/framework/cpu_allocator_impl.cc:80] Allocation of 1450082400 exceeds 10% of free system memory.

I have 32gb on he machine.

Here is the config:
[mllm-pl-lem]
name=MLLM Polish
language=pl
backend=mllm
analyzer=simple
vocab=vocab-pl-lem
limit=30

[omikuji-bonsai-pl-lem]
name=Omikuji Bonsai Polish
language=pl
backend=omikuji
analyzer=simple
vocab=vocab-pl-lem
cluster_balanced=False
cluster_k=100
max_depth=3

[nn-ensemble-pl-lem]
name=NN ensemble Polish
language=pl
backend=nn_ensemble
sources=mllm-pl-lem,omikuji-bonsai-pl-lem
limit=100
vocab=vocab-pl
nodes=1
dropout_rate=0.4
epochs=10

Can anyone help me? :)

juho.i...@helsinki.fi

unread,
Oct 18, 2021, 11:57:29 AM10/18/21
to Annif Users
Hi and thanks for very good problem description :)

It seems that you have a typo(?) in the vocab setting of the nn-ensemble project: it "is vocab-pl" whereas in the base projects the vocab is "vocab-pl-lem". The vocabularies should be the same in the (nn)-ensemble and its base projects. I get a similar (but not exactly the same) error message about incompatible shapes as you if the vocabularies are not the same.

About the case for nodes=100 and warnings of memory: I think the reason for the process being killed is not necessarily running out of memory despite the warnings (although you seem to have quite big vocabulary, looking at the TensorFlow error about the incompatible shapes), but again just the differing vocabs. However, if the reason is memory and you have no other way around it, then instead of using a nn-ensemble you could try a regular ensemble, but with optimized weights for the base projects. The hyperopt command can be used for finding good weights, e.g.:

annif hyperopt nn-ensemble-pl-lem --trials 200 path/to/docs

For the record, using only one node in a nn-ensemble project is not helpful, as you probably knew, it makes the neural-network to work quite the same as a regular ensemble.

I noticed you have a non-default limit of 30 in the MLLM project but not in Bonsai: you could try a non-default limit also in Bonsai, and for both projects some higher value, even like 1000: when the suggestions from the base projects are combined using (nn-)ensemble, it can be advantageous to have many "base-suggestions" available.

-Juho

Ewelina Bosko

unread,
Oct 19, 2021, 2:02:59 AM10/19/21
to Annif Users
Thanks for your answer and lot of suggestions :) I really appreciate your help.
I haven't noticed the typo in vocabulary, have fixed that already and now I'm trying to train the model once again.
I'll let you know when if it works.
Reply all
Reply to author
Forward
0 new messages