Hi,
We are evaluating ANNIF as a LCSH recommendation service at James Madison University. I have been able to train and evaluate all of the pertinent backends on our ETDs, with one exception. It is clear that the ensembles have the best chance of compensating for the subject matter bias in our training set. PAV and the simple Ensemble work as advertised, but my dev platform throws errors when I try to train the NN Ensemble on the two sources, MLLM and Omikuji/AttentionXML, errors that I am unable to diagnose.
Hardware:
Macbook Pro MacOS 11.7, 2.4 GHz Quad-Core Intel Core i5, 16 GB 2133 MHz LPDDR3 RAM.
Software:
ANNIF v.59
Python v.3.9
TensorFlow v.2.10
Keras v.2.10
Sources for ANNIF MLLM & AttentionXML were trained on 1,500 fulltext with subject headings in tsv files, with validation and test files numbering 300 and 200, respectively.
Vocab is LCSH-SKOS, pared down to prefLabels and altLabels, plus about 600 LCNAF headings similarly processed, a Turtle file that is only 120 MB.
ANNIF project configs:
[mllm-fulltext]
name=MLLM Fulltext project
language=en
backend=mllm
vocab=lcsubjects-lcnames-skosrdf
analyzer=snowball(english)
limit=1000
[omikuji-attention-fulltext]
name=Omikuji Attention Fulltext project
language=en
backend=omikuji
vocab=lcsubjects-lcnames-skosrdf
analyzer=snowball(english)
cluster_balanced=False
cluster_k=2
collapse_every_n_layers=5
min_df=2
limit=1000
[nn-ensemble-fulltext]
name=NN Ensemble Fulltext project
language=en
backend=nn_ensemble
sources=omikuji-attention-fulltext:1,mllm-fulltext:2
vocab=lcsubjects-lcnames-skosrdf
analyzer=snowball(english)
limit=100
nodes=100
dropout_rate=0.2
epochs=10
The full message when I try to train NN Ensemble:
(ANNIF) LIB-20-0157:ANNIF hollowswx$ annif train nn-ensemble-fulltext /Users/hollowswx/GitHub/ANNIF/JMU-ETD/docs/train-fulltext
2022-10-28 08:51:21.182496: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
Backend nn_ensemble: creating NN ensemble model
2022-10-28 08:51:26.794016: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
Backend nn_ensemble: Initializing source projects: omikuji-attention-fulltext, mllm-fulltext
2022-10-28T12:51:28.577Z INFO [omikuji::model] Loading model from data/projects/omikuji-attention-fulltext/omikuji-model...
2022-10-28T12:51:28.577Z INFO [omikuji::model] Loading model settings from data/projects/omikuji-attention-fulltext/omikuji-model/settings.json...
2022-10-28T12:51:28.578Z INFO [omikuji::model] Loaded model settings Settings { n_features: 305616, classifier_loss_type: Hinge }...
2022-10-28T12:51:28.578Z INFO [omikuji::model] Loading tree from data/projects/omikuji-attention-fulltext/omikuji-model/tree0.cbor...
2022-10-28T12:51:28.600Z INFO [omikuji::model] Loading tree from data/projects/omikuji-attention-fulltext/omikuji-model/tree1.cbor...
2022-10-28T12:51:28.625Z INFO [omikuji::model] Loading tree from data/projects/omikuji-attention-fulltext/omikuji-model/tree2.cbor...
2022-10-28T12:51:28.647Z INFO [omikuji::model] Loaded model with 3 trees; it took 0.07s
Backend nn_ensemble: Processing training documents...
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/usr/local/Caskroom/miniconda/base/envs/ANNIF/lib/python3.9/multiprocessing/pool.py", line 125, in worker
result = (True, func(*args, **kwds))
File "/usr/local/Caskroom/miniconda/base/envs/ANNIF/lib/python3.9/site-packages/annif/parallel.py", line 45, in suggest
project = self.registry.get_project(project_id)
File "/usr/local/Caskroom/miniconda/base/envs/ANNIF/lib/python3.9/site-packages/annif/registry.py", line 68, in get_project
projects = self.get_projects(min_access)
File "/usr/local/Caskroom/miniconda/base/envs/ANNIF/lib/python3.9/site-packages/annif/registry.py", line 62, in get_projects
for project_id, project in self._projects[self._rid].items()
KeyError: 140361445793648
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/local/Caskroom/miniconda/base/envs/ANNIF/bin/annif", line 8, in <module>
sys.exit(cli())
File "/usr/local/Caskroom/miniconda/base/envs/ANNIF/lib/python3.9/site-packages/click/core.py", line 1130, in __call__
return self.main(*args, **kwargs)
File "/usr/local/Caskroom/miniconda/base/envs/ANNIF/lib/python3.9/site-packages/click/core.py", line 1055, in main
rv = self.invoke(ctx)
File "/usr/local/Caskroom/miniconda/base/envs/ANNIF/lib/python3.9/site-packages/click/core.py", line 1657, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/usr/local/Caskroom/miniconda/base/envs/ANNIF/lib/python3.9/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/local/Caskroom/miniconda/base/envs/ANNIF/lib/python3.9/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/usr/local/Caskroom/miniconda/base/envs/ANNIF/lib/python3.9/site-packages/click/decorators.py", line 26, in new_func
return f(get_current_context(), *args, **kwargs)
File "/usr/local/Caskroom/miniconda/base/envs/ANNIF/lib/python3.9/site-packages/flask/cli.py", line 357, in decorator
return __ctx.invoke(f, *args, **kwargs)
File "/usr/local/Caskroom/miniconda/base/envs/ANNIF/lib/python3.9/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/usr/local/Caskroom/miniconda/base/envs/ANNIF/lib/python3.9/site-packages/annif/cli.py", line 333, in run_train
proj.train(documents, backend_params, jobs)
File "/usr/local/Caskroom/miniconda/base/envs/ANNIF/lib/python3.9/site-packages/annif/project.py", line 214, in train
self.backend.train(corpus, beparams, jobs)
File "/usr/local/Caskroom/miniconda/base/envs/ANNIF/lib/python3.9/site-packages/annif/backend/backend.py", line 64, in train
return self._train(corpus, params=beparams, jobs=jobs)
File "/usr/local/Caskroom/miniconda/base/envs/ANNIF/lib/python3.9/site-packages/annif/backend/nn_ensemble.py", line 175, in _train
self._fit_model(
File "/usr/local/Caskroom/miniconda/base/envs/ANNIF/lib/python3.9/site-packages/annif/backend/nn_ensemble.py", line 232, in _fit_model
self._corpus_to_vectors(corpus, seq, n_jobs)
File "/usr/local/Caskroom/miniconda/base/envs/ANNIF/lib/python3.9/site-packages/annif/backend/nn_ensemble.py", line 204, in _corpus_to_vectors
for hits, subject_set in pool.imap_unordered(
File "/usr/local/Caskroom/miniconda/base/envs/ANNIF/lib/python3.9/multiprocessing/pool.py", line 870, in next
raise value
KeyError: 140361445793648
[END]
Both TensorFlow and Keras have been tested independently in the same Python Conda environment. The TensorFlow message generated at the beginning of the NN Ensemble training load seems to be only a notification about possible CPU support.
Any help diagnosing this would be greatly appreciated.