Troublie in building the Maestro dataset

Cédric Colas

unread,

Dec 8, 2021, 1:33:25 PM12/8/21

to Magenta Discuss

Hi,

I'm trying to build the maestro dataset in the Tensor2Tensor format, as a preliminary step before building my own dataset. I'm using the t2t-datagen command.

I found fixes for various problems on the way:

I had to register the problem (score2perf_maestro_language_uncropped_aug) in the t2t_datagen.py file, otherwise it wasn't recognized.

I had to augment the script provided here with a region parameter in the pipeline options.

I had to add: 'sound': ['libsndfile1-dev'], in the EXTRAS_REQUIRE of the setup (to fix a bug when librosa is imported).

I had to specify the same version for tensorflow and tensorflow-estimator in the setup.py file (==2.6.0) to fight a bug that led to this error: "AttributeError: module 'tensorflow.tools.docs.doc_controls' has no attribute 'inheritable_header'", itself leading to another: "tensorflow.python.framework.errors_impl.AlreadyExistsError: Another metric with the same name already exists."

I'm now struggling with another problem.

The code runs fine at first, starts workers and goes through the steps until it tackles:

JOB_MESSAGE_BASIC: Executing operation input_transform_train/ReadAllFromTFRecord/ReadAllFiles/Reshard/ReshufflePerKey/GroupByKey/Read+input_transform_train etc

At this point, I run into this error:

INFO:apache_beam.runners.dataflow.dataflow_runner:2021-12-08T18:12:25.346Z: JOB_MESSAGE_ERROR: Traceback (most recent call last):
File "apache_beam/runners/common.py", line 1233, in apache_beam.runners.common.DoFnRunner.process
File "apache_beam/runners/common.py", line 571, in apache_beam.runners.common.SimpleInvoker.invoke_process
File "apache_beam/runners/common.py", line 1369, in apache_beam.runners.common._OutputProcessor.process_outputs
File "/usr/local/lib/python3.8/site-packages/apache_beam/io/filebasedsource.py", line 386, in process
for record in source.read(range.new_tracker()):
File "/usr/local/lib/python3.8/site-packages/apache_beam/io/tfrecordio.py", line 184, in read_records
with self.open_file(file_name) as file_handle:
File "/usr/local/lib/python3.8/site-packages/apache_beam/io/filebasedsource.py", line 173, in open_file
return FileSystems.open(
File "/usr/local/lib/python3.8/site-packages/apache_beam/io/filesystems.py", line 244, in open
return filesystem.open(path, mime_type, compression_type)
File "/usr/local/lib/python3.8/site-packages/apache_beam/io/gcp/gcsfilesystem.py", line 177, in open
return self._path_open(path, 'rb', mime_type, compression_type)
File "/usr/local/lib/python3.8/site-packages/apache_beam/io/gcp/gcsfilesystem.py", line 138, in _path_open
raw_file = gcsio.GcsIO().open(path, mode, mime_type=mime_type)
File "/usr/local/lib/python3.8/site-packages/apache_beam/io/gcp/gcsio.py", line 223, in open
downloader = GcsDownloader(
File "/usr/local/lib/python3.8/site-packages/apache_beam/io/gcp/gcsio.py", line 585, in __init__
project_number = self._get_project_number(self._bucket)
File "/usr/local/lib/python3.8/site-packages/apache_beam/io/gcp/gcsio.py", line 166, in get_project_number
self.bucket_to_project_number[bucket] = bucket_metadata.projectNumber
AttributeError: 'NoneType' object has no attribute 'projectNumber'

It seems it cannot reach my bucket metadata somehow. My data_dir and temp_location folder do exist. The run creates some files in the temp_location folder in the first steps. The bucket is attached to the project I define in the script.

Any idea of what might be going on? Any idea on how to debug this? The bug appears in the appache-beam code, so I can't use any print or debug mode there.

Would appreciate any hint!

Best,
Cédric

Ian Simon

unread,

Dec 8, 2021, 2:02:55 PM12/8/21

to Cédric Colas, Magenta Discuss

Hi Cédric, I haven't encountered that problem but it looks related to this bug: https://issues.apache.org/jira/browse/BEAM-12879

-Ian

--
Magenta project: magenta.tensorflow.org
To post to this group, send email to magenta...@tensorflow.org
To unsubscribe from this group, send email to magenta-discu...@tensorflow.org
---
To unsubscribe from this group and stop receiving emails from it, send an email to magenta-discu...@tensorflow.org.

Cédric Colas

unread,

Dec 8, 2021, 5:17:34 PM12/8/21

to Magenta Discuss, Ian Simon, Magenta Discuss, Cédric Colas

Hi,

Thanks for the pointer! I resolved the issue in the meantime. It seems I didn't have authorization to do a get on magentadata's bucket, which is weird. It worked fine when I uploaded the tfrecords on my own bucket.

Cédric

Cédric Colas

unread,

Dec 9, 2021, 10:28:37 AM12/9/21

to Magenta Discuss, Cédric Colas, Ian Simon, Magenta Discuss

I'm also running into troubles with the t2t-trainer script.

It runs fine until I arrive at a checkpoint, then I get:

tensorflow.python.framework.errors_impl.NotFoundError: 2 root error(s) found.
(0) Not found: Key transformer/parallel_0_3/transformer/transformer/body/decoder/layer_0/self_attention/multihead_attention/k/kernel not found in checkpoint
[[node save/RestoreV2_1 (defined at /home/ccolas/anaconda3/envs/pianocktail/lib/python3.8/site-packages/tensorflow_estimator/python/estimator/estimator.py:1648) ]]
[[save/RestoreV2_1/_55]]
(1) Not found: Key transformer/parallel_0_3/transformer/transformer/body/decoder/layer_0/self_attention/multihead_attention/k/kernel not found in checkpoint
[[node save/RestoreV2_1 (defined at /home/ccolas/anaconda3/envs/pianocktail/lib/python3.8/site-packages/tensorflow_estimator/python/estimator/estimator.py:1648) ]]

It seems the algorithm wants to reload its saved checkpoints but doesn't recognize them. This seems related to this and that.

I run into the same issue when I run the t2t-decoder on these same checkpoints.

Has anyone run into this problem before?

Thanks,
Cédric

Cédric Colas

unread,

Dec 9, 2021, 12:54:27 PM12/9/21

to Magenta Discuss, Cédric Colas, Ian Simon, Magenta Discuss

I found equivalent issues: here and there.

Cédric Colas

unread,

Dec 9, 2021, 1:43:10 PM12/9/21

to Magenta Discuss, Cédric Colas, Ian Simon, Magenta Discuss

As indicated in this issue, reverting to tensorflow-gpu==1.15.0 solves the issue somehow. Reverting to that point in time is a mess. Here is a few indications in case someone has to go that way:

* Magenta needs to be reverted to an older commit where there is no tf.compat imports, I used: git checkout -b old-state 375705ef30b74a981be0497d2d21e15c957ffe12

* cuda and cudnn compatible with tf1.15 are: conda install cudatoolkit=10.0; conda install cudnn=7.3.1;

* compatible versions that raise errors if not handled: tensor2tensor==1.14.0; tensorflow-dataset==3.2.1; tensorflow-probability==0.8.0.

Reply all

Reply to author

Forward