Troublie in building the Maestro dataset

33 views
Skip to first unread message

Cédric Colas

unread,
Dec 8, 2021, 1:33:25 PM12/8/21
to Magenta Discuss
Hi,
I'm trying to build the maestro dataset in the Tensor2Tensor format, as a preliminary step before building my own dataset. I'm using the t2t-datagen command.

I found fixes for various problems on the way:
  • I had to register the problem (score2perf_maestro_language_uncropped_aug) in the t2t_datagen.py file, otherwise it wasn't recognized.
  • I had to augment the script provided here with a region parameter in the pipeline options.
  • I had to add: 'sound': ['libsndfile1-dev'], in the EXTRAS_REQUIRE of the setup (to fix a bug when librosa is imported).
  • I had to specify the same version for tensorflow and tensorflow-estimator in the setup.py file (==2.6.0) to fight a bug that led to this error: "AttributeError: module 'tensorflow.tools.docs.doc_controls' has no attribute 'inheritable_header'", itself leading to another: "tensorflow.python.framework.errors_impl.AlreadyExistsError: Another metric with the same name already exists."
I'm now struggling with another problem.
The code runs fine at first, starts workers and goes through the steps until it tackles:

JOB_MESSAGE_BASIC: Executing operation input_transform_train/ReadAllFromTFRecord/ReadAllFiles/Reshard/ReshufflePerKey/GroupByKey/Read+input_transform_train etc

At this point, I run into this error:

INFO:apache_beam.runners.dataflow.dataflow_runner:2021-12-08T18:12:25.346Z: JOB_MESSAGE_ERROR: Traceback (most recent call last):
  File "apache_beam/runners/common.py", line 1233, in apache_beam.runners.common.DoFnRunner.process
  File "apache_beam/runners/common.py", line 571, in apache_beam.runners.common.SimpleInvoker.invoke_process
  File "apache_beam/runners/common.py", line 1369, in apache_beam.runners.common._OutputProcessor.process_outputs
  File "/usr/local/lib/python3.8/site-packages/apache_beam/io/filebasedsource.py", line 386, in process
    for record in source.read(range.new_tracker()):
  File "/usr/local/lib/python3.8/site-packages/apache_beam/io/tfrecordio.py", line 184, in read_records
    with self.open_file(file_name) as file_handle:
  File "/usr/local/lib/python3.8/site-packages/apache_beam/io/filebasedsource.py", line 173, in open_file
    return FileSystems.open(
  File "/usr/local/lib/python3.8/site-packages/apache_beam/io/filesystems.py", line 244, in open
    return filesystem.open(path, mime_type, compression_type)
  File "/usr/local/lib/python3.8/site-packages/apache_beam/io/gcp/gcsfilesystem.py", line 177, in open
    return self._path_open(path, 'rb', mime_type, compression_type)
  File "/usr/local/lib/python3.8/site-packages/apache_beam/io/gcp/gcsfilesystem.py", line 138, in _path_open
    raw_file = gcsio.GcsIO().open(path, mode, mime_type=mime_type)
  File "/usr/local/lib/python3.8/site-packages/apache_beam/io/gcp/gcsio.py", line 223, in open
    downloader = GcsDownloader(
  File "/usr/local/lib/python3.8/site-packages/apache_beam/io/gcp/gcsio.py", line 585, in __init__
    project_number = self._get_project_number(self._bucket)
  File "/usr/local/lib/python3.8/site-packages/apache_beam/io/gcp/gcsio.py", line 166, in get_project_number
    self.bucket_to_project_number[bucket] = bucket_metadata.projectNumber
AttributeError: 'NoneType' object has no attribute 'projectNumber'

It seems it cannot reach my bucket metadata somehow. My data_dir and temp_location folder do exist. The run creates some files in the temp_location folder in the first steps. The bucket is attached to the project I define in the script.

Any idea of what might be going on? Any idea on how to debug this? The bug appears in the appache-beam code, so I can't use any print or debug mode there.

Would appreciate any hint!

Best,
Cédric



Ian Simon

unread,
Dec 8, 2021, 2:02:55 PM12/8/21
to Cédric Colas, Magenta Discuss
Hi Cédric, I haven't encountered that problem but it looks related to this bug: https://issues.apache.org/jira/browse/BEAM-12879

-Ian

--
Magenta project: magenta.tensorflow.org
To post to this group, send email to magenta...@tensorflow.org
To unsubscribe from this group, send email to magenta-discu...@tensorflow.org
---
To unsubscribe from this group and stop receiving emails from it, send an email to magenta-discu...@tensorflow.org.

Cédric Colas

unread,
Dec 8, 2021, 5:17:34 PM12/8/21
to Magenta Discuss, Ian Simon, Magenta Discuss, Cédric Colas
Hi,
Thanks for the pointer! I resolved the issue in the meantime. It seems I didn't have authorization to do a get on magentadata's bucket, which is weird. It worked fine when I uploaded the tfrecords on my own bucket.
Cédric

Cédric Colas

unread,
Dec 9, 2021, 10:28:37 AM12/9/21
to Magenta Discuss, Cédric Colas, Ian Simon, Magenta Discuss
I'm also running into troubles with the t2t-trainer script.
It runs fine until I arrive at a checkpoint, then I get:

tensorflow.python.framework.errors_impl.NotFoundError: 2 root error(s) found.
  (0) Not found: Key transformer/parallel_0_3/transformer/transformer/body/decoder/layer_0/self_attention/multihead_attention/k/kernel not found in checkpoint
         [[node save/RestoreV2_1 (defined at /home/ccolas/anaconda3/envs/pianocktail/lib/python3.8/site-packages/tensorflow_estimator/python/estimator/estimator.py:1648) ]]
         [[save/RestoreV2_1/_55]]
  (1) Not found: Key transformer/parallel_0_3/transformer/transformer/body/decoder/layer_0/self_attention/multihead_attention/k/kernel not found in checkpoint
         [[node save/RestoreV2_1 (defined at /home/ccolas/anaconda3/envs/pianocktail/lib/python3.8/site-packages/tensorflow_estimator/python/estimator/estimator.py:1648) ]]

It seems the algorithm wants to reload its saved checkpoints but doesn't recognize them. This seems related to this and that.
I run into the same issue when I run the t2t-decoder on these same checkpoints.

Has anyone run into this problem before?

Thanks,
Cédric

Cédric Colas

unread,
Dec 9, 2021, 12:54:27 PM12/9/21
to Magenta Discuss, Cédric Colas, Ian Simon, Magenta Discuss
I found equivalent issues: here and there.

Cédric Colas

unread,
Dec 9, 2021, 1:43:10 PM12/9/21
to Magenta Discuss, Cédric Colas, Ian Simon, Magenta Discuss
As indicated in this issue, reverting to tensorflow-gpu==1.15.0 solves the issue somehow. Reverting to that point in time is a mess. Here is a few indications in case someone has to go that way:

* Magenta needs to be reverted to an older commit where there is no tf.compat imports, I used:  git checkout -b old-state 375705ef30b74a981be0497d2d21e15c957ffe12
* cuda and cudnn compatible with tf1.15 are: conda install cudatoolkit=10.0; conda install cudnn=7.3.1;
* compatible versions that raise errors if not handled: tensor2tensor==1.14.0; tensorflow-dataset==3.2.1; tensorflow-probability==0.8.0.
Reply all
Reply to author
Forward
0 new messages