Using tf.data.TFRecordDataset to parse the tfrecords for AudioSet to build a MLP model

jeffre...@flatironsdigital.com

unread,

Nov 9, 2017, 6:26:02 PM11/9/17

to audioset-users

I am trying to use the AudioSet tfrecord data to train an MLP to use on my own data, which I have used vggi'sh to extract features (that worked!). I am now trying to use the TF Data API to create a data pipeline from the tfrecords. The below is what I have. Can anyone tell me if:

a) my _parse_function is returning data which can be consumed properly by a model?

b) any other tips? Most of the docs out there are for reading/writing images to trfrecords.

import tensorflow as tf
import glob




def _parse_function(example_proto):
 contexts, features = tf.parse_single_sequence_example(
 example_proto,
 context_features={"video_id": tf.FixedLenFeature([], tf.string),
   "labels": tf.VarLenFeature(tf.int64)},
 sequence_features={'audio_embedding' : tf.FixedLenSequenceFeature([10], dtype=tf.string)
 })


 decoded_features = tf.reshape(
 tf.cast(tf.decode_raw(features['audio_embedding'], tf.uint8), tf.float32), [-1, 128])


 labels = (tf.cast(
    tf.sparse_to_dense(contexts["labels"].values, (527,), 1,
    validate_indices=False),
 tf.bool))


 return decoded_features, labels # and the labels?




# Get a list of files
filenames = glob.glob('/Users/jeff/documents/jeffcode/pond5/music_genre/features/audioset_v1_embeddings/bal_train/*.tfrecord')
dataset = tf.data.TFRecordDataset(filenames)
dataset.map(_parse_function)
dataset = dataset.shuffle(buffer_size=10000)
dataset = dataset.batch(64)
iterator = dataset.make_one_shot_iterator()
#iterator = dataset.make_initializable_iterator()
#next_element = iterator.get_next()
#train_op = model_and_optimizer(images, labels)


sess = tf.Session()
for _ in range(2):
  while True:
 try:
   # How can x fit into a model now? Is it returning the features and labels?
   x = sess.run(iterator.get_next())
   print(x) # This prints out the byte code at least :)
 except tf.errors.OutOfRangeError:
   break

Eric Robertson

unread,

Jul 29, 2019, 2:50:02 PM7/29/19

to audioset-users

To complete this thread,

def prepare_serialized_examples(serialized_example, max_quantized_value=2, min_quantized_value=-2):
            contexts, features = tf.parse_single_sequence_example(
                serialized_example,


                context_features={"video_id": tf.FixedLenFeature([], tf.string),
                                  "labels": tf.VarLenFeature(tf.int64)},

                sequence_features={'audio_embedding': tf.FixedLenSequenceFeature([], dtype=tf.string)


                                   })
            decoded_features = tf.reshape(
                    tf.cast(tf.decode_raw(features['audio_embedding'], tf.uint8), tf.float32),
                    [-1, 128])

            decoded_labels =  (tf.cast(contexts["labels"].values, tf.int64))

            return decoded_features, decoded_labels

dataset = tf.data.TFRecordDataset(filenames)
dataset = dataset.map(prepare_serialized_examples)
dataset = dataset.shuffle(buffer_size=10000)
dataset = dataset.batch(1)

I had success making the iterator initializable:

iterator = dataset.make_initializable_iterator()

feature_batch, label_batch = iterator.get_next()
with tf.Session() as sess:
    sess.run(iterator.initializer)
    try:
       while True:
           embeddings, labels = sess.run([feature_batch, label_batch])
           ...
    except tf.errors.OutOfRangeError:
        pass

Jose Giraldo

unread,

Jul 30, 2019, 5:57:04 PM7/30/19

to Eric Robertson, audioset-users

Hi Eric,

I was trying a very similar approach to you based on this tutorial of tf 2.0 https://www.tensorflow.org/tutorials/load_data/tf_records

here is my attempt inside colab https://colab.research.google.com/drive/1BeORlWolTKw3noASvW94OXXqcQZ8PZEQ

best

--
You received this message because you are subscribed to the Google Groups "audioset-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to audioset-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/audioset-users/cc5b6fc8-ec99-4352-a95a-002199ed9003%40googlegroups.com.

Sabid Bin Habib

unread,

Mar 17, 2022, 2:00:47 PM3/17/22

to audioset-users

Hi Jeffre, Eric, and Jose,

Thanks for the useful information about extracting audio embeddings from tfrecords of the google audioset. I have extracted the audio embeddings. Now, I want to use these audio embeddings for training my own model (CNN). I have some confusion about these audio embeddings.

Should I extract STFT and MFCC from the audio embeddings? If so, how can I do that (any way to use librosa?)? Or, are the audio embeddings already transformed to MFCC?
How should I split the audio set corpus into train, test and validate datasets? They are if Tfrecord format and each tfrecord file contain various segment of audio clips having different class labels. Should I select some (i.e. 70%) tfrecords and consider them as the train dataset?
If I want to work on selective class labels (such as rowing, or car sound), what should be the best way to extract the selective audio segments?

Also, please share some helpful resources about working with Google audioset corpus if possible.

Reply all

Reply to author

Forward