import osimport glob
from magenta.models.nsynth import utilsfrom magenta.models.nsynth.wavenet import fastgen
def wavenet_encode(file_path):
checkpoint_path = './wavenet-ckpt/model.ckpt-200000'
# Load and downsample the audio.
neural_sample_rate = 16000
audio = utils.load_audio(file_path,
sample_length=400000,
sr=neural_sample_rate)
# Pass the audio through the first half of the autoencoder,
# to get a list of latent variables that describe the sound.
# Note that it would be quicker to pass a batch of audio
# to fastgen.
encoding = fastgen.encode(audio, checkpoint_path, len(audio))
return encoding--
Magenta project: magenta.tensorflow.org
To post to this group, send email to magenta...@tensorflow.org
To unsubscribe from this group, send email to magenta-discuss+unsubscribe@tensorflow.org
---
You received this message because you are subscribed to the Google Groups "Magenta Discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to magenta-discuss+unsubscribe@tensorflow.org.
Hi Jeffrey and/or James,You can indeed use the nsynth embeddings for similarity measures, but your milage may vary based on how well the model fits the data (it was trained on single instrument notes). I've done some qualitative experiments for myself and found it picked similar instruments, but haven't done any quantitative comparisons to a baseline like CQT. Here's a post of someone who tried this with t-sne for some sounds: https://medium.com/@LeonFedden/comparative-audio-analysis-with-wavenet-mfccs-umap-t-sne-and-pca-cb8237bfce2fBest,
Jesse
Hi Jeffrey and/or James,You can indeed use the nsynth embeddings for similarity measures, but your milage may vary based on how well the model fits the data (it was trained on single instrument notes). I've done some qualitative experiments for myself and found it picked similar instruments, but haven't done any quantitative comparisons to a baseline like CQT. Here's a post of someone who tried this with t-sne for some sounds: https://medium.com/@LeonFedden/comparative-audio-analysis-with-wavenet-mfccs-umap-t-sne-and-pca-cb8237bfce2fBest,
Jesse
I think typically you would break it into smaller chunks, as no algorithm will be incorporating very long term structure anyways. One other thing people do is rather than just comparing all the time bins directly, comparing the best possible alignment of all the time bins to eachother with something called dynamic time warping (https://librosa.github.io/librosa/generated/librosa.core.dtw.html)Best of luck,
Jesse