How to reverse spectrogram png into audio

200 views
Skip to first unread message

John Ho

unread,
May 29, 2023, 7:07:15 AM5/29/23
to librosa
def audio_to_spectrogram(audio,img):
  plt.rcParams["figure.figsize"] = [2.56, 2.56]
  plt.rcParams["figure.autolayout"] = True

  fig, ax = plt.subplots()


  y, sr = librosa.load(audio)
  S = librosa.feature.melspectrogram(y=y, sr=sr)
  print(sr)
  S_dB = librosa.power_to_db(S, ref=np.max)
  p = librosa.display.specshow(S_dB, sr=sr, fmax=8000, ax=ax)

  plt.savefig(img)

Using this code, I have converted my wav file into a 256x256 image to be used in a neural network. I was wondering how I could reverse the image back into audio.

Graham Coleman

unread,
May 29, 2023, 7:17:58 AM5/29/23
to John Ho, librosa
Hi John,

As long as your image matches the dimensions of your mel spectrogram (and all of the parameters match), you can use the following function to invert it (using Griffin-Lim) back to audio:

Graham

--
You received this message because you are subscribed to the Google Groups "librosa" group.
To unsubscribe from this group and stop receiving emails from it, send an email to librosa+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/librosa/3b84447b-6062-4506-b28d-34b89d4095fbn%40googlegroups.com.

John Ho

unread,
May 29, 2023, 8:42:10 AM5/29/23
to librosa
Hi, 

These are some example sizes of my mel-spectrogram

(128, 59)

(128, 60)

(128, 27)

(128, 438)

(128, 652)

However, when I plot it, the image becomes size (256,256,3).

How would i convert this array of (256,256,3) back into the dimensions of the spectrogram?



Reply all
Reply to author
Forward
0 new messages