Mel-Spectrogram to Spectrogram

Meier Benjamin

unread,

Mar 15, 2017, 6:08:19 AM3/15/17

to librosa

Hello

I try to use a neural network to do some audio processing. Currently i use a spectrogram as input and i also produce a spectrogram.

I use the following code to recreate audio from the spectrogram:

https://github.com/vadim-v-lebedev/audio_style_tranfer/blob/master/audio_style_transfer.ipynb

a = np.zeros_like(a_content[0])

a[:N_CHANNELS,:] = np.exp(t[0]) - 1

# This code is supposed to do phase reconstruction

p = 2 * np.pi * np.random.random_sample(a.shape) - np.pi

for i in range(500):

S = a * np.exp(1j*p)

x = librosa.istft(S)

p = np.angle(librosa.stft(x, N_FFT))

OUTPUT_FILENAME = 'outputs/out.wav'

librosa.output.write_wav(OUTPUT_FILENAME, x, fs)

It works fine. I found out that the neural network works much better if i use the mel spectrogram instead of the spectrogram. Unfortunately I don't know how i can convert the mel spectrogram to audio or maybe to convert it to a spectrogram (and then i just can use the code above).

I checked the librosa code and I saw that me mel-sprectrogram is just computed by a (non-square) matrix multiplication which cannot be inverted (probably).

But is there a trick to reconstruct / approximate the initial spectrogram from the mel spectrogram?

Thank you very much

Dan Ellis

unread,

Mar 15, 2017, 7:38:59 AM3/15/17

to Meier Benjamin, librosa

Try multiplying by the transpose of the fft-to-Mel matrix. There may be some overall scaling trend, but it's about right.

DAn.

--
You received this message because you are subscribed to the Google Groups "librosa" group.
To unsubscribe from this group and stop receiving emails from it, send an email to librosa+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/librosa/2a4572fe-1f7f-4bbc-be9d-dda43c848141%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Meier Benjamin

unread,

Mar 15, 2017, 10:13:06 AM3/15/17

to librosa, benjamin...@gmail.com, dp...@ee.columbia.edu

Thanks, this seems to work very well:)!

Rafael Valle

unread,

Apr 30, 2017, 9:48:39 PM4/30/17

to librosa, benjamin...@gmail.com, dp...@ee.columbia.edu

DAn,

What would you suggest for phase reconstruction, e.g. the mel spectrograms were generated by a GAN?

Dan Ellis

unread,

Apr 30, 2017, 10:06:51 PM4/30/17

to Rafael Valle, librosa, benjamin...@gmail.com

You can try Griffin-Lim (iterative inverse STFT, STFT, then imposing target magnitude). It converges very slowly.

Probably some more modern scheme for predicting phase (phase-advance, I guess) along with the spectrum is more appropriate.

The fact is that Mel-spectrum is a heavily underspecified representation, particularly in the high frequency. Perhaps you also need something better than a pseudoinverse to predict FFT bins from the Mel magnitudes.

The point about the phase is that it will have an arbitrary rotation if you just look at the current frame's magnitude. But if you also look at the per-bin phases from the previous frames, or equivalently attempt to predict only the phase difference for each bin relative to the preceding frame, it should be much better behaved statistically.

DAn.

To view this discussion on the web visit https://groups.google.com/d/msgid/librosa/1f48bb6c-e5c4-448f-9eab-334bcdb8abfa%40googlegroups.com.

Chris Donahue

unread,

Mar 28, 2019, 4:09:01 PM3/28/19

to librosa

Resurrecting this helpful thread for a brief remark...

I have found that taking the pseudoinverse of the mel-scale triangular filter matrix (i.e. with np.lingalg.pinv) produces much better results than the transpose of the filter matrix.

Also, the Local Weighted Sums (Le Roux et al. 2010) method produces better qualitative results (for speech) than Griffin-Lim, and is much faster than running Griffin-Lim with many iterations.

To unsubscribe from this group and stop receiving emails from it, send an email to lib...@googlegroups.com.

Brian McFee

unread,

Mar 28, 2019, 4:15:19 PM3/28/19

to librosa

I'll double up on this here:

What works even better than pinv(mel_basis) is to run a non-negative least squares solver to find the optimal non-negative spectral magnitudes. There's currently an open PR waiting for review that implements this, as well as mfcc inversion, a simple griffin-lim implementation, and a numba-accelerated NNLS solver: https://github.com/librosa/librosa/pull/844 (if anyone wants to review this pr, i'll buy them a coffee).

Since LWS is already implemented in a separate package, I didn't think it was appropriate to reimplement it in librosa, but it is handy to have a reference implementation of griffin-lim. (Of course, you can always use our mel-to-linear inverter followed by LWS for phase retrieval!)

Chris Donahue

unread,

Mar 28, 2019, 4:24:30 PM3/28/19

to librosa

Thanks Brian! That is an amazing pull request. I can work towards a code review if that's appropriate but I won't be able to start on it until after the ISMIR deadline.

I think it would be nice to have an implementation of LWS in librosa. Plugging the output of librosa STFTs into LWS is not super convenient because it requires some fragile handling of the STFT window functions (the defaults are different between the two packages). I can take a stab at this at some point if you're okay with it conceptually.

Reply all

Reply to author

Forward