using wavelet theory for audio analysis as replacement for DFT

1,059 views
Skip to first unread message

blaine

unread,
Nov 2, 2011, 6:33:01 PM11/2/11
to PyWavelets
hi guys. i'm an audio analysis amateur and i was just hoping to get
your opinions. I'm working on my thesis and for my project I'm using
DFT comparisons between a target sound and a generated sound as basis
for my fitness function. i'm evolving a synthesizer, in short. DFT
makes sense because i can converge on sets of harmonics and
frequencies that are present in the target sound. I must use a sliding
window technique to capture the change in harmonics over time, of
course. I already have this in place and my fitness function uses the
sum of squared errors (SSE) on the DFT coefficients of the two sounds.

I'd like to consider a simple usage of wavelet transforms for an
additional fitness function (or a replacement fitness function).
PyWavelets looks really nice, clean, well documented (thanks!), has
great demos, and overall is slick. The problem I'm having is that the
output of the wavelet transform isn't as intuitive as DFT. What would
be an easy way to tackle audio comparison between two files? Would you
simply take all of the coefficients that are dumped out from the
wavelet transform and do a SSE computation, like with my DFT example?
I imagine that as long as I use all the coefficients that would
normally be used to reconstruct the signal, I would be using data that
encapsulates various dimensional properties of the sound file.

I'm sorry for being vague, I'm really just looking for ideas. No point
in reinventing the wheel so to speak.

Thanks for creating such a great project.
Sincerely,
Blaine

Nathaniel Smith

unread,
Nov 2, 2011, 6:51:44 PM11/2/11
to pywav...@googlegroups.com
Yes, just computing SSE on the wavelet coefficients would be similar
to doing so on the DFT output.

However, I feel like I should point out... if you're using an
orthogonal transform -- which includes the DFT, and many wavelet
transforms -- then a simple SSE between your transformed signals will
be identical to what you would have gotten if you'd skipped
transforming the data entirely, and just computed SSE between your
raw, original signals. Orthogonal transforms preserve dot products,
and SSE can be computed as a dot product.

The advantage of going into DFT or wavelet space would be that you
could, I dunno, discard phase information, or weight some frequency
bands higher than others, or something like that. If you aren't doing
anything like that, then I don't think it's accomplishing anything for
you.

-- Nathaniel

> --
> You received this message because you are subscribed to the Google Groups "PyWavelets" group.
> To post to this group, send email to pywav...@googlegroups.com.
> To unsubscribe from this group, send email to pywavelets+...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/pywavelets?hl=en.
>
>

Blaine

unread,
Nov 2, 2011, 9:37:43 PM11/2/11
to pywav...@googlegroups.com
Thanks Nathaniel, you've got a great point. I don't think that'll be an issue for me, though. I'm not trying to match exactly, but rather trying to converge on possible harmonic matches and other things like that. Also by having the fitness function in the frequency domain (with DFT for example), slight modifications to the generative frequency would cause lots of change in the wave form, but minimal change in the DFT coefficients which would help the organisms adapt easier.

I'm also considering doing what you're suggesting: weighting certain band higher than others. But I haven't decided on that yet.

Great point though, it'll give me something to think about for awhile. Thanks again!
Blaine

Blaine

unread,
Nov 3, 2011, 10:03:30 PM11/3/11
to pywav...@googlegroups.com
Nathan,
  Do you know of an easy way of getting the resultant transform's frequency information? I'm racking my brain but all I can dig up are academic examples. PyWavelets' Scalogram demo is, I think, exactly what I want but I have a problem:
 - I can't understand how to label the frequency (y) axis. There are obvious bands, but the label as it exists is strange, like daaa, daad, dadd, etc. Do you (or anyone) have a simple way of converting (or finding) to frequency? The ascii labels really have no value to me! I've read in some of the papers that there is no native frequency information because of how the "scaling" relates to an inverse of the frequency, but I can't seem to make the next step. It's either really obvious or really obscure.

Thank you!
Blaine

Here's the scalogram code:
#!/usr/bin/env python
# -*- coding: utf-8 -*-

import matplotlib.cm as cm
import pywt
import pylab

x = pylab.arange(0, 1, 1./512)
data = pylab.sin((5*50 * pylab.pi * x **2))
data = pylab.sin((50*x*2*pylab.pi))
wavelet = 'db2'
level = 4
order = "freq" # "normal"
interpolation='nearest'
cmap = cm.cool

wp = pywt.WaveletPacket(data, wavelet, 'sym', maxlevel=level)
nodes = wp.get_level(level, order=order)
labels = [n.path for n in nodes]
values = pylab.array([n.data for n in nodes], 'd')
values = abs(values)

f = pylab.figure()
f.subplots_adjust(hspace=0.2, bottom=.03, left=.07, right=.97, top=.92)
pylab.subplot(2, 1, 1)
pylab.title("linchirp signal")
pylab.plot(x, data, 'b')
pylab.xlim(0, x[-1])

ax = pylab.subplot(2,1,2)
pylab.title("Wavelet packet coefficients at level %d" % level)
pylab.imshow(values, interpolation=interpolation, cmap = cmap, aspect="auto", origin="lower", extent=[0,1,0,len(values)])
pylab.yticks(pylab.arange(0.5, len(labels)+0.5), labels)
#pylab.setp(ax.get_xticklabels(), visible=False)

#pylab.figure(2)
#pylab.specgram(data, NFFT=64, noverlap=32, cmap=cmap)
#pylab.imshow(values, origin='upper', extent=[-1,1,-1,1], interpolation='nearest')

pylab.show()


Blaine


On Wed, Nov 2, 2011 at 6:51 PM, Nathaniel Smith <n...@pobox.com> wrote:

Blaine

unread,
Nov 3, 2011, 11:03:54 PM11/3/11
to pywav...@googlegroups.com
Actually what I mean is an implementation like this. http://homepages.ius.edu/kforinas/k/pdf/timefreq.pdf. That'd be amazing.

Blaine

Nathaniel Smith

unread,
Nov 4, 2011, 1:07:57 AM11/4/11
to pywav...@googlegroups.com
On Thu, Nov 3, 2011 at 7:03 PM, Blaine <fri...@gmail.com> wrote:
> Nathan,
>   Do you know of an easy way of getting the resultant transform's frequency
> information? I'm racking my brain but all I can dig up are academic
> examples. PyWavelets' Scalogram demo is, I think, exactly what I want but I
> have a problem:
>  - I can't understand how to label the frequency (y) axis. There are obvious
> bands, but the label as it exists is strange, like daaa, daad, dadd, etc. Do
> you (or anyone) have a simple way of converting (or finding) to frequency?
> The ascii labels really have no value to me! I've read in some of the papers
> that there is no native frequency information because of how the "scaling"
> relates to an inverse of the frequency, but I can't seem to make the next
> step. It's either really obvious or really obscure.

We tend to think of "frequency" as a fundamental quantity, but it
isn't, really -- "frequency" just means "the thing you get out of a
Fourier transform". If you're not doing a Fourier transform, then
technically, you don't have frequencies, you have something else (that
may have similar properties in some ways, e.g., capturing the
fast-changing or slow-changing parts of your signal). This might sound
pedantic, but it really isn't -- there's no natural way to map between
wavelet coefficients and frequencies, because they're simply different
things.

There are some hacky ways to come up with fake frequency measurements
for particular wavelet transforms (based on taking the wavelet
transform of different sine waves and stuff), but I'm not an expert on
them and am not sure how useful they really are in practice. Unless
someone else speaks up you'll have to check the literature, sorry.

-- Nathaniel

Blaine

unread,
Nov 4, 2011, 10:22:25 AM11/4/11
to pywav...@googlegroups.com
Thanks Nathan, you're exactly right and have confirmed what my gut has been telling me. Fourier Transforms make so much intuitive sense that I think I'm getting caught up in the complexities of Wavelet Transform (which is also its strength).

I guess the main problem I'm having is that I just don't understand what the wavelet coefficients really mean. I have a better understanding after more reading, in that the coefficients are the results of the algorithm's tree branchings in that the deeper "level" you go, the higher the frequency band that the coefficient is representing.

My understanding comes down to the following question.  What if I just want to remove some "feature" from a signal? (At this point I'm not sure what "features" there are, realy)... Like a bass note (as the literature claims is "easily" possible with wavelet transform). How do I know where that bass note is in the coefficients? I understand there is not a direct analogy as you said, Nathan, so I'm not sure if I'm even asking the right question. Ultimately I guess what I'm saying is that intuitively I don't understand what modifying the coefficients is really doing to the original signal.

I really appreciate you taking the time to respond to me, thanks again. I'm sure it can be annoying as I'm really early in my understanding of the wavelet transform, but unlike the DFT, the field itself seems to be dominated mostly by math and there are few practical engineering introductions. This one has been immensely helpful: http://users.rowan.edu/~polikar/WAVELETS/WTpart4.html

Thanks again,
Blaine


Reply all
Reply to author
Forward
0 new messages