Using FFT, FourierTransform, on a wav file

621 views
Skip to first unread message

Pat

unread,
Nov 19, 2013, 5:40:49 PM11/19/13
to accor...@googlegroups.com
I've looked at the Audio/Fourier sample, but it only shows how to work with a live feed (i.e. a mic input). I'm having a heck of a time translating the code to work with a wave file.

You can download my code (I'm using LINQPad, which is free and amazing) at http://share.linqpad.net/k8ivu2.linq. The output looks like the attached html page.

I'm working with a simple 2kHz, 100ms sine wave encoded in 16-bit PCM (which I generated with Audacity).

Here's a snippet of what code I have so far (.Dump is a LINQPad extension method for producing output):

var reader = new Accord.Audio.Formats.WaveDecoder(@"W:\sine-2khz-100ms-44100-16pcm.wav");

Signal decoded = reader.Decode().Dump("decode");
// ComplexSignal complexDecoded = decoded.ToComplex().Dump("decoded complex");
//InvalidSignalPropertiesException
//Signals length should be a power of 2. 

int nbFft = NextClosestPowerOfTwoValue(decoded.RawData.Length);//.Dump();
var fftBuffer = new byte[nbFft]; // zero-fills
decoded.RawData.CopyTo(fftBuffer, 0);
// Need to use ctor, since extension method complains about size
var decComplex = new Accord.Audio.ComplexSignal(fftBuffer, decoded.Channels, decoded.Length, decoded.SampleRate);
decComplex.Dump("decoded complex from ctor");
decComplex.GetEnergy().Dump("dec complex energy");
var window = new Accord.Audio.Windows.BlackmanWindow(2048); // arg taken from sample app
window.Apply(decComplex, 0); // mutates
decComplex.ForwardFourierTransform(); // ArgExc: Incorrect data length.

accord-fft-troubles-output.html

blai...@gmail.com

unread,
Nov 20, 2013, 12:46:27 AM11/20/13
to accor...@googlegroups.com
I'm having the same problem, but with a much simpler example:

string fileName = "mu1.wav";
WaveDecoder sourceDecoder = new WaveDecoder(fileName);
Signal s = sourceDecoder.Decode();
s.ToComplex(); //This invokes the following exception:
//InvalidSignalPropertiesException
//Signals length should be a power of 2.


I read the source code and this exception should only be thrown if the signal is not formatted as types Format32BitIeeeFloat or Format128BitIeeeFloat. When I do a Console.WriteLine(s.SampleFormat), however, I see that the signal is indeed Format32BitIeeeFloat, so I'm not sure what the problem is.

César

unread,
Nov 20, 2013, 4:48:33 AM11/20/13
to accor...@googlegroups.com
Hello,

The first problem might be due to an issue in the way windows are applied to signals. I sent a correction to Pat just to see if it helps fixing, and if it does, I will make it a permanent change. In the second case, the problem is that signals must have lengths that are a power of two before they are transformed in the Fourier domain. In order to Fourier-transform them, you might need to apply a windowing function to extract smaller segments from the audio signal. If those segments are power of two, then it should work.

Hope it helps!

Regards,
Cesar


Pat

unread,
Nov 20, 2013, 1:03:28 PM11/20/13
to accor...@googlegroups.com
@unknown, I had the same problem with the .ToComplex extension method (you can see that part commented out in my code snippet). That's why I was attempting to get it to work using the constructor for a ComplexSignal.

What César sent to me was the following. I tried it without using the new DLLs (in the link), and it didn't work because Signal has no .Split method (but there is an extension method that works on ComplexSignal, from the looks of it). The error was, "'Accord.Audio.Signal' does not contain a definition for 'Split' and the best extension method overload 'Accord.Audio.Windows.Extensions.Split(Accord.Audio.ComplexSignal, Accord.Audio.Windows.IWindow, int)' has some invalid arguments". 

I'll try using the new DLLs and see if that works.


César's message (thanks, by the way):
You might be able to achieve what you need using the following snippet:

            // Create a decoder for the source stream
            WaveDecoder sourceDecoder = new WaveDecoder(sourceStream);

            // Decode the signal in the source stream
            Signal sourceSignal = sourceDecoder.Decode();

            RaisedCosineWindow window = RaisedCosineWindow.Hamming(1024);

            // Splits the source signal by walking each 512 samples, then creating 
            // a 1024 sample window. Note that this will result in overlapped windows.
            Signal[] windows = sourceSignal.Split(window, 512);

            // You might need to import Accord.Math in order to call this:
            ComplexSignal[] complex = windows.Apply(ComplexSignal.FromSignal);

            // Forward to the Fourier domain
            complex.ForwardFourierTransform();

However, when I was creating that test I noticed there was an issue in the window construction. I am uploading some updated binaries that should hopefully fix the exceptions you were getting:


Hope it helps! If it doesn't, please let me know!

Pat

unread,
Nov 20, 2013, 1:25:01 PM11/20/13
to accor...@googlegroups.com
Well, I'm getting results using the new DLLs, so that's encouraging. But I also discovered that my previous method should have worked, but for some reason I assumed the window.Apply method mutated the argument Signal, so I didn't assign the result of that method. By changing that, I got my code to give results (still not sure if they're meaningful).

Here's my full code. The only part that needed changing was the second-to-last line, decComplex = window.Apply(decComplex, 0);.

var reader = new Accord.Audio.Formats.WaveDecoder(@"W:\sine-2khz-100ms-44100-16pcm.wav");

Signal decoded = reader.Decode().Dump("decode");
// ComplexSignal complexDecoded = decoded.ToComplex().Dump("decoded complex");
//InvalidSignalPropertiesException
//Signals length should be a power of 2. 

int nbFft = NextClosestPowerOfTwoValue(decoded.RawData.Length);//.Dump();
var fftBuffer = new byte[nbFft]; // zero-fills
decoded.RawData.CopyTo(fftBuffer, 0);
// Need to use ctor, since extension method complains about size
var decComplex = new Accord.Audio.ComplexSignal(fftBuffer, decoded.Channels, decoded.Length, decoded.SampleRate);
decComplex.Dump("decoded complex from ctor");
decComplex.GetEnergy().Dump("dec complex energy");
var window = new Accord.Audio.Windows.BlackmanWindow(2048); // arg taken from sample app
decComplex = window.Apply(decComplex, 0);
decComplex.ForwardFourierTransform();

The results are:
## decoded complex from ctor
ComplexSignal
Accord.Audio.ComplexSignal
SampleFormat Format128BitComplex
Duration 100
Length 4410
Samples 4410
SampleRate 44100
Channels 1
RawData byte[]
Data
IntPtr
538620136
Status Normal

## decoded complex after FFT
ComplexSignal
Accord.Audio.ComplexSignal
SampleFormat Format128BitComplex
Duration 46
Length 2048
Samples 2048
SampleRate 44100
Channels 1
RawData byte[]
Data
IntPtr
538759648
Status FourierTransformed


I also tried C'esar's suggestion with the new DLLs using the following code (also available at ):

void Main()
{
var reader = new Accord.Audio.Formats.WaveDecoder(
@"W:\materials\audio\sine-2khz-100ms-44100-16pcm.wav");
Signal decoded = reader.Decode();
// ComplexSignal complexDecoded = decoded.ToComplex().Dump("decoded complex");
//InvalidSignalPropertiesException
//Signals length should be a power of 2. 

// From Cesar
RaisedCosineWindow window = RaisedCosineWindow.Hamming(1024);
// Splits the source signal by walking each 512 samples, then creating 
// a 1024 sample window. Note that this will result in overlapped windows.
Signal[] windows = decoded.Split(window, 512);
windows.Dump("windows after split");
// You need to import Accord.Math in order to call this:
ComplexSignal[] complex = windows.Apply(ComplexSignal.FromSignal);
complex.Dump("decoded complex before FFT");
// Forward to the Fourier domain
complex.ForwardFourierTransform();
complex.Dump("decoded complex after FFT");
complex[0].Dump("complex[0] after FFT");
}

And that gave me results like those attached.

I'll have to spend some time trying to understand these results. I guess the next step is to apply the same code as used in the Audio>FFT sample:

            // Now we can get the power spectrum output and its
            // related frequency vector to plot our spectrometer.

            Complex[] channel = signal.GetChannel(0);
            double[] power = Accord.Audio.Tools.GetPowerSpectrum(channel);
            double[] freqv = Accord.Audio.Tools.GetFrequencyVector(signal.Length, signal.SampleRate);

accord-fft-results-cesar.html

Pat

unread,
Nov 20, 2013, 1:27:34 PM11/20/13
to accor...@googlegroups.com
Whoops, I forgot to include a link to the LINQPad snippet for C'esar's method. Here it is: http://share.linqpad.net/umvcw8.linq. Note, this is with the updated DLLs at https://dl.dropboxusercontent.com/u/32601472/accord/releases/Accord.NET%20Framework-2.11.1-%28libs%20only%29%20r2.rar

blai...@gmail.com

unread,
Nov 20, 2013, 6:35:22 PM11/20/13
to accor...@googlegroups.com
I tried the new libraries and they seem to work perfectly. I get the same results you do. Now I just have to figure out how to interpret the results! Are there any good libraries in Accord.NET for plotting the histogram of the results?

Pat

unread,
Nov 22, 2013, 1:58:41 PM11/22/13
to accor...@googlegroups.com
Ok, I've gotten Accord to give me frequencies using the new libraries (2.11.1 r2) and C'esar's advice. I've attached the code which I'm using now. Instead of just looking at the frequencies in the signal, I'm extracting some features (spectral centroid and bandwidth, for now) to use in a classification system (speech vs music).

Thanks for the help!

Here are the relevant code snippets (also in the Audio.cs file attached):

private static void FreqencyDataFromSignal(ComplexSignal sig,
    out double[] power, out double[] magnitudes, out double[] freqv) {
    Debug.Assert(sig.Channels == 1"I only do mono signals.");
    power = Accord.Audio.Tools.GetPowerSpectrum(
        sig.GetChannel(0));
    magnitudes = Accord.Audio.Tools.GetMagnitudeSpectrum(
        sig.GetChannel(0));
    freqv = Accord.Audio.Tools.GetFrequencyVector(
        sig.Length, sig.SampleRate);
}

which is called by
public static FrequencyFeatures GetFrequencyFeatures(FileInfo file, int windowLen=512) {
    var reader = new Accord.Audio.Formats.WaveDecoder(file.FullName);
    Signal decoded = reader.Decode();
 
    RaisedCosineWindow window = RaisedCosineWindow.Hamming(windowLen); // len(window) -> len(chunk)
 
    // Splits the source signal by walking each windowLen/2 samples, then creating 
    // a windowLen sample window. Note that this will result in overlapped windows.	
    Signal[] windowedSignals = decoded.Split(window, windowLen / 2);
 
    // You need to import Accord.Math in order to call this:
    ComplexSignal[] complexWindowedSigs = windowedSignals.Apply(ComplexSignal.FromSignal);
    
    // Forward to the Fourier domain
    complexWindowedSigs.ForwardFourierTransform();
    
    // Loop through all windowed chunks of signal to get average values
    ComplexSignal complex0Signal = complexWindowedSigs[0];
 
    var len = complex0Signal.Length / 2 + 1// maybe because FFT info is only in n/2 + 1 bins?
    var meanPower = new double[len];
    var meanMags = new double[len];
 
    double[] power = {};
    double[] magnitudes = {};
    double[] freqv = {};
    int numSignals=0;
    foreach (ComplexSignal sig in complexWindowedSigs) {
        FreqencyDataFromSignal(sig, out power, out magnitudes, out freqv);
        Debug.Assert(power.Length == len);
        Debug.Assert(magnitudes.Length == len);
        for (int i = 0; i < len; i++) {
            meanPower[i] += power[i];
            meanMags[i] += magnitudes[i];
        }
        numSignals++;
    }
    for (int i = 0; i < len; i++) {
        meanPower[i] /= numSignals;
        meanMags[i] /= numSignals;
    }
 
    // Sort to find the top freqs
    Array.Sort(meanMags, freqv);
    Array.Reverse(meanMags);
    Array.Reverse(freqv);
 
    return FrequencyFeatures.FromData(new FrequencyData(freqv, meanMags));
}

and uses these classes:
public class FrequencyFeatures {
    public double SpectralCentroid { getprivate set; }
    public double Bandwidth { getprivate set; }
    public FrequencyData Data { getprivate set; }
 
    public FrequencyFeatures(double spectralCentroid, double bandwidth, FrequencyData data) {
        SpectralCentroid = spectralCentroid;
        Bandwidth = bandwidth;
        Data = data;
    }
 
    public static FrequencyFeatures FromData(FrequencyData data) {
        double[] meanMags = data.Magnitudes;
        double[] freqv = data.Frequencies;
        
        // spectral centroid - seems close to orig signal's "energy", at least for sine
        double spectralCentroid = meanMags.Zip(freqv, (m, f) => m*f).Sum()/
                                  meanMags.Sum();
        
        // bandwidth = highest - smallest freq
        double avgMagnitude = meanMags.Average();
        // need a threshold or else we just get half the sample rate
 
        var nonzeroFreqs = Enumerable.Zip(meanMags, freqv, Tuple.Create)
            .Where(t => t.Item1 > avgMagnitude)
            .Select(t => t.Item2).ToList();
        double bandwidth = nonzeroFreqs.Max() - nonzeroFreqs.Min();
        
        return new FrequencyFeatures(spectralCentroid, bandwidth, data);
    }
}
 
public class FrequencyData {
    public double[] Frequencies { getprivate set; }
    public double[] Magnitudes { getprivate set; }
 
    public FrequencyData(double[] frequencies, double[] magnitudes) {
        Frequencies = frequencies;
        Magnitudes = magnitudes;
    }
}

Hope that helps others trying to work with the FFT. I may be making some assumptions (for instance, do we need to account for the overlapping of the windowed chunks?), so anyone please correct my code if you see errors.
Audio.cs

Robbie Cooper

unread,
Jul 26, 2014, 5:27:42 AM7/26/14
to accor...@googlegroups.com
Hey Pat

Did you do any more with this? I'm cribbing from your code to drive visual effects with amplitude. I'm most interested in what you found about how to interpret the data...

Best, R

Pat

unread,
Jul 27, 2014, 5:23:37 PM7/27/14
to accor...@googlegroups.com
I ended up creating a GUI to classify music files as music or speech (from a very limited and often hardcoded set of files) which you can see at https://www.youtube.com/watch?v=U5ROGTfkiYI&feature=youtu.be&hd=1. If I remember correctly, I used Accord for extracting features from the audio file and then called the python script to do the actual classification (using scikit-learn).

If you want to download the final product that I submitted for the class assignment, it'll be at https://dl.dropboxusercontent.com/u/313647/musicClassifierAccordNetAndPython.zip for at least a few days. Feel free to treat it as MIT licensed unless otherwise specified. If there is demand, I can convert the Hg repo to git and post on github.

Hope that helps!
Pat
Reply all
Reply to author
Forward
0 new messages