I'm currently trying to figure out how to get an FFT 'freeze' effect to work in pure data - that is, to take and hold a snapshot of a particular moment's harmonic/spectral content. A Max version of what I'm trying to achieve is demonstrated nicely here:
Although I have a reasonable understanding of how FFT works, I don't have any experience whatsoever of the FFT objects in pure data, and at the moment I'm a little confused by some aspects of how they work. Consequently, I'm having some trouble getting this to work. One useful thing I have found is this post on the (very nice!) rumblesan blog: =223 . It describes and contains a patch to do exactly what I have in mind. However, the sound of the frozen version of the sound seems quite distorted, and not very faithful to the original sound - particularly when compared to the max patch shown in the youtube video above.
the only patch I have so far is the one I downloaded directly from the rumblesan blog entry I linked above, =223. If you scroll to the bottom of that page, there is a link to download the patch in question there.
here's a way of spectral freezing using the running phase calculated by some cyclone objects. i remember it sounding a lot nicer than the above mentioned patch, but don't ask me for the exact mathematical reason.
i've been trying around a lot with this kind of stuff recently, but i haven't found a way to get rid of the weird phasing/oscillations when time stretching is applied to spectral output. i guess it has to do with phase locking and channel interferences, but the maths are way over my head. i wish there were some more spectral objects that handle that kind of stuff better. the phasing sounds cool when you just freeze input though.
yeah, that patch sounds much better. I can definitely work with this. Will have to have a look into it and see if I can follow exactly how it works, as I could only half-follow maelstorm's description. I think I get why running phase is important, though. Thanks both for your help!
what poltocar/cartopol does is that it converts the real and imaginary fft output to amplitude and phase data and back. this is basically just for convenience. they can easily be converted manually if you just apply the right formula. i'm sure there's also an easy manual way of calculating phase difference and running phase and stuff and turning it to re and im again for resynthesis. i don't think you can really discard either the re or the im part for resynthesis.
i was able to put this cartesian version of the freeze patch together. because it works without the polar conversion it should be more efficient than the other version (not that it matters for a tiny patch like this, but anyway).
i also added puckette style phase locking, and a purity parameter, that can filter out the weak frequencies.
enjoy!
Another thing I am interested in is pitch-shifting these drones, to play them at different frequencies. Given that the signals are already in the frequency domain, is this a relatively easy procedure, or is it actually non-trivial?
thanks!
the values are just for scaling the parameter right. the amplitudes before resynthesis have a really weird scaling, and it's hardly possible to have any kind of dB representation for the parameter. so the values are a product of trail and error and tweaking. dbtorms is for logscaling of course. it spreads out the sweetspot.
here's one more. i took the approach a little further and modified the patch to frequently crossfade between incoming spectral snapshots (or optionally trigger snapshots yourself). this is basically a knockoff from the new grm tools plugins. it doesn't sound exactly like it of course, but i think that is for the most part due to the phase locking. there's more sophisticated approaches to get the phasing out of time stretched fft sounds, but i haven't seen anything implemented in pd yet, so i had to go for the simple and classic method.
If you don't mind using quite large FFT sizes (2^16 to 2^17), it may be useful to completely randomize the phase data. Paulstretch works that way, and I've found a lot of useful variations of this basic idea.
paulstretch sounds amazing! i love the sound of extreme time stretching, but i haven't seen anything in pd so far. i can't really imagine how it works to use one big window and randomize the phases. can you maybe post something that demonstrates the idea or elaborate it a little?
My patch is still a little messy, and I think I'm still pretty naive about this frequency domain stuff. I'd like to get it cleaned up more (i.e. less incompetent and embarrassing) before sharing. I'm not actually doing the time stretch/freeze here since I was going for a real time effect (albeit with latency), but I think what I did includes everything from Paulstretch that differs from the previously described phase vocoder stuff.
Of course you can do this in the frequency domain if you just add some offset signal to the phase. The resulting output signal is smeared in time over the duration of the FFT frame, and enveloped by the window function. Conveniently, 50 ms corresponds to a frame size of 2048 at 44.1 kHz. The advantage of the frequency domain approach here is that the phase offset can be arbitrarily varied over time. You can get a time variant phase offset signal with a delay/wrap and some small amount of added noise: not "running phase" as in the phase vocoder but "running phase offset". It's also sensible here to scale the amount of added noise with frequency.
Say that you add a maximum amount of noise to the running phase offset- now the delay/wrap part is irrelevant and the phase is completely randomized for each frame. This is what Paulstretch does (though it just throws out the original phase data and replaces it with noise). This completely destroys the sub-bin frequency resolution, so small FFT sizes will sound "whispery". You need a quite large FFT of 2^16 or 2^17 for adequate "brute force" frequency resolution.
You can add some feedback here for a reverberation effect. You'll want to fully randomize everything here, and apply some filtering to the feedback path. The frequency resolution corresponds to the reverb's modal density, so again it's advantageous to use quite large FFTs. Nonlinearities and pitch shift can be nice here as well, for non-linear decays and other interesting effects, but this is going into a different topic entirely.
With such large FFTs you will notice a quite long Hann window shaped "attack" (again 2^16 or 2^17 represents a "sweet spot" since the time domain smearing is way too long above that). I find the Hann window is best here since it's both constant voltage and constant power for an overlap factor of 4. So the output signal level shouldn't fluctuate, regardless of how much successive frames are correlated or decorrelated (I'm not really 100% confident of my assessment here...). But the long attack isn't exactly natural sounding. I've been looking for an asymmetric window shape that has a shorter attack and more natural sounding "envelope", while maintaining the constant power/voltage constraint (with overlap factors of 8 or more). I've tried various types of flattened windows (these do have a shorter attack), but I'd prefer to use something with at least a loose resemblance to an exponential decay. But I may be going off into the Twilight Zone here...
Anyway I have a theory that much of what people do to make a sound "larger", i.e. an ensemble of instruments in a concert hall, multitracking, chorus, reverb, etc. can be generalized as a time variant decorrelation effect. And if an idealized sort of effect can be made that's based on the way sound is actually perceived, maybe it's possible to make an algorithm that does this (or some variant) optimally.
Acreil - I'll admit that some of that was a little over my head, but some aspects of it sound a little like what Zynaddsubfx (a softsynth) does in its 'padsynth' algorithm? Basically it takes a simple waveform and spreads/"smears" it, in a gaussian distribution, over a range of frequencies, with some slightly complex-looking frequency domain mathematics. As you suggest, it's quite similar to a chorus effect, really...
Uh, and since I'm here bumping this thread anyway... Don't suppose anyone has any suggestions for what I was asking about above - namely, a way to pitchshift the FFT data once I've taken a 'freeze' of it? (It should be pretty trivial, right? I mean, isn't that the whole point of the frequency domain? Unfortunately, I never did figure out how to do it).
I'll have to admit it's partly over my head too. I don't really know that much about frequency domain stuff. I just read some papers, made some connections between them, and dicked around with example I09. But I guess I hit on something a little more unique than I expected. I'll upload the patch if I get it cleaned up a little more. It should be more efficient too...
I think as far as Padsynth goes, you can imagine playing a sound into a reverberator with an infinite decay time, then sampling and looping the output. Only it leaves out the reverberator (and the coloration, etc. that it can add) and produces the result (randomized phase) directly. The output is inherently periodic, so it loops perfectly with no additional effort. I think Image Line's Ogun uses the Padsynth algorithm, and that NOTAM Mammut program can do much the same thing (I think it actually illustrates the effect really nicely). Padsynth does smear out the frequency components a little (I guess windowing sorta does that for STFT...), but the phase randomization is the important part if you're processing arbitrary audio input.
ff7609af8f