[翻譯] 用於 FFT 程式中的 framedelta~ 和 frameaccum~ 定義 (轉貼自舊討論區)

266 views
Skip to first unread message

Chien-Wen Cheng

unread,
Jun 19, 2009, 9:24:57 PM6/19/09
to MAX/MSP/Jitter 互動音樂、互動藝術論壇
CWCheng
Site Admin


註冊時間: 2006-06-16
文章: 253
來自: University of North Texas
發表於: 29 六月 2006 12:05 am 文章主題: [翻譯] 用於 FFT 程式中的 framedelta~ 和
frameaccum~ 定義

--------------------------------------------------------------------------------

最近再重新試著學習FFT的程式寫作,因為FFT的功能太強大了,可以將聲音的每個頻率帶分析出來,分別處理,再合成輸出,所以可以做非常特別的聲音變
形。
然而這些理論對於念音樂的人來說,實在有些抽象、難以理解。以前教我老師Dr. Rovan 告訴我們這種東西,就是一直看,看一遍不熟,看兩遍、三
遍、四遍,最後就能夠運用。這好像跟我高中時期念數學、物理,藉由不斷的做題目,來「瞭解」抽象概念一樣。因此,在學max/msp的過程當中,我習慣
會把一些看不懂或者有些複雜的概念以及物件使用方法逐字翻譯成中文,藉以加深印象,也便於複習。

順便跟各位分享,歡迎修正您覺得不妥的翻譯,以下原文資料來自msp reference 手冊。願與各位共勉。

framedelat~: 用來計算正運行相位差(running phase deviation),藉由將前一個時間點的訊號向量和現在的訊號向量
相減而算出。換句話說,對於每一個訊號向量 (signal vector)而言,其輸出的第一個樣本將會是「當前的訊號向量的第一個樣本」減去「前一
個訊號向量的第一個樣本」所得之數值;同理,其輸出的第二個樣本則是「當前的訊號向量的第二個樣本」減去「前一個訊號向量的第二個樣本」所得之值,以此
類推。當使用在pfft~的副程式中時,他將會繼續保持FFT的運行相位差(running phase deviation),因為FFT 的大小和
訊號向量的大小是相等的。

The framedelta~ object computes a running phase deviation by
subtracting values in each position of its previously received signal
vector from the current signal vector. In other words, for each signal
vector, the first sample of its output will be the first sample in the
current signal vector minus the first sample in the previous signal
vector, the second sample of its output will be the second sample in
the current signal vector minus the second sample in the previous
signal vector, and so on. When used inside a pfft~ object, it keeps a
running phase deviation of the FFT because the FFT size is equal to
the signal vector size.

frameaccum~: 用來計算運行相位和,是輸入的訊號向量中的每一個位置之數值總和。換句話說,對於每一個訊號向量而言,第一個樣本將會是已經
接收到的「所有訊號向量的第一個樣本」之總和,第二個樣本將會是已經接收到的「所有訊號向量的第二個樣本」之總和,以此類推。當使用於pfft~物件當
中時,其可以保持FFT的運行相位,因為FFT的大小與訊號向量的大小相等

The frameaccum~ object computes a running phase by keeping a sum of
the values in each position of its incoming signal vectors. In other
words,
for each signal vector, the first sample of its output will be the sum
of all of the first samples in each signal vector it has received, the
second sample of its output will be the sum of all the second samples
in each signal vector, and so on. When used inside a pfft~ object, it
can keep a running phase of the FFT because the FFT size is equal to
the signal vector size.


回頂端


CWCheng
Site Admin


註冊時間: 2006-06-16
文章: 253
來自: University of North Texas
發表於: 30 六月 2006 02:49 am 文章主題:

--------------------------------------------------------------------------------

補充一下,這兩個物件是出現在 MSP tutorial 第二十六章最後一個範例中,用來做 phase vocoder 時會用到。利用
phase vocoder 的技巧,可以將聲音拉長或縮短,而不改變音高。

可惜範例說明只是輕描淡寫帶過,沒寫清楚為何在做 convolution 時不用這種物件,但製作 phase vocoder 時,卻要把FFT的
分析結果由直角座標轉為極座標。

目前只得到其他網友這樣的回覆,順便轉貼給有興趣的人做參考:

I agree that the phase wrapping is not necessary in ALL cases in a
spectral delay (I haven't look at the example in question though so
read on to find out when it is). If you miss it out you should also
store cartesian values in your buffer and forget about conversion to
polar co-ordinates at all because the trig is really expensive to do
this. In all my spectral work I try to avoid using polar values for
this reason. So far it has always been possible.

So, i said not in all cases - the reason for this is that for fixed
delay this works correctly (and arguably more accurately), but not for
variable delays (where NOT accumulating will not sound smooth because
the phases will not be read from consecutive frames in the buffer as
the delay changes so the resultant phase differences will not make
sense). In this case it is technically possible to phase accumulate
using cartesian geometry (using complex multiplies and divides) which
is cheaper. This is hard to do in msp code however. I have made an
external that does this (actually for my spectral delay - so i have
tried all these different options in practice) - I may post it to the
share site soon if I can find time to neaten it up and port to UB.

To summarise - if you want a fixed delay lose the trig and forget
about
accumulating. If it needs to vary over time then the accumulation will
sound much smoother whilst the delay is changing.
_________________
Chien-Wen Cheng's Music:

http://w3.nctu.edu.tw/~u8642524/index.htm

回頂端


timbre
高級會員


註冊時間: 2006-06-16
文章: 114
來自: 台灣
發表於: 21 十二月 2006 02:16 am 文章主題:

--------------------------------------------------------------------------------

轉貼一下cycling74 網路上的重要討論:
maybe it helps, if you think about a simple sinusoidal input and
imagine in what way each consecutive input frame differs from the
following (or the past).
you can take a look at the attached file. this is only a quick hack,
and the display is aliasing and might not be very accurate, but
hopefully it gives you the right idea of what is happening.

dividing the sampling rate by the fft size, you get the fft delta-bin
frequency, i.e. the frequency spacing of the fft bins.
if the input signal is exactly a multiple of the delta-bin freq, i.e.
if the input freq is at the center of an fft bin, the input signal is
a perfect multiple of the fft-frame size. that means every fft frame
looks the same, the phase is not moving, so the phase difference is
0.
but as soon as the input signal is not at the center of an fft bin,
the input signal does not fit in the fft frame an integer multiple of
times. so every input frame 'looks different', the phase is moving.
if the input frequency stays constant, the phase difference will also
be constant - but not 0!
in order to resynthesize the correct frequency, you'd have to account
for this moving phase, i.e. you'd have to add a constant phase-offset
for every fft frame -> accumulate the phase deltas...
don't know if that clears it up, but take a look at the attachment.

in fact this is an attempt of a simplified explanation of why you
have to deal with phase differences and not with actual phase values.
in reality it is a little more complex as usual...

#P window setfont "Sans Serif" 9.;
#P window linecount 1;
#P hidden newex 337 69 40 196617 t 5 0 b;
#P hidden newex 337 44 48 196617 loadbang;
#P window linecount 3;
#P comment 9 250 50 196617 calculate phase deltas;
#P window linecount 1;
#P comment 25 534 100 196617 phase delta;
#P newex 56 269 64 196617 phasewrap~;
#P comment 19 388 48 196617 + ¹ --;
#P comment 19 517 48 196617 - ¹ --;
#P flonum 56 347 60 9 0 0 0 3 0 0 0 221 221 221 222 222 222 0 0 0;
#P newex 56 298 39 196617 sah~;
#P newex 56 320 70 196617 snapshot~ 11;
#P user multiSlider 56 390 12 135 -3.2 3.2 1 2681 15 0 0 2 0 0 0;
#M frgb 0 0 0;
#M brgb 255 255 255;
#M rgb2 127 127 127;
#M rgb3 0 0 0;
#M rgb4 37 52 91;
#M rgb5 74 105 182;
#M rgb6 112 158 18;
#M rgb7 149 211 110;
#M rgb8 187 9 201;
#M rgb9 224 62 37;
#M rgb10 7 114 128;
#P comment 18 453 48 196617 -- 0 --;
#P newex 56 250 65 196617 framedelta~;
#P comment 103 387 48 196617 + ¹ --;
#P comment 103 516 48 196617 - ¹ --;
#P flonum 140 347 60 9 0 0 0 3 0 0 0 221 221 221 222 222 222 0 0 0;
#P flonum 224 129 45 9 0 0 0 3 0 0 0 255 227 23 222 222 222 0 0 0;
#N vpatcher 20 74 350 350;
#P window setfont "Sans Serif" 9.;
#P number 130 138 35 9 0 0 0 3 0 0 0 221 221 221 222 222 222 0 0 0;
#P newex 50 96 27 196617 i;
#P newex 50 74 34 196617 + 0.5;
#P inlet 50 30 15 0;
#P outlet 50 148 16 0;
#P connect 1 0 2 0;
#P connect 2 0 3 0;
#P connect 3 0 0 0;
#P connect 3 0 4 0;
#P pop;
#P newobj 186 226 43 196617 p round;
#P newex 169 249 27 196617 ==~;
#P newex 140 298 39 196617 sah~;
#P newex 140 320 70 196617 snapshot~ 11;
#P user multiSlider 140 389 12 135 -3.2 3.2 1 2681 15 0 0 2 0 0 0;
#M frgb 0 0 0;
#M brgb 255 255 255;
#M rgb2 127 127 127;
#M rgb3 0 0 0;
#M rgb4 37 52 91;
#M rgb5 74 105 182;
#M rgb6 112 158 18;
#M rgb7 149 211 110;
#M rgb8 187 9 201;
#M rgb9 224 62 37;
#M rgb10 7 114 128;
#P newex 255 228 45 196617 poke~ x;
#N vpatcher 20 74 409 376;
#P window setfont "Sans Serif" 9.;
#P newex 87 68 54 196617 dspstate~;
#P comment 162 129 100 196617 fft-size;
#P comment 162 103 100 196617 sampling rate;
#P flonum 101 154 56 9 0 0 0 3 0 0 0 221 221 221 222 222 222 0 0 0;
#P newex 101 126 40 196617 / 256.;
#P flonum 101 101 56 9 0 0 0 3 0 0 0 221 221 221 222 222 222 0 0 0;
#P comment 162 156 100 196617 delta-bin frequency;
#P outlet 101 190 15 0;
#P connect 7 1 2 0;
#P connect 2 0 3 0;
#P connect 3 0 4 0;
#P connect 4 0 0 0;
#P pop;
#P newobj 159 75 80 196617 p delta-bin freq;
#P newex 548 435 70 196617 buffer~ x 5.8;
#P message 548 387 32 196617 set x;
#P user waveform~ 186 387 363 137 3 9;
#W mode select;
#W mouseoutput none;
#W clipdraw 1;
#W unit samples;
#W grid 22.675737;
#W ticks 0;
#W labels 1;
#W vlabels 0;
#W vticks 0;
#W bpm 120. 4.;
#W frgb 33 0 0;
#W brgb 60 178 173;
#W rgb2 0 95 255;
#W rgb3 0 0 0;
#W rgb4 0 0 0;
#W rgb5 146 179 217;
#W rgb6 100 100 100;
#W rgb7 100 100 100;
#P user ezdac~ 609 89 653 122 0;
#P newex 97 214 53 196617 cartopol~;
#P flonum 97 56 45 9 0 0 0 3 0 0 0 40 204 140 222 222 222 0 0 0;
#P newex 97 103 72 196617 * 1.;
#P flonum 97 128 56 9 0 0 0 3 0 0 0 221 221 221 222 222 222 0 0 0;
#P newex 97 155 40 196617 cycle~;
#P newex 97 186 82 196617 fft~ 256 256 0;
#P comment 95 42 100 196617 bin number;
#P comment 156 129 28 196617 Hz;
#P comment 102 452 48 196617 -- 0 --;
#P comment 272 131 100 196617 phase ( 0. - 1. );
#P comment 302 230 137 196617 record time domain input;
#P comment 99 534 100 196617 actual phase value;
#P window linecount 2;
#P comment 47 207 50 196617 calculate phase;
#P window linecount 1;
#P comment 184 300 143 196617 s&h fft bin of interest;
#P comment 301 369 182 196617 time-domain input signal;
#P window linecount 2;
#P comment 239 531 239 196617 if the instantaneous phase of the input
frame is 1. (cosine) the fft will calculate a phase value of 0.;
#P window linecount 1;
#P comment 507 598 177 196617 volker bšhm vbo...@gmx.ch;
#P fasten 16 1 32 0 145 240 61 240;
#P connect 32 0 40 0;
#P connect 40 0 36 0;
#P connect 36 0 35 0;
#P connect 35 0 37 0;
#P connect 37 0 34 0;
#P fasten 26 0 36 1 174 291 90 291;
#P hidden connect 44 0 15 0;
#P connect 15 0 14 0;
#P connect 14 0 13 0;
#P connect 13 0 12 0;
#P connect 12 0 11 0;
#P connect 11 0 16 0;
#P fasten 28 0 12 1 229 151 132 151;
#P connect 11 1 16 1;
#P connect 16 1 25 0;
#P connect 25 0 24 0;
#P connect 24 0 29 0;
#P connect 29 0 23 0;
#P connect 21 0 14 1;
#P connect 11 2 26 0;
#P connect 26 0 25 1;
#P fasten 15 0 27 0 102 98 191 98;
#P connect 27 0 26 1;
#P hidden connect 19 0 18 0;
#P hidden connect 44 1 28 0;
#P fasten 12 0 22 0 102 177 260 177;
#P fasten 11 2 22 1 174 207 277 207;
#P hidden connect 43 0 44 0;
#P hidden connect 44 2 19 0;
#P window clipboard copycount 45;

回頂端


timbre
高級會員


註冊時間: 2006-06-16
文章: 114
來自: 台灣
發表於: 21 十二月 2006 03:15 am 文章主題:

--------------------------------------------------------------------------------

轉貼自 cycling74 討論區

frameaccum~ computes 'running phase' by adding an fft frame to the
previous frame as a vector (i.e. it adds the first sample of the first
frame to the first sample of the last frame, the second sample of the
first to the second sample of the last, etc.)... you can conceptualize
frameaccum~, framedelta~, and vectral~ as signal-rate equivalents of
doing math with vexpr. it puts out a vector with all of these sums in
the same order they originally came in. you need frameaccum~ when
you're building phase vocoders to prevent glitches when you read fft
analysis frames out of order (that's why in the phase vocoder examples
you record the difference between phases rather than the phases
themselves into the buffer~ objects). best. /luke

回頂端


lmj0316

註冊時間: 2006-11-23
文章: 49
來自: 成都
發表於: 22 十二月 2006 11:13 am 文章主題:

--------------------------------------------------------------------------------

恰巧,這兩天我也正在重溫第26章內容。借此機會也看了有關聲碼器的內容。

一些資料中記載:聲碼器也稱作相位聲碼器,是基於大量的聲音分析研究以後,產生出的專門跟蹤聲音中動態聲譜內容變化的合成單元。聲碼器可以被看作是一個
多段的帶通濾波器庫,每個帶通濾波器有著獨立的中央頻率,並包含獨立的振幅包絡跟蹤器。由於聲碼器將聲音的聲譜內容分成多段進行分析跟蹤,因此可以在改
變基頻的情況下,保留聲音聲譜中各分量之間的相對關係,也就是可以達到音高改變而音色質地不變的效果。在聲音合成與製作中,通常利用聲碼器跟蹤某一聲音
的音色特製,然後控制被合成音色。

在26章最後一例中,我覺得它利用fft的方法,通過調整loop size和loop offset來確定左端 sample中loop的截段()。
利用phasor來線性調整讀取sample截段的速度。通過子程式mypvoc中的round~規整為整數與frame size相乘,確保對應的採
樣點數是整數。無論速度讀取怎樣,sample的音質沒有發生改變。根據上段對聲碼器的描述,對應到本例的關係我理解為:“聲碼器可以被看作是一個多段
的帶通濾波器庫”,多段帶通實際上是fft的多個 frames。“每個帶通濾波器有著獨立的中央頻率”,則指的是pfft採用的相鄰frame交疊的
頻率點。“保留聲音聲譜中各分量之間的相對關係”也就是利用了framedelat~,frameaccum~來保證。

另外,我自己對於framedelat~,frameaccum~的理解。I/O vector size(I/O代表輸入∕輸出)控制一次進出聲音介
面的取樣點的量。Signal Vector Size是MSP一次所處理的取樣點數量, framedelta~就是计算每vector之间相位差,
即phase deviation between successive FFT frames,phasewrap再将度数转化到-pi~
+pi。“解码”的时候再通过frameaccum~把当前vector以前的所有delta值累加起来,恢复到phase。它們同時應用到本例,正因
為cartopol~和poltocar~的不一樣,x+yi和極座標的轉換。

但本例的作用似乎並沒有將聲碼器的強大功能展現出來。如何利用fft做出非常特別的聲音變形,思路還不是很明朗。請大家談談你們如何利用fft做出特別
的聲音變形的。

回頂端


timbre
高級會員


註冊時間: 2006-06-16
文章: 114
來自: 台灣
發表於: 16 一月 2007 12:50 pm 文章主題:

--------------------------------------------------------------------------------

lmj0316 寫到:
恰巧,這兩天我也正在重溫第26章內容。借此機會也看了有關聲碼器的內容。

但本例的作用似乎並沒有將聲碼器的強大功能展現出來。如何利用fft做出非常特別的聲音變形,思路還不是很明朗。請大家談談你們如何利用fft做出特別
的聲音變形的。


FFT 主要最常用在 time-stretch (不改變音高), transpose (不改變時間)兩種功能,不過跟 granular 很像的
地方是也是得先錄進 buffer 才能作變形,所以真正應用的時候,還是必須想個方法可以在即時演出時操控錄音。

當然其他功能還有很多,例如 spectral delay 或各種 filter 效果等。

另外還有一個物件叫 vectral~ 可以把鋼琴的聲音變成像管樂器。原理是甚麼,我還是不是很懂就是了。

Reply all
Reply to author
Forward
0 new messages