--
You received this message because you are subscribed to the Google Groups "audioset-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to audioset-users+unsubscribe@googlegroups.com.
To post to this group, send email to audioset-users@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/audioset-users/6d148131-2e7b-4fe0-928c-febf2dc138b3%40googlegroups.com.
SAMPLE_RATE = 16000
STFT_WINDOW_LENGTH_SECONDS = 0.025STFT_HOP_LENGTH_SECONDS = 0.010
window_length_samples: 400hop_length_samples: 160
fft_length = 2 ** int(np.ceil(np.log(window_length_samples) / np.log(2.0))fft_length: 512
Is this the desired behavior (fft applied to data array smaller than fft_size)?
Why not make the window size a power of 2?
Is it conceptually okay to execute an fft with input size smaller than fft_size?
What is the behavior of np.fft.rfft() when passed fewer samples than fft_size?
Yu -The 25 ms window / 10 ms hop is inherited from speech recognition (which means optimized for speech spectra and phoneme durations, which is not particularly relevant to audio events, but has worked out OK).Using 64 mel bands instead of the more customary 40 was basically to get a power of 2 which is a little "cleaner" for the factor-2 downsampling in the CNN. More spectral resolution seems useful, but with the normal mel spectrum implementation you don't want to go too fine (and risk aliasing against the FFT bins).We chose a ~1 sec window somewhat arbitrarily: Originally, we were using ~200 ms input patches, but wanted a more generous time context to be able to make use of wider time structure. But there are diminishing returns for very large windows. We went with 96 frames of 10 ms rather than exactly 100 so we could decimate by 2 five times and still get an integer size.I would say we've informally investigated each of these choices without finding anything that makes a startling difference, but it's quite possible there's something we're missing. I'd be very glad to see more systematic and quantitative investigation.DAn.
On Mon, Sep 18, 2017 at 6:37 PM, <yu-cha...@t-online.de> wrote:
Hi Manoj,I have a questions about setting hyperparameter of feaure extraction. Could you tell me, how did you select value for the following parameter? just from experience?NUM_FRAMES = 96 # Frames in input mel-spectrogram patch.NUM_BANDS = 64 # Frequency bands in input mel-spectrogram patch.STFT_WINDOW_LENGTH_SECONDS = 0.025STFT_HOP_LENGTH_SECONDS = 0.010Best RegardsYu Changsong
--
You received this message because you are subscribed to the Google Groups "audioset-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to audioset-user...@googlegroups.com.
To post to this group, send email to audiose...@googlegroups.com.
To unsubscribe from this group and stop receiving emails from it, send an email to audioset-users+unsubscribe@googlegroups.com.
To post to this group, send email to audioset-users@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/audioset-users/cf32ef5a-65a7-42d6-87de-f4c1770d4f28%40googlegroups.com.
This works out to:window_length_samples: 400hop_length_samples: 160And fft_length is calculated as:fft_length = 2 ** int(np.ceil(np.log(window_length_samples) / np.log(2.0))fft_length: 512So a 512 length fft (np.fft.rfft()) is applied to a data array of size 400. My understanding is that fft_length should match size of data being passed. It is not clear to me how np.fft.rfft() behaves when passed fewer samples than fft_size.
Just to make my confusion more concrete, here are some questions that may help me understand this:Is this the desired behavior (fft applied to data array smaller than fft_size)?
Why not make the window size a power of 2?
Is it conceptually okay to execute an fft with input size smaller than fft_size?
What is the behavior of np.fft.rfft() when passed fewer samples than fft_size?
To unsubscribe from this group and stop receiving emails from it, send an email to audioset-users+unsubscribe@googlegroups.com.
To post to this group, send email to audioset-users@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/audioset-users/cf32ef5a-65a7-42d6-87de-f4c1770d4f28%40googlegroups.com.
python vggish_inference_demo.py --wav_file helicoper.wav --tfrecord_file helicopter.tfrecord --pca_params vggish_pca_params.npz --checkpoint vggish_model.ckpt