CDBN for Audio Data

89 views
Skip to first unread message

John Bell

unread,
Jan 1, 2016, 6:42:09 AM1/1/16
to Caffe Users
I have a CDBN......


A python script to create spectrograms from MP3.....


So it's looking like the tools are coming together, pre-processing seems ok in making a spectrogram and applying PCA.  I am a little unclear on how to train layer, by layer...

"First, we extracted the spectrogram from each utterance of the TIMIT training data [13]. The spec-trogram had a 20 ms window size with 10 ms overlaps. The spectrogram was further processed using PCA whitening (with 80 components) to reduce the dimensionality. We then trained 300 first-layer bases with a filter length (n W ) of 6 and a max-pooling ratio (local neighborhood size) of 3. We further trained 300 second-layer bases using the max-pooled first-layer activations as input, again with a filter length of 6 and a max-pooling ratio of 3."

So how do we train each layer as per the protocol?

Regards,

Daniel
Reply all
Reply to author
Forward
0 new messages