I would like to learn a representation that is invariant to the ordering of frames
Q. If I want to use lasagne's conv1D layer, how should I be presenting each datapoint to the network?
Do I need to transpose each datapoint so that the net sees 60x300?
I would like to learn a representation that is invariant to the ordering of frames
I assume you meant invariant to the ordering of features? (Otherwise a convolution doesn't make sense.)
Q. If I want to use lasagne's conv1D layer, how should I be presenting each datapoint to the network?
As per the documentation, it should be of shape (batchsize, channels, frames), where "channels" would be your 60 features.Do I need to transpose each datapoint so that the net sees 60x300?
Either this (using a DimshuffleLayer), or you present your data as (batchsize, 1, 300, 60) and use a Conv2DLayer with filter_size=(something, 60) for the first layer, and (something, 1) for subsequent ones. The Conv1DLayer internally uses a 2D convolution anyway, so performance will probably be about the same (depends on what the underlying convolution implementation does).
Thanks,
Best, Jan
Jan, I know you do a lot of work with musical audio. I am trying to classify speakers from these 3 second clips. Is that maybe too long a clip to be feeding a CNN?
network['input'] = InputLayer(shape=(None,300,20), input_var = net_input)
35 batchs,_,_ = network['input'].input_var.shape
36 ··
37 network['reshape'] = ReshapeLayer(network['input'],(batchs,1,300,20))
38
39 #convolutional layers
40 network['conv1'] = batch_norm(ConvLayer(network['reshape'], 256,(3,20),stride=1,pad='full',flip_filters=False, W=HeNormal('relu')))
41 network['pool1'] = MaxPool2DLayer(network['conv1'],(3,20))
42 ··
43 network['conv2'] = batch_norm(ConvLayer(network['pool1'], 256,(3,20),stride=1,pad='full',flip_filters=False, W=HeNormal('relu')))·
44 network['pool2'] = MaxPool2DLayer(network['conv2'],(3,20))
45 ··
46 network['conv3'] = batch_norm(ConvLayer(network['pool2'], 256,(3,20),stride=1,pad='full',flip_filters=False, W=HeNormal('relu')))·
47 network['pool3'] = MaxPool2DLayer(network['conv3'],(3,20))
48
49 #One fully connected layer
50 ··
51 network['fc1'] = batch_norm(DenseLayer(network['pool3'],1024,nonlinearity=lasagne.nonlinearities.rectify, W=HeNormal('relu')))
52 network['fc1_drop'] = DropoutLayer(network['fc1'],p=0.5)
53 #softmax
54 network['fc2'] = DenseLayer(network['fc1_drop'],1628,nonlinearity=None)
55 network['prob'] = NonlinearityLayer(network['fc2'],nonlinearity=lasagne.nonlinearities.softmax)
In general, is my net doing what I want it to do?
Suggestions for network architectures would be most welcome
Is the pooling happening correctly with (3,20) or does the size 1 rule also apply to only the first 2D layer?
I am going to try to see if I can figure out (analogously) if I can do the same network but doing a 1d convolution through frequecy.