How should a Convolution2D layer be added after an Embedding layer?

Kasper Marstal

unread,

Nov 17, 2016, 9:55:29 AM11/17/16

to Keras-users

Hi all,

I am struggling to do 2d convolution on a sentences with word embeddings. My sentences are encoded as n-by-1 vectors where each element points to a row in an m-by-d matrix where m is the number of words and d is the dimension of the embedding (all fairly standard). The network below is passed p-by-n matrix, where p is the batch size and n is the number of words per sentence (padded to maximum sentence length.

Using the code below I get the error "ValueError: Filter must not be larger than the input: Filter: (3, 3) Input: (1, 128)" which seems to suggest that the embedding layer is skipped entirely and that the the convolution layer operates directly on the p-by-n matrix (p = batch size, (1, 128), is one sample in a batch). This is obviously not true as many of your are using successfully using the embedding later. Can anyone see what I am doing wrong? I get same error with and without the Reshape() layer. Code below.

Thanks in advance!!

Kasper

VOCAB_DIM = 1024
EMBEDDING_DIM = 128
NUMBER_OF_SAMPLES = 256
MAX_SEQUENCE_LENGTH = 32
BATCH_SIZE = 16

model = Sequential()
model.add(Embedding(VOCAB_DIM, EMBEDDING_DIM, batch_size=BATCH_SIZE, input_length=MAX_SEQUENCE_LENGTH))

# Convolution layers
model.add(Reshape((1, MAX_SEQUENCE_LENGTH, EMBEDDING_DIM)))
model.add(Convolution2D(16, 3, 3, init='uniform', border_mode='same'))
model.add(Activation('relu'))
model.add(MaxPooling2D(16))
model.add(Dropout(dropout))

# Output layer
model.add(Dense(1))
model.add(Activation('sigmoid'))

# Build network
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['acc'])

dim...@gmail.com

unread,

Nov 18, 2016, 11:22:37 AM11/18/16

to Keras-users

Kasper,

Why not use 1d convolution (as in the IMDB example that comes with Keras)?

Dima

Kasper Marstal

unread,

Nov 22, 2016, 7:15:11 AM11/22/16

to Keras-users, dim...@gmail.com

Hi Dima

I am implementing Convolutional Neural Networks for Sentence Classification by Kim et al (https://arxiv.org/pdf/1408.5882v2.pdf) which uses 2D convolution. I appreciate your suggestion, but my question is not about whether to use 2D convolution, but rather how to do 2D convolution. Do you know how to set this up? Thank you for your input!

Kasper

Daπid

unread,

Nov 22, 2016, 7:29:57 AM11/22/16

to Kasper Marstal, Keras-users, dim...@gmail.com

On 22 November 2016 at 13:15, Kasper Marstal <kasper...@gmail.com> wrote:
> I am implementing Convolutional Neural Networks for Sentence Classification
> by Kim et al (https://arxiv.org/pdf/1408.5882v2.pdf) which uses 2D
> convolution.

No, it uses 1D convolution. Look carefully at figure 1.

Kasper Marstal

unread,

Nov 22, 2016, 10:40:34 AM11/22/16

to Keras-users, kasper...@gmail.com, dim...@gmail.com

Hi David

Ahh, is it 1D convolution with filters of width equal to the embedding dim?

Kasper

Daπid

unread,

Nov 22, 2016, 10:43:48 AM11/22/16

to Kasper Marstal, Keras-users, dim...@gmail.com

On 22 November 2016 at 16:40, Kasper Marstal <kasper...@gmail.com> wrote:
>
> Ahh, is it 1D convolution with filters of width equal to the embedding dim?

No, the width along the embedding dimension comes for free on a 1D
convolution (as the number of channels comes for free in a 2D). The
widths are 2 for the red ones and 3 for the yellow, in the
illustration.

Kasper Marstal

unread,

Nov 30, 2016, 4:38:21 AM11/30/16

to Keras-users, kasper...@gmail.com, dim...@gmail.com

I am up and running now David, thank you very much for your help!

For any googlers out there, my source of confusion stems from the fact that 1D convolution in Deep Learning jargon uses a 2D kernel in the strict mathematical sense. However, one of dimensions of the 2D kernel is equal to number of channels, so the kernel only moves in one direction.

(Let me know if I am wrong).

yuanlia...@gmail.com

unread,

Oct 27, 2017, 5:50:22 AM10/27/17

to Keras-users

I know this is old but I wonder if anyone seriously tried 2D convolution on series of word embeddings?

I suspect convolution over the components of word embeddings (not just time steps) may get something useful too. After all, a word vector maps semantics to locations (components).

aakash...@gmail.com

unread,

Mar 28, 2019, 10:58:26 AM3/28/19

to Keras-users

Hi Everyone,

Facing Same issue of adding conv2D after Embedding Layer. can anyone please provide code for same.

model_query = Sequential()
model_query.add(Embedding(input_dim = len(query_wordindex),
                    output_dim = EMBEDDING_DIM,
                    weights=[query_embed_matrix],
                    input_length=max_query_words,trainable=False,
                    ))

model_query.add(Reshape((1, max_query_words, EMBEDDING_DIM)))

model_query.add(Conv2D(4,3,EMBEDDING_DIM,activation='tanh',data_format='channels_first'))

Using above code, please help.

Reply all

Reply to author

Forward