VGG19 block5_conv4 shape is 15x15x512 instead of 14x14x512?

56 views
Skip to first unread message

maz...@gmail.com

unread,
Sep 12, 2017, 9:35:36 AM9/12/17
to Keras-users
Hi everyone,


I am trying to use the VGG19 implementation from keras.applications to extract feature maps from images during preprocessing for the "Show, Attend & Tell" image captioning model (Paper, GitHub).

I've faced an issue with the dimensionality of the VGG output, that makes the code crash during training.

According to the image captioning paper, they use "the 14×14×512 feature map of the fourth convolutional layer before max pooling" (p.6).I find this a little ambigious when comparing to the VGG19 architecture (Paper, p.3), but the very last convolutional layer "block5_conv4" gets me close to the expected dimensions. This layer (and other layers in that block), however, return a 15×15×512 tensor.


I feel like I am missing something obvious. Does anyone have an idea what the extra dimensions are? How would I identify which rows/columns to exclude to match the expected dimensionality?


All best,

Matthias

Reply all
Reply to author
Forward
0 new messages