Thanks Klemen. Also, thanks for reminding me to actually read the code (it's right there clear as day). :)
Please consider the encoding layers below. If I've calculated correctly to end up with 4x4x8 at the end.
My only issue is the MaxPool in Layer 1 which appears to have a ragged extra end that does not work (are these ends discarded?)
pooling (2,2), stride = 2, padding = 1 on only one sides means dimensions increase by 1 for width and height
conv2d padding = 1 on all sides means dimensions increase by 2 for width and height
MNIST input: 28,28,1
# Layer 1
x = Convolution2D(16, 3, 3, input_shape=(1, 28, 28), activation='relu', border_mode='same')(input_img)
28x28 -> 30x30 (with padding), convolve with 3x3 gives 28x28
resulting size: 16x28x28
x = MaxPooling2D((2, 2), border_mode='same')(x)
28x28 -> 29x29 (padded), pool with 2x2 stride=2 gives 14x14 with remainder of 1 (discard ragged ends? what happens here?)
resulting size: 16x14x14
# Layer 2
x = Convolution2D(8, 3, 3, activation='relu', border_mode='same')(x)
14x14 -> 15x15 (padded), conv 3x3 give 13x13
resulting size: 8x13x13
x = MaxPooling2D((2, 2), border_mode='same')(x)
13x13 -> 14x14 (padded), pool 2x2 stride=2 gives 7x7
resulting size: 8x7x7
# Layer 3
x = Convolution2D(8, 3, 3, activation='relu', border_mode='same')(x)
7x7 -> 9x9 (padded), conv 3x3 gives 7x7
resulting size: 8x7x7
x = MaxPooling2D((2, 2), border_mode='same')(x)
7x7 -> 8x8 (padded), pool 2x2, stride=2 gives 4x4
resulting size: 8x4x4
# at this point the representation is (8, 4, 4) i.e. 128-dimensional