Convolutional Auto-Encoder in Caffe, but still without pooling-unpooling layers

2,747 views
Skip to first unread message

Volodymyr Turchenko

unread,
Dec 7, 2015, 9:37:59 PM12/7/15
to Caffe Users

Hi Caffe Users,

 

We have created working Convolutional Auto-Encoder in Caffe, but still without pooling-unpooling layers. The CAE is working using the modified version of Caffe https://github.com/HyeonwooNoh/caffe.

The details of its creation we have published in a form of a paper at ArXiv

 

http://arxiv.org/ftp/arxiv/papers/1512/1512.01596.pdf

 

The paper is included in this post below too.


I have attached the following folder/files (in zipped form):

/VisInsideCAE – the folder which contains the files to paint Fig. 7 as in the paper.

All files in the root folder – .prototxt files and Matlab files – all of them were used for training CAE model (only 3 runs from total 9) and its visualization. I also have included resulting files and figures for these 3 runs. Also I have attached the MNIST test set in file <mnist_test.mat> and I have changed a bit the .m files (the loop still is not optimized, sorry) to work with this <mnist_test.mat> file. In the original visualization we have used another MNIST test set file, just bigger. That’s why the figures here in attachment are not exactly the same as in the paper. The .m files which visualize 10 and 30 dimensional CAEs need installed version of t-SNE (https://lvdmaaten.github.io/tsne/) on your machine/Matlab.

 

I hope this small result will help Caffe community to create a CAE with pooling-unpooling layers as well as answer some questions I have seen before in this group – how to visualize the network itself and how to use t-SNE for visualization.

 

I appreciate any feedback on the paper, especially about how we calculated the number of trainable parameters in encoder and decoder parts. In some papers, where Caffe was used, I seen another way to calculate the number of trainable parameters, so our approach may be inaccurate.

 

Cheers,

Vlad


PS. Oh, I am deleting <mnist_test.mat> from the zip archive, it is about 4Gb, it does not allow me to publish this post. I took it somewhere from GitHub I believe.

1512.01596.pdf
CAEzip.zip

Chun-Hsien Lin

unread,
May 11, 2016, 10:36:20 PM5/11/16
to Caffe Users
Hi Volodymyr,


How do you apply the caffemodel to only the encoder part?

Message has been deleted
Message has been deleted
Message has been deleted

Volodymyr Turchenko

unread,
Jun 3, 2016, 1:44:06 AM6/3/16
to Caffe Users
Hi guys,

I would like to inform that I have uploaded a corrected version of this paper to ArXiv on Apr 22, 2016.
Please do NOT upload the version of the paper from this post here (above).

The corrected version of the paper is available in ArXiv at the same link:
http://arxiv.org/ftp/arxiv/papers/1512/1512.01596.pdf

Cheers,
Vlad

понеділок, 7 грудня 2015 р. 19:37:59 UTC-7 користувач Volodymyr Turchenko написав:

Volodymyr Turchenko

unread,
Jun 3, 2016, 1:47:08 AM6/3/16
to Caffe Users
Hi Chun-Hsien,

in the published CAEzip.zip, there is a file <mnistCAE10sym0202.prototxt>.
This is a prototxt description of the encoder part.
Then there is .m script <visualCAE02pcsFig1m.m> - it calls the Caffe from Matlab and uses this encoder part and .caffemodel file (available as well) and produces 2 dimensional output of this CAE,
variable <pc2> in the Matlab script. Similar code is in other .m scripts for 10 and 30 dimensional CAEs.
I hope I answered your question.

Cheers,
Vlad

середа, 11 травня 2016 р. 20:36:20 UTC-6 користувач Chun-Hsien Lin написав:

Bruno Nascimento

unread,
Aug 23, 2016, 12:52:40 PM8/23/16
to Caffe Users
Hi guys, if I'm not mistaken, this prototxt files have outdated syntax. Any chance we can get updated ones?
Also another issue:
- on the prototxt file, the last deconvolution layer has 1 output:
# -- convert back to 784 elements --
layers {
  name: "deconv1neur"
  type: DECONVOLUTION
  bottom: "deconv1"
  top: "deconv1neur"
  blobs_lr: 1
  blobs_lr: 3
  convolution_param {
    num_output: 1
    kernel_size: 1
    stride: 1
    weight_filler { type: "xavier" }
    bias_filler { type: "constant" }
  }

isnt num_output suppose to be 784 (28x28) for the mnist dataset?
thank you

Volodymyr Turchenko

unread,
Aug 23, 2016, 1:30:40 PM8/23/16
to Caffe Users
Hi Bruno,

1. How to deal with an outdated syntax is explained in the last paragraph before section "4 Conclusions" in our arXiv paper. Anyway, I am attaching the prototxt file with the new syntax.

2. num_output should be 1, it is correct. It restores 1-channel image with dimension 28x28. Details are also in the arXiv paper.

Cheers,
Vlad

вівторок, 23 серпня 2016 р. 10:52:40 UTC-6 користувач Bruno Nascimento написав:
train-mnistCAE12sym0302.prototxt

Bruno Nascimento

unread,
Aug 24, 2016, 10:19:19 AM8/24/16
to Caffe Users
Thank you for your reply, I was reading the updated paper you guys have on https://arxiv.org/ftp/arxiv/papers/1512/1512.01596.pdf
I was reading exactly what you meant for the reshape layer. This prototxt help me alot. I thank you for that. 
Another question, I am still applying this for the MNIST dataset which is 28x28, but eventually I would like to apply this for other datasets, for e.g. to encode patterns on images much larger than 28x28. After training, how can I apply this learned model, on a fully connected network fashion, such that I eventually could get a feature map of the output?
Thank you very much for your help.

Volodymyr Turchenko

unread,
Aug 25, 2016, 1:34:36 AM8/25/16
to Caffe Users
Hi Bruno,

If I understood your question correctly, you ask how to receive an encoded low-dimensional representation?
See my answers above to Chun-Hsien Lin.
If this is not a question you wanted to ask, please re-formulate it, I don't understand.

Cheers,
Vlad

середа, 24 серпня 2016 р. 08:19:19 UTC-6 користувач Bruno Nascimento написав:

Bruno Nascimento

unread,
Aug 25, 2016, 5:52:36 AM8/25/16
to Caffe Users
Hi again Vlad, 
No, my question is different. Will try to explain better - I see that this particular network architecture receives only 28x28 images (In this case for MNIST dataset). My real question is, how can I apply a convolutional autoencoder like this one, or similar to images with larger input sizes. e.g. 100x100 or 576x576. 

If I had a MNIST test sample that was trained under 28x28 input size, in this architecture, but wanted to apply the learned patterns into a 100x100 image like the one above, how could I do it, any ideas? 

Bruno Nascimento

unread,
Aug 25, 2016, 10:49:28 AM8/25/16
to Caffe Users
Another question Vlad, sorry to bother you.
I tried to use this same model for 32x32 input data, unlike the 28x28 used for MNIST. I had to adapt the deconv kernel sizes to make it happen. Like this:

layer {
  name: "deconv2"
  type: "Deconvolution"
  bottom: "ip1decodesh"
  top: "deconv2"
  param { lr_mult: 1 decay_mult: 1 }
  param { lr_mult: 2 decay_mult: 0 }
  convolution_param {
    num_output: 16 
    kernel_size: 14 #instead of 12   <-------------------------
    stride: 1
    weight_filler { type: "xavier" }
    bias_filler { type: "constant" value: 0 }
  }
}

...


layer {
  name: "deconv1"
  type: "Deconvolution"
  bottom: "deconv2"
  top: "deconv1"
  param { lr_mult: 1 decay_mult: 1 }
  param { lr_mult: 2 decay_mult: 0 }
  convolution_param {
    num_output: 6
    kernel_size: 19 <----------- instead of 17
    stride: 1
    weight_filler { type: "xavier" }
    bias_filler { type: "constant" value: 0 }
  }
}

Is this something you recommend? Is there any rules of thumb for adapting this?
Thank you very much for your time. 

Bruno

Volodymyr Turchenko

unread,
Aug 25, 2016, 9:31:39 PM8/25/16
to Caffe Users

Hi Bruno,

 

I think you have to read more about convolutional/deconvolution layers because you have to understand yourself that when we do convolution operation we decrease the size of the output feature maps and when we do deconvolution operation we increase the size of the output feature maps. This is how encoding-decoding paradigm works.

Read paper by Masci (2011)

J. Masci, U. Meier, D. Ciresan, J. Schmidhuber, Stacked convolutional auto-encoders for hierarchical feature extraction, Lecture Notes in Computer Sci. 6791 (2011) 52-59.

On that paper on page 3, in the paragraph before formula 3, the authors have explained two formulas (you can find these formulas in many other papers), how the sizes of output feature maps are decreasing and increasing. So, according to them, our convolution layer in Caffe implements a ‘valid’ convolution operation and our deconvolution layer in Caffe implements a ‘full’ convolution operation.

 

When you understand this, I suggest to see the file ‘mnistCAE10sym0202-03.log’ in my CAEzip.zip archive. This is a log, how the network is working, and there you can see, how the sizes are decreased in the encoder part and then how the sizes are increased in the decoder part. Start to see from these lines

 

I1123 14:58:29.456792 31450 net.cpp:105] Top shape: 100 1 28 28 (78400)

I1123 14:58:29.456801 31450 net.cpp:105] Top shape: 100 1 28 28 (78400)

I1123 14:58:29.456809 31450 net.cpp:105] Top shape: 100 1 28 28 (78400)

I1123 14:58:29.456816 31450 net.cpp:115] Memory required for data: 1254400

I1123 14:58:29.456823 31450 layer_factory.hpp:78] Creating layer conv1

I1123 14:58:29.456851 31450 net.cpp:69] Creating Layer conv1

I1123 14:58:29.456861 31450 net.cpp:396] conv1 <- data_data_0_split_0

I1123 14:58:29.456887 31450 net.cpp:358] conv1 -> conv1

I1123 14:58:29.456908 31450 net.cpp:98] Setting up conv1

I1123 14:58:29.457490 31450 net.cpp:105] Top shape: 100 8 20 20 (320000)

I1123 14:58:29.457504 31450 net.cpp:115] Memory required for data: 2534400

I1123 14:58:29.457533 31450 layer_factory.hpp:78] Creating layer sig1en

I1123 14:58:29.457551 31450 net.cpp:69] Creating Layer sig1en

I1123 14:58:29.457561 31450 net.cpp:396] sig1en <- conv1

I1123 14:58:29.457584 31450 net.cpp:347] sig1en -> conv1 (in-place)

I1123 14:58:29.457599 31450 net.cpp:98] Setting up sig1en

 

It corresponds to Table 2 in my paper, read this Table carefully and think about it. Then you will understand everything.

 

So, when you have to create a model to work with other sizes of images, the goal is to restore the same image size on the output, so you should play with the sizes of filters (second column of Table 2) which can give you the same output image size in the end of the decoder part. You either can (using these two formulas) calculate everything theoretically or run the model with some initial size of filters, it will not probably work, BUT you can see log file what is inside the network from layer to layer and you will see where the error is, it gives you an idea what sizes of filters should be for your particular problem.

 

Cheers,

Vlad



четвер, 25 серпня 2016 р. 08:49:28 UTC-6 користувач Bruno Nascimento написав:

Volodymyr Turchenko

unread,
Jan 20, 2017, 12:54:25 AM1/20/17
to Caffe Users
Hi Caffe Users,

There is an update on this work, a Convolutional Auto-Encoder with polling-unpooling layers.

https://arxiv.org/abs/1701.04949

CAE models researched in this paper are attached in .zip file


Cheers,
Vlad

понеділок, 7 грудня 2015 р. 19:37:59 UTC-7 користувач Volodymyr Turchenko написав:

Hi Caffe Users,

deepCAE5models.zip

Ayesha Siddiqua

unread,
Mar 24, 2017, 5:46:32 PM3/24/17
to Caffe Users
Hi,

Would you please provide the matlab visualization files? I can not find those.

Many Thanks
Ayesha

Volodymyr Turchenko

unread,
Mar 27, 2017, 11:41:39 AM3/27/17
to Caffe Users
Hi Ayesha,

Did you, by the chance, look inside the <deepCAE5models.zip> archive? They all (matlab visualization files) are there.

Cheers,
Vlad

пʼятниця, 24 березня 2017 р. 17:46:32 UTC-4 користувач Ayesha Siddiqua написав:

Nathan Ing

unread,
May 5, 2017, 5:30:01 PM5/5/17
to Caffe Users
Hi Volodymyr,
Thanks for sharing your model defs. I'm glad that this problem is well studied already. Full disclosure: I have just been reading about the various implementations of CAE and haven't tried anything yet. That said, I have three questions:

1. Since master Caffe has a Deconvolution layer, is it still necessary to use the modified repo you linked?
At the bottom there's a set of nice images comparing the input with the learned recomposition image. Is it possible to do the same with Caffe's CAE? I'm imagining to take the blob "deconv1neur"
3. I see that your data is from HDF5 files not included in the .zip: Did you do any preprocessing to these images, since I see no data transformations in the prototxt model defs.

Thanks!

Nathan Ing

unread,
May 5, 2017, 7:05:03 PM5/5/17
to Caffe Users
For anyone trying to use the Master branch of Caffe to make this work:
RE #1: 
In the "log" files included in the zip show that the inner product layers with "dummy" dimensions for H and W: 
I1123 14:58:29.458076 31450 net.cpp:98] Setting up ip1encode
I1123
14:58:29.485141 31450 net.cpp:105] Top shape: 100 250 1 1 (25000)

I had to add a "Reshape" layer to add this dummy shape in order for Deconvolution layers to do their thing.

I hope there's nothing wrong with this approach. The loss seems to decrease over the first couple thousand iterations.

Thanks again for sharing the method!

Nathan Ing

unread,
May 6, 2017, 11:41:39 AM5/6/17
to Caffe Users
By the way, I'm using caffe-segnet which has "Unpooling" implemented as "Upsample" layer.

It takes different parameters, which still are a bit mysterious for me. But I see some improved results.

Volodymyr Turchenko

unread,
May 6, 2017, 8:11:27 PM5/6/17
to Caffe Users
Hi Nathan,

yes, you are welcome! Here I will try to answer on all your questions:

1. This repo by HyeonwooNoh contains his implementations (*.cu and *.cpp) of unpoooling layer in Caffe. I have checked it very carefully, it works correctly.
You should use either this repo (it is an old version of Caffe dated April 2015) or just copy the code to your another repo(s) and compile everything together.
I am now using
HyeonwooNoh unpooling layer with the newest version of Caffe (not Caffe 2), everything works perfectly.

2. I did not see that post you linked. Now I saw it, their results suggest that everything is working there correctly.
Yes, definitely, it is possible to have the same nice reconstructed images with Caffe's CAE. Have you read by chance my papers about Caffe's CAE implementation?
https://arxiv.org/pdf/1512.01596
https://arxiv.org/pdf/1701.04949

I showed there some examples of reconstructed images.

3. No, I did not do any data transformation, I just had the MNIST dataset as a mat file and I just converted it to HDF5 (from 2D to 4D) format.

4. Yes, you are right about "Reshape" layer. I have specified this little trick in my first paper 1512.01596, on page 7 before Conclusions

5. I am not familiar with caffe-segnet version yet.
But, as I said, HyeonwooNoh implementation of unpooling layer works correctly and exactly implements the ideas described in the following papers:
[28] M.D. Zeiler, D. Krishnan, G.W. Taylor, R. Fergus, Deconvolutional networks, in: 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, San Francisco, CA, 2010, pp. 2528-2535.
[29] M.D. Zeiler, G.W. Taylor, R. Fergus, Adaptive deconvolutional networks for mid and high level feature learning, in: 2011 IEEE International Conference on Computer Vision (ICCV), IEEE, Barcelona, 2011, pp. 2018-2025.

See more results about Caffe's CAE with pooling-unpooling layers in my paper
https://arxiv.org/pdf/1701.04949

Cheers,
Vlad



субота, 6 травня 2017 р. 11:41:39 UTC-4 користувач Nathan Ing написав:

Nathan Ing

unread,
May 8, 2017, 11:22:03 AM5/8/17
to Caffe Users
Hello, Thanks for the detailed response.

In the time since posting I've used that 'caffe-segnet' repo to produce some pretty satisfying results. For my application, the unpooling operation makes a great difference. One problem with what you mentioned in #1 is that the old repo still uses the "vision_layers.hpp" while newer versions have unrolled this into many include/layers/*.hpp files. Since I don't want to dig in C++ code aside from an occasional copy+paste, integrating his implementation seems too time consuming. (But I'm a C++ novice, so maybe it's easier than I think!). I should probably use their repo by itself and compare the results.

23577...@qq.com

unread,
Jun 14, 2017, 2:03:11 PM6/14/17
to Caffe Users
Hi Volodymyr,
Thanks for sharing your model . I'm glad that this problem is well studied already.Now,l download the five model configuration files and want to train the model  , when l see the configuration files,l have two question:
(1)In train-mnistCAE14sym02.sh,there is one code "TOOLS=/global/software/caffem/bin",l cannot find this bin in my own caffe.what is in the bin file?
(2)In train-mnistCAE14sym0202.prototxt,there is one code "source: "/global/home/vtu/caffemVTU/01data/trainnormfile.txt",what is in the "trainnormfile.txt"
Now,l only have the five model configuration files and my one caffe,l want to train the model ,but some questions happen,l think l do not have some files which are necessary for training,so ,do you have the whole project in github?l think that will be helpful for me.
Thanks!
在 2015年12月8日星期二 UTC+8上午10:37:59,Volodymyr Turchenko写道:
Reply all
Reply to author
Forward
0 new messages