Top-Down inference with Probabilistic Max Pooling - "Inverse" pooling?

28 views
Skip to first unread message

David Anderson

unread,
Feb 27, 2017, 3:57:34 PM2/27/17
to pylearn-dev
Hi there

I'm looking at using pylearn2.expr.probabilistic_max_pooling to handle the pooling layer in a convolutional DBN I am implementing.

I understand using the following:

max_pool(z, pool_shape, top_down=None)

to do the forward pass. This will pool my detection layer (z, which I call h_0 for the hidden units in the first layer of the DBN) down by a factor of pool_shape[0] in the first dimension, and pool_shape[1] in the second resulting in p_0 and p_0_sampled (if using theano_rng)

My question is then, is there an inverse, or top-down way of supplying p_0 (essentially v_1, the visible units in the 2nd layer of the DBN) and un-pooling to form h_0?

It may be a stupid question, and I may be misunderstanding how the top-down flow works with the pooling layers, as I realize there is even a top_down parameter, representing a top-down "...representing input from above" - but after a bit of a look in the slow python implementation, I'm not so sure this is doing what I'm hoping for.  

I wish to determine P(h_i,j = 1 | p_0 ) - this max_pool operation obviously requires "z" - the detection layer it will be pooling - but in my mind doing the top-down inference would only require the layer to un-pool

In fact, I don't fully understand why the max_pool operation returns h and h_samples (the detection layer expectation and sample) when the detection layer is the thing that was being pooled down!

Any help in understanding top-down would be much appreciated!

Thanks

David Warde-Farley

unread,
Feb 27, 2017, 6:02:38 PM2/27/17
to pylea...@googlegroups.com
Hi,

This code is very much unmaintained (for years now) but I think the problem here is conceptual. You should think of z and top_down as activations rather than as hidden units, and each pool as a single "unit", which decides which detector layer binary unit (if any) to turn on. You'd typically have weights between a hidden layer of independent binary units and this thing on either side (I think; read the paper for details).

It's difficult to shoe-horn something resembling max-pooling into an RBM-like framework and still have sane probabilistic semantics with tractable inference. The way they accomplished it was, for each non-overlapping m x n pool in the 2D input, to introduce a m*n+1-way discrete random variable, parameterized by a softmax. The m * n states correspond to one of the PMP "detector" units being on, and the m*n+1th state corresponds to all of them being off. If you think of the "detector layer" represented as m * n binary values, "pooling layer" is then just a deterministic OR of those values.

The way this is parameterized is a softmax over all the z's in a region; this only gives you m * n variables, which would be fine because a softmax only has N-1 degrees of freedom, and the probability of them all being off could be set to 1/(1 + sum_j exp(-z_j)), with unit k's probability of being on equal to exp(-z_k) /(1 + sum_j exp(-z_j)). Instead of that 1, however, we can modulate the whole thing with top-down input, and make replace the 1s in those expressions with exp(-top_down). This lets layers above decide if it's likely that any of the detector units fired at all. Making top_down large and positive makes exp(-top_down) small and thus increases the probability assigned to the other stuff in the softmax. Making it large and negative increases the relative probability that all of them are off.

So you can't really do a proper "top down" pass or "bottom up" pass through these units as such, because the state depends on both the stuff above and the stuff below. This makes the Convolutional DBN as proposed in the paper this is from (see the comments) really more of a Convolutional Deep Boltzmann Machine (but the concept of "bottom up pass" in DBNs was questionable to begin with). The original paper does several mean-field iterations backward and forward through the whole net to overcome this, the same way the DBM papers do.

--

---
You received this message because you are subscribed to the Google Groups "pylearn-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pylearn-dev+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

David Anderson

unread,
Feb 28, 2017, 3:35:20 AM2/28/17
to pylearn-dev
Thanks a lot David for your detailed reply. Appreciate it!

I think you are right - it is definitely a conceptual issue of me not fully understanding the pooling approach. I have not read a lot about DBMs and this is obviously where I am falling short - I have only really focused on RBM, real valued RBM, and finally CRBM and how they can be stacked to form DBNs.

Your comment regarding thinking about z and top_down as activations rather than the actual binary value of the units helps quite a lot. I did not realize the addition of probabilistic max pooling completely changes how the units are sampled from the activations to get their final binary values. I should have realized this as H. Lee proposes a new Energy function for the pooled RBM in section 3.3. Though in the section where he introduces Prob. Max Pooling, he does not really introduce the top-down activations as a part of the process of sampling h or p - it looks like only later in section 3.6 does he go on to describe this approach, and just a little foot note mentioning that they used mean-field approximation in the experiements. I should have paid closer attention.

So this all seems like a lot of effort to go through when the desired goal is simply shrinking dimensionality. From the paper:
 
In our approach, probabilistic max-pooling helps to address scalability by shrinking the higher layers

I may experiment with alternating CRBM layers of small conv kernel and large conv kernel rather than max-pooling in order to get acceptable dimensionality reduction across layers. Although in large input this might not be as effective, as a pooling ratio of 2 literally halves the size of the input - a stupidly large conv kernel would be needed in order to get the same reduction.

Please let me know if you think I have missed the point of what you were saying above, but i think you've definitely pointed me in the right direction.

Thanks again

To unsubscribe from this group and stop receiving emails from it, send an email to pylearn-dev...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages