Which criterion to use with nn.MixtureTable()?

143 views
Skip to first unread message

Alexander Weiss

unread,
Mar 26, 2015, 8:23:37 PM3/26/15
to tor...@googlegroups.com
I am interested in using nn.MixtureTable() to implement a "Mixture of Experts" network:

The current documentation for nn.MixtureTable() gives two examples of how to build up a Mixture of Experts and perform a forward pass through it.  However, it does not show how to train it.  In particular, I am confused about which Criterion would be appropriate and which SoftMax functions would work with a given choice of Criterion (both on the gate and on the experts).   In each of the examples in the documentation, the gate has a nn.SoftMax() layer and the experts have no softmax layers.  In my case, the experts are classifiers, so I'm inclined to put nn.LogSoftMax() layers at the end of each one.

Here are my questions:

1)  Is there an available criterion in Torch7 that would be appropriate to use when training a Mixture of Experts architecture for which the experts are classifier-type neural nets?  The loss function must include information from both the gate and the experts, so nn.ClassNLLCriterion() certainly can't work.  There is an old paper with the title "Adaptive Mixtures of Local Experts" by Jacobs et. al. which provides a possible loss function (see equation 1.3).  Is there such a function currently available in Torch7?

2)  If there is an appropriate criterion available, which softmax function should be used on the gate?  And which softmax function should be used on the experts?

3)  Does anyone have a link to a useful example in which nn.MixtureTable() is used?

Mata Fu

unread,
Nov 29, 2016, 9:20:54 AM11/29/16
to torch7
I have the same problem, is there some can help?

在 2015年3月27日星期五 UTC+1上午1:23:37,Alexander Weiss写道:
Reply all
Reply to author
Forward
0 new messages