Dual network convergence

9 views

admmconvergenceconvexduallossnetworkoptimizationpycaffe

Skip to first unread message

Wojciech Dziwulski

unread,

Dec 10, 2016, 5:43:01 AM12/10/16

to Caffe Users

As a part of my master's thesis I am trying to train two separate deep networks but working on a common objective. This can be illustrated as:

i.e. we break the single deep network above to two constituent ones below. The first one tries to minimize a loss which is the difference between the conv3 and conv3' layers, the second one minimizes the usual loss between image labels and predictions.

The algorithm looks like:

So I am first forward propping the first net, then forward and back propping the second one, and then back propping the first one once some data is produced.

The thing is, I obtain very odd convergence behavior for the second net. The first one converges very fast, even exponentially, with not much noise. The second one, though, converges very, very slowly, with a lot of noise. The learning curves are:

Net12 is a reference - it is the complete net equivalent of the dual network. My question is, then, is there any obvious reason for such erratic behavior? I would be happy to provide more architectural details if needed.