Multi-task learning on alternating datasets

yazic...@gmail.com

unread,

Mar 22, 2016, 2:37:33 AM3/22/16

to lasagne-users

I'm trying to construct a network for multi-task learning with two different dataset each for different task. At some point, network splits into two part and only one part work depend on which dataset is used. I know how to construct such a network with a single dataset which contains multi-task related labels, however datasets are separate. When given a sample from specific dataset, information should propagate on some part of the network, while the other part deactivated.

goo...@jan-schlueter.de

unread,

Mar 22, 2016, 7:50:11 AM3/22/16

to lasagne-users

Just construct your network as if you had all inputs and targets for all the tasks. Call lasagne.layers.get_output() with all output layers (i.e., for both tasks):
outputA, outputB = lasagne.layers.get_output([outputlayerA, outputlayerB])

Then construct two separate loss expressions, one per task.
lossA = something(outputA, targetA)
lossB = something(outputB, targetB)

Construct two separate update dictionaries, one per task, updating only the parameters involved in that task.
paramsA = lasagne.layers.get_all_params(outputA, trainable=True)
paramsB = lasagne.layers.get_all_params(outputB, trainable=True)
updatesA = lasagne.updates.nesterov_momentum(lossA, paramsA, ...)
updatesB = lasagne.updates.nesterov_momentum(lossB, paramsB, ...)

Compile two separate training functions:
train_fn_A = theano.function([inputA1, inputA2, ..., targetA], lossA, updates=updatesA)
train_fn_B = theano.function([inputB1, inputB2, ..., targetB], lossB, updates=updatesB)

The list of inputs could also contain inputs that are shared between the tasks.

Then, in your training loop, always call the function matching the task you've got a batch from.

Hope this helps!
Jan

goo...@jan-schlueter.de

unread,

Mar 22, 2016, 7:52:16 AM3/22/16

to lasagne-users

Then, in your training loop, always call the function matching the task you've got a batch from.

(It will then only propagate the data through the part of the network that is needed to compute the loss involved in that training function, with the remaining parts of the network unused. That's the power of computational graphs :) )

yazic...@gmail.com

unread,

Mar 23, 2016, 3:24:19 AM3/23/16

to lasagne-users, goo...@jan-schlueter.de

Thanks. I was trying to put both of the dataset into the same batch. I guess that was the reason I couldn't see this method. It seems to me this solution quite close to what I want.

goo...@jan-schlueter.de

unread,

Mar 23, 2016, 5:46:54 AM3/23/16

to lasagne-users

Thanks. I was trying to put both of the dataset into the same batch.

If you want to train on both at once (rather than in turns), you can follow a variation of the steps I outlined before:

Construct two separate loss expressions, one per task.

lossA = something(outputA, targetA)
lossB = something(outputB, targetB)

Construct an update dictionaries for both tasks at once:
params = lasagne.layers.get_all_params(

[outputA, outputB], trainable=True)
updates = lasagne.updates.nesterov_momentum(lossA + lossB, params, ...)

Compile a training function:
train_fn = theano.function([inputA1, inputA2, ..., inputB1, inputB2, ..., targetA, targetB], [lossA, lossB] updates=updates)

This way the training function accepts inputs and targets for both tasks at once and gives the losses for both tasks. If the tasks share the same inputs, but only have different targets, your train_fn will simplify to something like theano.function([inputs, targetsA, targetsB], ...).

yazic...@gmail.com

unread,

Mar 24, 2016, 2:26:58 AM3/24/16

to lasagne-users, goo...@jan-schlueter.de

But in this case, how does loss functions know which input corresponds to which target. Because 'inputs' is a batch of samples from two different datasets, while 'targetA' and 'targetB' are the labels of these samples but one-to-one mapping between samples and labels are not known for the network above. If I follow the above code, each sample has two different targets while it should only have one. For example, 'outputA' is computed with all input samples while 'targetA' corresponds to only some part of it. Maybe I can interpolate 'targetA' and 'targetB' with zero vectors to where they don't correspond to input samples. In that case cross entropy is zero for non corresponding samples, but it waste computation in some degree.

As a second option, if 'inputs' are split into 'inputs1' and 'inputs2', correspondence can be solved, but both inputs must share parameters. I'm not sure whether it is the case or not.

goo...@jan-schlueter.de

unread,

Mar 24, 2016, 7:09:32 AM3/24/16

to lasagne-users

As a second option, if 'inputs' are split into 'inputs1' and 'inputs2', correspondence can be solved, but both inputs must share parameters. I'm not sure whether it is the case or not.

So the two tasks start from the same input representation, but not from the same samples, right? In that case, you should pass two input batches, and also do two separate forward passes through the network.
inputA = T.tensor4() # or whatever dimensionality you have
inputB = T.tensor4()
outputA = lasagne.layers.get_output(outputlayerA, inputA)
outputB = lasagne.layers.get_output(outputlayerB, inputB)
lossA = ...
lossB = ...
params = ...([outputlayerA, outputlayerB], trainable=True)
updates = ...(lossA + lossB, params, ...)
train_fn = theano.function([inputA, inputB], [lossA, lossB], updates=updates)

Since the two forward passes share the same network parameters, the parameters will be updated using the cumulative gradient information from both tasks.

panpan...@gmail.com

unread,

Dec 19, 2016, 1:20:47 PM12/19/16

to lasagne-users, yazic...@gmail.com

On Tuesday, March 22, 2016 at 1:37:33 AM UTC-5, yazic...@gmail.com wrote:

I'm trying to construct a network for multi-task learning with two different dataset each for different task. At some point, network splits into two part and only one part work depend on which dataset is used. I know how to construct such a network with a single dataset which contains multi-task related labels, however datasets are separate. When given a sample from specific dataset, information should propagate on some part of the network, while the other part deactivated.

Could you give me some guide how to construct a network for multi-task learning with one dataset for two different task using lasagne? it would be better if you could post some codes snippets. I am a newbie for deep learning, and thank you very much.

Jan Schlüter

unread,

Jan 6, 2017, 5:30:03 PM1/6/17

to lasagne-users

Could you give me some guide how to construct a network for multi-task learning with one dataset for two different task using lasagne? it would be better if you could post some codes snippets. I am a newbie for deep learning, and thank you very much.

If you have two tasks using the same input, you will probably need two use two output layers that share some intermediate layers. It's easier to demonstrate if you give some more details on what the tasks would be -- two different multi-class classification problems? Two binary ones? Two regression tasks? (The latter two can be solved with a single output layer.)

zhangm...@gmail.com

unread,

Jun 10, 2017, 4:58:08 PM6/10/17

to lasagne-users

What if the numbers of examples from Dataset A and Dataset B are not identical?

Will you have some empty input for A or B?

Jan Schlüter

unread,

Jun 14, 2017, 11:36:19 AM6/14/17

to lasagne-users, zhangm...@gmail.com

What if the numbers of examples from Dataset A and Dataset B are not identical? Will you have some empty input for A or B?

You could use different batch sizes for dataset A and B so they give the same number of batches per epoch, or not fully cover the larger dataset in each epoch.

Best, Jan

Reply all

Reply to author

Forward