TupleType

76 views
Skip to first unread message

Ian Goodfellow

unread,
Apr 22, 2013, 10:05:49 PM4/22/13
to thean...@googlegroups.com
I think it could be useful to have a TupleVariable / TupleType in theano. There are a few subtleties involved in this though.

1) Inplace stuff:

Suppose we make a MakeTuple op that joins some variables into a tuple:

tuple_var = make_tuple(singleton_var_1, singleton_var_2)

We'd like to be able to do this:
elem_1 = tuple_var[0]
assert elem_1 is singleton_var_1

But that will require that make_tuple has a view_map. Are we allowed to put views in the precompilation graph? I think so, but don't remember for sure.

2) Gradients:

Suppose we now say:

cost = g(singleton_var_1) + h(singleton_var_2)

I think in order to compute

T.grad(cost, tuple_var)

we need to special-case TupleVariable in T.grad itself.
T.grad should have a preprocessing step that unpackages any TupleVariables in the
wrt list and known_grads dictionary (checking for aliasing in the latter case). It can then call itself on the flattened version, and package the return value up into tuples again before returning. If there are nested tuples this will recurse as many times as the nesting is deep.


3) C code

Ideally we'd like to have theano functions compile down to pure C code, but now we have a python object as the runtime value of a theano variable. Is this a problem? We can just use the python C api, right?


Does anyone foresee any other subtleties?

Olivier Delalleau

unread,
Apr 22, 2013, 10:28:32 PM4/22/13
to thean...@googlegroups.com
Just a few initial thoughts / answers / questions:

#1: I don't think we should necessarily have "elem_1 is singleton_var_1". After all the index could be a symbolic variable, so I don't see why we should have to deal with constants in a special way.

#3. The Python C API should be able to handle Python tuples within C code.

What main use cases do you have in mind that would benefit from such a TupleVariable?

-=- Olivier



2013/4/22 Ian Goodfellow <goodfel...@gmail.com>

--
 
---
You received this message because you are subscribed to the Google Groups "theano-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to theano-dev+...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

David Warde-Farley

unread,
Apr 22, 2013, 10:42:44 PM4/22/13
to thean...@googlegroups.com

We can, but I think introducing more reliance on the Python C API is probably contrary to your other stated preferences re: reducing dependence of generated C code on the Python runtime. But, lots of Python objects are being passed around in C code currently, and I don't think adding handling for tuples would add much to the eventual amount for work necessary for such an eventuality.

>
> Does anyone foresee any other subtleties?
>

Ian Goodfellow

unread,
Apr 22, 2013, 10:48:02 PM4/22/13
to thean...@googlegroups.com
Quite a lot of use cases, really.

In pylearn2, it's pretty common to allow variables passed between methods to be either tuples or theano variables.
We use this to represent things like (detector_layer, pooling_layer) of a convolutional network.

We're also trying to move pylearn2 beyond the typical setup of having input features X and optionally output targets y.
We want to move to setups like having more than one kind of input (example: you have video and audio input) or more
than one kind of output (you're trying to predict a class label, and pose information). From the perspective of writing
generic pylearn2 code, it makes the most sense for the training algorithm code to just work on a single generic theano
variable called "data". To handle this efficiently, data will need to be a tuple / nested tuple.

The problem is, all of the code acting on such a variable has to have a branch that takes different actions depending on
whether data is a variable or a tuple, because tuples don't have owner, ancestors, name, etc. It's also not possible to
substitute in a value for a tuple using the givens dictionary.

There's a lot of existing pylearn2 code that could also be greatly simplified by being able to regard multiple theano
variables as one variable. For example, if we could view multiple shared variables as just being one "params"
a lot of the code for stuff like conjugate gradient descent would have way map/reduce operations that need to be
written out explicitly.

Using a TupleVariable to represent (real, imaginary) would also be an easy way to get complex number support on GPU,
provided that you wrote the right helper functions to do complex math on it using real math ops as primitives.

Frédéric Bastien

unread,
Apr 23, 2013, 9:18:57 AM4/23/13
to theano-dev
Why tuple and not list?

Implementing tuple is probably easier, as we always now the size when we build the graph, but I think a list object would be more useful. For example to allow scan to work on batch of image of different size.

More comments bellow.


On Mon, Apr 22, 2013 at 10:48 PM, Ian Goodfellow <goodfel...@gmail.com> wrote:
Quite a lot of use cases, really.

In pylearn2, it's pretty common to allow variables passed between methods to be either tuples or theano variables.
We use this to represent things like (detector_layer, pooling_layer) of a convolutional network.

We're also trying to move pylearn2 beyond the typical setup of having input features X and optionally output targets y.
We want to move to setups like having more than one kind of input (example: you have video and audio input) or more
than one kind of output (you're trying to predict a class label, and pose information). From the perspective of writing
generic pylearn2 code, it makes the most sense for the training algorithm code to just work on a single generic theano
variable called "data". To handle this efficiently, data will need to be a tuple / nested tuple.

What operation do you want to do on the tuple? Only to make them and access them like in Python? If you want more operation, someone will need to reimplement all the operation wanted on them.
 
The problem is, all of the code acting on such a variable has to have a branch that takes different actions depending on
whether data is a variable or a tuple, because tuples don't have owner, ancestors, name, etc. It's also not possible to
substitute in a value for a tuple using the givens dictionary.

Why not just always put the theano variable in a tuple of length one? We do that in Theano at a few places to simplify the code. You only add two lines, then you know for sure it is a tuple. Doing a new Variable/Type in Theano if this work around work well is not a good enough reason for the work needed I think. Is there reason why this "fix" wouldn't work for pylearn2?

There's a lot of existing pylearn2 code that could also be greatly simplified by being able to regard multiple theano
variables as one variable. For example, if we could view multiple shared variables as just being one "params"
a lot of the code for stuff like conjugate gradient descent would have way map/reduce operations that need to be
written out explicitly.

Sorry, I don't understand the last sentence: "would have way map/reduce operations that need to be
written out explicitly."?
 
Using a TupleVariable to represent (real, imaginary) would also be an easy way to get complex number support on GPU,
provided that you wrote the right helper functions to do complex math on it using real math ops as primitives.

I don't think this is the right approach. The new gpu ndarray have them working partly. In all case, we shouldn't re implement that part, but reuse what CUDA or OpenCL provide and if that isn't enough, reuse the NumPy C implementation that they have done. From memory this implementation don't depend on python.
 
Fred

Ian Goodfellow

unread,
Apr 23, 2013, 10:18:29 AM4/23/13
to thean...@googlegroups.com
On Tue, Apr 23, 2013 at 9:18 AM, Frédéric Bastien <no...@nouiz.org> wrote:
> Why tuple and not list?

I'm thinking of each TupleVariable as being indexable at compile-time using
constant indices to get the individual Variables back out. The container is thus
an important part of the compile-time Variable not just the value it
takes at runtime.
For that to be hashable / immutable, it needs to be a tuple and not a list.

We could also have a variable that represents something that is a list
at runtime,
and can only be interacted with symbolically at runtime, but that has
very different
uses.

>
> Implementing tuple is probably easier, as we always now the size when we
> build the graph, but I think a list object would be more useful. For example
> to allow scan to work on batch of image of different size.

That's fine, it's just not the use case I'm trying to address right now.


>
> More comments bellow.
>
>
> On Mon, Apr 22, 2013 at 10:48 PM, Ian Goodfellow <goodfel...@gmail.com>
> wrote:
>>
>> Quite a lot of use cases, really.
>>
>> In pylearn2, it's pretty common to allow variables passed between methods
>> to be either tuples or theano variables.
>> We use this to represent things like (detector_layer, pooling_layer) of a
>> convolutional network.
>>
>> We're also trying to move pylearn2 beyond the typical setup of having
>> input features X and optionally output targets y.
>> We want to move to setups like having more than one kind of input
>> (example: you have video and audio input) or more
>> than one kind of output (you're trying to predict a class label, and pose
>> information). From the perspective of writing
>> generic pylearn2 code, it makes the most sense for the training algorithm
>> code to just work on a single generic theano
>> variable called "data". To handle this efficiently, data will need to be a
>> tuple / nested tuple.
>
>
> What operation do you want to do on the tuple? Only to make them and access
> them like in Python? If you want more operation, someone will need to
> reimplement all the operation wanted on them.

The main things I want theano to support are:
tuple key to function's updates dictionary
tuple key to function's givens dictionary
tuple in wrt and known_grads of T.grad
create tuple
constant index access of tuple
less important: symbolic index access of tuple

This *enables* end users to do very useful things with the tuple, even if theano
does nothing more.

>
>>
>> The problem is, all of the code acting on such a variable has to have a
>> branch that takes different actions depending on
>> whether data is a variable or a tuple, because tuples don't have owner,
>> ancestors, name, etc. It's also not possible to
>> substitute in a value for a tuple using the givens dictionary.
>
>
> Why not just always put the theano variable in a tuple of length one?

Mostly because right now our special case code is to just give up on
doing something useful
if the variable is a tuple. Tuples don't have a name field, for example.

Also because it would be hideously inconvenient to write absolutely
everything that way,
especially compared to modifying a handful of theano functions once
and never worrying
about it again.


> We do
> that in Theano at a few places to simplify the code. You only add two lines,

It's not 2 lines, it's NK lines, where N is the number of features we
ever write in pylearn2,
and K is the number of lines it takes to implement whatever feature of
Type / Variable is
missing from the raw tuple that we wanted to use.

> then you know for sure it is a tuple. Doing a new Variable/Type in Theano if
> this work around work well is not a good enough reason for the work needed I
> think. Is there reason why this "fix" wouldn't work for pylearn2?
>
>> There's a lot of existing pylearn2 code that could also be greatly
>> simplified by being able to regard multiple theano
>> variables as one variable. For example, if we could view multiple shared
>> variables as just being one "params"
>> a lot of the code for stuff like conjugate gradient descent would have way
>> map/reduce operations that need to be
>> written out explicitly.
>
>
> Sorry, I don't understand the last sentence: "would have way map/reduce
> operations that need to be
> written out explicitly."?

For example, a dot product over a tuple of tensors is done by mapping elemwise
product and summation followed by a reduction with +:

def dot(tuple_A, tuple_B):
return sum((A*B).sum() for A, B in safe_zip(tuple_A, tuple_B))

Doing things like conjugate gradient where you take the dot product
over an unknown
amount of variables requires constantly packing things into tuples for
calling this dot
product function, and then unpacking / flattening them to tell theano
how to do the updates
or take gradients.


>
>>
>> Using a TupleVariable to represent (real, imaginary) would also be an easy
>> way to get complex number support on GPU,
>> provided that you wrote the right helper functions to do complex math on
>> it using real math ops as primitives.
>
>
> I don't think this is the right approach. The new gpu ndarray have them
> working partly. In all case, we shouldn't re implement that part, but reuse
> what CUDA or OpenCL provide and if that isn't enough, reuse the NumPy C
> implementation that they have done. From memory this implementation don't
> depend on python.
>
> Fred
>

Frédéric Bastien

unread,
Apr 23, 2013, 10:40:11 AM4/23/13
to theano-dev
On Tue, Apr 23, 2013 at 10:18 AM, Ian Goodfellow <goodfel...@gmail.com> wrote:
On Tue, Apr 23, 2013 at 9:18 AM, Frédéric Bastien <no...@nouiz.org> wrote:
[...]
> On Mon, Apr 22, 2013 at 10:48 PM, Ian Goodfellow <goodfel...@gmail.com>
> wrote:
>>
>> Quite a lot of use cases, really.
>>
>> In pylearn2, it's pretty common to allow variables passed between methods
>> to be either tuples or theano variables.
>> We use this to represent things like (detector_layer, pooling_layer) of a
>> convolutional network.
>>
>> We're also trying to move pylearn2 beyond the typical setup of having
>> input features X and optionally output targets y.
>> We want to move to setups like having more than one kind of input
>> (example: you have video and audio input) or more
>> than one kind of output (you're trying to predict a class label, and pose
>> information). From the perspective of writing
>> generic pylearn2 code, it makes the most sense for the training algorithm
>> code to just work on a single generic theano
>> variable called "data". To handle this efficiently, data will need to be a
>> tuple / nested tuple.
>
>
> What operation do you want to do on the tuple? Only to make them and access
> them like in Python? If you want more operation, someone will need to
> reimplement all the operation wanted on them.

The main things I want theano to support are:
    tuple key to function's updates dictionary
    tuple key to function's givens dictionary

No need for a new Type/Variable in Theano. Just change the code that deal with the updates/givens parameter to accept tuple.
 
    tuple in wrt and known_grads of T.grad

You know this code more then me, but I suppose it is the same as for updates/givens.
 
    create tuple
    constant index access of tuple

Why do you need that if we modify the code that deal with the updates/givens parameters to accept tuple?
 
    less important: symbolic index access of tuple

This *enables* end users to do very useful things with the tuple, even if theano
does nothing more.

What it enable more then my proposal? We shouldn't take lightly making a new Type/Variable as this add complexity at many places in Theano.


>> The problem is, all of the code acting on such a variable has to have a
>> branch that takes different actions depending on
>> whether data is a variable or a tuple, because tuples don't have owner,
>> ancestors, name, etc. It's also not possible to
>> substitute in a value for a tuple using the givens dictionary.
>
>
> Why not just always put the theano variable in a tuple of length one?

Mostly because right now our special case code is to just give up on
doing something useful
if the variable is a tuple. Tuples don't have a name field, for example.

Can you give an example? In Theano, we just do the same thing on all elements in the tuple.
 
Also because it would be hideously inconvenient to write absolutely
everything that way,
especially compared to modifying a handful of theano functions once
and never worrying
about it again.

If you need a new Type/Variable (I'm not yet convinced of that), you need more then modify a handful of Theano function. But my proposal do that.
 
> We do
> that in Theano at a few places to simplify the code. You only add two lines,

It's not 2 lines, it's NK lines, where N is the number of features we
ever write in pylearn2,
and K is the number of lines it takes to implement whatever feature of
Type / Variable is
missing from the raw tuple that we wanted to use.

In my case if N*2 where N is small. So no problem.
 
> then you know for sure it is a tuple. Doing a new Variable/Type in Theano if
> this work around work well is not a good enough reason for the work needed I
> think. Is there reason why this "fix" wouldn't work for pylearn2?
>
>> There's a lot of existing pylearn2 code that could also be greatly
>> simplified by being able to regard multiple theano
>> variables as one variable. For example, if we could view multiple shared
>> variables as just being one "params"
>> a lot of the code for stuff like conjugate gradient descent would have way
>> map/reduce operations that need to be
>> written out explicitly.
>
>
> Sorry, I don't understand the last sentence: "would have way map/reduce
> operations that need to be
> written out explicitly."?

For example, a dot product over a tuple of tensors is done by mapping elemwise
product and summation followed by a reduction with +:

def dot(tuple_A, tuple_B):
    return sum((A*B).sum() for A, B in safe_zip(tuple_A, tuple_B))

Doing things like conjugate gradient where you take the dot product
over an unknown
amount of variables requires constantly packing things into tuples for
calling this dot
product function, and then unpacking / flattening them to tell theano
how to do the updates
or take gradients.

Here is a more user friendly dot function.


def dot(orig_A, orig_B):
    tuple_A=A
    tuple_B=B
    if not isinstance(orig_A, tuple):
        tuple_A = (orig_A,)
    if not isinstance(orig_B, tuple):
        tuple_B = (orig_B,)
    ret = sum((A*B).sum() for A, B in safe_zip(tuple_A, tuple_B))
    if orig_A is tuple_A or orig_B is tuple_B:
       return ret
    else:
       return ret[0]


This function will take tuple or not tuple as input and return a tuple only if one of the inputs is a tuple, otherwise, it return the direct value.

But why do you need that dot function? It will probably be slower then call tensor.dot(A,B). Or is this just an dummy example?

Fred

Razvan Pascanu

unread,
Apr 23, 2013, 10:56:28 AM4/23/13
to thean...@googlegroups.com
On Tue, Apr 23, 2013 at 10:18 AM, Ian Goodfellow <goodfel...@gmail.com> wrote:
On Tue, Apr 23, 2013 at 9:18 AM, Frédéric Bastien <no...@nouiz.org> wrote:
> Why tuple and not list?

I'm thinking of each TupleVariable as being indexable at compile-time using
constant indices to get the individual Variables back out.

This is strange for Theano. It kind of breaks what a Theano variable means ..
A theano variable is a symbolic variable.
For e.g.:
   T= theanoTuple(X,Y)
    assert T[0] == X 

I think you want that assertion to hold. This is not possible, as T[0] is a symbolic expression
that tells you how to get the first element of X, is not X. The same way as if you do

  a, b = TT.scalar()
  v = TT.stack(a,b)
  assert v[0] == a

This assertion will also fail.
 
The container is thus
an important part of the compile-time Variable not just the value it
takes at runtime.
For that to be hashable / immutable, it needs to be a tuple and not a list.

We could also have a variable that represents something that is a list
at runtime,
and can only be interacted with symbolically at runtime, but that has
very different
uses.

That however would be a proper Theano type. We should be careful to maintain semantics when we add things in Theano, otherwise people who are already complaining that they do not understand how Theano is implemented will have a lot more to complain about.

 
I'm not saying a tuple would not be nice to have in Theano, but :
 (a) I think you underestimate how much work will be needed on the Theano side to have this working properly including documentation and tutorial and all
 (b) there are other things that are more urgent for Theano and I think the lack of a tuple type is not blocking development in pylearn2
so IMHO there are better things to focus on for Theano.
This however should not stop you from implementing a TupleType.. My points (a,b) are more that I wouldn't want to see someone like Fred or Pascal working on this rather than other opened issues or going through PRs.

Pascal Lamblin

unread,
Apr 23, 2013, 11:04:22 AM4/23/13
to thean...@googlegroups.com
On Mon, Apr 22, 2013, Ian Goodfellow wrote:
> I think it could be useful to have a TupleVariable / TupleType in theano.
> There are a few subtleties involved in this though.
>
> 3) C code
>
> Ideally we'd like to have theano functions compile down to pure C code, but
> now we have a python object as the runtime value of a theano variable. Is
> this a problem? We can just use the python C api, right?
>
> Does anyone foresee any other subtleties?

The way I see it, Theano defines strongly-typed variables. So I don't
think having a Python-like tuple would really help us much. What we
should have is C++-like tuples, specialized with the number of elements
and Type of each one.

For instance, I think if we want to be able to write things like
(params, params - lr * grad(cost, params)) for update rules, that would
be needed to make sure that the multiplication and substraction make
sense for the tuples in question, and to make sure that the update
expression has the same Type as the variable.


That being said, as Fred mentionned, using these types directly in C
code would mean having to reimplement lots of Ops (or extend current
implementation) to take this new Type into account.

Another way may be a Python structure representing a typed tuple of
Variables. It would be possible to define operations on that structure,
like addition and so on, that would create new Theano variables and
pack them into other such tuples. theano.function could be extended (or
wrapped) to accept these tuples as inputs, outputs, and in update and
givens rules. That way, the underlying graph does not contain tuples,
the optimizations and Op implementations do not have to be changed, but
we can have a nicer interface to deal with tuples.

I'm not sure it would actually make possible everything you have in
mind, though, and it may still be a lot of work.

--
Pascal

Ian Goodfellow

unread,
Apr 23, 2013, 11:38:08 AM4/23/13
to thean...@googlegroups.com
On Tue, Apr 23, 2013 at 11:04 AM, Pascal Lamblin
<lamb...@iro.umontreal.ca> wrote:
> On Mon, Apr 22, 2013, Ian Goodfellow wrote:
>> I think it could be useful to have a TupleVariable / TupleType in theano.
>> There are a few subtleties involved in this though.
>>
>> 3) C code
>>
>> Ideally we'd like to have theano functions compile down to pure C code, but
>> now we have a python object as the runtime value of a theano variable. Is
>> this a problem? We can just use the python C api, right?
>>
>> Does anyone foresee any other subtleties?
>
> The way I see it, Theano defines strongly-typed variables. So I don't
> think having a Python-like tuple would really help us much. What we
> should have is C++-like tuples, specialized with the number of elements
> and Type of each one.

Yes, this is what I want.


>
> For instance, I think if we want to be able to write things like
> (params, params - lr * grad(cost, params)) for update rules, that would
> be needed to make sure that the multiplication and substraction make
> sense for the tuples in question, and to make sure that the update
> expression has the same Type as the variable.
>
>
> That being said, as Fred mentionned, using these types directly in C
> code would mean having to reimplement lots of Ops (or extend current
> implementation) to take this new Type into account.

With the stuff I've been talking about, the only Op that would see a tuple
at runtime would be the symbolic tuple accessor.

>
> Another way may be a Python structure representing a typed tuple of
> Variables.
> It would be possible to define operations on that structure,
> like addition and so on, that would create new Theano variables and
> pack them into other such tuples. theano.function could be extended (or
> wrapped) to accept these tuples as inputs, outputs, and in update and
> givens rules. That way, the underlying graph does not contain tuples,
> the optimizations and Op implementations do not have to be changed, but
> we can have a nicer interface to deal with tuples.
>
> I'm not sure it would actually make possible everything you have in
> mind, though, and it may still be a lot of work.
>
> --
> Pascal
>

Ian Goodfellow

unread,
Apr 23, 2013, 11:51:41 AM4/23/13
to thean...@googlegroups.com
Your sentence doesn't make any sense, but let me remind you
that N is the number of things that we write in pylearn2, so if you
insist that N be small you are saying to quit developing pylearn2.
The whole point of pylearn2 is that N is meant to large, that you can
rapidly prototype new models. N is usually 3 or 4 per day for me.

Also, where do you get K=2 from?

In the following example, where all we care about is the name, I get that K=10.

Suppose we want to do something like
y = f(x)
x_name = x.name
if x_name is None:
x_name = 'x'
y.name = 'f(' + x_name + ')'

If we had a TupleVariable the above code would just work.

If x is allowed to be a tuple and not a named TupleVariable we have to
do something like

y = f(x)
def make_name(var, i):
if var.name is None:
return 'x[%d]' % i
return var.name

if isinstance(x, tuple):
whole_name = '{' + ','.join(make_name(component) for component in x) + '}'
for i, elem in enumerate(y):
y[i].name = 'f(' + whole_name + ')[%d' % i
else:
x_name = x.name
if x_name is None:
x_name = 'x'
y.name = 'f(' + x_name + ')'

This is even the simple case, where I've assumed that if x is a tuple,
then so is y.
K would need to be even higher if f can output a single theano
variable given a tuple.
That doesn't have anything to do with the problem I'm interested in solving.
The problem isn't internal to the dot function. The dot function wants a tuple,
theano wants variables everwhere.


>
> But why do you need that dot function? It will probably be slower then call
> tensor.dot(A,B). Or is this just an dummy example?

It doesn't do the same thing as tensor.dot(A, B), so they're not comparable.
tensor.dot(A, B) does matrix-matrix multiplication.
The dot example I gave her for conjugate gradient does vector-vector dot
product, where the vectors have been split into chunks and reshaped because
that's how it's convenient to write the model.
For example, the model might want to have a 4D stack of convolution kernels
and a 3D map of biases. But conjugate gradient wants to have just a vector
representing all the parameters, and be able to take dot products with it.

Read pylearn2.training_algorithms.bgd and look at the wikipedia link it gives
as a reference if you still don't understand this use case.

Ian Goodfellow

unread,
Apr 23, 2013, 11:53:42 AM4/23/13
to thean...@googlegroups.com
Have you figured out how to make the Monitor work for your pylearn2 PR?
I was working on that last night and I don't see a very clean way of
doing it without this.


> so IMHO there are better things to focus on for Theano.
> This however should not stop you from implementing a TupleType.. My points
> (a,b) are more that I wouldn't want to see someone like Fred or Pascal
> working on this rather than other opened issues or going through PRs.
>

Ian Goodfellow

unread,
Apr 23, 2013, 11:56:53 AM4/23/13
to thean...@googlegroups.com
I did some work on prototyping the Variable and Type definitions
themselves in pylearn2.sandbox if that helps make it more clear how I
intend for these to work:

https://github.com/goodfeli/pylearn/commit/ed7cd16875157d2a83a3aad3135980035de6990b

On Tue, Apr 23, 2013 at 11:53 AM, Ian Goodfellow

Razvan Pascanu

unread,
Apr 23, 2013, 11:57:31 AM4/23/13
to thean...@googlegroups.com
I thought Pascal already fixed that. Actually with Pascal PR from last night, right now I'm trying to see if I can run things. I'll let you know if it breaks at the Monitor class.

To work with monitors, you always do a composite from the spaces that each new channel asks. When you want to construct a Theano function, you flatten that space and remove all duplicates.
You then have to switch between these two views (flatten and nested).

Pascal Lamblin

unread,
Apr 23, 2013, 12:13:33 PM4/23/13
to thean...@googlegroups.com
On Tue, Apr 23, 2013, Ian Goodfellow wrote:
> In the following example, where all we care about is the name, I get that K=10.
>
> Suppose we want to do something like
> y = f(x)
> x_name = x.name
> if x_name is None:
> x_name = 'x'
> y.name = 'f(' + x_name + ')'
>
> If we had a TupleVariable the above code would just work.

Instead of creating a TupleVariable, we could also have helper functions
to handle names, for instance:

y = f(x)
x_name = get_name(x, default='x')
set_name(y, 'f(' + x_name + ')')

That way, the code below (or more complex version) only has to be
written once.

Idem for the nesting/flattening of tuples I'm still working on for
monitoring channels and composite spaces.
--
Pascal

Ian Goodfellow

unread,
Apr 23, 2013, 12:37:51 PM4/23/13
to thean...@googlegroups.com
Part of the issue with the names is that they're unwieldy in that setup.
It would be nice to be able to just name the tuple.

Pascal Lamblin

unread,
Apr 23, 2013, 1:51:37 PM4/23/13
to thean...@googlegroups.com
On Tue, Apr 23, 2013, Ian Goodfellow wrote:
> Part of the issue with the names is that they're unwieldy in that setup.
> It would be nice to be able to just name the tuple.

OK.

After some more thoughts, the main reason why I think it would be better
not to have a TupleType, but a structure analogous to a Theano Variable
holding a tuple of other Variables has to do with shared variables and
inputs to functions.

For instance, let's take the case of a model with two parameters, W and
b, that we want to optimize.

One way of using an actual TupleVariable would be to define params as a
shared tuple variable, holding the value (W_val, b_val). That way, we
can use the existing update mechanism of theano.function, specifying for
instance (params, new_params), or (params, make_tuple(new_W, new_b)).
However, in that case, W and b do not exist as shared tensor variables,
although we could define W = params[0] and b = params[1], in that case
they would be the output of a graph where params is the input. That
would make it impossible to specify update rules like (W, another_new_W)
in a different function, for instance. That would be like specifying
update rules like (W.T, new_W_T) when W is the actual shared variable.

The other way would be to define W and b as the shared variables, and
params as make_tuple(W, b), but then params is not a shared variable
anymore, and the problem happens the other way around.

However, if params is merely a structure holding variables, then you
can use either that structure, or the underlying shared variable, and
theano.function (or a wrapper around it) knows that it has to extract
the underlying shared variables and use them to do the actual update.

That structure could still behave like a theano Variable in lots of ways
(having a name, for instance).

Does it make sense?
--
Pascal

Ian Goodfellow

unread,
Apr 23, 2013, 5:46:33 PM4/23/13
to thean...@googlegroups.com
I think

shared( my_tuple )

should return a TupleVariable whose elements are the result of calling
shared on the elements of my_tuple.
It's not actually a shared variable, it's just a regular TupleVariable
whose elements are shared.

The important thing thing is that it has a name and ancestors and
stuff like that, so you can find out that its ancestors
are the shared variables without having to special case your ancestor
crawling code.

theano.function would need to be modified to recognize that the keys
to the updates dictionary can be TupleVariables,
and preprocess them out (I've said so earlier).

Olivier Delalleau

unread,
Apr 23, 2013, 8:24:20 PM4/23/13
to thean...@googlegroups.com
I agree with Pascal, it sounds like what you want is not a new type, but a Variable container (possibly a subclass of tuple) that makes it more convenient to deal with lists of variables when defining a Theano graph.

-=- Olivier


2013/4/23 Ian Goodfellow <goodfel...@gmail.com>

Pascal Lamblin

unread,
Apr 23, 2013, 11:00:00 PM4/23/13
to thean...@googlegroups.com
On Tue, Apr 23, 2013, Ian Goodfellow wrote:
> I think
>
> shared( my_tuple )
>
> should return a TupleVariable whose elements are the result of calling
> shared on the elements of my_tuple.
> It's not actually a shared variable, it's just a regular TupleVariable
> whose elements are shared.

OK, thanks for the clarification.

> The important thing thing is that it has a name and ancestors and
> stuff like that, so you can find out that its ancestors
> are the shared variables without having to special case your ancestor
> crawling code.

> theano.function would need to be modified to recognize that the keys
> to the updates dictionary can be TupleVariables,
> and preprocess them out (I've said so earlier).

That's the part where I'm skeptical. If we do that, then why couldn't
we use MakeVector in the update rules, and use Split() on the update
expression to dispatch the values to the ancestors of MakeVector? And
then, why not accept update rules like (m.T, dC_dmT), and automatically
recognize that we should transpose the update instead?

I think the semantics are simpler if we do not allow (in updates)
Variables that are the output of an Op. Otherwise, the potential for
confusion and errors that are hard to debug is really high. As I said,
I still think that allowing other data structures containing Variables
would essentially accomplish the same, with less confusion.

--
Pascal

Frédéric Bastien

unread,
Apr 24, 2013, 10:55:22 AM4/24/13
to theano-dev
I also think that PL suggestion  would cause less problem/confusion in Theano.

Ian, do you think it fix your problem? If not why?

Fred



--
Pascal

James Bergstra

unread,
Apr 25, 2013, 9:10:20 AM4/25/13
to thean...@googlegroups.com
+1 for Pascal's suggestion. I have often wanted to have function return a dictionary of variables too, and that seems related. I'm imagining that we might want to introduce tuple, list, and dict subclasses as a convenience for e.g.

* bulk operations: "add all these things to all those ones" (where "these" and "those" are corresponding structures of variables)
* specifying updates (and givens, I guess)
* specifying outputs to functions

It wouldn't require changing optimizations or ops, but it would make the user's life a little easier.
Does this address Ian's feature wish though?

Ian Goodfellow

unread,
May 9, 2013, 10:15:07 AM5/9/13
to thean...@googlegroups.com
If there's no name and no ancestors it doesn't address my feature wish, no.

Pascal Lamblin

unread,
May 9, 2013, 2:29:25 PM5/9/13
to thean...@googlegroups.com
On Thu, May 09, 2013, Ian Goodfellow wrote:
> If there's no name and no ancestors it doesn't address my feature wish, no.

Having a name is not a problem, having ancestors for a container is not
a problem either.

The only thing is that no variable in the graph the graph would have the
container as an ancestor, the original variables would be the ancestors.
Would that be a problem for you?
Pascal

Ian Goodfellow

unread,
May 10, 2013, 10:18:17 AM5/10/13
to thean...@googlegroups.com
On thinking about it more, this issue is maybe more complicated than I
originally realized.
If we write a generic function f(x,y) and we want to know whether y is
a function of x, then I think it does have to be special cased for
tuples.
If x were allowed to be a tuple, then x might not be an ancestor of y,
but some element of x might be.
I feel like maybe a bigger redesign is needed someday, to capture
relationships like "function of" and aliasing, that ancestry doesn't
really cover.
It's similar to the issue where taking the gradient with respect to a
subtensor doesn't do what you want.
Reply all
Reply to author
Forward
0 new messages