Overloading Knode

160 views
Skip to first unread message

Thomas Wiecki

unread,
Sep 13, 2012, 6:41:18 PM9/13/12
to hddm-...@googlegroups.com, Matzke, Dora
Hi Guido and Dora,

A couple of times very recently it came up that more control over the way the individual pymc nodes are created is required. I did a small change to kabuki now that should make this easier (git kabuki develop branch is required). Moreover, I thought I gave a deeper explanation to help with those endeavors.

In general, Knodes are the descriptors that specify how new pymc nodes can be constructed. Knodes have code to parcel the data appropriately (among other things) and ultimately calls create_node to create the actual pymc node. The function looks like this now:
    def create_node(self, node_name, kwargs, data):
        #actually create the node
        return self.pymc_node(name=node_name, **kwargs)
node_name is just a string, e.g. "v_g.cond1". self.pymc_node is the distribution type that is specified when creating the Knode (e.g. pymc.Normal) and kwargs contains the parameters such as the parents (e.g. mu and tau) and the data in case the node is an observed node (kwarg named value).

The data argument is new to allow more flexibility. It is pandas dataframe (very similar to a numpy array) that contains all columns of the data but only that chunk of the data this node depends on.

So if your pymc node requires input that is some transformation of the actual data (as in Dora's case) you could do some mangling on data and then replace kwargs['value'] with the actual data:
class KnodeInhib(Knode):
    def create_node(self, node_name, kwargs, data):
        SSDs = np.unique(data['SSDs'])
        kwargs['value'] = SSDs
        return self.pymc_node(name=node_name, **kwargs)
In Guido's case you want to check which condition the z node depends on and then replace it with 1-z. This could look as follows:
class KnodeZInv(Knode):
    def create_node(self, node_name, kwargs, data):
        if data['cond'] == 1:
            return 1-self.pymc_node(node_name, **kwargs)
        else:
            return self.pymc_node(node_name, **kwargs)
This is not ideal to have to jump through these hoops but currently its the best we can do. 

You would then use this new Knode when specifying that variable of your model.

Thomas

guido biele

unread,
Sep 14, 2012, 2:22:54 AM9/14/12
to hddm-...@googlegroups.com
that's very helpful!
thanks a lot!
guido
--
Sent from my Android phone with K-9 Mail. Please excuse my brevity.

Matzke, Dora

unread,
Sep 15, 2012, 8:42:42 AM9/15/12
to hddm-...@googlegroups.com
Thanks a lot Thomas!
dora

From: hddm-...@googlegroups.com [hddm-...@googlegroups.com] on behalf of guido biele [g.p....@psykologi.uio.no]
Sent: Friday, September 14, 2012 8:22 AM
To: hddm-...@googlegroups.com
Subject: Re: [hddm-users] Overloading Knode

Guido Biele

unread,
Sep 17, 2012, 11:15:29 AM9/17/12
to hddm-...@googlegroups.com, Matzke, Dora
Hi Thomas,

as far as we can understand the approach you suggested was to change the node for the z of the second condition to be a deterministic nodes, who's value is governed from the z of the first condition and the evaluation function.

This involves:

- making the node for the 2nd condition a class 'pymc.PyMCObjects.Deterministic'
- this class should have as a parent the node with z for the first condition
- this class should have a evaluation function y = 1-z

The question then is, where one makes the node a pymc.PyMCObjects.Deterministic class:
- when defining knodes?
- when creating pymc nodes?
From looking at the code it seems that it is for now easier to do this when pymc nodes are created (though it might be more elegant to do it when knodes are defined, which is whta you suggested in an earlier email  ...*)

Then, from what we could glean from your code,
node = 1-self.pymc_node(node_name, **kwargs)
should make a node pymc.PyMCObjects.Deterministic class when pymc objects re created.

what remains to be done then would be to
- add the right parents, which would be the z of the first condition (and not the parameters governing the group distribution ...).
- to make sure we get have the right evaluation function (I'm not sure if the above has also already generated the correct evaluation function.)
- to remove the parent nodes governing the group distribution for the z for the second condition.

what do you think?

cheers - oystein & guido

*the problem here seems to be to that the code for specifying knodes does not seem to have a variable indicating for which condition a knode is specified (which is fine for hddm in general, but seems to be an obstacle in our specific case).

Thomas Wiecki

unread,
Sep 17, 2012, 11:23:48 AM9/17/12
to hddm-...@googlegroups.com
Hi Guido,

On Mon, Sep 17, 2012 at 11:15 AM, Guido Biele <g.p....@psykologi.uio.no> wrote:
Hi Thomas,

as far as we can understand the approach you suggested was to change the node for the z of the second condition to be a deterministic nodes, who's value is governed from the z of the first condition and the evaluation function.

This involves:

- making the node for the 2nd condition a class 'pymc.PyMCObjects.Deterministic'
- this class should have as a parent the node with z for the first condition
- this class should have a evaluation function y = 1-z

The question then is, where one makes the node a pymc.PyMCObjects.Deterministic class:
- when defining knodes?
- when creating pymc nodes?
From looking at the code it seems that it is for now easier to do this when pymc nodes are created (though it might be more elegant to do it when knodes are defined, which is whta you suggested in an earlier email  ...*)

That sounds good. 

Then, from what we could glean from your code,
node = 1-self.pymc_node(node_name, **kwargs)
should make a node pymc.PyMCObjects.Deterministic class when pymc objects re created.

what remains to be done then would be to
- add the right parents, which would be the z of the first condition (and not the parameters governing the group distribution ...).
- to make sure we get have the right evaluation function (I'm not sure if the above has also already generated the correct evaluation function.)
- to remove the parent nodes governing the group distribution for the z for the second condition.

I think all of this is already taken care of when using the approach I outlined. 1-self.pymc_node() will create the stochastic pymc-node and then create a determinstic that computes the 1-x (this is pymc functionality).

You don't need to worry about the different conditions and deleting group variables as there will only be one z-group node created. You don't have to put z into depends_on with this approach. 

Let me know if you have more questions.

Thomas

guido biele

unread,
Sep 17, 2012, 11:50:17 AM9/17/12
to hddm-...@googlegroups.com
hi Thomas,
I think what I don't understand is how in the approch you outlined pymc/hddm gets the info that there are two observed nodes for each condition and that z is the parent for one obseved node and the determinstic node derived from z is the parent for the other observed node.

cheers-guido

Thomas Wiecki

unread,
Sep 18, 2012, 11:34:42 AM9/18/12
to hddm-...@googlegroups.com
On Mon, Sep 17, 2012 at 11:50 AM, guido biele <g.p....@psykologi.uio.no> wrote:
>
> hi Thomas,
> I think what I don't understand is how in the approch you outlined pymc/hddm gets the info that there are two observed nodes for each condition and that z is the parent for one obseved node and the determinstic node derived from z is the parent for the other observed node.


Yes, unfortunately you are correct. It's way tricky than I originally thought.

I think the following would work however.

Rather than creating separate z and z_inv depending on the data you could just transform the z node when the wfpt likelihood gets created. For this you would have to do the same trick of inheriting from Knode (this time for wfpt) and then have the wfpt node depend on your stim column:
class KnodeWfptZInv(Knode):
    def create_node(self, name, kwargs, data):
        if data['stim'] == 1:
            z = copy(kwargs['z'])
            kwargs['z'] = 1-z
            return self.pymc_node(name, **kwargs)
        else:
            return self.pymc_node(name, **kwargs)
Then when you create the knode (in create_wfpt_knode(), you would inherit a new HDDM class form HDDMBase and overwrite this method), replace the last line with something like:
wfpt = KnodeWfptZInv(self.wfpt_class, 'wfpt', observed=True, col_name='rt', depends='stim', **wfpt_parents)
Note the new depends flag that will create separate wfpt nodes.

Does that make sense?

Thomas

Guido Biele

unread,
Sep 19, 2012, 6:43:37 AM9/19/12
to hddm-...@googlegroups.com
Hi Thomas,

Thanks for your effort!!
From looking at the code It seems to me that his should do the job.

I'll just paraphrase the general approach to make sure that we have the same understanding.

1) When when calling hddm to generate the model, no depends on will be invoked for z.
2) Instead, the create_wfpt_knode method in HDDMbase is overloaded, such that the creation of observed nodes dependent on z is hard coded.
3) To implement the manipulation of z, in addition the create_node method in kabuki.hierarchical is overloaded such that z changes its value dependent on the stimulus

I like this approach a lot, because one can easily see how it can be extended to other manipulations (e.g. one could flip the sign of the drift rate dependent on stimulus when the data are response coded). I also like the approach because it seems to me that it works in combination with a depends_on flag*. Hence, one could also implement a model where z depends on one experimental manipulation (like on/off drugs) and in addition the direction of z for accuracy coded data depends on the stimulus type.

Do you agree?

I'll test the code in the next days

Cheers - guido

PS: You are helping more than I would have expected, and I'll make sure to write a how to from this once we have successfully tested it.


* It wasn't easy to trace the order in which the methods defined in kabuki and hddm are used. Here is what I think to understand so far
1) The model creation is started by calling hddm.HDDM, which inherits all of hddm.HDDMBase, which inherits all of hddm. AccumulatorModel, which inherits all of kabuki.Hierarchical
2) kabuki.Hierarchical checks if the model is a group model, and reorganizes the definition of dependencies into a dictionary "depends" and calls create_knodes to generate the Knodes
3) create_knodes is overloaded in hddm.HDDM and will create both non-observed and observed knodes.
it is at this stage that the modification to create_wfpt_knode you proposed becomes effective, because it is called by create_knodes. If I understand it correctly, actually 2 wfpt nodes will be created here (while typically only one knode would be created).
4) kabuki.Hierarchical takes the knodes, adds data, and constructs pymc nodes with kabuki. create_model
5) kabuki.create_model calls calls Knode.create, which creates pymc nodes
here the second change (to create_node) will apply. Knode.create will ignore the dependency on z based on stimuli because it is not included in the online model-specification, but when it loops through the grouped data, it will find 2 wfpt knodes, and adjust them so that for one for the the parent is a deterministic object derived from the one parent z.

hm, I hope this captures the gist of it.
I still have to find a good IDE for python which allows me to set break point so that I can examine objects at different processing stages...



--

 Guido Biele 
 Email: g.p....@psykologi.uio.no 
 Phone: +47 228 45172 
 Website 

 Visiting Address 
 Psykologisk Institutt  
 Forskningsveien 3 A 
 0373 OSLO 
  Mailing Address 
  Psykologisk Institutt 
  Postboks 1094 
  Blindern 0317 OSLO

Thomas Wiecki

unread,
Sep 19, 2012, 8:31:27 AM9/19/12
to hddm-...@googlegroups.com
Hi Guido,

On Wed, Sep 19, 2012 at 6:43 AM, Guido Biele <g.p....@psykologi.uio.no> wrote:
Hi Thomas,

Thanks for your effort!!
From looking at the code It seems to me that his should do the job.

I'm glad :).

I'll just paraphrase the general approach to make sure that we have the same understanding.

1) When when calling hddm to generate the model, no depends on will be invoked for z.
2) Instead, the create_wfpt_knode method in HDDMbase is overloaded, such that the creation of observed nodes dependent on z is hard coded.
3) To implement the manipulation of z, in addition the create_node method in kabuki.hierarchical is overloaded such that z changes its value dependent on the stimulus

That's right, just to make sure: I think you meant to write kabuki.hierarchical.Knode.
Also, z is never touched and z_inv will not be explicitly registered in Hierarchical (i.e. it has no own knode; it will just exist in the the pymc-model graph) but instead be plugged on directly before wfpt is created. It is also wfpt which gets the depends_on.

I like this approach a lot, because one can easily see how it can be extended to other manipulations (e.g. one could flip the sign of the drift rate dependent on stimulus when the data are response coded). I also like the approach because it seems to me that it works in combination with a depends_on flag*. Hence, one could also implement a model where z depends on one experimental manipulation (like on/off drugs) and in addition the direction of z for accuracy coded data depends on the stimulus type.

That's correct. As there can be multiple depends on for each knode z could depend on another data column (which will then progress to wfpt also depending on e.g. on/off in addition to wfpt depending on stim type). 

Do you agree?

I'll test the code in the next days

Cheers - guido

PS: You are helping more than I would have expected, and I'll make sure to write a how to from this once we have successfully tested it.

That's always helpful. The model in itself will also be useful. It's also good for me to think about the limitations of our current design. I think this method will work out and it's not totally unreasonable -- but it's not great either...
 
* It wasn't easy to trace the order in which the methods defined in kabuki and hddm are used. Here is what I think to understand so far
1) The model creation is started by calling hddm.HDDM, which inherits all of hddm.HDDMBase, which inherits all of hddm. AccumulatorModel, which inherits all of kabuki.Hierarchical
2) kabuki.Hierarchical checks if the model is a group model, and reorganizes the definition of dependencies into a dictionary "depends" and calls create_knodes to generate the Knodes
3) create_knodes is overloaded in hddm.HDDM and will create both non-observed and observed knodes.
it is at this stage that the modification to create_wfpt_knode you proposed becomes effective, because it is called by create_knodes. If I understand it correctly, actually 2 wfpt nodes will be created here (while typically only one knode would be created).

It's important to distinct between (pymc) node and knode. Knodes can be responsible for creating multiple pymc nodes (e.g. when depends_on is used). You would still need to only create 1 wfpt knode (of your new KnodeWfptZInv type) which will then create (at least) 2 pymc nodes (create_node() wil get called twice, once for each dependent): one with parent 1-z and one with z directly as the parent (depending on the stim column).

4) kabuki.Hierarchical takes the knodes, adds data, and constructs pymc nodes with kabuki. create_model
5) kabuki.create_model calls calls Knode.create, which creates pymc nodes
here the second change (to create_node) will apply. Knode.create will ignore the dependency on z based on stimuli because it is not included in the online model-specification, but when it loops through the grouped data, it will find 2 wfpt knodes, and adjust them so that for one for the the parent is a deterministic object derived from the one parent z.

hm, I hope this captures the gist of it.

Yep, very well put. I might take some parts from this and put them into develop docs.

Guido Biele

unread,
Sep 19, 2012, 9:06:08 AM9/19/12
to hddm-...@googlegroups.com
thanks for the quick feedback and the clarification about the knode -
pymc node relationship!
(and yes, I meant to write kabuki.hierarchical.Knode)

cheers - guido

On Wed Sep 19 14:31:27 2012, Thomas Wiecki wrote:
> Hi Guido,
>
> On Wed, Sep 19, 2012 at 6:43 AM, Guido Biele
> <g.p....@psykologi.uio.no <mailto:g.p....@psykologi.uio.no>> wrote:
>
> Hi Thomas,
>
>> <mailto:g.p....@psykologi.uio.no>
>> <mailto:thomas...@gmail.com>
>> <mailto:thomas...@gmail.com>> wrote:
>> >>
>> >> Hi Guido,
>> >>
>> >> On Mon, Sep 17, 2012 at 11:15 AM, Guido Biele
>> <g.p....@psykologi.uio.no <mailto:g.p....@psykologi.uio.no>
>> <mailto:g.p....@psykologi.uio.no>
> ------------------------------------------------------------------------
>
> Guido Biele
> Email:g.p....@psykologi.uio.no <mailto:g.p....@psykologi.uio.no>
> Phone:+47 228 45172 <tel:%2B47%20228%2045172>
> Website <https://sites.google.com/a/neuro-cognition.org/guido/home>
>
> Visiting Address
> Psykologisk Institutt
> Forskningsveien 3 A
> 0373 OSLO
>
>
>
> Mailing Address
> Psykologisk Institutt
> Postboks 1094
> Blindern 0317 OSLO
>
>
>
>



--
------------------------------------------------------------------------

Guido Biele
Email: g.p....@psykologi.uio.no
Phone: +47 228 45172
Website <https://sites.google.com/a/neuro-cognition.org/guido/home>

Guido Biele

unread,
Sep 28, 2012, 7:20:44 AM9/28/12
to hddm-...@googlegroups.com
Hi Thomas,

I got a few steps further, which also made two things appear to which I'd like to here your opinion:

When i understand it correctly, your method to fit data where bias = z for one part of the data and bias = 1-z for the other part of the data (where the data-parts are defined by some condition) is to intervene at when pymc nodes are generate, which is in line 135 or kabuki.hierachical.Knode (development version of kabuki).

First the node is created:
node = self.create_node(node_name, kwargs, grouped_data)

And then it is appended to the node database: 
if node is not None:
self.nodes[uniq_elem] = node
    self.append_node_to_db(node, uniq_elem)

My first question: Should the output of the create_node command generally be one node, so that the next append operation can work? (I think the answer is yes, I just want to be sure)

My next question is if you intended to generate 1 or 2 node with the modified create_node method:
class KnodeWfptZInv(Knode): def create_node(self, name, kwargs, data): if data['stim'] == 1: z = copy(kwargs['z']) kwargs['z'] = 1-z return self.pymc_node(name, **kwargs) else: return self.pymc_node(name, **kwargs)

to me, it looks that this function would always only return one node. That would work if an observed node could be such that for one part of the data the parameter 'z' has the value z and for the other part of the data the parameter 'z' has the value 1-z. I'm not sure that this is possible in pymc. In any case: the code as above doesn't work because data['stim'] is an array and the other side of the equation is not (python returns the erro message: "ValueError: The truth value of an array with more than one element is ambiguous").

My final question is what you think about following modification to your approach to get this working:

- overload the create_node method such that when the to be created node is an observed node, we further split the data along the (for example) stimulus condition and create 2 pymc nodes.
- (my approach here would be to copy some of your code to get the modified z, but then I would use logical indicies (hope that works, but there are certainly other ways) to select only the relevant values for each node; see example below)
- overload the kabuki.hierarchical.create method such that it can append either one or 2 nodes to the node database, dependent on how many nodes were returned by the modified create_node method


Cheers - Guido


# pseudo-code for an updated create_nodes method #

class KnodeWfptZInv(Knode):
    def create_node(self, name, kwargs, data):
        if self.observed == True:
        kwargs_tmp = kwargs
        kwargs['values'] = kwargs['values'][data['stim']==1] # this is pseudo code ...
        node1 = self.pymc_node(name, **kwargs)
       
        kwargs = tmp_kwargs
        z = copy(kwargs['z'])
        kwargs['z'] = 1-z
        kwargs['values'] = kwargs['values'][data['stim']==2] # this is pseudo code ...
       
        return(node1,node2)
        else:
return self.pymc_node(name, **kwargs)

PS: in line 95 of https://github.com/hddm-devs/kabuki/blob/develop/kabuki/hierarchical.py it reads "#create all the knodes" should this be "#create all the nodes" ?

Thomas Wiecki

unread,
Sep 28, 2012, 7:59:08 AM9/28/12
to hddm-...@googlegroups.com
Hi Guido,


On Fri, Sep 28, 2012 at 7:20 AM, Guido Biele <g.p....@psykologi.uio.no> wrote:
> Hi Thomas,
>
> I got a few steps further, which also made two things appear to which I'd
> like to here your opinion:

Great!


> When i understand it correctly, your method to fit data where bias = z for
> one part of the data and bias = 1-z for the other part of the data (where
> the data-parts are defined by some condition) is to intervene at when pymc
> nodes are generate, which is in line 135 or kabuki.hierachical.Knode
> (development version of kabuki).
>
> First the node is created:
> node = self.create_node(node_name, kwargs, grouped_data)
>
> And then it is appended to the node database:
>
> if node is not None:
>     self.nodes[uniq_elem] = node
>     self.append_node_to_db(node, uniq_elem)
>
>
> My first question: Should the output of the create_node command generally be
> one node, so that the next append operation can work? (I think the answer is
> yes, I just want to be sure)

Yes I think so. Unless pandas dataframes are even more magical than I thought. 


> My next question is if you intended to generate 1 or 2 node with the
> modified create_node method:
> class KnodeWfptZInv(Knode): def create_node(self, name, kwargs, data): if
> data['stim'] == 1: z = copy(kwargs['z']) kwargs['z'] = 1-z return
> self.pymc_node(name, **kwargs) else: return self.pymc_node(name, **kwargs)
>
> to me, it looks that this function would always only return one node. That
> would work if an observed node could be such that for one part of the data
> the parameter 'z' has the value z and for the other part of the data the
> parameter 'z' has the value 1-z. I'm not sure that this is possible in pymc.
> In any case: the code as above doesn't work because data['stim'] is an array
> and the other side of the equation is not (python returns the erro message:
> "ValueError: The truth value of an array with more than one element is
> ambiguous").

Right, you need to check for all values, e.g.: if np.all(data['stim'] == 1)


> My final question is what you think about following modification to your
> approach to get this working:
>
> - overload the create_node method such that when the to be created node is
> an observed node, we further split the data along the (for example) stimulus
> condition and create 2 pymc nodes.
> - (my approach here would be to copy some of your code to get the modified
> z, but then I would use logical indicies (hope that works, but there are
> certainly other ways) to select only the relevant values for each node; see
> example below)
> - overload the kabuki.hierarchical.create method such that it can append
> either one or 2 nodes to the node database, dependent on how many nodes were
> returned by the modified create_node method

That is actually what I originally envisioned but abandoned because of the multiple nodes problem. I think the above would still be easier and cleaner integrate into the current architecture but you are of course free to give this a shot.

Yeah, that only one pymc node can be returned appears to be a limitation. We might put in extra logic to check whether a list gets returned (I think that if, this should be changed for the Knode base class). However, this could cause some unforseen problems when we later pass pymc nodes to it's children (e.g. v_subj to wfpt). In your case it might not matter as this never happens (wfpt has no children). 


> Cheers - Guido
>
>
> # pseudo-code for an updated create_nodes method #
>
> class KnodeWfptZInv(Knode):
>     def create_node(self, name, kwargs, data):
>         if self.observed == True:
>         kwargs_tmp = kwargs
>         kwargs['values'] = kwargs['values'][data['stim']==1] # this is
> pseudo code ...
>         node1 = self.pymc_node(name, **kwargs)
>        
>         kwargs = tmp_kwargs
>         z = copy(kwargs['z'])
>         kwargs['z'] = 1-z
>         kwargs['values'] = kwargs['values'][data['stim']==2] # this is
> pseudo code ...
>        
>         return(node1,node2)
>         else:
> return self.pymc_node(name, **kwargs)
>
> PS: in line 95 of
> https://github.com/hddm-devs/kabuki/blob/develop/kabuki/hierarchical.py it
> reads "#create all the knodes" should this be "#create all the nodes" ?

Indeed, thanks. 

Guido Biele

unread,
Sep 28, 2012, 8:25:27 AM9/28/12
to hddm-...@googlegroups.com
Hi Thomas,

thanks for the quick reply!
a quick follow up to make sure that I understand.

I think the command  np.all(data['stim'] == 1) would check if all values in the column stim are 1 and return False otherwise. Because the general approach we discussed earlier should not to use the depends_on flag with z, the resultof this evaluation would always be False, so I think this would not solve the task. Or am I missing something?

I think the only way to avoid creating an additional node is if pymc can be set up so that the likelihood for some values in an observed node can be calculated with one parameter (e.g. z), and the likelihood for other values in the same node can be calculated with a different parameter (e.g. z' = 1-z). Do you know if pymc has this functionality? (i doubt that, but I know little about pymc). I think if this functionality does not exist, there is really no way around creating an additional observed node.

Finally, if I end up using the approach with an additional node, I will have to give it its own name, too. do you have a naming convention you would like me to use for this?

cheers - guido

PS: Does creating an additional observed node create a problem for the plot_posterior_predictive function (e.g. such that only one of the observed  nodes would be plotted.)?



--

 Guido Biele 
 Email: g.p....@psykologi.uio.no 
 Phone: +47 228 45172 
 Website 

Thomas Wiecki

unread,
Sep 28, 2012, 9:08:15 AM9/28/12
to hddm-...@googlegroups.com
On Fri, Sep 28, 2012 at 8:25 AM, Guido Biele <g.p....@psykologi.uio.no> wrote:
Hi Thomas,

thanks for the quick reply!
a quick follow up to make sure that I understand.

I think the command  np.all(data['stim'] == 1) would check if all values in the column stim are 1 and return False otherwise. Because the general approach we discussed earlier should not to use the depends_on flag with z, the resultof this evaluation would always be False, so I think this would not solve the task. Or am I missing something?

Right, what I proposed was for the depends_on option. In that case, data['stim'] will all have the same value because the depends will cause kabuki to parcel the data in exactly that way. Then you create knode with 1-z for that part of the data where stim==1 and z where stim==0.

I think the only way to avoid creating an additional node is if pymc can be set up so that the likelihood for some values in an observed node can be calculated with one parameter (e.g. z), and the likelihood for other values in the same node can be calculated with a different parameter (e.g. z' = 1-z). Do you know if pymc has this functionality? (i doubt that, but I know little about pymc). I think if this functionality does not exist, there is really no way around creating an additional observed node.

You could create a new likelihood that passes 1-z if stim ==1 and z otherwise. 

Finally, if I end up using the approach with an additional node, I will have to give it its own name, too. do you have a naming convention you would like me to use for this?

The depends_on option outlined a couple of emails ago is your safest and quickest bet I think. You don't need to create an additional knode for this (but replace the wfpt Knode with your KnodeInvZ) but exploit the existing mechanism to split knodes for different parts of the data. Then for these different parts of the data you replace z appropriately. If you don't mind, could you reread the emails from Sep 18 and 19 and let me know what you think is missing (wouldn't be the first time I don't see the show stopper).

cheers - guido

PS: Does creating an additional observed node create a problem for the plot_posterior_predictive function (e.g. such that only one of the observed  nodes would be plotted.)?

Not when you go with the depends option ;). 

Guido Biele

unread,
Sep 28, 2012, 9:42:50 AM9/28/12
to hddm-...@googlegroups.com
Hi Thomas,

You mentioned that "You could create a new likelihood that passes 1-z
if stim ==1 and z otherwise." can you point me to where in the code the
likelihood is calculated? then I'll have a look, as this option sounds
promising to me! (I just assume that the likelihood will also have some
data structure that allows splitting the data...).

If you prefer, you can ignore the rest of this email until I have
figured out if the likelihood approach works. Otherwise you can read
why I think using the depends_on option creates its own problems ;-)

#####################

I just re-read the emails and think I might have found where we
misunderstood each other.

when i tried to summarize the approach, i started by saying that
depends_on is not used for z, which you seemed to confirm to me.

GB:
I'll just paraphrase the general approach to make sure that we have
the same understanding.
1) When when calling hddm to generate the model, no depends on will be
invoked for z.
...
TW:
That's right, just to make sure: I think you meant to write
kabuki.hierarchical.Knode.
Also, z is never touched and z_inv will not be explicitly registered in
Hierarchical (i.e. it has no own knode; it will just exist in the the
pymc-model graph) but instead be plugged on directly before wfpt is
created. It is also wfpt which gets the depends_on.

Generally, I don't have a preference if I should use the depends_on
flag or not.

The advantage of the approach with the depends_on flag is of course
that this will split the data as needed, and that one stay firmly in
the kabuki framework. the disadvantage is see is that there remain 2
unsolved problems for this approach, both have to do that to z priors
will be created, one for each stimulus condition, but I want actually
only one. the specific problems would be:
- how to tell create_nodes that the parent z for the second observed
node is the same as for the first observed node, only that z(stim2) =
1-z(stim1), (the code you sent would simply calculate 1-z of the second
prior, so i still would have 2 instead of 1 z - parameter.)
- what should one do with the priors that were generated through the
depends_on approach, but which are no longer needed?


The advantage of the approach without the depends_on flag is that one
does not need to deal with additional priors/parameters that were
created but are not needed, and that it is straight forward to connect
the calculation of z(stim2) to the parameter for z(stim1) (see code
below). splitting the data in the create_node function is not a problem
either (see code below). however, there is as you alluded to a major
disadvantage:
- how will kabuki deal with the fact that there is now one more
observed node than expected? I really have no idea, but i would guess
that calling parameter values should be easy, but getting
posterior_predictives might be more complicated.

what do you think?

cheers- guido


###### code, not very clean yet (should use more variables and less
hard-coding) #####
def create_node(self, name, kwargs, data):
if self.observed == True:
grouped = data.groupby(data['stim'])
kwargs['value'] = grouped.get_group('lean')['rt'].values
node1 = self.pymc_node(name, **kwargs)

z = pymc.copy(kwargs['z'])
kwargs['z'] = 1-z
kwargs['value'] = grouped.get_group('rich')['rt'].values
node2 = self.pymc_node(name, **kwargs)

return(node1,node2)
else:
return self.pymc_node(name, **kwargs)


On Fri Sep 28 15:08:15 2012, Thomas Wiecki wrote:
> On Fri, Sep 28, 2012 at 8:25 AM, Guido Biele
>> <mailto:g.p....@psykologi.uio.no>
>> <mailto:g.p....@psykologi.uio.no> <mailto:g.p....@psykologi.uio.no>>
>> <mailto:thomas...@gmail.com> <mailto:thomas...@gmail.com>
>> <mailto:thomas...@gmail.com>>
>> wrote:
>> >> >>
>> >> >> Hi Guido,
>> >> >>
>> >> >> On Mon, Sep 17, 2012 at 11:15 AM, Guido Biele
>> >> >> <g.p....@psykologi.uio.no <mailto:g.p....@psykologi.uio.no>
>> <mailto:g.p....@psykologi.uio.no>
> ------------------------------------------------------------------------
>
> Guido Biele Email: g.p....@psykologi.uio.no
> <mailto:g.p....@psykologi.uio.no> Phone: +47 228 45172
> <tel:%2B47%20228%2045172> Website
> <https://sites.google.com/a/neuro-cognition.org/guido/home>
>
> Visiting Address
> Psykologisk Institutt
> Forskningsveien 3 A
> 0373 OSLO
>
>
>
> Mailing Address
> Psykologisk Institutt
> Postboks 1094
> Blindern 0317 OSLO
>
>



--
------------------------------------------------------------------------

Guido Biele
Email: g.p....@psykologi.uio.no
Phone: +47 228 45172
Website <https://sites.google.com/a/neuro-cognition.org/guido/home>

Thomas Wiecki

unread,
Sep 28, 2012, 9:58:54 AM9/28/12
to hddm-...@googlegroups.com
On Fri, Sep 28, 2012 at 9:42 AM, Guido Biele <g.p....@psykologi.uio.no> wrote:
Hi Thomas,

You mentioned that "You could create a new likelihood that passes 1-z if stim ==1 and z otherwise." can you point me to where in the code the likelihood is calculated? then I'll have a look, as this option sounds promising to me! (I just assume that the likelihood will also have some data structure that allows splitting the data...).

Ok, I'll point you to that if I haven't convinced you below :). 

If you prefer, you can ignore the rest of this email until I have figured out if the likelihood approach works. Otherwise you can read why I think using the depends_on option creates its own problems ;-) 

#####################

I just re-read the emails and think I might have found where we misunderstood each other.

when i tried to summarize the approach, i started by saying that depends_on is not used for z, which you seemed to confirm to me.

GB:

I'll just paraphrase the general approach to make sure that we have the same understanding.
1) When when calling hddm to generate the model, no depends on will be invoked for z.
...
TW:

That's right, just to make sure: I think you meant to write kabuki.hierarchical.Knode.
Also, z is never touched and z_inv will not be explicitly registered in Hierarchical (i.e. it has no own knode; it will just exist in the the pymc-model graph) but instead be plugged on directly before wfpt is created. It is also wfpt which gets the depends_on.

Generally, I don't have a preference if I should use the depends_on flag or not.

The advantage of the approach with the depends_on flag is of course that this will split the data as needed, and that one stay firmly in the kabuki framework. the disadvantage is see is that there remain 2 unsolved problems for this approach, both have to do that to z priors will be created, one for each stimulus condition, but I want actually only one. the specific problems would be:

That's not correct. There will only be one z throughout. You are not splitting the z node or creating an additional z_inv knode (which was my initial suggestion and in which case you would be correct). Instead, you are splitting the wfpt node -- the z node remains the same. Critically, the resulting model would be identical to the likelihood approach you outlined above.

- how to tell create_nodes that the parent z for the second observed node is the same as for the first observed node, only that z(stim2) = 1-z(stim1), (the code you sent would simply calculate 1-z of the second prior, so i still would have 2 instead of 1 z - parameter.)

Since z is never touched before it gets to the wfpt node, you will get the identical node each time wfpt create_node() will be called. That's true for any other node like a, v, t. In one case you invert, in the other you don't. 
 
- what should one do with the priors that were generated through the depends_on approach, but which are no longer needed?

Only one prior will be generated. 

The advantage of the approach without the depends_on flag is that one does not need to deal with additional priors/parameters that were created but are not needed, and that it is straight forward to connect the calculation of z(stim2) to the parameter for z(stim1) (see code below). splitting the data in the create_node function is not a problem either (see code below). however, there is as you alluded to a major disadvantage:
- how will kabuki deal with the fact that there is now one more observed node than expected? I really have no idea, but i would guess that calling parameter values should be easy, but getting posterior_predictives might be more complicated. 

Yeah, I suspect this to be one of potentially many issues. 

what do you think? 

Unless I am not completely missing something I think I haven't communicated the wfpt-depends_on option clearly. If it's not clear, can you let me know why you think multiple z priors will be generated with this approach?

Thomas
    <mailto:g.p.biele@psykologi.uio.no> Phone: +47 228 45172

    <tel:%2B47%20228%2045172> Website
    <https://sites.google.com/a/neuro-cognition.org/guido/home>

      Visiting Address
      Psykologisk Institutt
      Forskningsveien 3 A
      0373 OSLO

                       

       Mailing Address
       Psykologisk Institutt
       Postboks 1094
       Blindern 0317 OSLO


Guido Biele

unread,
Sep 28, 2012, 10:02:43 AM9/28/12
to hddm-...@googlegroups.com
Hi,

"Ich habe da auf der Leitung gestanden ..."
it dawns me now/i begin to rember that the depends_on was for the wfpt,
not in the general model set up.
I'll check it out later tonight, as I got to take care of my kids now
... ;-)

cheers - guido

On Fri Sep 28 15:58:54 2012, Thomas Wiecki wrote:
> On Fri, Sep 28, 2012 at 9:42 AM, Guido Biele
> <g.p....@psykologi.uio.no <mailto:g.p....@psykologi.uio.no>> wrote:
>
> Hi Thomas,
>
> grouped.get_group('lean')['rt'__].values
> node1 = self.pymc_node(name, **kwargs)
>
> z = pymc.copy(kwargs['z'])
> kwargs['z'] = 1-z
> kwargs['value'] =
> grouped.get_group('rich')['rt'__].values
> node2 = self.pymc_node(name, **kwargs)
>
>
> return(node1,node2)
> else:
> return self.pymc_node(name, **kwargs)
>
>
> On Fri Sep 28 15:08:15 2012, Thomas Wiecki wrote:
>
> On Fri, Sep 28, 2012 at 8:25 AM, Guido Biele
> <g.p....@psykologi.uio.no
> <mailto:g.p....@psykologi.uio.no>
> <mailto:g.p.biele@psykologi.__uio.no
> <mailto:g.p....@psykologi.uio.no>>> wrote:
>
> Hi Thomas,
>
> <mailto:g.p....@psykologi.uio.no>
> <mailto:g.p.biele@psykologi.__uio.no
> <mailto:g.p....@psykologi.uio.no>>
> <mailto:g.p.biele@psykologi.__uio.no
> <mailto:g.p....@psykologi.uio.no>>
>
> <mailto:g.p.biele@psykologi.__uio.no
> <mailto:g.p....@psykologi.uio.no>>> wrote:
> > Hi Thomas,
> >
> kwargs['values'][data['stim']=__=1] # this is
> > pseudo code ...
> > node1 = self.pymc_node(name, **kwargs)
> >
> > kwargs = tmp_kwargs
> > z = copy(kwargs['z'])
> > kwargs['z'] = 1-z
> > kwargs['values'] =
> kwargs['values'][data['stim']=__=2] # this is
> > pseudo code ...
> >
> > return(node1,node2)
> > else:
> > return self.pymc_node(name, **kwargs)
> >
> > PS: in line 95 of
> >
> https://github.com/hddm-devs/__kabuki/blob/develop/kabuki/__hierarchical.py
> <https://github.com/hddm-devs/kabuki/blob/develop/kabuki/hierarchical.py>
> it
> > reads "#create all the knodes" should this be
> "#create all the
> nodes" ?
>
> Indeed, thanks.
> >
> >
> > On Tuesday, September 18, 2012 5:35:04 PM UTC+2,
> Thomas wrote:
> >>
> >> On Mon, Sep 17, 2012 at 11:50 AM, guido biele
> <g.p....@psykologi.uio.no
> <mailto:g.p....@psykologi.uio.no>
> <mailto:g.p....@psykologi.uio.__no
> <mailto:g.p....@psykologi.uio.no>>
> <mailto:g.p....@psykologi.uio.__no
> <mailto:g.p....@psykologi.uio.no>>
> <mailto:g.p....@psykologi.uio.__no
> <mailto:thomas...@gmail.com>> <mailto:thomas...@gmail.com
> <mailto:thomas...@gmail.com>>
>
> <mailto:thomas...@gmail.com <mailto:thomas...@gmail.com>>>
> wrote:
> >> >>
> >> >> Hi Guido,
> >> >>
> >> >> On Mon, Sep 17, 2012 at 11:15 AM, Guido Biele
> >> >> <g.p....@psykologi.uio.no
> <mailto:g.p....@psykologi.uio.no>
> <mailto:g.p....@psykologi.uio.__no
> <mailto:g.p....@psykologi.uio.no>>
> <mailto:g.p....@psykologi.uio.__no
> <mailto:g.p....@psykologi.uio.no>>
>
> <mailto:g.p....@psykologi.uio.__no
> <mailto:g.p....@psykologi.uio.no>>> wrote:
> >> >>>
> >> >>> Hi Thomas,
> >> >>>
> >> >>> as far as we can understand the approach you
> suggested was to
> change
> >> >>> the node for the z of the second condition to be a
> deterministic nodes,
> >> >>> who's value is governed from the z of the first
> condition and
> the evaluation
> >> >>> function.
> >> >>>
> >> >>> This involves:
> >> >>>
> >> >>> - making the node for the 2nd condition a class
> >> >>> 'pymc.PyMCObjects.__Deterministic'
> ------------------------------__------------------------------__------------
>
> Guido Biele Email: g.p....@psykologi.uio.no
> <mailto:g.p....@psykologi.uio.no>
> <mailto:g.p.biele@psykologi.__uio.no
> <mailto:g.p....@psykologi.uio.no>> Phone: +47 228 45172
> <tel:%2B47%20228%2045172>
>
> <tel:%2B47%20228%2045172> Website
>
> <https://sites.google.com/a/__neuro-cognition.org/guido/home
> <https://sites.google.com/a/neuro-cognition.org/guido/home>__>
>
> Visiting Address
> Psykologisk Institutt
> Forskningsveien 3 A
> 0373 OSLO
>
>
>
> Mailing Address
> Psykologisk Institutt
> Postboks 1094
> Blindern 0317 OSLO
>
>
>
>
>
> --
> ------------------------------__------------------------------__------------
>
>
> Guido Biele
> Email: g.p....@psykologi.uio.no <mailto:g.p....@psykologi.uio.no>
> Phone: +47 228 45172 <tel:%2B47%20228%2045172>
> Website
> <https://sites.google.com/a/__neuro-cognition.org/guido/home
> <https://sites.google.com/a/neuro-cognition.org/guido/home>__>
Reply all
Reply to author
Forward
0 new messages