28 views

Skip to first unread message

May 24, 2017, 4:54:13 AM5/24/17

to Stan users mailing list

Dear all,

I have coded a very simple hierarchical regression in Stan with 2 predictors that ran reasonably fast.

Then I tried to make the code generic so that I could use it for an arbitrary number of predictors (I tried to code it the way the manual suggests in the section on Linear Regression). While this still gave me the same parameter estimates (so something must be right), it took much longer (~3 times slower). I'm not sure why and whether there is anything I can do to speed it up? I've attached data and code.

Here is the original model:

data {

int datp; //number of data points

vector[datp] y; //reaction time

int nsub; //number of subjects

int ntrs[nsub]; //number of trials for each person

int<lower=1> nregs; //number of regressors in this design (including constant)

matrix[datp,nregs] x; //design matrix - format: column 1:intercept, column 2: regressor, rows:datapoints for all subjects concatenated

int subID[datp]; //which subject each datapoint belongs to

}

parameters {

// group level inference parameters

vector[2] beta_mu; // constant (intercept) and effect of condition (slope)

// structured noise: differences between individual subjects and items

real<lower=0> sigma_e; //error sd per measurement

vector<lower=0>[2] sigma; //individual differences between people

vector[nsub] betas0; //RT adjustment for each person (intercept)

vector[nsub] betas1; //RT adjustment for slope

}

// This block runs the actual model

model {

real mu;

// Generate the RT adjustments for items and subjects

// They are from a normal distribution with the standard deviation ('sigma') as defined above, mean is set to zero because the global constant captures this

betas0 ~ normal(0,sigma[1]);

betas1 ~ normal(0,sigma[2]);

// Compute the likelihoods for each data point as before, now adjusting mu also for individual differences between people and itmes

for (i in 1:datp){ // likelihood for each data point

mu = (beta_mu[1] + betas0[subID[i]])*x[i,1] + (beta_mu[2] + betas1[subID[i]])*x[i,2];

y[i] ~ lognormal(mu,sigma_e); // the '~' means rt is distributed according to a lognormal distribution, with mean mu and standard deviation sigma_e

}

}

And here is the more generic model (i've also tried this without the non-centered parameterisation, it's about the same speed):

data {

int datp; //number of data points

vector[datp] y; //reaction time

int nsub; //number of subjects

int ntrs[nsub]; //number of trials for each person

int<lower=1> nregs; //number of regressors in this design (including constant)

matrix[datp,nregs] x; //design matrix - format: column 1:intercept, column 2: regressor, rows:datapoints for all subjects concatenated

int subID[datp]; //which subject each datapoint belongs to

}

parameters{

vector[nregs] beta_mu; // group level paras

vector<lower=0>[nregs] sigma; // individual differences

real<lower=0> sigma_e; //error sd per measurement

vector[nsub] betasRaw[nregs];

}

// To do a non-centered distribution,this speeds up the sampling

transformed parameters{

vector[nsub] betas[nregs];

for (ir in 1:nregs){

betas[ir] = beta_mu[ir]+sigma[ir]*betasRaw[ir];}

}

// This block runs the actual model

model {

real mu;

//Individual subject

for (ir in 1:nregs){

betasRaw[ir] ~ normal(0,1);

}

// Run the regression

for (i in 1:datp){ // likelihood for each data point

mu=to_vector(betas[:,subID[i]])'*x[i]';

y[i] ~ lognormal(mu,sigma_e);

}

}

May 24, 2017, 2:00:03 PM5/24/17

to stan-...@googlegroups.com, jacquie...@gmail.com

We're shuttering this list and moving to: http://discourse.mc-stan.org

I'm surprised the revised model is three times slower. Did you do runs

from the same initial points or do multiple runs to collect average timings?

To speed up both of your models, there are a lot of things that can

vectorized. Most importantly this:

> for (i in 1:datp){ // likelihood for each data point

> mu = (beta_mu[1] + betas0[subID[i]])*x[i,1] + (beta_mu[2] + betas1[subID[i]])*x[i,2];

> y[i] ~ lognormal(mu,sigma_e); // the '~' means rt is distributed according to a lognormal distribution, with mean mu and standard deviation sigma_e

> }

can be

y ~ lognormal((beta_mu[1] + betas0[subId]) .* x[, 1]

+ (beta_mu[2] + betas1[subID]) .* x[, 2]),

sigma_e);

You could make this even more efficient by packing everything up into

matrices, but that's probably not worth it. This will get most of the

efficiency gain to be had.

Did you mean to leave out a global intercept? It can be hard to fit

a lot of models without it.

In the second model, are you sure this is right:

> for (i in 1:datp){ // likelihood for each data point

> mu=to_vector(betas[:,subID[i]])'*x[i]';

> y[i] ~ lognormal(mu,sigma_e);

> }

The way you have it, mu is going to be a vector, but y[i] is just a scalar,

so it'll be equivalent to PROD_n y[i] ~ lognormal(mu[n], sigma_e), which

is probably not what you want.

- Bob

> // Run the regressionA

> You received this message because you are subscribed to the Google Groups "Stan users mailing list" group.

> To unsubscribe from this group and stop receiving emails from it, send an email to stan-users+...@googlegroups.com.

> To post to this group, send email to stan-...@googlegroups.com.

> For more options, visit https://groups.google.com/d/optout.

> <ranIntSlopeSubjMyInp.stan><GenericRegressionRT2levelsReparam.stan><temp.data.R>

I'm surprised the revised model is three times slower. Did you do runs

from the same initial points or do multiple runs to collect average timings?

To speed up both of your models, there are a lot of things that can

vectorized. Most importantly this:

> for (i in 1:datp){ // likelihood for each data point

> mu = (beta_mu[1] + betas0[subID[i]])*x[i,1] + (beta_mu[2] + betas1[subID[i]])*x[i,2];

> y[i] ~ lognormal(mu,sigma_e); // the '~' means rt is distributed according to a lognormal distribution, with mean mu and standard deviation sigma_e

> }

y ~ lognormal((beta_mu[1] + betas0[subId]) .* x[, 1]

+ (beta_mu[2] + betas1[subID]) .* x[, 2]),

sigma_e);

You could make this even more efficient by packing everything up into

matrices, but that's probably not worth it. This will get most of the

efficiency gain to be had.

Did you mean to leave out a global intercept? It can be hard to fit

a lot of models without it.

In the second model, are you sure this is right:

> for (i in 1:datp){ // likelihood for each data point

> mu=to_vector(betas[:,subID[i]])'*x[i]';

> y[i] ~ lognormal(mu,sigma_e);

> }

so it'll be equivalent to PROD_n y[i] ~ lognormal(mu[n], sigma_e), which

is probably not what you want.

- Bob

> for (i in 1:datp){ // likelihood for each data point

> mu=to_vector(betas[:,subID[i]])'*x[i]';

> y[i] ~ lognormal(mu,sigma_e);

> }

> }

>

>

> --
> mu=to_vector(betas[:,subID[i]])'*x[i]';

> y[i] ~ lognormal(mu,sigma_e);

> }

> }

>

>

> You received this message because you are subscribed to the Google Groups "Stan users mailing list" group.

> To unsubscribe from this group and stop receiving emails from it, send an email to stan-users+...@googlegroups.com.

> To post to this group, send email to stan-...@googlegroups.com.

> For more options, visit https://groups.google.com/d/optout.

> <ranIntSlopeSubjMyInp.stan><GenericRegressionRT2levelsReparam.stan><temp.data.R>

Reply all

Reply to author

Forward

0 new messages

Search

Clear search

Close search

Google apps

Main menu