How to define an array of vectors where the length of vectors are not fixed!

866 views
Skip to first unread message

sepehr akhavan

unread,
Apr 15, 2016, 11:35:06 AM4/15/16
to Stan users mailing list
Hi,

Let's Consider a longitudinal setting (multiple measurements per subject) where number of measurements across subjects is variable. As an example, we can consider measuring blood pressure in a year for N = 300 subjects. Number of times their blood pressure have been measured over a 1-year followup is a number between 6 to 12 (let's call it Mi = number of measurements for subject i). 

So, blood pressure per subject is a vector. I'd like to consider bloodPressure as an array of size N = 300 where each bloodPressure[i] is a vector of size Mi. How can I define bloodPressure array in the data block of my stan code:

data{
int<lower = 1> N;
vector<lower = 1>[N] Mi;
vector[Mi] bloodPressure[N]; // Can I put Mi as a size of vector where Mi itself is a vector ?
}

Thanks very much for your help,
-Sepehr

Ben Goodrich

unread,
Apr 15, 2016, 12:06:06 PM4/15/16
to Stan users mailing list
On Friday, April 15, 2016 at 11:35:06 AM UTC-4, sepehr akhavan wrote:
data{
int<lower = 1> N;
vector<lower = 1>[N] Mi;
vector[Mi] bloodPressure[N]; // Can I put Mi as a size of vector where Mi itself is a vector ?
}

No. You pretty much have to stack them into a long vector and when needed, pull out the segment of length Mi[n] for the n-th unit.

Ben

Bob Carpenter

unread,
Apr 15, 2016, 12:07:29 PM4/15/16
to stan-...@googlegroups.com
First of all, sizes have to be integers, so you
would have to have:

int<lower=1> Mi[N];

But we don't have ragged arrays, yet, so you either have
to use a melted encoding and put vectors back together if
you need them as such, or you have to declare a bigger array
of vectors

vector[max(Mi)] bP[N];

and then just ignore the unused vectors (you have to
slice them down manually when you use them).

We'll get ragged arrays eventually, but probably not this year.

- Bob
> --
> You received this message because you are subscribed to the Google Groups "Stan users mailing list" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to stan-users+...@googlegroups.com.
> To post to this group, send email to stan-...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Bob Carpenter

unread,
Apr 15, 2016, 12:08:30 PM4/15/16
to stan-...@googlegroups.com
That's the third option! There's a discussion in
a chapter in the manual on how to do this.

- Bob

sepehr akhavan

unread,
Apr 15, 2016, 12:29:32 PM4/15/16
to stan-...@googlegroups.com
Thanks very much, Bob. For the slicing part, selecting 1:Mi is as simple as bP[i][1:Mi[i]]? 

Bob Carpenter

unread,
Apr 15, 2016, 12:37:29 PM4/15/16
to stan-...@googlegroups.com
Yes. There's also a chapter in the manual on indexing
syntax.

- Bob

> On Apr 15, 2016, at 12:29 PM, sepehr akhavan <akhavan...@gmail.com> wrote:
>
> Thanks very much, Bob. For the slicing part, selecting 1:Mi is as simple as bP[i][1:Mi]?

Christopher Peterson

unread,
Apr 15, 2016, 12:57:54 PM4/15/16
to Stan users mailing list
One method I've found helpful is to define a pair of index vectors:

int N; // number of measurements
int nPat; // number of patients
vector[N] bP;  // measurements
int
<lower=1,upper=N> start_Pos[nPat];  // the first position on bP for each patient
int<lower=1,upper=N> stop_Pos[nPat];  // the last position on bP for each patient

/// ...

bP[start_Pos[i]:stop_Pos[i]] ~ ... // select the measurements of patient i

This I suppose this could result in slightly more overhead, but I think it makes the code easier to write.  

Bob Carpenter

unread,
Apr 15, 2016, 3:19:46 PM4/15/16
to stan-...@googlegroups.com
Right. That's what Ben was suggesting. Usually you only
need a single index array though and can use

bP[start_pos[i], start_pos[i + 1] - i]

But then you have to be careful to pad the start_pos index to size N + 1
and add the final endpoint if you want the loops to be easy.

Think of all the integer arithmetic as a freebie --- it can
almost all be parallelized efficiently in the CPU beside the
floating point calls. In contrast, iterating over two arrays
is more expensive, because memory access can be more (or even much
more) costly than integer arithmetic.

- Bob

sepehr akhavan

unread,
Apr 17, 2016, 8:44:19 PM4/17/16
to Stan users mailing list
Hi Bob,
Following your solution on slicing and indexing my matrices and vectors, I wrote my code. 
I have to define some of my matrices and vectors in the model block as they both depend on parameters and data. In particular, in the model section, I define a matrix of the form below:

matrix[1,max(Mi)] K_transpose_div_SigmaPred;

As a reminder, Mi is a vector of number of measurements per subject. So if Mi[1] is 12, it means subject 1 has 12 within-subj measurements. Note that subjects may have different number measurements. So the idea is let's define a big row matrix and if a subject has fewer observations, let's slice this row matrix as K_transpose_div_SigmaPred[1, 1:Mi_i] where Mi_i <- Mi[i]

Later in my data block, I have a loop on subjects and in a statement in that loop, I compute K_transpose_div_SigmaPred as:

K_transpose_div_SigmaPred <- to_matrix((K[1:(Mi_i), 1])' * inverse(SigmaPred[1:Mi_i, 1:Mi_i]));  

I successfully compile the code. However, I get run-time error when I run the code and the run time is:
if a subject has 10 measurements, in the line above, the right hand side is a 1 by 10 row matrix where as the left hand side is a bigger row matrix (suppose max(Mi) is 12). I tried to do this:

K_transpose_div_SigmaPred[1, 1:Mi_i] <- to_matrix((K[1:(Mi_i), 1])' * inverse(SigmaPred[1:Mi_i, 1:Mi_i]));  

But the line above is not valid and leads to compile error. Also it's not possible to declare K_transpose_div_SigmaPred within the loop as it seems that all variable declaration should be done on top of the model block. What would you think I should do?

Also, I'm so grateful to your guidance and help and so sorry I'm asking a specific question. I tried to find a solution over the past 2 days but no success and decided to get some help here.

Regards,
-Sepehr

Bob Carpenter

unread,
Apr 18, 2016, 2:25:32 AM4/18/16
to stan-...@googlegroups.com

> On Apr 17, 2016, at 8:44 PM, sepehr akhavan <akhavan...@gmail.com> wrote:
>
> Hi Bob,
> Following your solution on slicing and indexing my matrices and vectors, I wrote my code.
> I have to define some of my matrices and vectors in the model block as they both depend on parameters and data. In particular, in the model section, I define a matrix of the form below:
>
> matrix[1,max(Mi)] K_transpose_div_SigmaPred;
>
> As a reminder, Mi is a vector of number of measurements per subject. So if Mi[1] is 12, it means subject 1 has 12 within-subj measurements. Note that subjects may have different number measurements. So the idea is let's define a big row matrix and if a subject has fewer observations, let's slice this row matrix as K_transpose_div_SigmaPred[1, 1:Mi_i] where Mi_i <- Mi[i]

Why do that? What is Sigma for and what are the model parameters?

> Later in my data block, I have a loop on subjects and in a statement in that loop, I compute K_transpose_div_SigmaPred as:
>
> K_transpose_div_SigmaPred <- to_matrix((K[1:(Mi_i), 1])' * inverse(SigmaPred[1:Mi_i, 1:Mi_i]));

You don't want to compute inverses. Rather than A * inv(B), it's more
stable to use A / B.

And you want to organize your data so it doesn't
need to be transposed if you can help it.

>
> I successfully compile the code. However, I get run-time error when I run the code and the run time is:
> if a subject has 10 measurements, in the line above, the right hand side is a 1 by 10 row matrix where as the left hand side is a bigger row matrix (suppose max(Mi) is 12).

They have to have compatible dimensions.

> I tried to do this:
>
> K_transpose_div_SigmaPred[1, 1:Mi_i] <- to_matrix((K[1:(Mi_i), 1])' * inverse(SigmaPred[1:Mi_i, 1:Mi_i]));
>
> But the line above is not valid and leads to compile error. Also it's not possible to declare K_transpose_div_SigmaPred within the loop as it seems that all variable declaration should be done on top of the model block.

You can define local variables within any block. You can even use
{ and } to create a new block any place just to define local variables.

> What would you think I should do?

I'm not sure what you're trying to do, but you can't multiply
matrices of different dimensionalities.

> Also, I'm so grateful to your guidance and help and so sorry I'm asking a specific question.

Specific is easier to answer than vague.
Reply all
Reply to author
Forward
0 new messages