How to subset an array based on indices satisfying some condition?

1,098 views
Skip to first unread message

Dalton Hance

unread,
Mar 1, 2016, 11:40:49 PM3/1/16
to Stan users mailing list

What I want to do is relatively simple, but I've been struggling to figure out how to implement it. I'm putting together a hierarchical model and I want to subset a two-dimensional integer array based on a one-dimensional integer array. I'm trying to write a function to do this on the Stan side as opposed to doing it on the R side and then passing to Stan.

What I want to do is demonstrated very simply in R:

k = 4
i = 25

set.seed(123)

x <- matrix(round(runif(i*k)), ncol =k)
group <- c(rep(1, 7), rep(2, 4), rep(3, 3), rep(4, 5), rep(5,6))
x[group==1,]


But I'm struggling to implement this in Stan. I'm sure the solution is simple, but my grasp of the language is still pretty rudimentary.

I'm going to give Stan the following data:

data {
    int<lower=2> K;                   // capture events
    int<lower=1> G;                   // number of groups
    int<lower=0> I;                   // number of individuals
    int<lower=0,upper=1> X[I,K];      // X[i,k]: ==1 if individual i captured at , 0 otherwise
    int<lower=1> group[I];            // vector of group membership
}

What I want to do is write a function that returns the subarray X[some number < I, K] given a specific g. I've got the function header written (not that I deserve a medal for that trivial step).

  int subset_capture_history(int I, int[,] X, int[] group, int j){}

I'm going to pass this function into another function that summarizes the two dimensional integer array into a vector, so I don't actually need to store the output in a declared container. 



Bob Carpenter

unread,
Mar 4, 2016, 7:14:04 PM3/4/16
to stan-...@googlegroups.com

> On Mar 1, 2016, at 11:40 PM, Dalton Hance <dalton...@gmail.com> wrote:
>
>
> What I want to do is relatively simple, but I've been struggling to figure out how to implement it. I'm putting together a hierarchical model and I want to subset a two-dimensional integer array based on a one-dimensional integer array. I'm trying to write a function to do this on the Stan side as opposed to doing it on the R side and then passing to Stan.
>
> What I want to do is demonstrated very simply in R:
>
> k = 4
> i = 25
>
> set.seed(123)
>
> x <- matrix(round(runif(i*k)), ncol =k)
> group <- c(rep(1, 7), rep(2, 4), rep(3, 3), rep(4, 5), rep(5,6))
> x[group==1,]


/**
* Return what R computes as x[is == target, ]
*/
matrix foo(matrix x, int[] is, int target) {
matrix[count, cols(x)] result;
int pos;

if (size(is) != rows(x))
reject("illegal input"); // stricter than R's broadcasting

// count number of rows in result
int count;
count <- 0;
for (n in 1:size(is))
if (is[n] == target)
count <- count + 1;

// assign rows in result
pos <- 1;
for (n in 1:size(is)) {
if (is[n] == target) {
result[pos] <- x[n]; // assigns entire row
pos <- pos + 1;
}
}
return result;
}

- Bob

Jonah Gabry

unread,
Mar 4, 2016, 7:34:25 PM3/4/16
to Stan users mailing list
That's a cool function! Minor thing: I think it will give an error because count is used in matrix[count, cols(x)] result; before it's declared.

Bob Carpenter

unread,
Mar 5, 2016, 3:07:04 PM3/5/16
to stan-...@googlegroups.com
Too bad, I had it right the first time then tried to
"simplify". You need to put that matrix declaration in
a local lower down, as in:

/**
* Return what R computes as x[is == target, ]
*/
matrix foo(matrix x, int[] is, int target) {
if (size(is) != rows(x))
reject("illegal input"); // stricter than R's broadcasting

// count number of rows in result
int count;
count <- 0;
for (n in 1:size(is))
if (is[n] == target)
count <- count + 1;

// assign rows in result
{
matrix[count, cols(x)] result;
int pos;
pos <- 1;
for (n in 1:size(is)) {
if (is[n] == target) {
result[pos] <- x[n]; // assigns entire row
pos <- pos + 1;
}
}
return result;
}
}

I still haven't tried it, but it should work. I love
these little programming puzzles --- it's like a job interview
for being a programmer these days :-)

- Bob

> On Mar 4, 2016, at 7:34 PM, Jonah Gabry <jga...@gmail.com> wrote:
>
> That's a cool function! Minor thing: I think it will give an error because count is used in matrix[count, cols(x)] result; before it's declared.
>
>
> --
> You received this message because you are subscribed to the Google Groups "Stan users mailing list" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to stan-users+...@googlegroups.com.
> To post to this group, send email to stan-...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Jonah Gabry

unread,
Mar 5, 2016, 3:42:07 PM3/5/16
to stan-...@googlegroups.com
With one minor tweak it works nicely (the declaration int count; needs to come before the rejection for illegal inputs). 

For anyone interested, here's how you can compare it to the R version using RStan's expose_stan_functions feature: 

1) Put Bob's function (with my minor edit above) in a .stan file (I'm using expose_foo.stan) along with an empty model block (all Stan programs need a model block). So expose_foo.stan is:

functions {

  matrix foo(matrix x, int[] is, int target) { 
   int count; 
   if (size(is) != rows(x)) 
     reject("illegal input");  // stricter than R's broadcasting 

   // count number of rows in result 
   count <- 0; 
   for (n in 1:size(is)) 
     if (is[n] == target) 
       count <- count + 1; 

   // assign rows in result 
   { 
     matrix[count, cols(x)] result; 
     int pos; 
     pos <- 1; 
     for (n in 1:size(is)) { 
       if (is[n] == target) { 
         result[pos] <- x[n];  // assigns entire row 
         pos <- pos + 1; 
       } 
     } 
     return result; 
   } 
  }
}
model {

}


2) In an R session use RStan's expose_stan_functions and then check that foo gives the same result as doing the subsetting using the OP's R syntax:

library(rstan)

# Expose the foo function in global environment
expose_stan_functions(stanc(file = "expose_foo.stan"))

# Test it on an example

X <- matrix(rnorm(25), 5, 5)
a <- c(1, 0, 0, 0, 1)
identical(X[a == 1, , drop = FALSE], foo(X, a, 1)) 



Jonah
Reply all
Reply to author
Forward
0 new messages