categorical distribution in mixture model issue

retzerjj

unread,

May 7, 2014, 2:30:41 PM5/7/14

to stan-...@googlegroups.com

Hello all,
We were trying to run a finite mixture model which puts Dirichlet priors on both rows and columns using the data set from the R "blockcluster" package. The translation / compiliation to C++ appears to be working however an error occurs when the "categorical" command is executed.

   Error : Error in function stan::prob::categorical_log(i): Number of categories is 0, but must be between (1, 2)
     error occurred during calling the sampler; sampling not done

I don't "think" this is related to a discrete parameter issue since we have no discrete _parameters_, just discrete (integer) vectors created within the model block. I'm not sure if our argument for categorical command is correct or if there is some other issue.

Any advice would be greatly appreciated,
Many thanks,
Joe Retzer

-------------------------------------- R / Rstan Code ------------------------------------

library(rstan)
library(blockcluster); data(binarydata); data <- binarydata # for example data set

set_cppo("fast") # for best running speed
set_cppo('debug') # make debug easier

N = 1000; D = 100                      # data size
K = 2;    L = 3                        # row and column clusters

my_data <- list(x = data, K = K, L = L, N = N, D = D, alpha1 = rep(1,K), alpha2 = rep(1,L))

my_code <- '
data {
int<lower=1> N;
int<lower=1> D;
int<lower=1,upper=N> K;                // Number of row clusters
int<lower=1,upper=D> L;                // Number of col clusters
int<lower=0,upper=1> x[N,D];
vector<lower=0>[K] alpha1;
vector<lower=0>[L] alpha2;
}

parameters {
   real<lower=0,upper=1> theta[K,L];
   simplex[K] pi[N];
   simplex[L] pj[D];
}
model {
int rclass[N,D];
int cclass[N,D];
for(k in 1:K){                            # prior on theta - binomial prob for co-clusters
    for(l in 1:L){
      theta[k,l] ~ beta(1, 1);
    }
}

for(i in 1:N) {                           # hierarchical prior on rows
    pi[i] ~ dirichlet(alpha1);              # the individual class prob pi
}

for(j in 1:D) {                           # hierarchical prior on columns
    pj[j] ~ dirichlet(alpha2);              # the individual class prob pj
}

for(i in 1:N){                            # likelihood
    for(j in 1:D){
      rclass[i,j] ~ categorical(pi[i]);
      cclass[i,j] ~ categorical(softmax(pj[j]));
      x[i,j] ~ bernoulli(theta[rclass[i,j], cclass[i,j]]);
    }
}
}
'

fit <- stan(model_code = my_code, data = my_data, iter = 1000, chains = 4)

Bob Carpenter

unread,

May 7, 2014, 6:33:22 PM5/7/14

to stan-...@googlegroups.com

I don't know what the R package blockcluster does, but
our categoricals are indexed from 1 through K.

Your problem is here:

> model {
> int rclass[N,D];
...

> rclass[i,j] ~ categorical(pi[i]);

You don't ever set rclass. The default value is 0, and then
that causes a problem because 0 is not an appropriate outcome
for a categorical.

It looks like what you're trying to do is sample from the
categorical(pi[i]) and assign the result to rclass[i,j], which
is NOT how Stan sampling statements work. They just increment
the log probability with the value of

log(Categorical(rclass[i,j] | pi[i]))

which presupposes that rclass[i,j] is set to a valid value.

Also, given that you have a parameter declaration

> simplex[L] pj[D];

you almost certainly don't want to be doing this:

> cclass[i,j] ~ categorical(softmax(pj[j]));

pj[j] is already declared to be a simplex, so it's already
the right kind of thing to be a parameter to categorical().
Now you're transforming it again by exponentiating the
values in pj[j] and then normalizing.

- Bob

> --
> You received this message because you are subscribed to the Google Groups "Stan users mailing list" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to stan-users+...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Retzer Joe

unread,

May 7, 2014, 11:47:52 PM5/7/14

to stan-...@googlegroups.com

Hi Bob,

Thanks so much for getting back to us. Just a couple of things:

On May 7, 2014, at 5:33 PM, Bob Carpenter wrote:

I don't know what the R package blockcluster does, but
our categoricals are indexed from 1 through K.

I only mention the blockcluster package to specify the source of our data, its functionality has nothing to do with the stan program. Sorry for not being more clear on this.

Your problem is here:

model {
int rclass[N,D];
...
rclass[i,j] ~ categorical(pi[i]);

You don't ever set rclass. The default value is 0, and then
that causes a problem because 0 is not an appropriate outcome
for a categorical.

Right, we saw that with some print statements we had but removed before posting. We did try to limit / set the range with:

int<lower=1, upper=K> rclass[N,D];

int<lower=1, upper=L> cclass[N,D];

in the model block, but got the error:

Error in stanc(file = file, model_code = model_code, model_name = model_name, :

failed to parse Stan model 'my_code' with error message:

EXPECTATION FAILURE LOCATION: file=input; line=20, column=36

int<lower=1,upper=K> rclass[N,D];

^-- here

DIAGNOSTIC(S) FROM PARSER:

require unconstrained. found range constraint.

Parser expecting: <eps>

However we may be setting rclass inappropriately. Is there another way to limit rclass range?

It looks like what you're trying to do is sample from the
categorical(pi[i]) and assign the result to rclass[i,j], which
is NOT how Stan sampling statements work. They just increment
the log probability with the value of

log(Categorical(rclass[i,j] | pi[i]))

Yes, that is exactly what we are trying to do. I suspect our approach therefore is not correct. We would need to both set rclass (and cclass) appropriately and change our likelihood specification code, is that correct?

which presupposes that rclass[i,j] is set to a valid value.

Also, given that you have a parameter declaration

simplex[L] pj[D];

you almost certainly don't want to be doing this:

cclass[i,j] ~ categorical(softmax(pj[j]));

pj[j] is already declared to be a simplex, so it's already
the right kind of thing to be a parameter to categorical().
Now you're transforming it again by exponentiating the
values in pj[j] and then normalizing.

Yes, absolutely correct. The "softmax" code was left over after some re-coding attempts and should have been removed before posting, sorry about that.

Thanks again Bob for looking at our code and your advice, its very much appreciated,

Joe

You received this message because you are subscribed to a topic in the Google Groups "Stan users mailing list" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/stan-users/LK0kGNdyGdE/unsubscribe.
To unsubscribe from this group and all its topics, send an email to stan-users+...@googlegroups.com.

Bob Carpenter

unread,

May 9, 2014, 5:20:11 AM5/9/14

to stan-...@googlegroups.com

On May 8, 2014, at 5:47 AM, Retzer Joe <retz...@gmail.com> wrote:
>
>> Your problem is here:
>>
>>> model {
>>> int rclass[N,D];
>> ...
>>> rclass[i,j] ~ categorical(pi[i]);
>>
>> You don't ever set rclass. The default value is 0, and then
>> that causes a problem because 0 is not an appropriate outcome
>> for a categorical.
>>
>
> Right, we saw that with some print statements we had but removed before posting. We did try to limit / set the range with:
>
> int<lower=1, upper=K> rclass[N,D];
> int<lower=1, upper=L> cclass[N,D];

The problem isn't not setting the bounds, the problem is that you're
trying to use "~" to do random number generation and assignment to
rclass, and that's not how Stan works. It requires rclass and pi to
already be defined (not just declared as local variables) when you call
it. The parameters are all defined by default externally by the sampler,
but any locals, transformed data, transformed parameters, and generated
quantities need to be set explicitly before being used.

>> It looks like what you're trying to do is sample from the
>> categorical(pi[i]) and assign the result to rclass[i,j], which
>> is NOT how Stan sampling statements work. They just increment
>> the log probability with the value of
>>
>> log(Categorical(rclass[i,j] | pi[i]))
>
>
> Yes, that is exactly what we are trying to do. I suspect our approach therefore is not correct. We would need to both set rclass (and cclass) appropriately and change our likelihood specification code, is that correct?
>

Correct.

What you really need to do is marginalize out any discrete
parameters.

- Bob

retzerjj

unread,

May 21, 2014, 1:48:26 PM5/21/14

to stan-...@googlegroups.com

Thanks Bob. I tried re-writing to address the likelihood updating and to marginalize out the discrete parameters (using somewhat similar examples found on the list and the manual) and came up with the code below.

my_code <- '
data {
int<lower=1> N;
int<lower=1> D;

int<lower=1> K; // Number of row clusters
int<lower=1> L; // Number of col clusters

   int<lower=0,upper=1> x[N,D];
   vector<lower=0>[K] alpha1;
   vector<lower=0>[L] alpha2;
}

parameters {
    real<lower=0,upper=1> theta[K,L];

// real pi[N,K];
// real pj[D,L];

simplex[K] pi[N];
simplex[L] pj[D];
}

model {

   int rowSegVec[N];
   int colSegVec[D];
   real ps[N,D];

   for(k in 1:K){                            # prior on theta - binomial prob for co-clusters
     for(l in 1:L){theta[k,l] ~ beta(1, 1);}
   }

   for(i in 1:N) {                           # hierarchical prior on rows
     pi[i] ~ dirichlet(alpha1);              # the individual class prob pi
   }

   for(j in 1:D) {                           # hierarchical prior on columns
     pj[j] ~ dirichlet(alpha2);    # the individual class prob pj
   }

   for(i in 1:N){                             # likelihood

     rowSegVec[i] ~ categorical(pi[i]);
     for(j in 1:D){
       colSegVec[j] ~ categorical(pj[j]);
       for(l in 1:L){
         for (k in 1:K){
            if (which.max(rowSegVec[i])) == k & which.max(colSegVec[j]) == l)
               increment_log_prob(x[i,j]*log(Theta[k,l])+(1-x[i,j])*log1m(Theta[k,l]));
           }
         }
       }
     }
   }
'
I'm now running into the error:

TRANSLATING MODEL 'my_code' FROM Stan CODE TO C++ CODE NOW.

Error in stanc(file = file, model_code = model_code, model_name = model_name, :
failed to parse Stan model 'my_code' with error message:

EXPECTATION FAILURE LOCATION: file=input; line=42, column=40

if (which.max(rowSegVec[i])) == k & which.max(colSegVec[j]) == l)

^-- here

DIAGNOSTIC(S) FROM PARSER:

no matches for function name="which.max"
arg 0 type=int
available function signatures for which.max:
expression is ill formed
Parser expecting: <eps>

It would seem the R "which.max" function is not recognized by stan, though there could be other problems. I was wondering if there was an equivalent function in R and/or if there might be any additional obvious problems with the code re-write.
Many thanks,
Joe

Soga Shota

unread,

May 21, 2014, 8:28:36 PM5/21/14

to stan-...@googlegroups.com

probably a missing parenthesis or an extra one?

if (which.max(rowSegVec[i])) == k & which.max(colSegVec[j]) == l)

to

if ((which.max(rowSegVec[i])) == k & which.max(colSegVec[j]) == l)

or

if (which.max(rowSegVec[i]) == k & which.max(colSegVec[j]) == l)

Bob Carpenter

unread,

May 21, 2014, 8:31:25 PM5/21/14

to stan-...@googlegroups.com

Stan doesn't have support for any functions not in its manual.
This includes most of the R functions.

I have no idea what which.max does in R, but it's not in Stan.
Maybe someone who knows R can help you translate or you could
explain what you're trying to do.

You'll also find that Stan uses the "&&" notation for conjunction from
C++ (where "&" is bitwise and).

- Bob

Avraham Adler

unread,

May 22, 2014, 9:57:38 AM5/22/14

to stan-...@googlegroups.com

which.max in R cycles through a vector and returns the index with the largest value. To do that in Stan, you'll probably have to write an extra loop and store the value. One observation, though. RowSegVec is a vector, so RowSegVec[i] is a single observation, which.max shouldn't apply, unless I missed something.

--Avi

> To unsubscribe from this group and stop receiving emails from it, send an email to stan-users+unsubscribe@googlegroups.com.

Retzer Joe

unread,

May 22, 2014, 10:14:20 AM5/22/14

to stan-...@googlegroups.com

Thanks Avi, you are right, I need to assign and refer to rowSegVec and colSegVec without an index.

Joe

You received this message because you are subscribed to a topic in the Google Groups "Stan users mailing list" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/stan-users/LK0kGNdyGdE/unsubscribe.
To unsubscribe from this group and all its topics, send an email to stan-users+...@googlegroups.com.

Retzer Joe

unread,

May 22, 2014, 11:20:44 AM5/22/14

to stan-...@googlegroups.com

Hi all,

I re-defined my rowSegVec and colSegVec to be vectors (to be re-used within the relevant loop). I thought that an alternative to "which.max" could be as follows:

- given the dimension of the vector is say K, the max will be identified as having rank(rowSegVec) equal to K - 1.

The remainder of the code (listed below) did not change. I now get a parsing error:

TRANSLATING MODEL 'my_code' FROM Stan CODE TO C++ CODE NOW.

Error in stanc(file = file, model_code = model_code, model_name = model_name, :

failed to parse Stan model 'my_code' with error message:

EXPECTATION FAILURE LOCATION: file=input; line=35, column=6

rowSegVec ~ categorical(pi[i]);

^-- here

DIAGNOSTIC(S) FROM PARSER:

no matches for function name="categorical_log"

arg 0 type=vector

arg 1 type=vector

available function signatures for categorical_log:

0. categorical_log(int, vector) : real

unknown distribution=categorical

Parser expecting: "}"

Any advice would be appreciated,

Joe

my_code <- '

data {

int<lower=1> N;

int<lower=1> D;

int<lower=1> K; // Number of row clusters

int<lower=1> L; // Number of col clusters

int<lower=0,upper=1> x[N,D];

vector<lower=0>[K] alpha1;

vector<lower=0>[L] alpha2;

}

parameters {

real<lower=0,upper=1> theta[K,L];

simplex[K] pi[N];

simplex[L] pj[D];

}

model {

vector[K] rowSegVec;

vector[L] colSegVec;

real ps[N,D];

for(k in 1:K){ # prior on theta - binomial prob for co-clusters

for(l in 1:L){theta[k,l] ~ beta(1, 1);}

}

for(i in 1:N) { # hierarchical prior on rows

pi[i] ~ dirichlet(alpha1); # the individual class prob pi

}

for(j in 1:D) { # hierarchical prior on columns

pj[j] ~ dirichlet(alpha2); # the individual class prob pj

}

for(i in 1:N){ # likelihood

rowSegVec ~ categorical(pi[i]);

for(j in 1:D){

colSegVec ~ categorical(pj[j]);

for(l in 1:L){

for (k in 1:K){

// if (which.max(rowSegVec) == k && which.max(colSegVec) == l)

if (rank(rowSegVec[k]) == K-1 && rank(colSegVec[l]) == L-1)

increment_log_prob(x[i,j]*log(theta[k,l])+(1-x[i,j])*log1m(theta[k,l]));

}

'

On May 22, 2014, at 8:57 AM, Avraham Adler wrote:

You received this message because you are subscribed to a topic in the Google Groups "Stan users mailing list" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/stan-users/LK0kGNdyGdE/unsubscribe.
To unsubscribe from this group and all its topics, send an email to stan-users+...@googlegroups.com.

Soga Shota

unread,

May 23, 2014, 4:48:32 AM5/23/14

to

I fixed your code a little bit for you. The error said all. your return type should be int instead of vector.

Also, your use of rank is not correct (check input arguments in the manual). I do not know what is the purpose of rank function in your code, but I do not think which.max is equivalent to rank anyway.

Assuming you use rank as alternative to which.max, here is your code with my fixes(at least compiled, but not sure this is what you want).

data {

int<lower=1> N;

int<lower=1> D;

int<lower=1> K; // Number of row clusters

int<lower=1> L; // Number of col clusters

int<lower=0,upper=1> x[N,D];

vector<lower=0>[K] alpha1;

vector<lower=0>[L] alpha2;

}

parameters {

real<lower=0,upper=1> theta[K,L];

simplex[K] pi[N];

simplex[L] pj[D];

}

model {

//vector[K] rowSegVec;

int rowSegVec[K];

//vector[L] colSegVec;

int colSegVec[L];

real ps[N,D];

for(k in 1:K){ # prior on theta - binomial prob for co-clusters

for(l in 1:L){theta[k,l] ~ beta(1, 1);}

}

for(i in 1:N) { # hierarchical prior on rows

pi[i] ~ dirichlet(alpha1); # the individual class prob pi

}

for(j in 1:D) { # hierarchical prior on columns

pj[j] ~ dirichlet(alpha2); # the individual class prob pj

}

for(i in 1:N){ # likelihood

int row_max;

int col_max;

rowSegVec ~ categorical(pi[i]);

for(j in 1:D){

colSegVec ~ categorical(pj[j]);

for(l in 1:L){

for (k in 1:K){

row_max <- 0; // first candidate

col_max <- 0; // first candidate

for(kk in 1:K)

{

// not sure what to do if where are two max or more.

if(rowSegVec[kk] > row_max)

{

row_max <- kk;

}

for(ll in 1:L)

{

if(colSegVec[ll] > row_max)

{

col_max <- ll;

}

if (row_max == K-1 && col_max == L-1)

increment_log_prob(x[i,j]*log(theta[k,l])+(1-x[i,j])*log1m(theta[k,l]));

}

Enjoy!

Retzer Joe

unread,

May 24, 2014, 2:30:34 PM5/24/14

to stan-...@googlegroups.com

Hi Soga,

Thanks so much for looking at this. When I run the code you provided I still get the error

TRANSLATING MODEL 'my_code' FROM Stan CODE TO C++ CODE NOW.

Error in stanc(file = file, model_code = model_code, model_name = model_name, :

failed to parse Stan model 'my_code' with error message:

EXPECTATION FAILURE LOCATION: file=input; line=41, column=6

rowSegVec ~ categorical(pi[i]);

^-- here

DIAGNOSTIC(S) FROM PARSER:

no matches for function name="categorical_log"

arg 0 type=int[1]

arg 1 type=vector

available function signatures for categorical_log:

0. categorical_log(int, vector) : real

unknown distribution=categorical

Parser expecting: "}"

What I'd like to do is generate a categorical RV using the simplex pi (or pj). I'm assuming I'd get a vector of 0's and one "1". The position of the one in the vector would identify the row (or column) of my theta matrix from which to draw a bernoulli RV parameter. It seems I cannot get past the rowSegVec ~ categorical(pi[i]); statement.

I've tried many iterations one being to create rowSegVec and colSegVec in the generated quantities block using categorical_rng.(note: categorical_rng will not work in the model block) However the function uses a simplex vector created in the model block. Also, if I put generated quantities code before the model block I again get an error.

I'm also not able to create rowSegVec / colSegVec in the transformed parameter block as they need to be integer.

Joe

On May 22, 2014, at 11:24 PM, Soga Shota wrote:

I fixed a little bit for you. The error said all. your return type should be int instead of vector.

Also, your use of rank is not correct (check input arguments in the manual). I do not know what is the purpose of rank, but I do not think which.max is equivalent to rank anyway.

Assuming you use rank as alternative to which.max, here is your code with my fixes(at least compiled, but not sure this is what you want).

data {
int<lower=1> N;
int<lower=1> D;
int<lower=1> K; // Number of row clusters
int<lower=1> L; // Number of col clusters
int<lower=0,upper=1> x[N,D];
vector<lower=0>[K] alpha1;
vector<lower=0>[L] alpha2;
}

parameters {
real<lower=0,upper=1> theta[K,L];
simplex[K] pi[N];
simplex[L] pj[D];
}

model {

//vector[K] rowSegVec;
int rowSegVec[K];
//vector[L] colSegVec;
int colSegVec[L];

real ps[N,D];
for(k in 1:K){ # prior on theta - binomial prob for co-clusters
for(l in 1:L){theta[k,l] ~ beta(1, 1);}
}

for(i in 1:N) { # hierarchical prior on rows
pi[i] ~ dirichlet(alpha1); # the individual class prob pi
}

for(j in 1:D) { # hierarchical prior on columns
pj[j] ~ dirichlet(alpha2); # the individual class prob pj
}

for(i in 1:N){ # likelihood

int row_max;
int col_max;

rowSegVec ~ categorical(pi[i]);
for(j in 1:D){
colSegVec ~ categorical(pj[j]);
for(l in 1:L){
for (k in 1:K){

row_max <- 0; // first candidate
col_max <- 0; // first candidate

for(kk in 1:K)
{
// not sure what to do if where are two max or more.
if(rowSegVec[kk] > row_max)
{
row_max <- kk;
}
}
for(ll in 1:L)
{
if(colSegVec[ll] > row_max)
{
col_max <- ll;
}
}

if (row_max == K-1 && col_max == L-1)

increment_log_prob(x[i,j]*log(theta[k,l])+(1-x[i,j])*log1m(theta[k,l]));
}
}
}
}
}

Enjoy!

Bob Carpenter

unread,

May 24, 2014, 7:46:47 PM5/24/14

to stan-...@googlegroups.com

1. The problem is that you're trying to use an int array as
the outcome for a categorical, which Stan doesn't allow.

Stan distinguishes the categorical from the multinomial.
If theta is a K-simplex, then

y ~ categorical(theta);

assumes y is a single outcome coded as an integer in 1:K,
whereas

y ~ multinomial(theta);

assumes that y is a K-array of integer counts for each outcome.

2. The _rng functions are specified to only work in the generated
quantities block --- they're not part of the modeling and
can't be used as such.

- Bob

> You received this message because you are subscribed to the Google Groups "Stan users mailing list" group.

> To unsubscribe from this group and stop receiving emails from it, send an email to stan-users+...@googlegroups.com.

Soga Shota

unread,

May 25, 2014, 7:59:03 PM5/25/14

to stan-...@googlegroups.com

> Thanks so much for looking at this. When I run the code you provided I still get the error

I rechecked I do not get an error when I compiled my code I posted.

I used cmdstan-2.2.0 with clang++. Did you get an error at a run-time? I did not check it because I dont have data...

Anyway, see comments from Bob.

Reply all

Reply to author

Forward