Logit transformed probability parameter[1] is nan:0,

Ross Boylan

unread,

Oct 25, 2013, 2:27:57 PM10/25/13

to stan-users

Error message:
Error in function stan::prob::bernoulli_logit_log(N4stan5agrad3varE):
Logit transformed probability parameter[1] is nan:0, but must not be
nan!Rejecting proposed initial value with zero density.

What is causing this and what does nan:0 mean? When I print out the
relevant values, they also show the :0 suffix. I see the :0 in some
other output too, suggesting it indicates floating point numbers (rather
than an error).

print output:
1 theta_x=-8.28481:0
theta=[3.17291:0,0.70838:0,1.64412:0,0.960729:0,-2.74281:0,1.45019:0,-2.84672:0,-0.110114:0,-1.8066:0,-0.3441:0,-3.47609:0,0.574026:0,-1.21811:0,3.21109:0,0.584489:0,-2.99691:0,1.28483:0,-0.488034:0,-0.805839:0,0.035148:0,2.51912:0,0.652776:0,-2.47\
6:0,-0.260721:0,-2.36909:0,3.40766:0,2.68725:0,-1.36023:0]
alpha_m=-4.58418:0

But if the theta_x value is OK, what's going wrong? is -8 just too
extreme for it?

Also, does parameter[1] mean the first argument to bernoulli_logit_log?
That's just 0 or 1. mnDetail[1] = 1.

Code snippet:
for (n in 1:N) {
beta_x[n] <- (x[n] * beta)+alpha[iCluster[n]]+gamma[n];
theta_x[n] <- (x[n] * theta)+alpha_m[iCluster[n]];
// print(n, " mnDetail=", mnDetail[n], " beta_x=", beta_x[n], " theta_x=", theta_x[n]);
// print("x=", x[n], " beta=", beta, " iCluster=", iCluster[n], " alpha=", alpha[iCluster[n]],
// " gamma=", gamma[n]);
print(n, " theta_x=", theta_x[n], " theta=", theta, " alpha_m=", alpha_m[iCluster[n]]);
if (mnDetail[n] == 1)
//----> next line should be the relevant one for this case
increment_log_prob(log_sum_exp(bernoulli_logit_log(1, theta_x),
bernoulli_logit_log(0, theta_x) + poisson_log_log(0, beta_x[n])));
else {
increment_log_prob(bernoulli_logit_log(0, theta_x));
if (mnDetail[n] < 10)
increment_log_prob(poisson_log_log(mnDetail[n]-1, beta_x[n]));
else
increment_log_prob(poisson_ccdf_log(8, exp(beta_x[n])));
}

Bob Carpenter

unread,

Oct 25, 2013, 2:45:20 PM10/25/13

to stan-...@googlegroups.com

- Bob

On 10/25/13 2:27 PM, Ross Boylan wrote:
> Error message:
> Error in function stan::prob::bernoulli_logit_log(N4stan5agrad3varE):
> Logit transformed probability parameter[1] is nan:0, but must not be
> nan!Rejecting proposed initial value with zero density.
>
> What is causing this and what does nan:0 mean? When I print out the
> relevant values, they also show the :0 suffix. I see the :0 in some
> other output too, suggesting it indicates floating point numbers (rather
> than an error).

The ":0" is an artifact of the way parameters print.
Data will print without it, any parameters, transformed
parameters, or local variables in those blocks or in
the model block will be auto-diff variables. The :0 indicates
the gradient at the time of print.

We should clean up our diagnostic printing. I created an issue

https://github.com/stan-dev/stan/issues/330

but it's not prioritized because it's a lot of work given the
templating involved.

>
> print output:
> 1 theta_x=-8.28481:0
> theta=[3.17291:0,0.70838:0,1.64412:0,0.960729:0,-2.74281:0,1.45019:0,-2.84672:0,-0.110114:0,-1.8066:0,-0.3441:0,-3.47609:0,0.574026:0,-1.21811:0,3.21109:0,0.584489:0,-2.99691:0,1.28483:0,-0.488034:0,-0.805839:0,0.035148:0,2.51912:0,0.652776:0,-2.47\
> 6:0,-0.260721:0,-2.36909:0,3.40766:0,2.68725:0,-1.36023:0]
> alpha_m=-4.58418:0
>
> But if the theta_x value is OK, what's going wrong? is -8 just too
> extreme for it?

-8 isn't too extreme and should not be causing an issue.
The only time you should see the error you saw is if NaN is
passed in for the value. There's only one line in the code that
produces that error report, and it only checks for NaN inputs.

Your input shouldn't be causing an issue. I'm going to try
to create a reproducible isolated test case because with the above
theta_x value, I don't see how you can get the error message you're
getting. If I can't, I'll ask later if I can get your data to reproduce.

> Also, does parameter[1] mean the first argument to bernoulli_logit_log?
> That's just 0 or 1. mnDetail[1] = 1.

Ooh. We should change that, too. It's indexed C++ style from
0 because it's being reported through the API. I'm not sure what
the right thing to do here is, because it's broken in the API if
we do it from 0 and it's broken in the model output from the error
message. Here's another issue:

https://github.com/stan-dev/stan/issues/331

We should at least doc what's going on now.

And we should be giving you credit for the number of these
inconsistencies and errors you've brought to our attention!

>
> Code snippet:
> for (n in 1:N) {
> beta_x[n] <- (x[n] * beta)+alpha[iCluster[n]]+gamma[n];
> theta_x[n] <- (x[n] * theta)+alpha_m[iCluster[n]];
> // print(n, " mnDetail=", mnDetail[n], " beta_x=", beta_x[n], " theta_x=", theta_x[n]);
> // print("x=", x[n], " beta=", beta, " iCluster=", iCluster[n], " alpha=", alpha[iCluster[n]],
> // " gamma=", gamma[n]);
> print(n, " theta_x=", theta_x[n], " theta=", theta, " alpha_m=", alpha_m[iCluster[n]]);
> if (mnDetail[n] == 1)
> //----> next line should be the relevant one for this case
> increment_log_prob(log_sum_exp(bernoulli_logit_log(1, theta_x),
> bernoulli_logit_log(0, theta_x) + poisson_log_log(0, beta_x[n])));
> else {
> increment_log_prob(bernoulli_logit_log(0, theta_x));
> if (mnDetail[n] < 10)
> increment_log_prob(poisson_log_log(mnDetail[n]-1, beta_x[n]));
> else
> increment_log_prob(poisson_ccdf_log(8, exp(beta_x[n])));
> }

Thanks for the code and print output.

- Bob

Ross Boylan

unread,

Oct 25, 2013, 3:40:26 PM10/25/13

to stan-...@googlegroups.com, ro...@biostat.ucsf.edu

There were 100 tries (all of which failed). I only sent the last one.
The most extreme value I saw on a quick scan had aboslute value > 21.
I saw as small as 0.3.

So I guess whatever is going wrong, it's not an overflow problem.

Maybe it's something to do with being part of log_sum_exp.

Still on development code from a bit before 2.0. It's probably about
time to update now that the bugs in 2.0 are fixed.

By the way, congratulation to the whole devlopment team on your work,
and also to the fantastic help you provide. I really appreciate it.

Ross

>
> Your input shouldn't be causing an issue. I'm going to try
> to create a reproducible isolated test case because with the above
> theta_x value, I don't see how you can get the error message you're
> getting. If I can't, I'll ask later if I can get your data to reproduce.
>
>> Also, does parameter[1] mean the first argument to bernoulli_logit_log?
>> That's just 0 or 1. mnDetail[1] = 1.
>
> Ooh. We should change that, too. It's indexed C++ style from
> 0 because it's being reported through the API. I'm not sure what
> the right thing to do here is, because it's broken in the API if
> we do it from 0 and it's broken in the model output from the error
> message. Here's another issue:

The zero vs 1 based issue is perhaps a bit more confusing from R since it is 1-based.

>
> https://github.com/stan-dev/stan/issues/331
>
> We should at least doc what's going on now.
>
> And we should be giving you credit for the number of these
> inconsistencies and errors you've brought to our attention!
>
>>
>> Code snippet:
>> for (n in 1:N) {
>> beta_x[n] <- (x[n] * beta)+alpha[iCluster[n]]+gamma[n];
>> theta_x[n] <- (x[n] * theta)+alpha_m[iCluster[n]];
>> // print(n, " mnDetail=", mnDetail[n], " beta_x=", beta_x[n], " theta_x=", theta_x[n]);
>> // print("x=", x[n], " beta=", beta, " iCluster=", iCluster[n], " alpha=", alpha[iCluster[n]],
>> // " gamma=", gamma[n]);
>> print(n, " theta_x=", theta_x[n], " theta=", theta, " alpha_m=", alpha_m[iCluster[n]]);
>> if (mnDetail[n] == 1)
>> //----> next line should be the relevant one for this case
>> increment_log_prob(log_sum_exp(bernoulli_logit_log(1, theta_x),
>> bernoulli_logit_log(0, theta_x) + poisson_log_log(0, beta_x[n])));
>> else {
>> increment_log_prob(bernoulli_logit_log(0, theta_x));
>> if (mnDetail[n] < 10)
>> increment_log_prob(poisson_log_log(mnDetail[n]-1, beta_x[n]));
>> else
>> increment_log_prob(poisson_ccdf_log(8, exp(beta_x[n])));
>> }
>
> Thanks for the code and print output.
>
> - Bob
>

> --
> You received this message because you are subscribed to the Google Groups "stan users mailing list" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to stan-users+...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.

Bob Carpenter

unread,

Oct 25, 2013, 3:48:14 PM10/25/13

to stan-...@googlegroups.com

We're glad to help, especially when most of the problems
are caused by us in the first place! Taking user errors
seriously is the only way we're going to make Stan easier
to use and more robust.

I tried to recreate a simple model that shows it's not
the value being passed into bernoulli_logit that's causing
the problem. It can't be an overflow issue, because the
test is just on the raw input parameters to bernoulli_logit.

Here's my test model:

parameters {
real y;
}
transformed parameters {
real theta_x;
theta_x <- -8.28481;
}
model {
1 ~ bernoulli_logit(theta_x);
0 ~ bernoulli_logit(theta_x);
y ~ normal(0,1);
}

This runs just fine.

But it leaves me confused about how you could've gotten the error
you got. Could you send me the model and data (you don't need
to send to the list): ca...@alias-i.com. Or a smaller
reproducible example if you can create one.

Thanks,

- Bob

Ross Boylan

unread,

Oct 26, 2013, 2:30:12 PM10/26/13

to stan-users

The problem is that the arguments to bernoulli_Logit_log should have
been theta_x[n] (which is what I printed out) not theta_x!

So the only odd thing about stan's behavior is that it didn't get a type
error like "must be a real not a vector".

I suppose that, since I'm not saving theta_x[n] or beta_x[n], and since
their computation can't be vectorized, it would be better to declare
them as local variables (not vectors) inside the loop:

for (n in 1:N) {

real beta_x;
real theta_x;
beta_x <- (x[n] * beta)+alpha[iCluster[n]]+gamma[n];
theta_x <- (x[n] * theta)+alpha_m[iCluster[n]];
......

Amazing what becomes clear when you step away from a problem.

Ross

Bob Carpenter

unread,

Oct 27, 2013, 1:29:14 AM10/27/13

to stan-...@googlegroups.com

On 10/26/13 2:30 PM, Ross Boylan wrote:
> The problem is that the arguments to bernoulli_Logit_log should have
> been theta_x[n] (which is what I printed out) not theta_x!

Glad you found it.

> So the only odd thing about stan's behavior is that it didn't get a type
> error like "must be a real not a vector".

What was the operation and types that you expected to be
a problem? It may just be that they're vectorized.

> I suppose that, since I'm not saving theta_x[n] or beta_x[n], and since
> their computation can't be vectorized, it would be better to declare
> them as local variables (not vectors) inside the loop:
> for (n in 1:N) {
> real beta_x;
> real theta_x;
> beta_x <- (x[n] * beta)+alpha[iCluster[n]]+gamma[n];
> theta_x <- (x[n] * theta)+alpha_m[iCluster[n]];

That makes sense. Local variables and assignment is
cheap for real values (there's a copy involved for
vectors, matrices, arrays).

The group-level indexing in alpha[icluster[n]] really gets
in the way of converting to vector/array operations. It'd nice
to have an operation to compose alpha and icluster in
situations like this, with

compose(vals,indexes)[n] =def= vals[indexes[n]].

> Amazing what becomes clear when you step away from a problem.

:-)

- Bob

Ross Boylan

unread,

Oct 27, 2013, 4:01:07 AM10/27/13

to stan-...@googlegroups.com, ro...@biostat.ucsf.edu

On Sun, Oct 27, 2013 at 01:29:14AM -0400, Bob Carpenter wrote:
>
>
> On 10/26/13 2:30 PM, Ross Boylan wrote:
>> The problem is that the arguments to bernoulli_Logit_log should have
>> been theta_x[n] (which is what I printed out) not theta_x!
>
> Glad you found it.
>
>> So the only odd thing about stan's behavior is that it didn't get a type
>> error like "must be a real not a vector".
>
> What was the operation and types that you expected to be
> a problem? It may just be that they're vectorized.

The error seems to have been from, e.g.,
bernoulli_logit_log(1, theta_x)

If the second argument needs to a real, I would expect the compiler to complain
that it doesn't work with a vector.

It seems that instead it complained that the argument was NaN. I
suppose one could say that a vector is not a number, but that's not a
useage I would expect.

OTOH, if the second argument can be vectorized, I wouldn't expect any
complaint at all about that argument. Perhaps there would be a
complaint later in the evaluation (later in the sense of involving
terms for which bernoulli_logit_log() is just a component) from the
fact that the result of bernoulli_logit_log was a vector.

>
>> I suppose that, since I'm not saving theta_x[n] or beta_x[n], and since
>> their computation can't be vectorized, it would be better to declare
>> them as local variables (not vectors) inside the loop:
>> for (n in 1:N) {
>> real beta_x;
>> real theta_x;
>> beta_x <- (x[n] * beta)+alpha[iCluster[n]]+gamma[n];
>> theta_x <- (x[n] * theta)+alpha_m[iCluster[n]];
>
> That makes sense. Local variables and assignment is
> cheap for real values (there's a copy involved for
> vectors, matrices, arrays).
>
> The group-level indexing in alpha[icluster[n]] really gets
> in the way of converting to vector/array operations. It'd nice
> to have an operation to compose alpha and icluster in
> situations like this, with
>
> compose(vals,indexes)[n] =def= vals[indexes[n]].

I could compute

for (n in 1:N)

alpha2 <- alpha[iCluster[n]]
and then do
beta_x <- (x*beta)+alpha2+gamma
with beta_x a vector (I'm not sure if things line up exactly right,
but something like that).

Is there any advantage to that, or is it just making more work, or at
least more memory use?

I couldn't tell from the manual whether increment_log_prob can be
vectorized.

Ross

>
>> Amazing what becomes clear when you step away from a problem.
>
> :-)
>
> - Bob
>

Bob Carpenter

unread,

Oct 27, 2013, 4:27:54 PM10/27/13

to stan-...@googlegroups.com

On 10/27/13 4:01 AM, Ross Boylan wrote:
> On Sun, Oct 27, 2013 at 01:29:14AM -0400, Bob Carpenter wrote:
>>
>>
>> On 10/26/13 2:30 PM, Ross Boylan wrote:

...

> The error seems to have been from, e.g.,
> bernoulli_logit_log(1, theta_x)
>
> If the second argument needs to a real, I would expect the compiler to complain
> that it doesn't work with a vector.

This is the problem with overloading operations. This is the intended
behavior. The vectorized probability functions broadcast reals up to
vectors to match dimensions. For example, writing:

y ~ bernoulli(1, theta_x);

where y is a real and theta_x is a vector is equivalent to:

for (n in 1:size(theta_x))
y ~ bernoulli(1, theta_x[n]);

> It seems that instead it complained that the argument was NaN. I
> suppose one could say that a vector is not a number, but that's not a
> useage I would expect.

The NaN error would only arise if there's a NaN in the vector. The checks
won't confuse vectors with NaNs.

> OTOH, if the second argument can be vectorized, I wouldn't expect any
> complaint at all about that argument.

There are two points at which arguments are checked.

1. Types are checked for compatibility when the model is compiled.

2. Sizes and illegal values are checked at run time when the function
is evaluated.

...

>>> for (n in 1:N) {
>>> real beta_x;
>>> real theta_x;
>>> beta_x <- (x[n] * beta)+alpha[iCluster[n]]+gamma[n];
>>> theta_x <- (x[n] * theta)+alpha_m[iCluster[n]];
>>

...

>> The group-level indexing in alpha[icluster[n]] really gets
>> in the way of converting to vector/array operations. It'd nice
>> to have an operation to compose alpha and icluster in
>> situations like this, with
>>
>> compose(vals,indexes)[n] =def= vals[indexes[n]].
>
> I could compute
> for (n in 1:N)
> alpha2 <- alpha[iCluster[n]]
> and then do
> beta_x <- (x*beta)+alpha2+gamma
> with beta_x a vector (I'm not sure if things line up exactly right,
> but something like that).
>
> Is there any advantage to that, or is it just making more work, or at
> least more memory use?

The memory usage will be trivial. As will the time overhead in copying.
There's a bit of an advantage in the expression tree to add a scalar
to a whole vector than to each of its components, but it's not worth
going to this extreme for. That's why I'd like to have it built in.

> I couldn't tell from the manual whether increment_log_prob can be
> vectorized.

No. It has the signature

void increment_log_prob(real);

The vectorized ones have "reals" instead of "real" in the signatures
listed in the appendix. (And the "void" isn't a real Stan type --- it's
just there to let us write out increment_log_prob, which is technically
a statement.

I'll add vectorization to it. No reason it shouldn't be vectorized.
Here's the issue I just created to track our progress:

https://github.com/stan-dev/stan/issues/335

- Bob

Ross Boylan

unread,

Oct 27, 2013, 5:51:51 PM10/27/13

to stan-...@googlegroups.com, ro...@biostat.ucsf.edu

On Sun, Oct 27, 2013 at 04:27:54PM -0400, Bob Carpenter wrote:
>
>
> On 10/27/13 4:01 AM, Ross Boylan wrote:
>> On Sun, Oct 27, 2013 at 01:29:14AM -0400, Bob Carpenter wrote:
>>>
>>>
>>> On 10/26/13 2:30 PM, Ross Boylan wrote:
> ...
>
>> The error seems to have been from, e.g.,
>> bernoulli_logit_log(1, theta_x)
>>
>> If the second argument needs to a real, I would expect the compiler to complain
>> that it doesn't work with a vector.
>
> This is the problem with overloading operations. This is the intended
> behavior. The vectorized probability functions broadcast reals up to
> vectors to match dimensions. For example, writing:
>
> y ~ bernoulli(1, theta_x);
>
> where y is a real and theta_x is a vector is equivalent to:
>
> for (n in 1:size(theta_x))
> y ~ bernoulli(1, theta_x[n]);
>
>
>> It seems that instead it complained that the argument was NaN. I
>> suppose one could say that a vector is not a number, but that's not a
>> useage I would expect.
>
> The NaN error would only arise if there's a NaN in the vector. The checks
> won't confuse vectors with NaNs.
>

There's the problem. The problem code included
+ for (n in 1:N) {
+ beta_x[n] <- (x[n] * beta)+alpha[iCluster[n]]+gamma[n];
+ theta_x[n] <- (x[n] * theta)+alpha_m[iCluster[n]];
+ print(n, " theta_x=", theta_x[n], " theta=", theta, " alpha_m=", alpha_m[iCluster[n]]);
+ if (mnDetail[n] == 1)
+ increment_log_prob(log_sum_exp(bernoulli_logit_log(1, theta_x),
+ bernoulli_logit_log(0, theta_x) + poisson_log_log(0, beta_x[n])

So, on the first iteration theta_x[1] is set and every other value is
unset, presumably NaN. bernoulli_logit_log then tries to evalue all
the theta_x values, and finds the NaN.

OK. So I'll stick with what I've got since it's clearer.

>> I couldn't tell from the manual whether increment_log_prob can be
>> vectorized.
>
> No. It has the signature
>
> void increment_log_prob(real);
>
> The vectorized ones have "reals" instead of "real" in the signatures
> listed in the appendix. (And the "void" isn't a real Stan type --- it's
> just there to let us write out increment_log_prob, which is technically
> a statement.
>
> I'll add vectorization to it. No reason it shouldn't be vectorized.
> Here's the issue I just created to track our progress:
>
> https://github.com/stan-dev/stan/issues/335
>

Great. That sounds like a nice addition.

Ross

Reply all

Reply to author

Forward