real to integer

Krzysztof Sakrejda

unread,

Jan 14, 2015, 1:14:58 PM1/14/15

to stan-...@googlegroups.com

I'd like to write a function like this:

int tail_repeat(int x, int top, int period) {
  int z;
  if (x > top)
    z <- top - period + 1 + fmod(x-top-1, period);
  else
    z <- x;
  return z;
}

Given the manual, it understandably gives an error (below) because fmod returns a real, promoting top/period to real and the assignment to 'z' fails.

--- Translating Stan model to C++ code ---
bin/stanc /home/krzysiek/projekty/westbrook_ats_survival/analysis/cjs-finalize-2/cjs.stan --o=/home/krzysiek/projekty/westbrook_ats_survival/analysis/cjs-finalize-2/cjs.cpp --no_main
Model name=cjs_model
Input file=/home/krzysiek/projekty/westbrook_ats_survival/analysis/cjs-finalize-2/cjs.stan
Output file=/home/krzysiek/projekty/westbrook_ats_survival/analysis/cjs-finalize-2/cjs.cpp

SYNTAX ERROR, MESSAGE(S) FROM PARSER:
base type mismatch in assignment; variable name = z, type = int; right-hand side type=real

ERROR at line 6
  4:        int z;
  5:      if (x > top) 
  6:          z <- top - period + 1 + fmod(x-top-1, period);
                                                          ^
  7:        else

Is it possible to rewrite this function in Stan to avoid this?

Here's output from a working R version:

> tail_repeat(1:20, 9, 4)
 [1] 1 2 3 4 5 6 7 8 9 6 7 8 9 6 7 8 9 6 7 8

The use case is for an 'age' index in a model of a seasonal system, so there are a few distinct classes for ages 1-9, and then age '10' is most like age 6, age 11 is like age 7, etc...

As I write this I realize I could pre-process the data in R, but the question about how to do it in Stan remains.

Krzysztof

Bob Carpenter

unread,

Jan 14, 2015, 2:05:30 PM1/14/15

to stan-...@googlegroups.com

There are no functions that convert reals to integers.
That was on purpose, because it loses derivative information.
In some limited cases it can be done in a loop or with an
if_else, but I wouldn't recommend it.

- Bob

> --
> You received this message because you are subscribed to the Google Groups "Stan users mailing list" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to stan-users+...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Krzysztof Sakrejda

unread,

Jan 14, 2015, 2:26:40 PM1/14/15

to stan-...@googlegroups.com

Ok, that does leave the absurd while-loop/test implementation... oh well, it's only
a small number of iterations. My case is a data transformation so it doesn't really
loose the derivative information.

Is this just to keep users from doing silly things? I realize that can be a headache
for you guys but... you're already letting users write near-arbitrary likelihood functions...

Either way, thanks!

Krzysztof

Bob Carpenter

unread,

Jan 14, 2015, 2:39:28 PM1/14/15

to stan-...@googlegroups.com

> On Jan 14, 2015, at 2:26 PM, Krzysztof Sakrejda <krzysztof...@gmail.com> wrote:
>
> Ok, that does leave the absurd while-loop/test implementation... oh well, it's only
> a small number of iterations. My case is a data transformation so it doesn't really
> loose the derivative information.
>
> Is this just to keep users from doing silly things? I realize that can be a headache
> for you guys but... you're already letting users write near-arbitrary likelihood functions...
>

Yes, and I have to say the dev team's split on whether to
allow this kind of thing or not.

The perceived problem is that our users understand stats
and probability, but not necessarily HMC-based computation.
So what can make sense mathematically can kill the performance
of the sampler.

This would be another argument for some data-only operations.
Right now, there are a couple places where expressions have to
be data only: some of the args to the ODE solver, and the
size declarations for matrices and arrays.

- Bob

Krzysztof Sakrejda

unread,

Jan 14, 2015, 3:08:09 PM1/14/15

to stan-...@googlegroups.com

On Wednesday, January 14, 2015 at 2:39:28 PM UTC-5, Bob Carpenter wrote:

> On Jan 14, 2015, at 2:26 PM, Krzysztof Sakrejda <krzysztof...@gmail.com> wrote:
>
> Ok, that does leave the absurd while-loop/test implementation... oh well, it's only
> a small number of iterations. My case is a data transformation so it doesn't really
> loose the derivative information.
>
> Is this just to keep users from doing silly things? I realize that can be a headache
> for you guys but... you're already letting users write near-arbitrary likelihood functions...
>

Yes, and I have to say the dev team's split on whether to
allow this kind of thing or not.

The perceived problem is that our users understand stats
and probability, but not necessarily HMC-based computation.
So what can make sense mathematically can kill the performance
of the sampler.

I can see the wisdom of saving people from themselves and the
statistics/HMC distinction there... but for completeness and
with things that can kill performance firmly in mind:

int floor_real_to_int(real x) {
  real y;
  int z;
  y <- floor(x);
  if (y == 0.0) {
    z <- 0;

  if (y > 0) {
    z <- 0;
    while (z < y) {
      z <- z + 1;
    }
  }
  if (y < 0) {
    z <- 0;
    while (z > y) {
      z <- z - 1;
    }
  }
  return z;
}

Based on my understanding of Stan auto-diff this should be
safe to use to generate integers for indexing into parameters as long as
the argument (x) is data or transformed data but _not_ otherwise.

If this is used to generate integer transformations of things that get auto-diffed (parameters,
transformed parameters, or local variables) then you're introducing discontinuities
into the posterior and are likely to kill HMC performance.

Krzysztof

Bob Carpenter

unread,

Jan 14, 2015, 3:28:30 PM1/14/15

to stan-...@googlegroups.com

This is simpler, but still inefficient:

int int_floor(real x) {
int i;
i <- 0;
if (x >= 0) while (x > i + 1) i <- i + 1;
else while (x < i) i <- i - 1;
return i;
}

But it may still not terminate if x > max integer value.

To write a really robust and efficient version, you'd need
to test the outside values and throw an exception, and
then use a binary search on the inside to make the whole
thing logarithmic time complexity.

- Bob

Krzysztof Sakrejda

unread,

Jan 14, 2015, 3:53:37 PM1/14/15

to stan-...@googlegroups.com

On Wednesday, January 14, 2015 at 3:28:30 PM UTC-5, Bob Carpenter wrote:

This is simpler, but still inefficient:

int int_floor(real x) {
int i;
i <- 0;
if (x >= 0) while (x > i + 1) i <- i + 1;
else while (x < i) i <- i - 1;
return i;
}

But it may still not terminate if x > max integer value.

If somebody wants to cut and paste from a mailing list post into their model
without reading the rest of the post who are we to stop them?

To write a really robust and efficient version, you'd need
to test the outside values and throw an exception, and
then use a binary search on the inside to make the whole
thing logarithmic time complexity.

Hopefully the mere sight of the current implementation will make people stop
and think before trying large values of x... (?)

Krzysztof

Cody Ross

unread,

Feb 27, 2015, 7:37:54 PM2/27/15

to stan-...@googlegroups.com

I am trying to resample from a vector (with replacement) in the generated quantities block. It would be nice to be able to use an rng for a discrete uniform, but none come with Stan. I was trying to build my own, and got stuck by the floor function being typed to real only. Can't we introduce an integer-typed version for use in the generated quantities block only? It seems like that would be a more elegant fix, than having to run:

int int_floor(real x) {
    int i;
    i <- 0;
    if (x >= 0) while (x > i + 1) i <- i + 1;
    else while (x < i) i <- i - 1;
    return i;
  }

for every evaluation.

Bob Carpenter

unread,

Feb 27, 2015, 10:10:34 PM2/27/15

to stan-...@googlegroups.com

If you want to sample from a uniform discrete distribution from 1 to N,
I'd suggest:

y <- categorical_rng(rep_vector(0,N) / N);

Then if you need to move to a different starting point, just add an integer.

If you'd like to file a feature request for a discrete uniform RNG,
it wouldn't be hard to add. We could also add the pmf and cdfs for
consistency, though these aren't very useful anywhere in a Stan program
other than to add normalizing constants, because we don't have integer
parameters. We're pretty backed up at the moment, though.

- Bob

Cody Ross

unread,

Feb 27, 2015, 10:25:21 PM2/27/15

to stan-...@googlegroups.com

Thanks for that method of making the discrete uniform. I trying to do something less elegant with floor_int(uniform_rng(A,B+1)) that was probably a lot slower.

The discrete uniform dist is not urgent, but is there any reason to keep the gradient destroying functions typed to reals in the gen. quant. block?

Bob Carpenter

unread,

Feb 27, 2015, 10:35:50 PM2/27/15

to stan-...@googlegroups.com

> On Feb 28, 2015, at 2:25 PM, Cody Ross <ctr...@ucdavis.edu> wrote:
>
> Thanks for that method of making the discrete uniform. I trying to do something less elegant with floor_int(uniform_rng(A,B+1)) that was probably a lot slower.
>
> The discrete uniform dist is not urgent, but is there any reason to keep the gradient destroying functions typed to reals in the gen. quant. block?

The ones that return reals also destroy gradients, so there's
really no reason not to include integer based versions other than
that we don't want to mislead users into doing things that cause
problems. (They'd be an "attractive nuisance" in American legal
terms (not to imply our users are children!).)

We have the information in the parser to easily do this,
but the biggest downside is user-facing complexity in
the doc, which will probably lead to confusion, though restricting
the _rng functions hasn't seemed to have been a problem.

- Bob

Cody Ross

unread,

Feb 27, 2015, 11:09:50 PM2/27/15

to stan-...@googlegroups.com

BTW... Should

y <- categorical_rng(rep_vector(0,N) / N)

be

y <- categorical_rng(rep_vector(1,N) / N)

or does it matter?

Bob Carpenter

unread,

Feb 27, 2015, 11:54:31 PM2/27/15

to stan-...@googlegroups.com

Yes, it should definitely be 1. Thanks for the correction.

- Bob

Reply all

Reply to author

Forward