adding constraints for parameters in Stan model

2,979 views
Skip to first unread message

Linas Mockus

unread,
Apr 1, 2014, 10:13:23 AM4/1/14
to stan-...@googlegroups.com
Hi,

Can anybody suggest how to add constrains different than just upper/lower bounds to a model in Stan.  Specifically my model (2-compartmental PK model in closed form) may be expressed as:
A*exp(-k1*time)+B*exp(-k2*time)+C*exp(-k3*time)
where k1,2,3>=0, A,B,C are functions of k1,2,3.
I need to enforce constraints on parameters to make sure that model has non negative values.  Or maybe there is another way of dealing with this?
Thanks for any suggestions,
Linas

Ben Goodrich

unread,
Apr 1, 2014, 11:45:51 AM4/1/14
to stan-...@googlegroups.com

Do you mean that

A * exp(-k1 * time) + B * exp(-k2 * time) + C * exp(-k3 * time) > 0

or that A, B, C > 0? If the latter, I would tend to write

A <- exp(log_A);

and make log_A the function of k1,k2,k3. If A can be negative but the whole expression has to be positive, I would again probably model the log of it. Another way would be to do something like

exp(-k1 * time) + B * exp(-k2 * time)) / exp(-k3 * time)> C;
  ...
}

Ben

Linas Mockus

unread,
Apr 1, 2014, 12:17:21 PM4/1/14
to stan-...@googlegroups.com
Thanks Ben,

I meant
A*exp(-k1*time)+B*exp(-k2*time)+C*exp(-k3*time) >=0.

Basically I would love to write the code - and don't know where to put constraint:
for (time in 1 : 8) {
    model <-
A*exp(-k1*time)+B*exp(-k2*time)+C*exp(-k3*time);
    lny [time] ~ normal (log(model),sigma);
}


Linas

Ben Goodrich

unread,
Apr 1, 2014, 12:34:04 PM4/1/14
to stan-...@googlegroups.com
In this case, I think you will be fine. Ultimately, you can use the lognormal density, so just do

y ~ lognormal(log_model,sigma);

but first you have to create the log_model array. In your case,

A * exp(-k1 * time) + B * exp(-k2 * time) + C
* exp(-k3 * time)
==
exp
(log_A) *
exp(-k1 * time) + exp(log_B) * exp(-k2 * time)+exp(log_C) * exp(-k3 * time)
==
exp
(log_A - k1 * time) + exp(log_B - k2 * time) + exp(log_C - k3 * time)

Thus, log_model is in the form of a log_sum_exp() of a vector of length 3 (which also helps with the numerical stability of this thing):

for(time in 1:8) {
  vector
[3] temp;
  temp
[1] <- log_A - k1 * time;
  temp
[2] <- log_B - k2 * time;
  temp
[3] <- log_C - k3 * time;
  log_model
[time] <- log_sum_exp(temp);
}
y
~ lognormal(log_model, sigma);

You just need to declare log_A, log_B, and log_C as parameters rather than A, B, and C.

Ben

Linas Mockus

unread,
Apr 1, 2014, 2:28:45 PM4/1/14
to stan-...@googlegroups.com
Thanks, Ben.  How to allow situations such as A,B>0 but C<0?  The plasma concentration profile may be represented as a sum of exponentials with positive and negative coefficients.

I think when I am done with this model I am willing to donate the model (standard 2-compartmental) along with anti-cancer drug data (documented in open literature) as an example of PK modeling in Stan.

Linas
--
You received this message because you are subscribed to a topic in the Google Groups "stan users mailing list" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/stan-users/fAqBYSf3u6o/unsubscribe.
To unsubscribe from this group and all its topics, send an email to stan-users+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Bob Carpenter

unread,
Apr 1, 2014, 7:04:20 PM4/1/14
to stan-...@googlegroups.com

On Apr 1, 2014, at 8:28 PM, Linas Mockus <linasm...@gmail.com> wrote:

> Thanks, Ben. How to allow situations such as A,B>0 but C<0? The plasma concentration profile may be represented as a sum of exponentials with positive and negative coefficients.

Unless you know something about the constraints on values,
the bounds on C given A and B are data dependent. You can
just do the algebra, though:

0 < A * exp(-k1 * time) + B * exp(-k2 * time) + C * exp(-k3 * time)

iff

-[A * exp(-k1 * time) + B * exp(-k2 * time)] < C * exp(-k3 * time)

iff

-[A * exp(-k1 * time) + B * exp(-k2 * time)] / exp(-k3 * time) < C

So you need to take the lower bound for C to be

max_{k1,k2,k3,time}
-[A * exp(-k1 * time) + B * exp(-k2 * time)]
/ exp(-k3 * time)

It may not actually be possible to calculate this in Stan, because
you need the bound at the point you declare C, at least to support
the direct constraint approach.


>
> I think when I am done with this model I am willing to donate the model (standard 2-compartmental) along with anti-cancer drug data (documented in open literature) as an example of PK modeling in Stan.

Thanks. That'd be great, and we'd of course provide
attribution and proper credit!

- Bob
> You received this message because you are subscribed to the Google Groups "stan users mailing list" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to stan-users+...@googlegroups.com.

Linas Mockus

unread,
Apr 1, 2014, 8:28:00 PM4/1/14
to stan-...@googlegroups.com
Thanks, Bob.  It looks I have to do algebra myself - there is no magic.

One approach folks are using is to make likelihood -e+300 when the set of parameters is infeasible.  Can this be done in Stan? For example:
likelihood <- function (theta, time, y)
{
    prob <- dnorm (y - model (theta, time), mean = 0, sd = sigma) # to model y~normal(model,sigma)
    if (theta is feasible) {
        return (sum (prob))
    } else {
        return (-e+300)
    }
}
Linas 

Michael Betancourt

unread,
Apr 2, 2014, 3:42:22 AM4/2/14
to stan-...@googlegroups.com
It’s worth nothing that from a theoretical perspective such a constraint
indicates that model has broken down — you’re ignoring interactions
or saturation effects that would yield physically reasonable results.
Consequently just about everything you do is going to be a hack.

If you want to absolutely ensure that the sum of the tree exponentials
is positive then declare a transformed parameter

real<lower=0> sum = A exp () + B exp() + C exp()

When the RHS is negative a constraint violation will be raised and
the proposal immediately rejected but it’ll be up to you to ensure
that the posterior mass with sum < 0 isn’t too large.

Bob Carpenter

unread,
Apr 2, 2014, 7:54:40 AM4/2/14
to stan-...@googlegroups.com
From a more practical perspective, if you do this:

>> if (theta is feasible) {
>> return (sum (prob))
>> } else {
>> return (-e+300)
>> }


then when theta's not feasible, you lose all connection to
any of the parameters --- you just return a constant. This
breaks all the gradient information, and makes it very hard for
HMC to get back to where it should be.

Rejecting as Michael suggests is a better approach if you can't
formulate a constraint.

I take it that the A, B, and C being outside the exp() is
critical for some reason?

- Bob

Linas Mockus

unread,
Apr 2, 2014, 8:40:11 AM4/2/14
to stan-...@googlegroups.com
Thanks, Bob and Michael.

Since A, B, C are functions of k1,k2,k3 it is possible that one or two
of them are negative. I don't know beforehand which.

I understand the reason why fake likelihood approach is unacceptable.

This approach was used with MCMCmetrop1R quite successfully. However, I
thought moving to Stan might be beneficial. In fact I would love to
compare PKBugs, MCMCmetrop1R, and Stan.

Linas
4 AM, Bob Carpenter wrote:
> From a more practical perspective, if you do this:
>
>>> if (theta is feasible) {
>>> return (sum (prob))
>>> } else {
>>> return (-e+300)
>>> }
>
> then when theta's not feasible, you lose all connection to
> any of the parameters --- you just return a constant. This
> breaks all the gradient information, and makes it very hard for
> HMC to get back to where it should be.
>
> Rejecting as Michael suggests is a better approach if you can't
> formulate a constraint.
>
> I take it that the A, B, and C being outside the exp() is
> critical for some reason?
>
> - Bob
>
> On Apr 2, 2014, at 9:42 AM, Michael Betancourt <betan...@gmail.com> wrote:
>
>> It's worth nothing that from a theoretical perspective such a constraint
>> indicates that model has broken down -- you're ignoring interactions

Linas Mockus

unread,
Apr 2, 2014, 2:04:58 PM4/2/14
to stan-...@googlegroups.com
Thanks, Bob.

Few more questions. How to specify that parameter k1!=parameter k2 in
order to reject the proposal?

Should if be in transformed parameters or model block?

Linas

On 4/2/2014 7:54 AM, Bob Carpenter wrote:
> From a more practical perspective, if you do this:
>
>>> if (theta is feasible) {
>>> return (sum (prob))
>>> } else {
>>> return (-e+300)
>>> }
>
> then when theta's not feasible, you lose all connection to
> any of the parameters --- you just return a constant. This
> breaks all the gradient information, and makes it very hard for
> HMC to get back to where it should be.
>
> Rejecting as Michael suggests is a better approach if you can't
> formulate a constraint.
>
> I take it that the A, B, and C being outside the exp() is
> critical for some reason?
>
> - Bob
>
> On Apr 2, 2014, at 9:42 AM, Michael Betancourt <betan...@gmail.com> wrote:
>
>> It's worth nothing that from a theoretical perspective such a constraint
>> indicates that model has broken down -- you're ignoring interactions

Michael Betancourt

unread,
Apr 2, 2014, 4:10:17 PM4/2/14
to stan-...@googlegroups.com
transformed parameters {
real A;
real B;
real C;
real<lower=0> sum;

A <- …;
B <- …;
C <- ..;

sum <- A * exp(-k1 * time) + B * exp(-k2 * time) + C * exp(-k3 * time);

}

When the transformed parameters are computed an exception will be thrown if sum < 0
causing an immediate rejection even before the model block is executed.

Linas Mockus

unread,
Apr 2, 2014, 4:45:23 PM4/2/14
to stan-...@googlegroups.com
Thanks a lot, Michael.

I am happy that puzzle is being solved!  What about constraint that k1 != k2 in order to reject the p.

Linas


      On 4/2/2014 4:10 PM, Michael Betancourt wrote:

Michael Betancourt

unread,
Apr 2, 2014, 5:02:24 PM4/2/14
to stan-...@googlegroups.com
What do you mean by reject the p?  In this construction the proposal is immediately
rejected when sum evaluates to a negative value.

Linas Mockus

unread,
Apr 2, 2014, 6:13:06 PM4/2/14
to stan-...@googlegroups.com
Sorry, I meant to reject the proposal.  I am trying to do some algebra to make it more efficient.  Couple of constraints come to mind, some of them >=0 type, one of constraints being k1 != k2.

t is a variable: t=t[i], i=1,...,number of measurements so I thought it would be efficient to define constraints which do not involve time.  Since we may have 100s of patients and dozens of measurements per patient I would like to make the approach scalable.
Linas
You received this message because you are subscribed to a topic in the Google Groups "stan users mailing list" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/stan-users/fAqBYSf3u6o/unsubscribe.
To unsubscribe from this group and all its topics, send an email to stan-users+...@googlegroups.com.

Michael Betancourt

unread,
Apr 2, 2014, 6:21:24 PM4/2/14
to stan-...@googlegroups.com
Yeah, that’s a measure zero event and not something you want to try to condition
on directly.  You could play some game of defining 

real<lower=0.0001> epsilon;
epsilon <- fabs(k1 - k2);

but the resulting model would be a mess in practice.

Again, it sounds like you have a very bad approximate model (linearized system of coupled ODEs?)
that does not make physical sense.  While these kinds of hacks can provide small band aids they 
won’t fix the underlying problem and I would advise investigating a better model rather than
adding more hacks like this.

Linas Mockus

unread,
Apr 2, 2014, 6:50:39 PM4/2/14
to stan-...@googlegroups.com
Actually this is a closed form solution of system of ODEs.
Linas

Michael Betancourt

unread,
Apr 2, 2014, 7:07:48 PM4/2/14
to stan-...@googlegroups.com
Then from where are the unsatisfied constrains coming?  If you have a self-consistent system
then the solutions should satisfy all of the necessary constraints tautologically and the only
issue would be one of parameterization.  Per this definition a system of ODEs that doesn’t
satisfy all of the constraints is an approximation, even if it has a closed-form solution.

As we have discussed, limiting the parameter space to where the approximate model is valid
can be feasible provided that the space of valid parameters is relatively well-behaved.  The
more constraints you add, however, are going to make the model harder and harder to
specify, let alone fit.

Were I Andrew I’d invoke the Folk Theorem, http://andrewgelman.com/2009/05/24/handy_statistic/.

Linas Mockus

unread,
Apr 3, 2014, 7:51:54 AM4/3/14
to stan-...@googlegroups.com
Michael,

Model is good as long as parameters are feasible.  For example if we take a bathtub and parameters are water rates entering and leaving the bathtub and the rate of leaving bathtub is higher than the rate entering bathtub then bathtub will get dry even if it was full initially.  Assuming constant parameters solution of ODE which describes this process will eventually lead to a negative level.  This is in essence 1-compartmental model.  2-compartmental model is a bit trickier.  So assumption  that rate parameters are constant is wrong.  Therefore we end up with enforcing constraint that the level in bathtub (or plasma concentration) > 0. Kind of hack... I am looking for an elegant solution - perhaps it exists, I don't know. This should be more general than PK modeling.
Linas

Michael Betancourt

unread,
Apr 3, 2014, 8:35:40 AM4/3/14
to stan-...@googlegroups.com
Right, this is what I was saying.  You have a model that is not consistent with the physical system outside a certain neighborhood
of parameter space.  Within that neighborhood fits are okay, but if the data pushes you outside of that neighborhood then (a) you
have strong evidence that the model validity has broken down an (b) your fitting will start to perform poorly.

Now you can ignore the conflict between the data and the model validity (not recommended, of course) and try to force inference
within the valid neighborhood, but this will only work okay in practice if the valid neighborhood is relatively well-behaved.  For example,
if a single parameter is constrained to be in a sweet spot then it's easy enough to constrain that one parameter and not limit performance
much.

But the constraints you have discussed are not simple.  The positivity constraint induces highly non-linear constraints in the parameter
space, and introducing singular k1 != k2 constraints will induce further pathologies.

Yes, you can hack all of this stuff into a Stan model but the resulting posterior is going to become ugly, and performance is going to suffer
dramatically and this is why I am recommending against it.  It's the Folk Theorem all over again.

Linas Mockus

unread,
Apr 3, 2014, 9:05:31 AM4/3/14
to stan-...@googlegroups.com
I wholeheartedly agree.  But I am stuck with the model.  Perhaps constraint A*exp(-k1*t)+...>=0 for all t is acceptable solution?  Really I am not interested in the extrapolation outside experimental time points (variable t).  How code below looks like:
transformed parameters {
    vector<lower=0> sum [N];

    real A;
    real B;
    real C;
    A<-fA(k1,k2,k3);
    B<-fB(k1,k2,k3);
    C<-fC(k1,k2,k3);
    for (i in 1 : N)
        sum[i]<-A*exp(-k1*t[i])+B*exp(-k2*t[i])+C*exp(-k3*t[i]);
}
model {
    for (i in 1 : N)
        y[i]~lognormal(log(sum[i]),sigma)
}
   
   
Linas

Michael Betancourt

unread,
Apr 3, 2014, 9:19:50 AM4/3/14
to stan-...@googlegroups.com
This will indeed impose the desired constraint, but there is no guarantee that the resulting model will fit well.
If you have any appreciable posterior mass near the constraint then you are going to have terrible performance
and there's not much Stan can do about that.


Bob Carpenter

unread,
Apr 3, 2014, 12:10:10 PM4/3/14
to stan-...@googlegroups.com
You can do the transforms yourself, but you should be
able to get away with letting A and B be free and only
constraining C so that the whole thing is OK.

I'd suggest just trying it without the constraint to see
where the solution lies. If, as Michael says, it wants to
be outside the constraint boundary, then you have a problem
with the model/data. The problem is that if you constrain the
parameters to be inside the boundary, they'll try to butt up
against the constraints, which is problematic for sampling (the scales
are difficult because the parameter wants to go to infinity on
the unconstrained space).

- Bob

Linas Mockus

unread,
May 5, 2014, 8:11:45 PM5/5/14
to stan-...@googlegroups.com
Bob,

Attached are the R+stan files for a 2 compartmental modeling: first hierarchical for a population then individual for each patient. 

Any suggestions how to improve the model are welcome.  The data is tabulated in the thesis https://www.duo.uio.no/bitstream/10852/12176/2/HOVEDOPPGAVE.pdf.

Linas
CsA_data_Storehagen_2007.csv
cyclosphamide2comp_multerr_h.r
2comp_multerr_hn.stan
2comp_multerrn.stan
util.r
Reply all
Reply to author
Forward
0 new messages