equality constraints

273 views
Skip to first unread message

Daniel Berman

unread,
Jun 24, 2015, 9:39:38 AM6/24/15
to stan-...@googlegroups.com
Hi,

I'm looking to set an equality constraint on one of my parameters, which is a matrix. There is a physicality constraint on the parameter such that some function of each matrix element must sum to zero. The function itself is easy to program, but I was wondering if there's an easy way to incorporate this constraint?

Ben Goodrich

unread,
Jun 24, 2015, 10:51:37 AM6/24/15
to stan-...@googlegroups.com, daniel....@gmail.com
On Wednesday, June 24, 2015 at 9:39:38 AM UTC-4, Daniel Berman wrote:
I'm looking to set an equality constraint on one of my parameters, which is a matrix. There is a physicality constraint on the parameter such that some function of each matrix element must sum to zero. The function itself is easy to program, but I was wondering if there's an easy way to incorporate this constraint?

Only for very simple explicit functions where one element of the vector / matrix is conditionally deterministic given the other elements of the vector / matrix. For example, a sum-to-zero constraint is easy
parameters{
  vector
[K-1] theta_free;
 
...
}
model
{
  vector
[K] theta;
 
for (k in 1:(K-1)) theta[k] <- theta_free[k];
  theta
[K] <- -sum(theta_free);
 
...
}

For implicit functions, we have talked about it a lot but have nothing currently. In general, Stan is not really designed for containers to have some unknown elements and some (conditionally) known elements.

Ben

Daniel Berman

unread,
Jun 24, 2015, 12:11:22 PM6/24/15
to stan-...@googlegroups.com
Could you elaborate a bit on the example code? I'm not sure how you implement the check of the sum.

Bob Carpenter

unread,
Jun 24, 2015, 12:28:38 PM6/24/15
to stan-...@googlegroups.com
By setting theta[K] to -sum(theta_free), you guarantee
that sum(theta) = 0.

You can then use thetain the model as a "sum to 0"
constrained K-vector.

- Bob

> On Jun 24, 2015, at 12:11 PM, Daniel Berman <daniel....@gmail.com> wrote:
>
> Could you elaborate a bit on the example code? I'm not sure how you implement the check of the sum.
>
> --
> You received this message because you are subscribed to the Google Groups "Stan users mailing list" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to stan-users+...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Daniel Berman

unread,
Jun 24, 2015, 1:17:28 PM6/24/15
to stan-...@googlegroups.com
Is that an example where the real parameter of interest is theta and you know that theta[K] has to be equal to -sum(theta[1:K-1])? This doesn't place any constraints on theta[1:K-1]. Correct?

Ben Goodrich

unread,
Jun 24, 2015, 1:52:05 PM6/24/15
to stan-...@googlegroups.com, daniel....@gmail.com
On Wednesday, June 24, 2015 at 1:17:28 PM UTC-4, Daniel Berman wrote:
Is that an example where the real parameter of interest is theta and you know that theta[K] has to be equal to -sum(theta[1:K-1])? This doesn't place any constraints on theta[1:K-1]. Correct?

Correct, although you might be able to further constrain the first K - 1 elements. 

Daniel Berman

unread,
Jun 24, 2015, 3:44:10 PM6/24/15
to stan-...@googlegroups.com, daniel....@gmail.com
I'm not sure if this would work as a constraint mechanism but my plan is to create a new parameter in the transformed parameter section, and in that transformed parameter (a matrix), I'll set <lower, upper> close to zero so that it will have elements close to 0. Does that sound feasible?
Here is the section of the code relevant to my question. Much of the actual code is missing because there's a lot of it.

data{
  int numberOfGates; #number of gates
  int lenPauliBasis;
  int chiDimensions;
  int chiDimensionsDoubled; #dimensions of a gate
  matrix[chiDimensions,chiDimensions] Sigma3; #random matrix.

}

parameters{
 cholesky_factor_cov[chiDimensionsDoubled] chi[numberOfGates];
}

transformed parameters{
    matrix<lower=-.00001,upper=.00001> [numberOfGates,lenPauliBasis] constraintMat_real;
    matrix<lower=-.00001,upper=.00001> [numberOfGates,lenPauliBasis] constraintMat_imag;
    real kronDelt;

    for (k in 1:numberOfGates){
        for (r in 1:lenPauliBasis){
            for (m in 1:chiDimensions){
                for (n in 1:chiDimensions){
                    kronDelt<-0;
                    if (r==1){
                        kronDelt<-1;
                    }
                    constraintMat_real[k,r]<-constraintMat_real[k,r]+chi[k][m,n]*pauliTripleTrace_real[r][m,n]-kronDelt;
                    constraintMat_imag[k,r]<-constraintMat_imag[k,r]+chi[k][m+chiDimensions,n+chiDimensions]*pauliTripleTrace_imag[r][m,n];
                }
            }
        }
    }
}

model{
    vector[numberOfGates] w;
    w<-rep_vector(0,numberOfGates);
   
    constraintMat_real~multi_gp(Sigma3,w);
    constraintMat_imag~multi_gp(Sigma3,w);
....
}

Ben Goodrich

unread,
Jun 24, 2015, 4:17:16 PM6/24/15
to stan-...@googlegroups.com, daniel....@gmail.com
On Wednesday, June 24, 2015 at 3:44:10 PM UTC-4, Daniel Berman wrote:
I'm not sure if this would work as a constraint mechanism but my plan is to create a new parameter in the transformed parameter section, and in that transformed parameter (a matrix), I'll set <lower, upper> close to zero so that it will have elements close to 0. Does that sound feasible?

No, but this is not explained so well in the manual. Any lower=a, upper=b constraints in the parameters block are "substantive" in the sense that they preclude any proposal for the parameters from violating those constraints. But any lower=c, upper=d constraints in the transformed parameters block are only useful to catch mistakes on the part of the user who intended for the lower=c, upper=d constraints on the transformed parameters to be satisfied for every admissible proposal for the objects in the parameters block.

If 0 < p < 1 is the proportion of the posterior distribution where all the constraints in the transformed parameters block are simultaneously satisfied, then in principle, you should divide the posterior distribution by p to account for the truncation. But if p < 1, then in general we do not know what value p is (there are some univariate exceptions where you can use the T[,] notation to enforce the truncation). So, if you are using the lower=c, upper=d constraints in the transformed parameters block "substantively" (the same holds for reject() statements), then the posterior distribution Stan is drawing from does not correspond the posterior distribution in your mind.

Also, Stan will have a very difficult time sampling efficiently from the (wrong) posterior distribution unless p is very close to 1 (in which case it is not that wrong). But I doubt that would hold in your case or most cases.

Ben

Daniel Berman

unread,
Jun 24, 2015, 4:27:19 PM6/24/15
to stan-...@googlegroups.com, daniel....@gmail.com
If I'm understanding this correctly, then it seems like there isn't really a way to implement the constraint I need. Would you agree?

Ben Goodrich

unread,
Jun 24, 2015, 4:44:17 PM6/24/15
to stan-...@googlegroups.com, daniel....@gmail.com
On Wednesday, June 24, 2015 at 4:27:19 PM UTC-4, Daniel Berman wrote:
If I'm understanding this correctly, then it seems like there isn't really a way to implement the constraint I need. Would you agree?

As far as I can tell, I don't see how to enforce such a constraint, but I don't know very much about your model. The key is to try to find a 1-to-1 mapping between the real numbers and the admissible values of the parameters. This is how Stan works pretty well for constrained types such as simplexes, correlation matrices, and unit vectors.

Ben

Michael Betancourt

unread,
Jun 24, 2015, 5:01:28 PM6/24/15
to stan-...@googlegroups.com, daniel....@gmail.com
As Ben notes the right answer is to reparameterize your model.

When you have a holonomic constraint of the form f(x_1, …, x_n) = 0
then you don’t have n degrees of freedom but rather (n - 1).  If you
try to sample from the over constrained space, (x_1, …, x_n), directly
then you’ll spend an infinite amount of time trying to perturb your initial
state into one that satisfies the constraint*.

So what you really want to do is find (n - 1) parameters, (y_1, …, y_{n-1}),
and a map g : x -> y, such that g \circ f(y_1, …., y_{n-1}) = 0 exactly.
This is what Bob was recommending when he suggested taking any

y_1, …, y_{n - 1}

and define the map

x_{i} = y_{i}, i < n
x_{n} = - sum_{j = 1}^{n} x_{j}.

By construction you always satisfy the constraint.

The only tricky bit that you’d a have to worry about is that if you want to
assign a prior on x space then you’ll have to keep the Jacobian in mind,
as

p(y) = p(x(y)) | dx / dy |.

* There’s a clever geometric way to handle such constraints, which
is what the Shake and Rattle integrators exploit to simulate non-Hamiltonian
systems, but we don’t do anything like this in Stan at the moment.

Daniel Berman

unread,
Jun 24, 2015, 5:30:07 PM6/24/15
to stan-...@googlegroups.com, daniel....@gmail.com
It might not be evident looking at my code, but I do have the addition of a kronecker delta function. Could that be used for re-parametrization?

The constraint might be a bit difficult to piece together in the form I wrote it. I have an array of K, m by n matrices G that I am trying to estimate. For the purpose of estimation they are transformed into a form χ, that has very nice properties. I have K matrices G so there are K χ matrices. P is also an array of matrices and there are a total of R matrices.

Sum of m,n [(χ_G)_mn * Tr{P_m*P_r*P_n} − δ_1r = 0], ∀ G ∈ G

Do you think I could use simplexes for all k and r since I know that they have to either sum up to 1 or zero? If so, do you think that would be difficult to implement?

Michael Betancourt

unread,
Jun 24, 2015, 5:48:28 PM6/24/15
to stan-...@googlegroups.com
I’m not sure if simplices would be appropriate here as they require that
the parameters not just sum to one but also be positive.

It looks like you’re trying to define a projection operator, in which case
the unconstrained space that you want to sample from is exactly the
image of the projection operator and likely has a very natural physical
interpretation.  My guess is that thinking about it this way would
motivate a much cleaner implementation of the model.

Bob Carpenter

unread,
Jun 24, 2015, 5:53:39 PM6/24/15
to stan-...@googlegroups.com
Correct.

The constraints on parameters are fully stated in their
declarations. Transformed parameters are just functions of
parameters. The transformation here turns an unconstrained
(K - 1)-vector into a constrained K-vector.

- Bob


> On Jun 24, 2015, at 1:17 PM, Daniel Berman <daniel....@gmail.com> wrote:
>
> Is that an example where the real parameter of interest is theta and you know that theta[K] has to be equal to -sum(theta[1:K-1])? This doesn't place any constraints on theta[1:K-1]. Correct?
>
Reply all
Reply to author
Forward
0 new messages