Diagonality constraints on the pairwise weights during CRF training

21 views

Skip to first unread message

Seong Joon Oh

unread,

Feb 3, 2016, 4:29:15 AM2/3/16

to pystruct

I'm using EdgeFeatureGraphCRF model to learn unary (w_u) and pairwise (w_p) parameters.

The problem is that the number of states is relatively large (e.g. 500 classes) compared to the number of examples to train all the pairwise parameters w_p (~10 node examples per class to train 500x500x1 dimensional w_p).

Therefore, I would like to put a diagonality constraint for w_p during CRF training in such a way that w_p consists of only two parameters: alpha (diagonal entries) and beta (off-diagonal entries).

My question is

1) Has parameter training with linear constraints already been implemented in pystruct?

2) Do you have other suggestions to reduce dimensionality of w_p or to prevent overfitting?

Andreas Mueller

unread,

Feb 3, 2016, 1:21:11 PM2/3/16

to pyst...@googlegroups.com

On 02/03/2016 04:29 AM, Seong Joon Oh wrote:
> I'm using EdgeFeatureGraphCRF model to learn unary (w_u) and pairwise
> (w_p) parameters.
> The problem is that the number of states is relatively large (e.g. 500
> classes) compared to the number of examples to train all the pairwise
> parameters w_p (~10 node examples per class to train 500x500x1
> dimensional w_p).
> Therefore, I would like to put a diagonality constraint for w_p during
> CRF training in such a way that w_p consists of only two parameters:
> alpha (diagonal entries) and beta (off-diagonal entries).
>
> My question is
>
> 1) Has parameter training with linear constraints already been
> implemented in pystruct?

There is currently no way to specify additional constraints. But I'm not
sure that's the way I would go. I would rewrite the model in such a way
that the parameters of the model are just the diagonal and off-diagonal
entries.
Then you don't need to add constraints in the optimization.

> 2) Do you have other suggestions to reduce dimensionality of w_p or to
> prevent overfitting?

You could do a low-rank matrix (though that is a non-linear
parametrization and might be trickier), or use a different form of
feature representation.

If you use EdgeFeatureGraphCRF, do you use only a single edge feature?
Otherwise w_p would be 500x500xn_features.

How to parameterize your model also depends a lot on the semantics of
the edges. You propose to more or less learn a Pott's model.
If that is sensible depends on your application.

If you only have a single or two pair-wise parameters, it might also be
easier to just do a brute-force search instead of learning a structured SVM.

Cheers,
Andy

Reply all

Reply to author

Forward

0 new messages