Adding dummy labels

44 views
Skip to first unread message

Giulia

unread,
Jan 7, 2017, 12:49:43 PM1/7/17
to pystruct
I want to be able to add a dummy label for each substructure such that if this label is chosen then I pay a lower cost. As an example, suppose we use the OCR letters, I want to be able to add a dummy label on top of the 26 letters. Note that in particular, this dummy label will never appear in the training or test data.  The caveat is that if I choose to label a substructure according to this dummy label, then I pay a different price than if I had chosen a usual label. Say we take the Hamming loss L(y,y') where y' is the true label from the training data.  Suppose that on the kth substructure, I choose the dummy label so y_k=dummy label, then I won't pay the usual price 1_{y_k\neq y'_k}, but instead pay 0.05* 1_{y_k\neq y'_k}.  I have never used pystruct before, but it seems to me that all I have to do is simply 

ChainCRF(n_states=27,class_weight=[1]*26+[0.05])


I noticed that there is a flag that catches when your data does not have the same number of classes than what you specify in n_states, but after removing this flag would the algorithm run as intended, that is will the algorithm run as if there were 27 letters instead of 26 and assign the different weight to the Hamming loss if you choose the dummy label?

Thank you very much in advance for the help! 

JL Meunier

unread,
Feb 17, 2017, 10:22:53 AM2/17/17
to pystruct
Hi Giulia,

I'm curious: have you tried?

Anyway, I would say that if this extra label does never appear in the training set, then its (internal) weight vector will be set to 0 by the regularization mechanism at training time. 
So at test time, this label will never be predicted (since giving to a node this label would contribute 0 to the graph energy)

JL

Giulia

unread,
Feb 27, 2017, 10:51:04 AM2/27/17
to pystruct
Hi JL, 

yes I did try and the the final test scores did change so something is clearly happening, but it is unclear what exactly.  Do you have any better ideas on how to do this easily in pystruct?

Thank you,
Giulia

JL Meunier

unread,
Feb 28, 2017, 5:16:27 AM2/28/17
to pystruct
Hi Giulia,

from the top of my head... what about modifying your training data so that it contains reasonable evidences of the dummy label? For instance by forging pure noise character images?

JL

Giulia

unread,
Feb 28, 2017, 10:01:24 AM2/28/17
to pystruct
Hi JL, 

unfortunately, that won't work for what I am trying to do. In essence, the dummy class represents a type of example that is hard to learn so I can't generate examples for it even if they are just pure noise since these examples that are hard to learn should come from the true distribution of your data. 

Thank you,
Giulia
Reply all
Reply to author
Forward
0 new messages