How could we make use of the dummy label in classification tasks?

phiradet.ba...@gmail.com

unread,

Apr 10, 2015, 12:15:20 AM4/10/15

to junto...@googlegroups.com

Hello

Thank you very much for your great SSL algorithm and its implementation.

I would like to employ MAD for a classification task. I wondering about the way to interpret the dummy label in classification tasks. In a ranking problem, I have ranked instances with a high score of the dummy label at the bottom of the result. By the way, I have no idea of a good way to use the dummy label in the context of classification. If the dummy label get the highest score, we may simply ignore the dummy label to answer the next label. However, I think that it is not a good way because we gain nothing from introducing the dummy label to MAD. I am wondering if you have any suggest about the way to deal with the dummy label in classification tasks.

Thank you very much,

Phiradet

Partha Pratim Talukdar

unread,

Apr 11, 2015, 11:54:19 PM4/11/15

to junto...@googlegroups.com

On Fri, Apr 10, 2015 at 9:45 AM, <phiradet.ba...@gmail.com> wrote:

I would like to employ MAD for a classification task. I wondering about the way to interpret the dummy label in classification tasks. In a ranking problem, I have ranked instances with a high score of the dummy label at the bottom of the result. By the way, I have no idea of a good way to use the dummy label in the context of classification. If the dummy label get the highest score, we may simply ignore the dummy label to answer the next label. However, I think that it is not a good way because we gain nothing from introducing the dummy label to MAD. I am wondering if you have any suggest about the way to deal with the dummy label in classification tasks.

There can be multiple ways to use the dummy label. For example, you could think that the algorithm uses the dummy label to express confidence. So, any label ranked below the dummy label should not be considered. This way you get a threshold automatically (e.g., in multi label classification), without having to tune for one. The dummy label is also used by MAD to match the regularization target associated with abandonment probability.

Hope this helps,

Partha

phiradet.ba...@gmail.com

unread,

Apr 16, 2015, 12:36:06 AM4/16/15

to junto...@googlegroups.com, par...@talukdar.net

Thank you very much for your opinion. By the way, I wondering whether you considered the dummy label when you calculated MRR (Eq.4) in the paper titled "Experiments in Graph-based Semi-Supervised Learning Methods for
Class-Instance Acquisition".

Best regards,

Phiradet

Partha Pratim Talukdar

unread,

Apr 16, 2015, 2:29:58 AM4/16/15

to junto...@googlegroups.com

On Thu, Apr 16, 2015 at 10:06 AM, <phiradet.ba...@gmail.com> wrote:

Thank you very much for your opinion. By the way, I wondering whether you considered the dummy label when you calculated MRR (Eq.4) in the paper titled "Experiments in Graph-based Semi-Supervised Learning Methods for
Class-Instance Acquisition".

IIRC, I think the dummy labels were ignored during MRR scoring. If this is critical, then let me know and I can check further.

Partha

phiradet.ba...@gmail.com

unread,

Apr 16, 2015, 10:32:48 PM4/16/15

to junto...@googlegroups.com

It is not so critical. I am just trying to get more insight into the real advantage of the dummy label in the context of classification. The claim that "LP-ZGL is underregularized, its model parameters are not constrained enough, compared to MAD (Eq. 3, specifically the third term), resulting in overfitting in case of highly connected graphs" in the ACL2010 paper is very interesting clue. I am wondering the dummy label or the L2 regularization help MAD to avoid overfitting because the third term becomes the sum of squares of \hat{Y}, or L2-norm, in the case of the possible labels. Then, we may be able to achieve a equivalent result by introducing the L2-regularization only, or || \hat{Y}_l ||_2^2.

Best regards,

Phiradet

Partha Pratim Talukdar

unread,

Apr 23, 2015, 3:39:53 AM4/23/15

to junto...@googlegroups.com

On Fri, Apr 17, 2015 at 8:02 AM, <phiradet.ba...@gmail.com> wrote:

Then, we may be able to achieve a equivalent result by introducing the L2-regularization only, or || \hat{Y}_l ||_2^2.

Yes, that is one form of regularization where your are setting the regularization targets (R matrix in MAD, minus the dummy label) to 0. But in many cases, you might want different regularization targets (say due to varying priors), in which case you want to be able to handle arbitrary R as in MAD.