It is not so critical. I am just trying to get more insight into the real advantage of the dummy label in the context of classification. The claim that "LP-ZGL is underregularized, its model parameters are not constrained enough, compared to MAD (Eq. 3, specifically the third term), resulting in overfitting in case of highly connected graphs" in the ACL2010 paper is very interesting clue. I am wondering the dummy label or the L2 regularization help MAD to avoid overfitting because the third term becomes the sum of squares of \hat{Y}, or L2-norm, in the case of the possible labels. Then, we may be able to achieve a equivalent result by introducing the L2-regularization only, or || \hat{Y}_l ||_2^2.
Best regards,
Phiradet