[QualityFlawPrediction] Legitimate Features

8 views
Skip to first unread message

Oliver Ferschke

unread,
May 15, 2012, 6:22:41 AM5/15/12
to pan-works...@googlegroups.com
Dear list,
obviously, the cleanup labels (and categories) which tag the flaw we want to detect cannot be used as features.
I am wondering, however, if it is legitimate to use features that might have been directly triggered by these tags.
For example, a user discussion about the article quality would make a legitimate feature. But what about the discussions about the cleanup tag itself , e.g. a discussion whether the tag has been assigned legitimately to the article or a discussion about how to improve the article in order to get rid of the tag?
If we want to predict flaws, it should be done as independently from the tags as possible. But then again, many aspects of the article and its evolution are influenced by the quality markers.
So, total independence cannot be guaranteed.
I was just wondering where we should draw the line? What can we include so that we can still claim to have built a "flaw predictor"?

Regards,
Oliver






Maik Anderka

unread,
May 16, 2012, 3:49:03 AM5/16/12
to PAN Workshop Series. Uncovering Plagiarism, Authorship, and Social Software Misuse.
Dear Oliver,

as you might expect, there is no simple answer to this question.

In general, a flaw predictor should not use any features that quantify
cleanup tag related information. A predictor that uses such features
would be able to identify articles that have already been tagged, but
this doesn't solve the actual problem, namely predicting flaws in
untagged articles.

However, there might be certain article features that are in some way
effected by the cleanup tags. A very trivial example is the edit
count, which increases by one if a cleanup tag has been placed. We
cannot control all such (latent) influences. I think a good approach
is to omit those features that obviously (!) quantify cleanup tag
related information. If there is any doubt about a certain feature, we
can discuss its usefulness on this list.

I hope this helps.

Best regards,
Maik


On May 15, 12:22 pm, Oliver Ferschke <oliver.fersc...@googlemail.com>
wrote:
Reply all
Reply to author
Forward
0 new messages