Dear Oliver,
as you might expect, there is no simple answer to this question.
In general, a flaw predictor should not use any features that quantify
cleanup tag related information. A predictor that uses such features
would be able to identify articles that have already been tagged, but
this doesn't solve the actual problem, namely predicting flaws in
untagged articles.
However, there might be certain article features that are in some way
effected by the cleanup tags. A very trivial example is the edit
count, which increases by one if a cleanup tag has been placed. We
cannot control all such (latent) influences. I think a good approach
is to omit those features that obviously (!) quantify cleanup tag
related information. If there is any doubt about a certain feature, we
can discuss its usefulness on this list.
I hope this helps.
Best regards,
Maik
On May 15, 12:22 pm, Oliver Ferschke <
oliver.fersc...@googlemail.com>
wrote: