Fake Accounts at WikiPathways.org

4 views
Skip to first unread message

Alexander Pico

unread,
Oct 7, 2012, 4:07:59 PM10/7/12
to wikipathw...@googlegroups.com
As you know, we've been tracking each new registration and every change made to the site via email notification for the past couple months. I've personally looked at every registration and have noticed an unsettling number of confirmed fake accounts and highly suspicious accounts. A small (minuscule) percent of these new registrants actually make legitimate edits (though still higher than most other wikis). 

Most of these fake and potentially fake accounts seem harmless, that is, they don't even attempt to make spam edits. I end up banning spammers and confirmed fake accounts at the rate of  about 5  per week. The other accounts (the vast majority at WikiPathways) just hangout there, doing nothing.

We are on top of the spam, that's not a real problem, but I have two remaining concerns regarding fake accounts:

1. Our registered user count is becoming meaningless as we get more and more fake accounts just hanging out. The ones that don't spam, still bloat that count. Based on the past few months, I don't think we should report this number anymore.

2. Theoretically, these fake accounts could become active at some later point and flood us with spam and even crash the site.

Proposals:
A. Make our own "captcha" system that bots won't bother to hack (because it's just used at our site and not a thousand other mediawikis). It could show a picture of a simple reaction and ask users to identify the reactant, product or enzyme, for example. Or ask if a reaction is a phosphorylation event or a metabolic process.

B. Institute a policy to delete accounts after 30 days if no edits are ever made. If the user makes a legitimate edit and then never returns, that's fine. I just want to target the completely unused accounts.

What do you think?
 - Alex

Egon Willighagen

unread,
Oct 8, 2012, 4:51:46 AM10/8/12
to wikipathw...@googlegroups.com
On Sun, Oct 7, 2012 at 10:07 PM, Alexander Pico
<ap...@gladstone.ucsf.edu> wrote:
> Proposals:
>
> A. Make our own "captcha" system that bots won't bother to hack (because
> it's just used at our site and not a thousand other mediawikis). It could
> show a picture of a simple reaction and ask users to identify the reactant,
> product or enzyme, for example. Or ask if a reaction is a phosphorylation
> event or a metabolic process.

That sounds like significant work... how many new real users register
now? From this proposal I assume explicitly approving users is not an
option... ?

I guess email confirmation is already in place? (I do not remember
from when I registered...)

> B. Institute a policy to delete accounts after 30 days if no edits are ever
> made. If the user makes a legitimate edit and then never returns, that's
> fine. I just want to target the completely unused accounts.

That sounds good to me... then the WP adoption can be measured by
counting all accounts that exist after 30 days...

Would it be possible to send out a reminder after 21 days, to warn
genuine users that their account will be deleted if they do not make
any edit? It could be useful to point here to a list of small edit
tasks, like missing identifiers, so that genuine users have something
simple they can do to reconfirm their account (and have a nice spin
off effect too :)... in fact, it would act as kind of second
captcha...

Egon

--
Dr E.L. Willighagen
Postdoctoral Researcher
Department of Bioinformatics - BiGCaT
Maastricht University (http://www.bigcat.unimaas.nl/)
Homepage: http://egonw.github.com/
LinkedIn: http://se.linkedin.com/in/egonw
Blog: http://chem-bla-ics.blogspot.com/
PubList: http://www.citeulike.org/user/egonw/tag/papers

Alexander Pico

unread,
Oct 8, 2012, 5:04:34 AM10/8/12
to wikipathw...@googlegroups.com
On 10/8/12 1:51 AM, "Egon Willighagen" <egon.wil...@gmail.com> wrote:


>On Sun, Oct 7, 2012 at 10:07 PM, Alexander Pico
><ap...@gladstone.ucsf.edu> wrote:
>> Proposals:
>>
>> A. Make our own "captcha" system that bots won't bother to hack (because
>> it's just used at our site and not a thousand other mediawikis). It
>>could
>> show a picture of a simple reaction and ask users to identify the
>>reactant,
>> product or enzyme, for example. Or ask if a reaction is a
>>phosphorylation
>> event or a metabolic process.
>
>That sounds like significant work... how many new real users register
>now? From this proposal I assume explicitly approving users is not an
>option... ?
>
>I guess email confirmation is already in place? (I do not remember
>from when I registered...)

A custom "captcha" would be pretty simple: a few images associated with
correct answers. We've avoided moderating new accounts via explicit
approval to allow folks to contribute right away, when they are usually
most motivated to contribute. A key part of lowering the barrier to entry.
And, yes, email confirmation is already in place. Bots (and cheap human
labor) have circumvented this measure.


>> B. Institute a policy to delete accounts after 30 days if no edits are
>>ever
>> made. If the user makes a legitimate edit and then never returns, that's
>> fine. I just want to target the completely unused accounts.
>
>That sounds good to me... then the WP adoption can be measured by
>counting all accounts that exist after 30 days...
>
>Would it be possible to send out a reminder after 21 days, to warn
>genuine users that their account will be deleted if they do not make
>any edit? It could be useful to point here to a list of small edit
>tasks, like missing identifiers, so that genuine users have something
>simple they can do to reconfirm their account (and have a nice spin
>off effect too :)... in fact, it would act as kind of second
>captcha...

Great idea! I like the idea of encouraging specific, easy edits.
- Alex

>
>Egon
>
>--
>Dr E.L. Willighagen
>Postdoctoral Researcher
>Department of Bioinformatics - BiGCaT
>Maastricht University (http://www.bigcat.unimaas.nl/)
>Homepage: http://egonw.github.com/
>LinkedIn: http://se.linkedin.com/in/egonw
>Blog: http://chem-bla-ics.blogspot.com/
>PubList: http://www.citeulike.org/user/egonw/tag/papers
>
>--
>You received this message because you are subscribed to the Google Groups
>"wikipathways-devel" group.
>To post to this group, send an email to
>wikipathw...@googlegroups.com.
>To unsubscribe from this group, send email to
>wikipathways-de...@googlegroups.com.
>For more options, visit https://groups.google.com/groups/opt_out.
>
>


Andra Waagmeester

unread,
Oct 8, 2012, 9:59:17 AM10/8/12
to wikipathw...@googlegroups.com
I would like to add a third item and that is to use Vivoweb: http://vivoweb.org. This network of researchers currently being developed in the US is a open source solution. I will be using it or a similar solution to capture curators information for our semantic web data. This would allow users to seamlessly use their work on wikipathways to be extended to other VIVO resources. You don't have to use this functionality, but if you would like to use your wikipathways activity in your output reports, you would be able to do so. 

Proposal C:
          So my proposal would be to go for Proposal B, unless you filled a WikiPathways vivo entry. 


Implementing a vivo instance is strait forward, it is open source. I can easily create a query to ask for all wikipathways curators without a wpvivo profile we then could to terminate after 30 days. In away this would give prospective users the ability to test run our platform for 30 days after which continuous usage requires a proper registration.

Andra 

Thomas Kelder

unread,
Oct 10, 2012, 9:17:42 AM10/10/12
to wikipathw...@googlegroups.com
Hi Alex,

Thanks for this analysis and summary of the problem. I agree that we
shouldn't use the registered user number anymore as usage statistic.

Proposal B sounds good to me, except that the filter may need to
exclude accounts that did not edit but do have pathways in their
watchlist. It is a valid use case to have an account just to get
updates about a pathway of interest, but not editing yet.

I'm not enthusiastic about any technical solutions (Proposal A, C),
because I think the only right technical solution will be to update
MW. We switched to a different captcha system earlier, and that didn't
help. I've seen similar spam account problems on other outdated MWs.
This makes me suspect that there is a security leak independent of the
captcha. Because wikipedia.org/mediawiki.org do not seem to have these
spam account issues, I suspect this leak has been resolved in MW
updates.

Best wishes,
Thomas

Alexander Pico

unread,
Oct 10, 2012, 11:49:44 AM10/10/12
to wikipathw...@googlegroups.com
Right. I was trying to think of use cases for accounts other than edits. Watchlists are a good example an something we can easily add to the exception list. Please share others that come to mind for optimizing proposal B.

I'll wait and assess traffic after the update before starting on these alternative proposals for registration.

Thanks!
- Alex

Martijn van Iersel

unread,
Oct 10, 2012, 2:34:35 PM10/10/12
to wikipathw...@googlegroups.com
>
> I'll wait and assess traffic after the update before starting on these alternative proposals for registration.
>


That may not be any time soon. Given our recent track record, I would
not hold my breath.

By the way, I'm fine with B.

--
Martijn
Reply all
Reply to author
Forward
0 new messages