-cks
--
Christopher St. John
http://artofsystems.blogspot.com
http://getsatisfaction.com/pbwiki
As for PreviousBarCamps and News Archive, they served a purpose once,
when I spent more time maintaining them... seems like, possibly, we've
outgrown the current structure of the wiki for single individuals to
prevent spam...? Thoughts?
Chris
--
How about a captcha system? See the recaptcha mod for mediawiki and
maybe something similiar can be implemented.
- Bryan
________________________________________
Bryan Bishop
http://heybryan.org/
It's hard to balance technology solutions with simply better policing
by the community... maybe we should add a page describing ways that
people can help garden the wik? Christopher, maybe start by
documenting some of the things you do to garden the wiki?
Chris
--
Do we have contacts with PBWiki to suggest feature requests to make
life easy for administrators?
That should be done, of course, but it's only going to work in concert
with some of the other things discussed (like locking down old pages)
There's an awful lot of spam, and if you get behind just a bit it becomes
overwhelming. I just don't think it's practical to expect a small group
of humans to compete against an army of spam farmers using automated
attack tools (although it sounds like a fun movie :-) )
A big technical win would be the ability to detect (and reject) display=none
inline css, which seems to be a favorite trick. Detecting it after the fact is
nice, but still requires manual cleanup (see below)
Manual cleanup would be easier if the response time were faster. In fact,
that's the single biggest impediment to gardening right now: it can take up
to a minute to view the history for a page, and hitting "edit" gets you the
hiccup message maybe 20% of the time (or more on a bad day). I don't
want to sound too critical, I understand that the BarCamp wiki is a (very)
edge case as pbwikis go, but as far as technical fixes go speedier editing
and management would definitely help.
I guess that would require a bit of engineering overhead and while I
don't expect PBWiki to do custom development for us, it would be nice
if we could be on the edge of improving PBWiki's scalability and spam
fighting...
David, any thoughts?
Chris
--
The way that the recaptcha system is set up on mediawiki installations
is that it's only a once-per-login deal, and one at registration too,
or if you are an anonymous editor then one every time you edit a page.
And if a bot registers and a human logs it in, then you can ban those
usernames. I have always liked those "What's 2+2?" catpchas, actually.
Alright, how about this? We are mostly programmers here, right? That
sort of lag is a serious problem on nearly any wiki. Most of these
wikis are using a MySQL database backend, or PostgreSQL, for which the
following will work as well -- let's write a quick perl client to
interface with the remote databases, and have it fetch the history of
the last 500 pages, and a quick comparison between the two articles
(maybe the 'diff' information, which would require some serverside
processing first), and then a quick yes/no interface for "keep the
changes" ("revert?"). This would require the community to designate a
certain person for cleanup duty, yes, but otherwise a good fix.
And if we do the captcha system, we would instead ask to ban a certain
user or not, or put up the user for later review, etc. We could dump
usernames on a certain page and ask the community for yea/nay.
Chris
Sent from a typo-prone iPhone.
Manual cleanup would be easier if the response time were faster. In fact,
that's the single biggest impediment to gardening right now: it can take up
to a minute to view the history for a page, and hitting "edit" gets you the
hiccup message maybe 20% of the time (or more on a bad day).
-eric
Sent from my iPhone - please excuse any typos.
We're really sorry there are abusers here; we're making a best effort
to keep them out, though, and think you'll find our public wikis get
substantially less abuse for their traffic levels than many other
equivalent systems (e.g. forums).
-David
On Tue, Apr 22, 2008 at 2:44 AM, Frederic Baud <freder...@gmail.com> wrote:
>
--
Or, even: "Please email x...@yyy.com with "unlock <pagename>"
in the subject line and the page will be unlocked in about 5 minutes
or so". (And have a bot to do the work)
Or even "press this button and the page will unlock in 15 minutes"
(if less work required, need to make the wait longer).
Would the second two disrupt the spammer's workflows sufficiently
to deter them? They're amendable to automation, but it may not be
worth solving just for the BarCamp wiki...
I'm -1 on that change, it might reduce the flow of spam but it would
increase the number of admin requests (it's reasonably frequent
that someone needs to edit multiple pages), which nets me 0 in
terms of time saved.
For the BarCamp wiki there's just one pattern right now, and it's highly
accurate (legit users don't do it) and easy to detect:
<span style="display:none"><ul><li><a href="http://spamas.html">
Buy Spam Online</a></li></ul><ol><li><a href="http://spam.html">
Buy Spam Online</a></li></ol></span></span>
<u style="display: none">
<a href="http://crud.tr" title="mırc, mirc" target="_blank">mirc</a>
<a href="http://crap.tr" title="mırc, mirc" target="_blank">mirc</a>
<a href="http://asshat.com" title="mırc, mirc" target="_blank">mırc
</a>
anything with the string style="display:none" can safely be assumed
to be spam. If you want to get more specific, anything with display as
none that contains a link might be more accurate (but is there ever a
reason to have invisible content on a wiki page?)
> I don't feel that defacement is subsiding, quite the contrary.
>
> While I'm a bit disappointed that by now we don't have new functions
> in PBWiki to help fighting this. I can understand that PBWiki has some
> paying customers to serve first. But I still think that BarCamp's wiki
> is a nice shop-window, and I must say that the window does not look
> really clean today.
>
> I think that we are many that will appreciate that Chris, Christopher
> or anyone with administrative rights take some stark decisions so we
> get less of these spammers that start to entirely disrupt the use of
> the wiki.
Sorry, I haven't followed this, but if it's running on Apache, have a
look at mod_security. This lets you block requests (including posts)
based on lots of criteria. What's worked wonders for me is blocking
anonymous posts that contain URLs, and forwarding them to a page that
explains how to "mask" a URL so that it will get through (but of
course make it useless to spammers who just want clicks or Google
rankings for their sites).
Combine it with fail2ban, a log reader that lets you block repeat
offenders at the firewall level, and you've got a really powerful
defense.
Jeroen
So:
Which appears to be a complete fake BarCamp, set up to collect SEO
spam links. Which sets the bar kinda high for technological solutions.
It's just the tip of the iceberg. I started going through the pages in
alphabetical order, and it looks like as many pages are fake or vandalized
as not.
Holding new pages for moderation seems like it would be a good start,
but as far as I can tell you can't do that with PbWiki. In fact, PbWiki really
doesn't have the tools or infrastructure to handle really large blogs and
they don't seem to be focused on that area (not meant to be a jab, it's
fair enough that we're on their primary business path)
I think it's probably time to start looking around for alternatives that better
fit the needs of BarCamp. Not sure what to do in the mean time...
1) the option to filter any edit with display="none" (or variants thereof)
would eliminate 90% of the current defacement.
2) the ability to filter out specific url's would eliminate the majority of
the rest (since most of the spam points to the same sites again
and again)
3) the ability to retroactively apply spam controls to existing pages
would be huge.
4) pagination of the users list would help (there are thousands, so
converting somebody to read-only can take several minutes)
This is already done for <div>'s (it was implemented for BarCamp)
which is why the spammers switched to using <noembed> and <u
style="display: none">. We're updating our filters as I write this.
> 2) the ability to filter out specific url's would eliminate the majority of
> the rest (since most of the spam points to the same sites again
> and again)
A good suggestion, though I would worry that tinyurl, etc, would make
this a challenging race to win.
> 3) the ability to retroactively apply spam controls to existing pages
> would be huge.
And a dangerous bulk-shoot-self-in-foot tool. But we'll think about this.
> 4) pagination of the users list would help (there are thousands, so
> converting somebody to read-only can take several minutes)
Absolutely agreed on this. We're revamping our user management system
to be able to better manage thousands of participants (needed for
site-wide corporate deploys anyhow), so you should see some
improvement in this in the coming weeks.
-David
hmm, is there ever a legit reason to use a tinyurl in a wiki? i'd be willing
to block those entirely on the barcamp wiki, but it would obviously have
to be configurable. but it's an arms race, and nobody expects
a long-term solution, just ammunition to get through this battle :-)
>> 3) the ability to retroactively apply spam controls to existing pages
>> would be huge.
>
> And a dangerous bulk-shoot-self-in-foot tool. But we'll think about this.
>
just identifying the pages might be enough. a "potential
spammed pages" report or something, then give the ability for a human
to select and bulk-delete. or something. i'm pushing on this one because
i did a quick run though a sample of the existing wiki pages via the "all
pages" report and got really depressed...
i especially like solutions that help an admin use their human judgement
rather than totally automating. power-tools to make humans stronger rather
than robots to replace them. in the longer-ish term i think that's the only
possibility, because as the fakecamp pages show, that's what the spammers
are doing...
and, of course, as always, thanks. i've tried to be careful not to sound
ungrateful, it's just that cleaning up spam puts me in a bad mood :-)
-David
--
ok, this one may just be me failing to rtfm, but is there a way to turn off
comments entirely? there doesn't appear to be a legitimate use for them
on this wiki (since the comments on page content should generally go
inline), and they're an attractive nuisance.
Dave,
I appreciate your effort, but you've got to understand that
some simple hardcoded spam filters are just not sufficient. I've
locked the front page semi-permanently, and it looks like we're
going to need to lock down the entire Wiki for the forseeable future.
Better bulk-editing tools are desperately needed, as well as the
ability for Wiki admins (not just PbWiki staff) to edit the spam filters
to respond quickly (and retroactively apply them, regardless of the
risk of newbies shooting themselves in the foot)
But's that's just the beginning.
I'll try to find some time to research how "industrial strength" Wikis
like Wikipedia handle this sort of thing (I suspect it's mostly human-
level process plus the technology needed to monitor and enforce
the process, but I'd like to know the details). It might be an option
moving forward that allows a truly open wiki without the current
mess.
-cks
In retrospect, I see that this sounded like it was suggesting that PbWiki
is not "industrial strength". That suggestion would be both factually
incorrect (I see that PbWiki has more total pages than Wikipedia)
and not at all what I meant.
A better way to put it would be to say that I believe that the requirements
of the BarCamp Wiki are more similar to a massively collaborative system
like Wikipedia than to the Wikis normally supported by PbWiki. I base this
on the way, for example, the admin tools are obviously optimized for a
different page/user mix than the BarCamp Wiki has.
Which is fine, since PbWiki is a great solution for many, many people.
I just worry that it's unreasonable to ask PbWiki to totally revamp so much
of its admin interface to support just the one Wiki, especially when the
changes are needed very quickly. It just doesn't seem fair to ask that
level of change when PbWiki has already been so generous with time and
resources.
For bulk editing tools I recommend directly interfacing with the
database or some very well thought-out scripting frontends. I'm not
aware of anybody implementing this on large wikifarms, but allowing
shell access and modifying of the local database accounts wouldn't be
too terrible. Sourceforge does it, for instance.
- Bryan
________________________________________
http://heybryan.org/
I requested API access for the BarCamp Wiki a couple days ago,
but haven't heard back. Other than mailing a...@pbwiki.com is
there anything else I need to do?
also: slighly annoying to have your issues on this mailing list all
the time -- and it does not help to increase credibility of barcamps
either
just my 2p ;-)
/ Peter
Dr. Peter Troxler
[ k n w l d g ]
---------------------------------------
the ability to think differently
is more important
than the knowledge gained.
--------------------------------------------------------------
DISCLAIMER:
This email is intended for the personal and private use of the
individual addressee(s) named above and may contain information that
is confidential, privileged or unsuitable for overly sensitive persons
with low self-esteem, no sense of humour or irrational religious
beliefs. If you are not the intended recipient, any dissemination,
distribution or copying of this email is not authorised (either
explicitly or implicitly) and constitutes an irritating social faux-
pas. If you are indeed the indended recipient, you should still take
the utmost care in redistributing this email in order not to make this
item of your personal use public too widely. Unless the word
absquatulation has been used in its correct context somewhere other
than in this warning, it does not have any legal or grammatical use
and may be ignored. No animals were harmed in the transmission of this
email, although the kelpie next door is living on borrowed time. Those
of you with an overwhelming fear of the unknown will be gratified to
learn that there is no hidden message revealed by reading this warning
backwards, so just ignore that Alert Notice from Microsoft. However,
by pouring a complete circle of salt around yourself and your computer
you can ensure that no harm befalls you and your pets. This email
constitutes a marginal contribution to global warming and the sender
has taken utmost care to counteract such less desired side-effects of
electronic communication by planting a virtual tree with every email
sent. These trees come from certified and managed nurseries and can be
deleted together with this email without causing any harm to humans or
the environment whatsoever other than the sacrament of oblivion. If
you have received this email in error, please add some nutmeg & egg
whites, whisk and place in a warm oven for about 42 minutes.
--------------------------------------------------------------
disDISCLAIMER: This disclaimer has been monkeyed from the Internet and
is distributed under a copy-left agreement. For more information
please consult http://yourauthor.org
--------------------------------------------------------------
<<-
Upcoming isn't really a replacement for the Wiki.
> also: slighly annoying to have your issues on this mailing list all
> the time -- and it does not help to increase credibility of barcamps
> either
>
That's what this list is for, no? I mean, if you don't want to read about
organizations/community issues, this probably isn't the right list to
be subscribed to...
-cks