Smart spam

8 views
Skip to first unread message

Chris Watkins

unread,
Sep 16, 2012, 2:43:27 AM9/16/12
to Appropedia community
I just found an interesting new page on Appropedia, about car tyres. It seems like an ok article, mostly, and doesn't have any links at all, but looking at the user name of the contributor and the the repeated reference to a business name, I'm sure it's from a spam bot.

What we have, presumably, is a bot that's intended to associate the business name with the keywords in search engine algorithms. What should our policy be for pages like this, assuming the ?
  1. Delete? (Problem: as the imitations of human writing get better, we'll make mistakes and delete articles by real people). Or...
  2. Remove any reference to the business?
The second option seemed safer, so in this case, I moved the article from "The basics of various types of car tyres from business name" to "The basics of various types of car tyres" and am removing all occurrences of business name in the article. Car tyres are pretty borderline for Appropedia's scope, but the content does no harm and might have value.

Thoughts?

I'm reminded of this cartoon... (language warning):


Constructive


- from https://xkcd.com/810/


--
Chris Watkins

Appropedia.org - Sharing knowledge to build rich, sustainable lives.

nikki caputo

unread,
Sep 21, 2012, 2:24:06 PM9/21/12
to appropedia...@googlegroups.com
i lik your precautionary princples...would tend toward this line of action myself.

--
For the public archive and subscription options, visit this group at
http://groups.google.com/group/appropedia-community?hl=en
To unsubscribe from this group, send email to
appropedia-commu...@googlegroups.com



--

He Waka Eke Noa

We are all in this boat together

~Maori Wisdom



Danyl Strype

unread,
Sep 22, 2012, 2:20:13 AM9/22/12
to appropedia...@googlegroups.com
Kia ora koutou

Good spotting Chris. I think removing any reference to particular companies is wise for a number of reasons:
1) spam-proofing
2) protects appro from trademark violation accusations
3) reduces linkrot (companies come and go, generic technologies evolve)
4) protects appro from accusations of favouritism or advertising (elevating some companies over others for money)

Open-membership wiki are spambot paradise (particularly those running on engines written in PHP), and we need to keep an eye out for camoflaged ad-spam. Like the XKCD cartoon ;) There are some excellent captchas which rely on answering questions which are obvious to humans, but very hard for computers. I quite like ESP-PIX:

Speaking of tyres, they are a pretty big issue. Most of them are synthetic rubber (made from crude oil), which means they are totally unsustainable, and very hard to dispose of at end-of-life (unless you build them into Earthships ;) This is bad news for anyone whose idea of appropriate tech involved bicycles, or cars and trucks powered by biofuel or renewably-generated electricity. The big car companies are doing research into bio-tyres that aren't made of oil, but AFAIK none of that research is orientated towards community-scale, low-energy productions. This is going to become an issue for people like Open Source Ecology, as at least some of their machine designs involve wheels carrying heavy loads.

Hei kōnā
Strypey


On 16 September 2012 18:43, Chris Watkins <chrisw...@appropedia.org> wrote:
I just found an interesting new page on Appropedia, about car tyres. It seems like an ok article, mostly, and doesn't have any links at all, but looking at the user name of the contributor and the the repeated reference to a business name, I'm sure it's from a spam bot.

What we have, presumably, is a bot that's intended to associate the business name with the keywords in search engine algorithms. 


--
Danyl Strype
Community Developer
Disintermedia.net.nz/strype

"Geeks are those who partake in our culture."
- .ISOcrates

"Uncomfortable alliances are not just necessary; they reflect and speak to the tremendous possibility of our political moment."
- Harmony Goldberg and Joshua Kahn Russell
http://www.nationofchange.org/new-radical-alliances-new-era-1337004193

"Both Marxists and Chicago-school libertarian economists can agree that free software is the best model."
- Keith C Curtis
http://keithcu.com/wordpress/?page_id=407

Christian Siefkes

unread,
Sep 24, 2012, 1:48:20 PM9/24/12
to appropedia...@googlegroups.com
Hi,

On 09/21/2012 08:24 PM, nikki caputo wrote:
> On Sat, Sep 15, 2012 at 11:43 PM, Chris Watkins
> <chrisw...@appropedia.org <mailto:chrisw...@appropedia.org>> wrote:
>
> I just found an interesting new page on Appropedia, about car tyres. It
> seems like an ok article, mostly, and doesn't have any links at all, but
> looking at the user name of the contributor and the the repeated
> reference to a business name, I'm sure it's from a spam bot.
>
> What we have, presumably, is a bot that's intended to associate the
> business name with the keywords in search engine algorithms. What should
> our policy be for pages like this, assuming the ?

as a software developer who happened to work for a tire-related startup this
summer, I can assure you that this is not smart spam, just very normal spam.
Happens all the time: spammers copy text from somewhere and paste it into
publicly accessible websites to order to get credible-looking (more or less)
links to their site.

Such very rudimentary information about tires can probably be found at about
a million places in the Web. It was added solely for spamming and has
nothing to do with Appropedia. Delete it.

Best regards
Christian

--
|------- Dr. Christian Siefkes ------- chri...@siefkes.net -------
| Homepage: http://www.siefkes.net/ | Blog: http://www.keimform.de/
| Peer Production Everywhere: http://peerconomy.org/wiki/
|---------------------------------- OpenPGP Key ID: 0x346452D8 --
Whenever you find yourself on the side of the majority, it is time to pause
and reflect.
-- Mark Twain


signature.asc

Chris Watkins

unread,
Sep 24, 2012, 10:28:16 PM9/24/12
to appropedia...@googlegroups.com
On 22 September 2012 16:20, Danyl Strype <str...@disintermedia.net.nz> wrote:
Kia ora koutou

Good spotting Chris. I think removing any reference to particular companies is wise for a number of reasons:
1) spam-proofing
2) protects appro from trademark violation accusations
3) reduces linkrot (companies come and go, generic technologies evolve)
4) protects appro from accusations of favouritism or advertising (elevating some companies over others for money)

The approach that I've taken, and that I think others have taken, is to judge links on whether they offer something useful. Obviously links from spambots will fail that test.

As for linkrot...
  • We'll need a bot to do occasional checks of links. Something for the tech team.
  • I'm not sure that a .com page is less permanent than a .org or .edu or .gov page.
  • If a link disappears... there's always archive.org :-). I hope we'll have a dynamic community that helps to replace links and update content.

One page in the wastewater area has been a major spam target - but it looks like human spam, linking particular companies' products in the wastewater treatment field, from the external links section. Relevant but not useful. I have the page on my watchlist and have removed many such links when they didn't have any useful info. After the most recent addition of spam links, I added a hidden comment in that section.

<!-- NOTE: PLEASE DO NOT ADD LINKS UNLESS THEY CONTAIN USEFUL INFORMATION. Commercial links will be promptly deleted *UNLESS* they contain useful information about the subject. -->

The humans will see that and I expect they'll think twice about what they add. (These are not the usual spammers, and it doesn't hurt to be more explicit about our links policy.)

We could take a harder line, but I don't see the necessity for that. Many companies do valuable work and share valuable info.


Open-membership wiki are spambot paradise (particularly those running on engines written in PHP), and we need to keep an eye out for camoflaged ad-spam. Like the XKCD cartoon ;) There are some excellent captchas which rely on answering questions which are obvious to humans, but very hard for computers. I quite like ESP-PIX:

Cool - I'll be interested in Lonny's thoughts on that.
 
Thanks


Speaking of tyres, they are a pretty big issue. Most of them are synthetic rubber (made from crude oil), which means they are totally unsustainable, and very hard to dispose of at end-of-life (unless you build them into Earthships ;) This is bad news for anyone whose idea of appropriate tech involved bicycles, or cars and trucks powered by biofuel or renewably-generated electricity. The big car companies are doing research into bio-tyres that aren't made of oil, but AFAIK none of that research is orientated towards community-scale, low-energy productions. This is going to become an issue for people like Open Source Ecology, as at least some of their machine designs involve wheels carrying heavy loads.

Hei kōnā
Strypey


On 16 September 2012 18:43, Chris Watkins <chrisw...@appropedia.org> wrote:
I just found an interesting new page on Appropedia, about car tyres. It seems like an ok article, mostly, and doesn't have any links at all, but looking at the user name of the contributor and the the repeated reference to a business name, I'm sure it's from a spam bot.

What we have, presumably, is a bot that's intended to associate the business name with the keywords in search engine algorithms. 


--
Danyl Strype
Community Developer
Disintermedia.net.nz/strype

"Geeks are those who partake in our culture."
- .ISOcrates

"Uncomfortable alliances are not just necessary; they reflect and speak to the tremendous possibility of our political moment."
- Harmony Goldberg and Joshua Kahn Russell
http://www.nationofchange.org/new-radical-alliances-new-era-1337004193

"Both Marxists and Chicago-school libertarian economists can agree that free software is the best model."
- Keith C Curtis
http://keithcu.com/wordpress/?page_id=407

--
For the public archive and subscription options, visit this group at
http://groups.google.com/group/appropedia-community?hl=en
To unsubscribe from this group, send email to
appropedia-commu...@googlegroups.com

Lonny

unread,
Sep 24, 2012, 10:35:31 PM9/24/12
to appropedia...@googlegroups.com
In response to Chris' question - http://server251.theory.cs.cmu.edu/cgi-bin/esp-pix/esp-pix: it is a cool idea. I think that if it got popular it would be very easy to break, unless made very difficult. I have tried it three times and found one that I could not answer. In addition, it is very english-centric (unless there are other language versions, I do not know about).

Thanks,
-Lonny

Chris Watkins

unread,
Sep 24, 2012, 10:38:30 PM9/24/12
to appropedia...@googlegroups.com
Thanks Christian - useful to know. I won't waste time on such pages in future.

Since I'd already spent some time on this, I just culled it right down to a couple of relevant sentences, marked it as a stub, and renamed it.


On 25 September 2012 03:48, Christian Siefkes <chri...@siefkes.net> wrote:

as a software developer who happened to work for a tire-related startup this
summer, I can assure you that this is not smart spam, just very normal spam.
Happens all the time: spammers copy text from somewhere and paste it into
publicly accessible websites to order to get credible-looking (more or less)
links to their site.

Such very rudimentary information about tires can probably be found at about
a million places in the Web. It was added solely for spamming and has
nothing to do with Appropedia. Delete it.



Danyl Strype

unread,
Sep 24, 2012, 11:46:49 PM9/24/12
to appropedia...@googlegroups.com
Kia ora

On 25 September 2012 14:35, Lonny <lo...@appropedia.org> wrote:
> >> esp-pix: it is a cool idea. I think that if it got popular it would be very easy to break, unless made very difficult. <<

Can you explain why you think this?

>> I have tried it three times and found one that I could In addition, it is very english-centric (unless there are other language versions, I do not know about). <<

As with a lot of free code software, it would be necessary to run an
open source translation project for it (this may already be underway),
using something like LaunchPad.net. Translation is pretty trivial
compared to going up with a test that a computer struggles to pass,
while being reasonably easy for a human.

This one only require one sentence to be translated:
http://server251.theory.cs.cmu.edu/cgi-bin/sq-pix

What do you think? I like the idea (a kid could do it) but this
particularly implementation didn't work on my GNU/Linux system (worked
fine using my girlfriend's Mac).

Ma te wā
Strypey

Lonny

unread,
Sep 25, 2012, 12:17:35 AM9/25/12
to appropedia...@googlegroups.com
Kia ora,

In response to Dany's question:
  1. A bot would somehow need to be prevented from just refreshing and trying a random selection, even a list of 1000s of entries would be easy to cycle through for a computer and annoying for a person (even with typing the letter).
  2. Also, is there a google visual search api, or something similar, yet?
  3. If there are a set number of figures, it seems that one spammer could make a database of correct answers that could then be polled with an image recognizing package like recognize dot im.
I definitely support lots of different approaches and appreciate the work in that captcha code. I am sure that it will be useful and that many of the pitfalls can be forestalled. All that said, I am not an expert and will leave the rest of the conversation to those that know more.

Thanks,
-Lonny

Samuel Rose

unread,
Sep 25, 2012, 12:26:56 AM9/25/12
to appropedia...@googlegroups.com


On Monday, September 24, 2012, Chris Watkins wrote:
Thanks Christian - useful to know. I won't waste time on such pages in future.

Since I'd already spent some time on this, I just culled it right down to a couple of relevant sentences, marked it as a stub, and renamed it.



I think Christian is right. If you find yourself dealing with lots of spam like this, you can start looking at banning ip addresses, which can make it harder for spammers to repeat their actions.


 
On 25 September 2012 03:48, Christian Siefkes <chri...@siefkes.net> wrote:

as a software developer who happened to work for a tire-related startup this
summer, I can assure you that this is not smart spam, just very normal spam.
Happens all the time: spammers copy text from somewhere and paste it into
publicly accessible websites to order to get credible-looking (more or less)
links to their site.

Such very rudimentary information about tires can probably be found at about
a million places in the Web. It was added solely for spamming and has
nothing to do with Appropedia. Delete it.



--
Chris Watkins

Appropedia.org - Sharing knowledge to build rich, sustainable lives.

--
For the public archive and subscription options, visit this group at
http://groups.google.com/group/appropedia-community?hl=en
To unsubscribe from this group, send email to
appropedia-commu...@googlegroups.com


--
--
Sam Rose
Hollymead Capital Partners, LLC
Cel: +1-(517)-974-6451
email: samue...@gmail.com
http://hollymeadcapital.com
http://p2pfoundation.net
http://socialmediaclassroom.com

"The universe is not required to be in perfect harmony with human ambition." - Carl Sagan

Chris Watkins

unread,
Sep 25, 2012, 12:32:06 AM9/25/12
to appropedia...@googlegroups.com
On 25 September 2012 14:26, Samuel Rose <samue...@gmail.com> wrote:
I think Christian is right. If you find yourself dealing with lots of spam like this, you can start looking at banning ip addresses, which can make it harder for spammers to repeat their actions.

I'd assumed that spammers cloaked their IP addresses somehow - but Wikimedia sites have some kind of IP blacklist, so I guess this is still an issue. Thanks.
Reply all
Reply to author
Forward
0 new messages