The description sure sounds abusive. In particular if it does not obey robots.txt. Most crawlers can't be blocked easily by IP address as most of them use some form of a distributed crawler network. But they should obey robots.txt
Can you block it by user agent?
> --
> Need IPv6 Training? See http://www.ipv6securitytraining.com . IPv6 Security Training
>
> To unsubscribe from this group, send email to
> iscdshield+...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/iscdshield?hl=en
Johannes Ullrich
jull...@euclidian.com
(757) 726 7528
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (Darwin)
iEYEARECAAYFAkzlnE0ACgkQPNuXYcm/v/3d+ACfbatuqXr13jASABid993TU1QC
im0An0tZ+9jfpJtgxm+9pwZX47ADgdBN
=SFSo
-----END PGP SIGNATURE-----
"
If you block 008 using robots.txt, you will see crawl requests die
down gradually, rather than immediately. This happens because of our
distributed architecture. Our computers only periodically receive
robots.txt information for domains they are crawling."
So they should stop in a few...
thanks,
> See http://www.80legs.com/webcrawler.html
Scott,
I think you selectively edited the contents of their website a little bit
too much to bias your case...
They actually say...
----------
Blocking our web crawler by IP address will not work. Due to the
distributed nature of our infrastructure, we have thousands of
constantly changing IP addresses. We strongly recommend you don't try
to block our web crawler by IP address, as you'll most likely spend
several hours of futile effort and be in a very bad mood at the end of
it. You really should just include us in your robots.txt or contact us
directly.
If you feel that 008 is crawling your website too quickly, please let us
know what an
appropriate crawl rate is. If you'd like us to stop crawling your website,
the best thing
to do is to block our web crawler using the robots.txt specification.
To do this, add the following to your robots.txt:
User-agent: 008
Disallow: /
If you block 008 using robots.txt, you will see crawl requests die down
gradually, rather
than immediately. This happens because of our distributed architecture. Our
computers only
periodically receive robots.txt information for domains they are crawling.
----------
It appears to me that if you have a robots.txt then they will abide by it
and if they don't, then they provide contact links to let them know.
Regards,
Brad
[Tomas L. Byrnes] Brad, Scott has tried the robots option, as have many
others who post about this net abuser, and it doesn't work.
Does anyone have a feed of their nodes that ThreatSTOP can publish?
> -----Original Message-----
> From: iscds...@googlegroups.com [mailto:iscds...@googlegroups.com]
> On Behalf Of d...@sucuri.net
> Sent: Thursday, November 18, 2010 1:40 PM
> To: iscds...@googlegroups.com
> Subject: Re: [dshield] 80legs spider is abusive
>
> They say on the page:
>
> "
> If you block 008 using robots.txt, you will see crawl requests die
> down gradually, rather than immediately. This happens because of our
> distributed architecture. Our computers only periodically receive
> robots.txt information for domains they are crawling."
>
> So they should stop in a few...
>
> thanks,
>
[Tomas L. Byrnes]
They lie:
http://www.wxforum.net/index.php?topic=7623.0;wap2
> On Behalf Of d...@sucuri.net
> Sent: Thursday, November 18, 2010 1:40 PM
> To: iscds...@googlegroups.com
> Subject: Re: [dshield] 80legs spider is abusive[Tomas L. Byrnes]
>
> They say on the page:
>
> "
> If you block 008 using robots.txt, you will see crawl requests die
> down gradually, rather than immediately. This happens because of our
> distributed architecture. Our computers only periodically receive
> robots.txt information for domains they are crawling."
>
> So they should stop in a few...
>
> thanks,
>
They lie:
http://www.wxforum.net/index.php?topic=7623.0;wap2