Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Blocking Spiders

1 view
Skip to first unread message

Matt K

unread,
Oct 4, 2001, 9:47:25 AM10/4/01
to
Currently we block spiders with code like this:

dim hua
hua = lcase(Request.ServerVariables("HTTP_USER_AGENT"))
if instr(1,hua,"xenu")>0 then Response.Redirect (BASESITE & "/crawler.asp")
if instr(1,hua,"teleport")>0 then Response.Redirect (BASESITE &
"/crawler.asp")
if instr(1,hua,"msiecrawler")>0 then Response.Redirect (BASESITE &
"/crawler.asp")
if instr(1,hua,"webcopier")>0 then Response.Redirect (BASESITE &
"/crawler.asp")
if instr(1,hua,"openfind")>0 then Response.Redirect (BASESITE &
"/crawler.asp")
if instr(1,hua,"webzip")>0 then Response.Redirect (BASESITE &
"/crawler.asp")
if instr(1,hua,"bordermanager")>0 then Response.Redirect (BASESITE &
"/crawler.asp")


Has anyone found a better solution?

Matt K

Jesper Nielsen

unread,
Oct 4, 2001, 10:03:30 AM10/4/01
to
> Has anyone found a better solution?

Create a robots.txt file in your root directory with the following content:
User-Agent: *
Disallow: /

That should do the trick.


Tom Pepper

unread,
Oct 4, 2001, 10:05:35 AM10/4/01
to
Exluding Search Engine Robots (robots.txt)
http://info.webcrawler.com/mak/projects/robots/exclusion-admin.html
--
-----
Tom "Pepper" Willett
Microsoft MVP - FrontPage
-----
"Matt K" <xx...@xxxxx.com> wrote in message
news:uzY2esNTBHA.1860@tkmsftngp05...

George Hester

unread,
Oct 4, 2001, 10:37:14 AM10/4/01
to
Please what is a spider? I like spiders I always let them eat the bugs
around the house. But something tells me I don't want them fooling around
in my Web. And just by putting this txt file in C:\Inetpub\wwwroot will do
that? I don't refer to it in any way from anything?

--
George Hester
"Jesper Nielsen" <j...@nielsenit.dk> wrote in message
news:k0_u7.1530$uQ.2...@news010.worldonline.dk...

Jesper Nielsen

unread,
Oct 4, 2001, 11:01:47 AM10/4/01
to
> I don't refer to it in any way from anything?

No, simply place the file in the root of your website.
The first file serious spiders look for is a robots.txt file, and if it
finds that I is not allowed to spider the files, it will leave again,
without spidering/indexing anything.

/jesper/


Matt K

unread,
Oct 4, 2001, 11:02:15 AM10/4/01
to
I don't want to stop all spiders just certain ones. Many (google etc.) are
useful.

We get several rogue spiders than can use over 40% of a server, often run by
people on ADSL lines.

Matt K

"Tom Pepper" <tomp...@mvps.org> wrote in message
news:Ooaoy2NTBHA.928@tkmsftngp03...

Tom Pepper

unread,
Oct 4, 2001, 11:25:42 AM10/4/01
to
The URL I gave you will tell you how to block just certain spiders.

--
-----
Tom "Pepper" Willett
Microsoft MVP - FrontPage
-----
"Matt K" <xx...@xxxxx.com> wrote in message
news:uqoLTWOTBHA.2144@tkmsftngp05...
0 new messages