Google Groups Home
Help | Sign in
Message from discussion robot.txt
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
Ed Costello  
View profile
 More options Aug 6 1995, 3:00 am
Newsgroups: comp.infosystems.www.authoring.cgi
From: coste...@netcom.com (Ed Costello)
Date: 1995/08/06
Subject: Re: robot.txt

In <npmDCwt59....@netcom.com>, n...@netcom.com wrote:
> Upon close examination of my log I can see there's some midnight maurader
> (of the programatic kind) going around and looking for "robot.txt".  
> Just what exactly is it looking for in that robot.txt file?  I'd like to
> feed the little devils.

robots.txt is a defacto standard file for webwalkers/spiders/etc to
look for on web sites to determine what to index on that site (or
whether to index the site at all).
The file looks like:

#This is a file retrieved by webwalkers a.k.a. spiders that
#conform to a defacto standard.
#See <URL:"http://web.nexor.co.uk/mak/doc/robots/norobots.html">
#The webmaster for this site is webmas...@www.ibm.com
#Format is:
#       User-agent: <name of spider>
#       Disallow: <nothing> | <path>
#---------------------------------------------------------------------
# following prevents access to /misc/ for spiders
User-Agent: *
Disallow: /misc

#EOF

See <URL:http://web.nexor.co.uk/mak/doc/robots/norobots.html> for more
information.

--
//name  dd                                                       -ed costello
//email dd                                                coste...@netcom.com


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.

Create a group - Google Groups - Google Home - Terms of Service - Privacy Policy
©2008 Google