Google Groups Home
Help | Sign in
Discussions > Crawling, indexing, and ranking > How to stop googlebot from downloading from my FTP site?
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  13 messages - Collapse all
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
O  
View profile
(1 user)  More options Mar 9 2007, 11:18 am
From: O
Date: Fri, 09 Mar 2007 16:18:03 -0000
Local: Fri, Mar 9 2007 11:18 am
Subject: How to stop googlebot from downloading from my FTP site?
I know about robots.txt ... I have robots.txt on this server and I've
verified through the Google Webmaster tools that my robots.txt has
been read.  What I would like to know, and stop, is why googlebot is
downloading files, through FTP, from my ftp server:  ftp.utexas.edu/
ftp.the.net.

Here's some sample from my xferlog:

Fri Mar  9 10:07:39 2007 8 66.249.72.107 966294 /ftp2/ubuntu/pool/
universe/a/axiom/axiom-databases_20050901-1ubuntu1_all.deb b _ o a
[email address] ftp 0 * c
Fri Mar  9 10:07:57 2007 1 66.249.66.196 13687 /ftp2/freebsd/ports/
amd64/packages-7-current/All/jpegoptim-1.2.2.tbz b _ o a
[email address] ftp 0 * c
Fri Mar  9 10:08:12 2007 1 66.249.72.107 897 /ftp1/slackware/slackware-
current/source/n/netwatch/netwatch.SlackBuild b _ o a
[email address] ftp 0 * c
Fri Mar  9 10:08:32 2007 1 66.249.66.196 149444 /ftp2/ubuntu/pool/
universe/x/xbanner/xbanner_1.31-23_powerpc.deb b _ o a
[email address] ftp 0 * c
Fri Mar  9 10:08:46 2007 1 66.249.66.196 7924 /ftp1/opendarwin/
darwinsource/projects/other/postfix-147/postfix/src/util/safe_open.c b
_ o a [email address] ftp 0 * c
Fri Mar  9 10:08:52 2007 1 66.249.72.107 16928 /ftp2/freebsd/ports/
amd64/packages-5-stable/All/whirlgif-3.04.tbz b _ o a
[email address] ftp 0 * c
Fri Mar  9 10:09:32 2007 1 66.249.72.107 46050 /ftp2/freebsd/ports/
amd64/packages-5-stable/All/ripit-3.5.1.tbz b _ o a
[email address] ftp 0 * c
Fri Mar  9 10:09:40 2007 1 66.249.66.196 11476 /ftp1/opendarwin/
darwinsource/projects/apsl/diskdev_cmds-143/fsck_hfs.tproj/dfalib/
SKeyCompare.c b _ o a [email address] ftp 0 * c

Doesn't seem to be a rhyme or reason for what the googlebots are
downloading.  Any information would be appreciated.

Oscar


    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Sebastian  
View profile
 More options Mar 9 2007, 12:13 pm
From: Sebastian
Date: Fri, 09 Mar 2007 09:13:58 -0800
Local: Fri, Mar 9 2007 12:13 pm
Subject: Re: How to stop googlebot from downloading from my FTP site?
Sure these are HTTP requests? Your robots.txt disallows Googlebot
correctly. The first IP I've checked is indeed a crawler, not a
Googler.

On Mar 9, 5:18 pm, O wrote:


    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
O  
View profile
(1 user)  More options Mar 9 2007, 6:31 pm
From: O
Date: Fri, 09 Mar 2007 23:31:19 -0000
Local: Fri, Mar 9 2007 6:31 pm
Subject: Re: How to stop googlebot from downloading from my FTP site?
These aren't HTTP requests, they're FTP.  The log samples are from /
var/log/xferlogs, which is kept by vsftpd.  Looking at the past week I
see these googlebot addresses downloading files:

# of access             IP address              Hostname
 267                    66.249.65.243           crawl-66-249-65-243.googlebot.com
 699                    66.249.66.146           crawl-66-249-66-146.googlebot.com
 762                    66.249.65.206           crawl-66-249-65-206.googlebot.com
3295                    66.249.65.136           crawl-66-249-65-136.googlebot.com
3339                    66.249.65.168           crawl-66-249-65-168.googlebot.com
3769                    66.249.72.107           crawl-66-249-72-107.googlebot.com
4792                    66.249.66.196           crawl-66-249-66-196.googlebot.com

On Mar 9, 11:13 am, Sebastian wrote:


    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Discussion subject changed to "Hey Googlers: does Googlebot obey ftp://example.com/robots.txt ??" by Sebastian
Sebastian  
View profile
(1 user)  More options Mar 9 2007, 6:48 pm
From: Sebastian
Date: Fri, 09 Mar 2007 23:48:08 -0000
Local: Fri, Mar 9 2007 6:48 pm
Subject: Hey Googlers: does Googlebot obey ftp://example.com/robots.txt ??
Just guessing:

Google follows links to your files
http://www.google.com/support/webmasters/bin/answer.py?answer=33580&t...

Maybe Googlebot doesn't fetch and obey robots.txt when requesting
stuff via ftp? I'm curious ...

Sebastian

On Mar 10, 12:31 am, O wrote:


    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
MrGamma  
View profile
 More options Mar 9 2007, 9:31 pm
From: MrGamma
Date: Fri, 09 Mar 2007 18:31:38 -0800
Local: Fri, Mar 9 2007 9:31 pm
Subject: Re: Hey Googlers: does Googlebot obey ftp://example.com/robots.txt ??
I know you can get it to read different robots.txt files from http and
https...

You can also sign up for a webmaster account under https://domain.com...

Maybe you can sign up for an account under ftp://domain.com and take a
look around...


    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
MrGamma  
View profile
(1 user)  More options Mar 9 2007, 9:33 pm
From: MrGamma
Date: Fri, 09 Mar 2007 18:33:31 -0800
Local: Fri, Mar 9 2007 9:33 pm
Subject: Re: Hey Googlers: does Googlebot obey ftp://example.com/robots.txt ??
yup... looks like you can sign up for an ftp webmasters account too...

    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
softplus  
View profile
 More options Mar 10 2007, 4:46 am
From: softplus
Date: Sat, 10 Mar 2007 09:46:37 -0000
Local: Sat, Mar 10 2007 4:46 am
Subject: Re: Hey Googlers: does Googlebot obey ftp://example.com/robots.txt ??
LOL, how about gopher: ?

Actually, looking through my AOL search database, there are only very
few "clicks" on ftp:// URLs (perhaps 15?). Could it be that there is a
FTP "service" out there that is redirecting http requests to your ftp
server? Perhaps a "file-search" service that has http links which are
automatically redirected to the "closest" ftp server? I imagine if the
Googlebot were to stumble into a service like that, it might get
tricked to download via ftp (but it would be strange if they didn't
check for that and halt the crawl in those cases...hmm)

John


    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Sebastian  
View profile
(1 user)  More options Mar 10 2007, 4:49 am
From: Sebastian
Date: Sat, 10 Mar 2007 09:49:45 -0000
Local: Sat, Mar 10 2007 4:49 am
Subject: Re: Hey Googlers: does Googlebot obey ftp://example.com/robots.txt ??
Hint:

Google's mission is to organize the world's information and make it
universally accessible and useful.
http://www.google.com/corporate/

Sebastian

On Mar 10, 10:46 am, softplus wrote:


    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
softplus  
View profile
 More options Mar 10 2007, 5:30 am
From: softplus
Date: Sat, 10 Mar 2007 10:30:08 -0000
Local: Sat, Mar 10 2007 5:30 am
Subject: Re: Hey Googlers: does Googlebot obey ftp://example.com/robots.txt ??
What about Google code search? They trawl ftp sites, download packages
and index the contents of the files. I don't think they have a
separate crawler useragent... but which robots.txt would they respect?

John


    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Sebastian  
View profile
 More options Mar 11 2007, 5:26 pm
From: Sebastian
Date: Sun, 11 Mar 2007 21:26:38 -0000
Local: Sun, Mar 11 2007 5:26 pm
Subject: Re: Hey Googlers: does Googlebot obey ftp://example.com/robots.txt ??
bump

http://www.google.com/search?num=100&hl=en&safe=off&q=inurl:ftp+site:...
On Mar 10, 11:30 am, softplus wrote:


    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
JLH  
View profile
(1 user)  More options Mar 11 2007, 7:59 pm
From: JLH
Date: Sun, 11 Mar 2007 16:59:44 -0700
Local: Sun, Mar 11 2007 7:59 pm
Subject: Re: Hey Googlers: does Googlebot obey ftp://example.com/robots.txt ??
If you get Philipp Lenssen to ask the question Matt will answer it on
his blog.

http://www.mattcutts.com/blog/search-results-in-search-results/#comme...

On Mar 11, 4:26 pm, Sebastian wrote:


    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.