Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

msnbot won't stop downloading one page (rant)

2 views
Skip to first unread message

Eli the Bearded

unread,
Feb 10, 2009, 7:38:45 PM2/10/09
to
I noticed this month that msnbot is downloading one of my pages about
5000 times a day. Just the one page is getting this insane access. So
on the 5th I tried to email Microsoft about it, using the email address
that the bot has in its headers, but wouldn't you know it? That email
address bounces.

When I go to the URL in the bot's headers, there is no replacement
contact the bot owner method.

I updated my site so that the one page msnbot won't leave alone sends
a 403 Forbidden to all requests from msnbot, but the bot has not slowed
down much, it now averages about 4000 times a day. All of the IP addresses
it is coming from are real Microsoft hosts:

requests ip
13785 65.55.120.25
12598 65.55.104.35
12522 65.55.51.7
8250 65.55.51.12

That's 47155 hits to one page so far for the calendar month.

If this was my server, I'd have msnbot in robots.txt. But it's not, this
is my ISP, and I get charged for bandwidth if I go over a certain
threshold. I was half way to that threshold from msnbot requests when I
noticed this on the 5th. Now that I'm sending a 55 byte message with the
403 response, I'm not going to exceed my allotted traffic, but I'm not
happy.

So I post this, not expecting it will get anything fixed, but to heap
infamy on this stupid bot.

CAPTURED: Sat, 07 Feb 2009 09:08:20 -0500
SERVER_PROTOCOL: HTTP/1.0
REMOTE_ADDR: 65.55.209.48
Accept: text/html, text/plain, text/xml, application/*, Model/vnd.dwf, drawing/x-dwf
Host: www.panix.com
Accept-Encoding: gzip, deflate
From: msnbot(at)microsoft.com
User-Agent: msnbot/1.1 (+http://search.msn.com/msnbot.htm)
Cache-Control: max-age=0
Connection: keep-alive

Elijah
------
is inclined to include profanity in the the 403 response

D. Stussy

unread,
Feb 10, 2009, 11:59:29 PM2/10/09
to
"Eli the Bearded" <*@eli.users.panix.com> wrote in message
news:eli$09021...@qz.little-neck.ny.us...

> I noticed this month that msnbot is downloading one of my pages about
> 5000 times a day....

>
> If this was my server, I'd have msnbot in robots.txt. But it's not, this
> is my ISP, and I get charged for bandwidth if I go over a certain
> threshold. I was half way to that threshold from msnbot requests when I
> noticed this on the 5th. Now that I'm sending a 55 byte message with the
> 403 response, I'm not going to exceed my allotted traffic, but I'm not
> happy.

Ask your service provider to ban them at the firewall level.


Eli the Bearded

unread,
Feb 11, 2009, 7:12:54 PM2/11/09
to
In comp.infosystems.www.misc,
D. Stussy <rep...@newsgroups.kd6lvw.ampr.org> wrote:

> "Eli the Bearded" <*@eli.users.panix.com> wrote:
> > If this was my server, I'd have msnbot in robots.txt. But it's not, this
> > is my ISP, and I get charged for bandwidth if I go over a certain
> > threshold. I was half way to that threshold from msnbot requests when I
> > noticed this on the 5th. Now that I'm sending a 55 byte message with the
> > 403 response, I'm not going to exceed my allotted traffic, but I'm not
> > happy.
> Ask your service provider to ban them at the firewall level.

This is not likely to fly. I am using one of those antiquated things,
a Unix shell account, for my web hosting. The server gets used by many
other web users. They aren't going to make special rules in the firewall
(or apache) config for me.

Elijah
------
now on twelve years at the same provider

D. Stussy

unread,
Feb 11, 2009, 8:58:42 PM2/11/09
to
"Eli the Bearded" <*@eli.users.panix.com> wrote in message
news:eli$09021...@qz.little-neck.ny.us...

Have you asked them? You might not be the only one with the problem.


Eli the Bearded

unread,
Feb 12, 2009, 2:58:31 AM2/12/09
to
In comp.infosystems.www.misc,
D. Stussy <rep...@newsgroups.kd6lvw.ampr.org> wrote:
> Have you asked them? You might not be the only one with the problem.

It is only one page on one of the two accounts I have here. I've posted
about it to the internal web newsgroup and while people have commented,
no one else has observed this.

Elijah
------
now thinking of a shared robots.txt solution

D. Stussy

unread,
Feb 12, 2009, 3:31:36 PM2/12/09
to
"Eli the Bearded" <*@eli.users.panix.com> wrote in message
news:eli$09021...@qz.little-neck.ny.us...
> In comp.infosystems.www.misc,
> D. Stussy <rep...@newsgroups.kd6lvw.ampr.org> wrote:
> > Have you asked them? You might not be the only one with the problem.
>
> It is only one page on one of the two accounts I have here. I've posted
> about it to the internal web newsgroup and while people have commented,
> no one else has observed this.

No one else who READS the internal group has observed it....


0 new messages