Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Repeated inbound SMTP failure (timeout) from specific domains

16 views
Skip to first unread message

Dane

unread,
Dec 15, 2004, 2:56:43 AM12/15/04
to
I've been trying to troubleshoot a very strange mail issue on an SBS2003
(Exchange 2003) server for about 6 weeks now and am desperately looking for
help.

Here is a sample of what I'm working with (failed inbound SMTP session):

date time c-ip cs-username cs-method cs-uri-query sc-status sc-win32-status
sc-bytes cs-bytes time-taken
2004-12-14 16:09:42 65.54.187.77 hotmail.com EHLO +hotmail.com 250 0 171 16
0
2004-12-14 16:09:42 65.54.187.77 hotmail.com MAIL
+FROM:<john...@hotmail.com> 250 0 46 33 0
2004-12-14 16:09:42 65.54.187.77 hotmail.com RCPT
+TO:<Di...@recipientdomain.com> 250 0 0 25 93
2004-12-14 16:20:10 65.54.187.77 hotmail.com TIMEOUT hotmail.com 121
1304114753 84 4 627797
2004-12-14 16:20:10 65.54.187.77 hotmail.com QUIT hotmail.com 240 628063 84
4 627797

A small formatted Excel spreadsheet with more logs of both failed and
successful sessions can be quickly opened at:

http://pics.virtuality.org/linkto/email_troubles.xls

An Exchange server I manage has been processing inbound SMTP connections
that result in a 121 TIMEOUT. The messages never get delivered locally and
an NDR is not sent from my Exchange server, though the originating server
usually kicks back a failure notification to the sender after the retry
period expires.

Of the thousands of messages that the server processes daily, a fairly
steady group of sender domains consistently (but not always) have trouble
delivering email to my server while most messages come through fine. A log
review shows that the failed connections get "stuck" after my server
receives the RCPT command.

The sessions on my side look like:

EHLO remotemailserver.com
MAIL ...
RCPT ...
(10 minute wait)
TIMEOUT ...
QUIT ...

At first I assumed it was a firewall issue on my side that was blocking the
BDAT verb which would normally come after RCPT when advertising ESMTP. I
removed the CHUNKING advertisement to prevent binary data formats from being
used for inbound SMTP and forced HELO for outbound SMTP, but the problem
persisted.

I then decided to replace the consumer model SMC firewall with a Cisco 2651
router with the firewall feature set (NBAR and CBAC). Unfortunately the
firewall upgrade didn't change or improve the mail symptoms one bit, so I
can't imagine it's still a firewall issue at this point.

One consistent anomoly in the logs has to do with the sc-win32-status result
on the connections that time out, though I don't know what the result means.
The TIMEOUT line for a failed connection has a sc-win32-status result with a
very large number such as 2175011793. (more examples in the .xls file link
above)

So far I've seen failed inbound sessions from 6 legitimate businesses that
communicate with users on my Exchange server, and occasionally from domains
like hotmail.com, ebay.com and other very large mail domains, but never any
UCE or junk mail sessions. I've attempted to recreate the problem using my
own email accounts, both with and without attachments, but have not been
able to recreate the problem.

The failed messages never successfuly get delivered after retries... they
permanently fail.

I've searched both the web and newsgroups and found similar symptoms from
folks going back to 2002 and using both Exch2k and 2k3, but never any
solutions that were documented.

Any suggestions would be greatly appreciated. I'm going nuts trying to
figure this out!


parc...@gmail.com

unread,
Jan 28, 2005, 4:54:06 PM1/28/05
to
I was having the same problem, also with a small business server.
However, I don't think the server was the problem.

After spending about 3 days on the phone with Microsoft Support, and
analyzing many smtp conversations at packet level *some* of the packets
seemed become malformed before reaching server (from the servers that
were causing the problem sessions)

We did many tests with the MTU size, trying to get it as high as
possible without any success. Then i called linksys support, and they
suggested to set the mtu size to the lowest value possible (576) for
trouble shooting purposes. Amazingly, I started to receive all the test
messages. Then I slowly moved the MTU value higher and higher and am
now receiving email from hotmail at an MTU size of 1400.

My best guess of what is causing this problem is some router on the
backbone of the Internet that is not configured properly and is messing
with the packets that come through (ie, the reason why only *some* of
the mailservers have problems). Setting the MTU lower on our
router/firewall makes the mail servers negotiate a smaller framsize
which doesn't end up being corrupted somewhere along the way.

I hope this helps.
--Johan

Carol Chisholm

unread,
Feb 1, 2005, 1:50:46 PM2/1/05
to
Here is my latest update on my particular case.

I had changed the MTU down to 1400 but not lower...

This concerns a new company where I set up a new Exchange 2003 server
.....and e-mail started to arriveů

Not all mail arrived, and after a considerable amount of reading
logfiles I found that in fact mail from some mail hosts did not
arrive.

One ISP affected is bezquint.com. They have two blocks of servers, one
of which we can receive from and one of which we can not receive from.

Another affected ISP is VTX, and where I do have some cooperative
contacts to help with testing. There are many more.

When I called these ISPs they all obligingly checked my DNS, reverse
DNS, telnetted into my server and sent me a message saying all was
well.

I tested further with VTX and we found that when mail was sent
"manually" with telnet, ehlo and so on, it was transferred, and when
it was send "batch" through the mail system, it sat in the queue at
VTX until it timed out. No NDR, no nothing at my end.

At my end I see a connection, and EHLO, and no DATA, or BDAT or
whateverů Here is a snip from a logfile. I had the timeout set to 20
minutes at this stage so there is no immediate timeout, but it does
come later.

>>2004-11-29 13:24:53 212.37.192.53 smtp1.internet-fr.net SMTPSVC1
>>NBINEUS001 10.1.1.1 0 EHLO - +smtp1.internet-fr.net 250 0 214 26 0
>>SMTP - - - -
>>2004-11-29 13:24:53 212.37.192.53 smtp1.internet-fr.net SMTPSVC1
>>NBINEUS001 10.1.1.1 0 MAIL - +FROM 250
>>0 54 51 31 SMTP - - - -
>>2004-11-29 13:24:53 212.37.192.53 smtp1.internet-fr.net SMTPSVC1
>>NBINEUS001 10.1.1.1 0 RCPT - +TO: 250 0 27 24 0 SMTP -

Anyway having identified the problem I changed the firewall, changed
the ADSL router, build the second server as an Exchange server (it was
supposed to be the terminal server for Access Accounts). I reduced all
the MTUs (NIC, firewall, ADSL modems & routers) to 1400, 1464, 1492. I
have no blacklists, I uninstalled the anti-virus software. Finally in
desperation I took the second server to another building, with one of
the firewalls I had been testing with and plugged it into a cable
modem rather than an ADSL modem. And all the mail arrived!

So now I have one server offsite, two firewalls, a firewall-firewall
VPN and two Exchange servers.

Today when you called I was doing more testing with VTX. I build a
third server on some very old hardware I had lying around, and took it
into the office. It has the same software as the other two (which do
not get mail over the ADSL connection in the office).

I set up a test domain (link216.ch) and had VTX send me mail. When
they send to my old hardware, mail came through instantaneously. When
I changed the route to send incoming SMTP to the new Proliant server
(same modem, same firewall, same switch, same LAN) mail does not
arrive. (well some mail does, but not from certain hosts). As before
it sits in the queue at VTX until it times out.

Of course I need to bring the server back so I can set up a terminal
server. I'd also like to elucidate the problem, of course.

My conclusion at this stage is that it has to be a network setting.
The new servers are off-the-shelf Proliants with 10/100/1000 NICS.
(NC7760 and NC7761 if I remember correctly). My old machine has
probably got a 3C905 or 3C590 or some such in it.

At this stage I'm wondering whether to add a very standard NIC to the
new server and try again.

Now I've read Johan's post I'll reduce the MTU size further as well.

Carol Chisholm

Carol Chisholm

unread,
Feb 5, 2005, 3:32:28 AM2/5/05
to
I have found that changing the NIC in the Exchange server from a
gigabit one to a very standard 10/100 can also resolve this problem.

This is not actually a solution, because the problem is domain
dependent, but it is a relatively easy workaround, which I imagine is
effective because the less sophisticated, older NIC will make smaller
packets, and the drivers for a cheap NIC are less likely to use any
sophisticated techniques which might get lost in misconfigured
routers.

There is also the Extended DNS issues mentioned in MS article 828731
when you have Exchange 2003 SP1.

Carol

Donald@discussions.microsoft.com Heath Donald

unread,
Feb 28, 2005, 7:45:02 PM2/28/05
to
Hi,

I have been messing around with this exact problem. I resolved it by
uninstalling the AV. It seems that with the new versions of Symantec AV it
has a native SMTP/POP3 proxy "service" that intercepts ALL SMTP/POP3 traffic.
Even from the Exchange SMTP virtual server!!! Its designed for clients not
exchange servers. If you uninstall this part of the Symantec AV and reboot
the server mail starts to flow.

I really thought this was a MTU issue but it isnt. !!

Well so much for that waisted 48hrs of my life.

http://service1.symantec.com/SUPPORT/ent-security.nsf/docid/2004052415562048?Open&sr

"Carol Chisholm" wrote:

> I have found that changing the NIC in the Exchange server from a
> gigabit one to a very standard 10/100 can also resolve this problem.
>
> This is not actually a solution, because the problem is domain
> dependent, but it is a relatively easy workaround, which I imagine is
> effective because the less sophisticated, older NIC will make smaller
> packets, and the drivers for a cheap NIC are less likely to use any
> sophisticated techniques which might get lost in misconfigured
> routers.
>
> There is also the Extended DNS issues mentioned in MS article 828731
> when you have Exchange 2003 SP1.
>
> Carol
>
>
> On Tue, 01 Feb 2005 19:50:46 +0100, Carol Chisholm
> <carol...@smalldomain.ch> wrote:
>
> >Here is my latest update on my particular case.
> >
> >I had changed the MTU down to 1400 but not lower...
> >
> >This concerns a new company where I set up a new Exchange 2003 server

> >.....and e-mail started to arrive…


> >
> >Not all mail arrived, and after a considerable amount of reading
> >logfiles I found that in fact mail from some mail hosts did not
> >arrive.
> >
> >One ISP affected is bezquint.com. They have two blocks of servers, one
> >of which we can receive from and one of which we can not receive from.
> >
> >Another affected ISP is VTX, and where I do have some cooperative
> >contacts to help with testing. There are many more.
> >
> >When I called these ISPs they all obligingly checked my DNS, reverse
> >DNS, telnetted into my server and sent me a message saying all was
> >well.
> >
> >I tested further with VTX and we found that when mail was sent
> >"manually" with telnet, ehlo and so on, it was transferred, and when
> >it was send "batch" through the mail system, it sat in the queue at
> >VTX until it timed out. No NDR, no nothing at my end.
> >
> >At my end I see a connection, and EHLO, and no DATA, or BDAT or

> >whatever… Here is a snip from a logfile. I had the timeout set to 20

Carol Chisholm

unread,
Mar 20, 2005, 6:03:40 AM3/20/05
to
I am not running Symantec anti-virus. I have tried uninstalling my AV
products and this has no effect.

>> >.....and e-mail started to arriveů


>> >
>> >Not all mail arrived, and after a considerable amount of reading
>> >logfiles I found that in fact mail from some mail hosts did not
>> >arrive.
>> >
>> >One ISP affected is bezquint.com. They have two blocks of servers, one
>> >of which we can receive from and one of which we can not receive from.
>> >
>> >Another affected ISP is VTX, and where I do have some cooperative
>> >contacts to help with testing. There are many more.
>> >
>> >When I called these ISPs they all obligingly checked my DNS, reverse
>> >DNS, telnetted into my server and sent me a message saying all was
>> >well.
>> >
>> >I tested further with VTX and we found that when mail was sent
>> >"manually" with telnet, ehlo and so on, it was transferred, and when
>> >it was send "batch" through the mail system, it sat in the queue at
>> >VTX until it timed out. No NDR, no nothing at my end.
>> >
>> >At my end I see a connection, and EHLO, and no DATA, or BDAT or

>> >whateverů Here is a snip from a logfile. I had the timeout set to 20

Hartono

unread,
Mar 20, 2005, 8:21:00 PM3/20/05
to
I also had the same problem. The good thing is I could identify the domain
which is hotmail.

I have tried fixing DNS (reverse lookup), SMTP (smarthost), but no lucks.

Any help would be very appreciated.

Regards,

Hartono
Consultant

"Carol Chisholm" <carol...@smalldomain.ch> wrote in message
news:r2mq31hh2rs89j1g5...@4ax.com...

> >> >.....and e-mail started to arrive.


> >> >
> >> >Not all mail arrived, and after a considerable amount of reading
> >> >logfiles I found that in fact mail from some mail hosts did not
> >> >arrive.
> >> >
> >> >One ISP affected is bezquint.com. They have two blocks of servers, one
> >> >of which we can receive from and one of which we can not receive from.
> >> >
> >> >Another affected ISP is VTX, and where I do have some cooperative
> >> >contacts to help with testing. There are many more.
> >> >
> >> >When I called these ISPs they all obligingly checked my DNS, reverse
> >> >DNS, telnetted into my server and sent me a message saying all was
> >> >well.
> >> >
> >> >I tested further with VTX and we found that when mail was sent
> >> >"manually" with telnet, ehlo and so on, it was transferred, and when
> >> >it was send "batch" through the mail system, it sat in the queue at
> >> >VTX until it timed out. No NDR, no nothing at my end.
> >> >
> >> >At my end I see a connection, and EHLO, and no DATA, or BDAT or

> >> >whatever. Here is a snip from a logfile. I had the timeout set to 20

Hartono

unread,
Mar 20, 2005, 8:21:56 PM3/20/05
to
I also had the same problem. The good thing is I could identify the domain
which is hotmail.

I have tried fixing DNS (reverse lookup), SMTP (smarthost), but no lucks.

Any help would be very appreciated.

Regards,

Hartono
Consultant

"Carol Chisholm" <carol...@smalldomain.ch> wrote in message
news:r2mq31hh2rs89j1g5...@4ax.com...

> >> >.....and e-mail started to arrive.


> >> >
> >> >Not all mail arrived, and after a considerable amount of reading
> >> >logfiles I found that in fact mail from some mail hosts did not
> >> >arrive.
> >> >
> >> >One ISP affected is bezquint.com. They have two blocks of servers, one
> >> >of which we can receive from and one of which we can not receive from.
> >> >
> >> >Another affected ISP is VTX, and where I do have some cooperative
> >> >contacts to help with testing. There are many more.
> >> >
> >> >When I called these ISPs they all obligingly checked my DNS, reverse
> >> >DNS, telnetted into my server and sent me a message saying all was
> >> >well.
> >> >
> >> >I tested further with VTX and we found that when mail was sent
> >> >"manually" with telnet, ehlo and so on, it was transferred, and when
> >> >it was send "batch" through the mail system, it sat in the queue at
> >> >VTX until it timed out. No NDR, no nothing at my end.
> >> >
> >> >At my end I see a connection, and EHLO, and no DATA, or BDAT or

> >> >whatever. Here is a snip from a logfile. I had the timeout set to 20

0 new messages