How to Trace "TCP Receive Error"

Barry Finkel

unread,

Jan 6, 2008, 11:05:58 AM1/6/08

to

I am seeing lots of messages like this one from BIND-9.4.1-P1:

[ID 873579 daemon.info] dispatch b090ef8:
shutting down due to TCP receive error: 69.59.189.68#53:
connection reset

I tried a Solaris snoop trace of all traffic between the DNS server
(which has three IP addresses) to the IP address in the message:

snoop -v -s3000 -o /tmp/snoop.trace 69.59.189.68

but I did not get any packets captured. I ran the trace for one hour,
and after not capturing anything, I looked in /var/adm/messages.
There were about 300 such messages logged. What snoop trace parameters
do I have to specify to trace this activity? I am assuming (maybe
incorrectly) that snoop is tracing activity on all three IP addresses.
I have BIND query logging on, and I do not see this address in the
query.log file. Thanks.
----------------------------------------------------------------------
Barry S. Finkel
Computing and Information Systems Division
Argonne National Laboratory Phone: +1 (630) 252-7277
9700 South Cass Avenue Facsimile:+1 (630) 252-4601
Building 222, Room D209 Internet: BSFi...@anl.gov
Argonne, IL 60439-4828 IBMMAIL: I1004994

Dave Knight

unread,

Jan 6, 2008, 11:48:13 AM1/6/08

to

On 6-Jan-08, at 11:05 AM, Barry Finkel wrote:

> I am seeing lots of messages like this one from BIND-9.4.1-P1:
>
> [ID 873579 daemon.info] dispatch b090ef8:
> shutting down due to TCP receive error: 69.59.189.68#53:
> connection reset
>
> I tried a Solaris snoop trace of all traffic between the DNS server
> (which has three IP addresses) to the IP address in the message:
>
> snoop -v -s3000 -o /tmp/snoop.trace 69.59.189.68

Snoop will listen to the first non-loopback interface it finds, I
would guess in this case it has picked the wrong one.

You can list the available interfaces with:

netstat -i

Then instruct snoop to listen on the correct one with:

-d <interface>

Furthermore, snoop defaults to capturing whole packets, so your
setting snaplen with -s is probably redundant, if for some reason it's
required you shouldn't need to set it higher than the mtu of the
interface on which you are capturing traffic. You'll see that in the
output of the above netstat command.

Mark Andrews

unread,

Jan 6, 2008, 6:21:40 PM1/6/08

to

> I am seeing lots of messages like this one from BIND-9.4.1-P1:
>
> [ID 873579 daemon.info] dispatch b090ef8:
> shutting down due to TCP receive error: 69.59.189.68#53:
> connection reset
>
> I tried a Solaris snoop trace of all traffic between the DNS server
> (which has three IP addresses) to the IP address in the message:
>
> snoop -v -s3000 -o /tmp/snoop.trace 69.59.189.68
>

> but I did not get any packets captured. I ran the trace for one hour,
> and after not capturing anything, I looked in /var/adm/messages.
> There were about 300 such messages logged. What snoop trace parameters
> do I have to specify to trace this activity? I am assuming (maybe
> incorrectly) that snoop is tracing activity on all three IP addresses.
> I have BIND query logging on, and I do not see this address in the
> query.log file. Thanks.
> ----------------------------------------------------------------------
> Barry S. Finkel
> Computing and Information Systems Division
> Argonne National Laboratory Phone: +1 (630) 252-7277
9700 South Cass Avenue Facsimile:+1 (630) 252-4601
> Building 222, Room D209 Internet: BSFi...@anl.gov
> Argonne, IL 60439-4828 IBMMAIL: I1004994

I suspect the nameserver has some sort of filtering box in
front of it that is attempting to determine if the client
is real or spoofed. A "real" client will try TCP on seeing
"tc" even if this is not strictly true for UDP only
client/stacks. This then turns just about all the UDP
queries into TCP queries. If the nameserver behind gets
overwhelmed with TCP connections it will start sending out
RST. Self inflicted TCP SYN DoS. There is a reason DNS
uses UDP in the first place.

This sort of "solution" does not scale.

From the trace below the filtering box is keeping state
because subsequent UDP queries get through. This doesn't
help much as many clients only ask a single question of a
nameserver as A and AAAA queries often go to different
nameservers. If the filtering boxes were to share state
there would be less problems.

Mark

farside.isc.org:marka {1} % dig any . +norec @69.59.189.68 +ignore

; <<>> DiG 9.3.3 <<>> any . +norec @69.59.189.68 +ignore
; (1 server found)
;; global options: printcmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 40882
;; flags: qr aa tc; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;. IN ANY

;; Query time: 4 msec
;; SERVER: 69.59.189.68#53(69.59.189.68)
;; WHEN: Sun Jan 6 22:57:56 2008
;; MSG SIZE rcvd: 17

farside.isc.org:marka {2} % dig any . +norec @69.59.189.68 +ignore

; <<>> DiG 9.3.3 <<>> any . +norec @69.59.189.68 +ignore
; (1 server found)
;; global options: printcmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 58042
;; flags: qr aa tc; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;. IN ANY

;; Query time: 3 msec
;; SERVER: 69.59.189.68#53(69.59.189.68)
;; WHEN: Sun Jan 6 22:58:09 2008
;; MSG SIZE rcvd: 17

farside.isc.org:marka {3} % dig any . +norec @69.59.189.68 +ignore +vc

; <<>> DiG 9.3.3 <<>> any . +norec @69.59.189.68 +ignore +vc
; (1 server found)
;; global options: printcmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: REFUSED, id: 40817
;; flags: qr; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;. IN ANY

;; Query time: 3 msec
;; SERVER: 69.59.189.68#53(69.59.189.68)
;; WHEN: Sun Jan 6 22:58:22 2008
;; MSG SIZE rcvd: 17

farside.isc.org:marka {4} % dig any . +norec @69.59.189.68 +ignore

; <<>> DiG 9.3.3 <<>> any . +norec @69.59.189.68 +ignore
; (1 server found)
;; global options: printcmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: REFUSED, id: 10118
;; flags: qr; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;. IN ANY

;; Query time: 5 msec
;; SERVER: 69.59.189.68#53(69.59.189.68)
;; WHEN: Sun Jan 6 22:58:25 2008
;; MSG SIZE rcvd: 17

farside.isc.org:marka {5} %
--
Mark Andrews, ISC
1 Seymour St., Dundas Valley, NSW 2117, Australia
PHONE: +61 2 9871 4742 INTERNET: Mark_A...@isc.org

Mark Andrews

unread,

Jan 6, 2008, 6:43:10 PM1/6/08

to

> I suspect the nameserver has some sort of filtering box in
> front of it that is attempting to determine if the client
> is real or spoofed. A "real" client will try TCP on seeing
> "tc" even if this is not strictly true for UDP only
> client/stacks. This then turns just about all the UDP
> queries into TCP queries. If the nameserver behind gets
> overwhelmed with TCP connections it will start sending out
> RST. Self inflicted TCP SYN DoS. There is a reason DNS
> uses UDP in the first place.

Note: named supports "dataready" as a accept filter and has
the ability to tune the listen queue depth via named.conf.
Both of these can help minimise the impact to the server
of putting such a filtering box infont of it.

Also tuning the DNS TTL to be less than the TTL of filtering
box's state table may help if the state table's TTL gets
reset on UDP queries. If the DNS's TTL is bigger almost
all transactions will trigger the TC response.

Mark

Barry Finkel

unread,

Jan 6, 2008, 9:58:19 PM1/6/08

to

On 6-Jan-08, at 11:05 AM, Barry Finkel wrote:

>> I am seeing lots of messages like this one from BIND-9.4.1-P1:
>>
>> [ID 873579 daemon.info] dispatch b090ef8:
>> shutting down due to TCP receive error: 69.59.189.68#53:
>> connection reset
>>
>> I tried a Solaris snoop trace of all traffic between the DNS server
>> (which has three IP addresses) to the IP address in the message:
>>
>> snoop -v -s3000 -o /tmp/snoop.trace 69.59.189.68
>>
>> but I did not get any packets captured. I ran the trace for one hour,
>> and after not capturing anything, I looked in /var/adm/messages.
>> There were about 300 such messages logged. What snoop trace
>> parameters
>> do I have to specify to trace this activity? I am assuming (maybe
>> incorrectly) that snoop is tracing activity on all three IP addresses.
>> I have BIND query logging on, and I do not see this address in the
>> query.log file. Thanks.

and Dave Knight <da...@knig.ht> replied:

>Snoop will listen to the first non-loopback interface it finds, I
>would guess in this case it has picked the wrong one.
>
>You can list the available interfaces with:
>
> netstat -i
>
>Then instruct snoop to listen on the correct one with:
>
> -d <interface>

I do not understand your reply. The DNS server has three IP addresses,
and ALL THREE are advertised and in use. So, there is no "correct" one.

oberon% netstat -i
Name Mtu Net/Dest Address Ipkts Ierrs Opkts Oerrs Collis Queue
lo0 8232 loopback localhost 465553 0 465553 0 0 0
bge0 1500 oberon.it.anl.gov oberon 5358043 0 1668993 0 0 0
bge1 1500 dns2.anl.gov dns2.anl.gov 340299637 0 154842 0 0 0
bge2 1500 dns2.anl.gov dns2.anl.gov 286178523 0 689428381 0 0 0

oberon%

and I have no idea what interface is being used for these queries.
The DNS server is an internal server for our anl.gov clients. It
is inaccessible for internet queries (but it will accept response
packets), so the queries that are triggering these messages must be
from one or more internal machines here.

On the DNS server I did an "rndc dumpdb", and these records appear in
the database dump:

; glue
support-intelligence.NET. 134497 NS dns-eu1.powerdns.net.
134497 NS dns-eu2.powerdns.net.
; authauthority
a.support-intelligence.NET. 1775 \-AAAA ;-$NXRRSET
; glue
1891 A 69.59.189.68
; authauthority
b.support-intelligence.NET. 1775 \-AAAA ;-$NXRRSET
; glue
1891 A 69.59.189.68
; glue
dob.sibl.support-intelligence.NET. 1891 NS a.support-intelligence.net.
1891 NS b.support-intelligence.net.
; glue

;
; Unassociated entries
;
; 69.59.189.68 [srtt 374780] [flags 00000000] [ttl 1773]

I assume that the comment lines come before the data line(s).
The queries seem to be associated somehow with the domain

support-intelligence.net

A check of our BIND query log shows lots of queries from one of our
mail machines; here is one query.

06-Jan-2008 17:38:01.101 queries: info:
client 146.137.96.51#41548: query:
achilles.ctd.anl.gov.dob.sibl.support-intelligence.net IN A +

I do not have access to that mail machine, so I am copying the
administrators of that machine, who might be able to tell me why these
queries are happening.

Mark Andrews

unread,

Jan 6, 2008, 10:18:46 PM1/6/08

to

The correct interface is the one the kernel will select to send
packets to 69.59.189.68.

> oberon% netstat -i
> Name Mtu Net/Dest Address Ipkts Ierrs Opkts Oerrs Collis Queu
> e
> lo0 8232 loopback localhost 465553 0 465553 0 0 0
> bge0 1500 oberon.it.anl.gov oberon 5358043 0 1668993 0 0
> 0
> bge1 1500 dns2.anl.gov dns2.anl.gov 340299637 0 154842 0 0 0
>
> bge2 1500 dns2.anl.gov dns2.anl.gov 286178523 0 689428381 0 0
> 0
>
> oberon%
>
> and I have no idea what interface is being used for these queries.

Examine the routing tables or just run snoop on all three
interfaces.

Looks like a dob.sibl.support-intelligence.net is in a search
list and the application is not RFC 1535 compliant.

> I do not have access to that mail machine, so I am copying the
> administrators of that machine, who might be able to tell me why these
> queries are happening.
> ----------------------------------------------------------------------
> Barry S. Finkel
> Computing and Information Systems Division
> Argonne National Laboratory Phone: +1 (630) 252-7277
> 9700 South Cass Avenue Facsimile:+1 (630) 252-4601
> Building 222, Room D209 Internet: BSFi...@anl.gov
> Argonne, IL 60439-4828 IBMMAIL: I1004994
>
>

Dave Knight

unread,

Jan 6, 2008, 10:27:10 PM1/6/08

to

> oberon% netstat -i
> Name Mtu Net/Dest Address Ipkts Ierrs Opkts Oerrs
> Collis Queue
> lo0 8232 loopback localhost 465553 0 465553 0
> 0 0
> bge0 1500 oberon.it.anl.gov oberon 5358043 0 1668993
> 0 0 0
> bge1 1500 dns2.anl.gov dns2.anl.gov 340299637 0 154842 0
> 0 0
> bge2 1500 dns2.anl.gov dns2.anl.gov 286178523 0 689428381
> 0 0 0
>
> oberon%
>
> and I have no idea what interface is being used for these queries.

> The DNS server is an internal server for our anl.gov clients. It
> is inaccessible for internet queries (but it will accept response
> packets), so the queries that are triggering these messages must be
> from one or more internal machines here.

# man snoop

[..]

-d device

Receive packets from the network using the interface
specified by device, for example, eri0 or hme0. The pro-
gram netstat(1M), when invoked with the -i flag, lists
all the interfaces that a machine has. Normally, snoop
will automatically choose the first non-loopback inter-
face it finds.

snoop can only capture packets on one interface at a time, so if you
are unsure which interface the packets you are looking for are going
to arrive on you might try running one for each possible interface
concurrently:

# snoop -v -s 1500 -o /tmp/snoop.bge0.trace -d bge0 host 69.59.189.68 &
# snoop -v -s 1500 -o /tmp/snoop.bge1.trace -d bge1 host 69.59.189.68 &
# snoop -v -s 1500 -o /tmp/snoop.bge2.trace -d bge2 host 69.59.189.68 &

which will capture traffic on all ethernet interfaces

SM

unread,

Jan 7, 2008, 2:32:27 PM1/7/08

to

At 19:18 06-01-2008, Mark Andrews wrote:
> > A check of our BIND query log shows lots of queries from one of our
> > mail machines; here is one query.
> >
> > 06-Jan-2008 17:38:01.101 queries: info:
> > client 146.137.96.51#41548: query:
> > achilles.ctd.anl.gov.dob.sibl.support-intelligence.net IN A +
>

> Looks like a dob.sibl.support-intelligence.net is in a search
> list and the application is not RFC 1535 compliant.

It's not a search list issue. dob.sibl.support-intelligence.net is a
DNSRBL. The above is a URI lookup.

Regards,
-sm

Barry Finkel

unread,

Jan 7, 2008, 4:42:14 PM1/7/08

to

On 6-Jan-08, at 11:05 AM, Barry Finkel wrote:

>>>> I am seeing lots of messages like this one from BIND-9.4.1-P1:
>>>>
>>>> [ID 873579 daemon.info] dispatch b090ef8:
>>>> shutting down due to TCP receive error: 69.59.189.68#53:
>>>> connection reset

Mark Andrews replied,

>>> I suspect the nameserver has some sort of filtering box in
>>> front of it that is attempting to determine if the client
>>> is real or spoofed. A "real" client will try TCP on seeing
>>> "tc" even if this is not strictly true for UDP only
>>> client/stacks. This then turns just about all the UDP
>>> queries into TCP queries. If the nameserver behind gets
>>> overwhelmed with TCP connections it will start sending out
>>> RST. Self inflicted TCP SYN DoS. There is a reason DNS
>>> uses UDP in the first place.

>>> This sort of "solution" does not scale.

>>> From the trace below the filtering box is keeping state
>>> because subsequent UDP queries get through. This doesn't
>>> help much as many clients only ask a single question of a
>>> nameserver as A and AAAA queries often go to different
>>> nameservers. If the filtering boxes were to share state
>>> there would be less problems.

At 18:58 06-01-2008, Barry Finkel replied:

>>A check of our BIND query log shows lots of queries from one of our
>>mail machines; here is one query.
>>
>> 06-Jan-2008 17:38:01.101 queries: info:
>> client 146.137.96.51#41548: query:
>> achilles.ctd.anl.gov.dob.sibl.support-intelligence.net IN A +
>>

>>I do not have access to that mail machine, so I am copying the
>>administrators of that machine, who might be able to tell me why these
>>queries are happening.

And SM replied to me off-list:
>dob.sibl.support-intelligence.net is a blacklist. These queries are
>most probably generated by SpamAssassin when mail is scanned. There
>are reports of similar problems for DNS queries against that blacklist.
>
>Regards,
>-sm

I have gotten a snoop trace, and I have reviewed it. A UDP query

Address(19.36.221.140.dob.sibl.support-intelligence.net.)

(packet #6) gets a return UDP packet (#11) with

AA, TC, 0 answer sections, 0 auth sections, and 0 additional sects.

There is truncation involved. In the trace, I see these TCP packets
following the TC UDP packet:

oberon ==> dob.sibl.support-intelligence.net.
oberon <== dob.sibl.support-intelligence.net.
P# Time
-- ----------- --- --------------------------------------
14 08:03:16.18 ==> (DNS) syn 3389810912 0
21 08:03:16.24 <== (DNS) ack syn 984205940 3389810913
22 08:03:16.24 ==> (DNS) ack 3389810913 984205941
31 08:03:16.29 <== (DNS) ack reset 984205941 3389810913

The dob server resets the TCP connection very quickly.
The trace is confusing, as my DNS server oberon has sent five queries
in a short period of time (packets 6-10, all at 08:03:16.13), and so
there are five successive TC responses and then five different TCP
streams to the dob nameserver. I think I have extracted the correct
TCP packets in the four I have reduced above.

A Google search of the dob domain retrieves web pages with similar
complaints.