OK, so I've adjusted the MTU on my Linux machine's ethernet device
down, way down, all the way down to 576. I can't get the problem to
go away consistently.
I wrote the following little shell script to see if I could catch the
problem occurring.
#!/bin/sh
h=${1:-74.125.67.100}
while wget http://$h -O /dev/null
do
sleep 15
done
for n in 1500 1464 1444 1424 1404 1384 1364 1344 1324 1304 1284 1264
1244 1224 1204
do
output=`ping -c 1 -M do -s $n $h`
echo "$output"
echo $output | grep "Frag needed" > /dev/null || break
done
It does a repeated wget on google.com or some other web site supplied
on the command line, until the wget fails. Then it immediately does a
ping with several different packet sizes until it finds the largest
MTU that works. (Yes, I know 1500 is never going to work.) So when
the problem is happening, I ought to be able to catch it and
approximate the best MTU, right?
But noooo. In every case when wget fails, the ping -s 1464 works,
which means the maximum MTU of 1492 for my PPPoE connection is
perfectly good.
--2009-11-18 18:33:37-- http://74.125.67.100/
Connecting to 74.125.67.100:80... failed: Network is unreachable.
PING 74.125.67.100 (74.125.67.100) 1500(1528) bytes of data.
From 192.168.0.197 icmp_seq=1 Frag needed and DF set (mtu = 1492)
--- 74.125.67.100 ping statistics ---
0 packets transmitted, 0 received, +1 errors
PING 74.125.67.100 (74.125.67.100) 1464(1492) bytes of data.
72 bytes from 74.125.67.100: icmp_seq=1 ttl=44 (truncated)
--- 74.125.67.100 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 127.479/127.479/127.479/0.000 ms
So how can the problem be a bad MTU?
Just to add complications to this, I'm unable to reproduce the problem
when I connect a Windows box directly to the Internet by PPPoE,
bypassing the router. This would seem to implicate the router (a
Linksys WRT54G v 8.0 running DD-WRT). I tried replacing it with a
brand new Dlink DIR-655, and then another one (!) with essentially the
same results. This sure feels like a configuration problem rather
than three bad routers in a row.
Should the MTU setting on the router match the one on the computer?
Has anyone seen a similar issue that was not related to MTU?
> Should the MTU setting on the router match the one on the computer?
> Has anyone seen a similar issue that was not related to MTU?
Getting a tcpdump of the problem will likely be very helpful. But to
answer your question -- all devices on the same LAN should have the
same MTU, but a router need not have the same MTU on all of its
interfaces. My bet is the router has a 1,500-byte MTU on its LAN
interface, and so every other device on that LAN should also have a
1,500 byte MTU.
DS
> I've been troubleshooting an intermittent network problem for several
> days and could use a clue. I have a high-speed wireless broadband
> connection from my ISP. My router uses PPPoE to connect to the
> Internet. At intervals, I'm unable to load a web page or I only get the
> page partially loaded. When I try loading the page from the command
> line using wget, I get a "Network is unreachable" error. In every case
> I'm able to ping the site. Searching for answers always leads to
> instructions for adjusting my MTU settings.
Thorough analysis, and I cannot make head or tails from it either. Some
suggestions:
- I would start tcpdump and look which host is sending the Network is
unreachable error. This may give some clue on where to start searching
further.
- Is there maybe a transparent proxy involved? Does this problem occur on
other tcp connections as well or only with http over port 80?
- Do note that many Linksyses do some form of clamp-mss-to-mtu. If this
implementation is broken, it may actually increase the mss so lowering
your mtu may not do what you think it does. However, this alone does not
explain your problems.
As it is an intermittent problem, I doubt it is a MTU problem. MTU
problems are intermittent because they occur only when packet sizes
exceed mtu size. When you do a repeated wget of the same page, you get
(roughly) the same stream always and should either always trigger the
problem or always succeed. I really, really doubt this is MTU related.
Having seen many MTU related problems, I'm actually fairly confident in
this, but your problem does not make sense right now so don't rule it out
completely.
HTH,
M4
I ran wget 74.125.67.100 repeatedly, and waited for symptoms. Output
of
tcpdump 'tcp port 80 and host 74.125.67.100'
at the time of the symptoms, is:
16:06:14.628041 IP don-desktop.local.34499 > gw-in-f100.1e100.net.www:
Flags [S], seq 2205816329, win 5840, options [mss 1460,sackOK,TS val
2409508 ecr 0,nop,wscale 7], length 0
FWIW, right now, MTU on the router happens to be set to 1300.
> - Is there maybe a transparent proxy involved? Does this problem occur on
> other tcp connections as well or only with http over port 80?
To my knowledge there's no proxy. Certainly I'm not configured at my
end for a proxy. When I hit a web page that's 404, I don't get a page-
not-found from Squid or some other proxy, I get it from the web server
I expect.
When it's really bad (no port 80 connections seem to work at all) I am
also unable to connect to POP mail (port 110), and connections over
Tor start to fail as well.
Don a ᅵcrit :
> On Nov 19, 2:43 pm, Martijn Lievaart <m...@rtij.nl.invlalid> wrote:
>>
>> - I would start tcpdump and look which host is sending the Network is
>> unreachable error. This may give some clue on where to start searching
>> further.
>
> I ran wget 74.125.67.100 repeatedly, and waited for symptoms. Output
> of
>
> tcpdump 'tcp port 80 and host 74.125.67.100'
This won't capture the ICMP error messages that Martijn suggested.
>> - I would start tcpdump and look which host is sending the Network is
Correction, get wireshark so you can really look at the packets.
>> unreachable error. This may give some clue on where to start searching
>> further.
>
> I ran wget 74.125.67.100 repeatedly, and waited for symptoms. Output of
>
> tcpdump 'tcp port 80 and host 74.125.67.100'
Ah, but now you don't get the icmp errors. Start a full session, no
filter (except filtering out known local traffic and the ssh session you
use) and find out which host exactly sends those icmp-errors.
>
> at the time of the symptoms, is:
>
> 16:06:14.628041 IP don-desktop.local.34499 > gw-in-f100.1e100.net.www:
> Flags [S], seq 2205816329, win 5840, options [mss 1460,sackOK,TS val
> 2409508 ecr 0,nop,wscale 7], length 0
>
> FWIW, right now, MTU on the router happens to be set to 1300.
As I don't see a DF in the above packet, that should be immaterial.
M4
Oops.
tcpdump -v '(tcp port 80 or icmp) and host 74.125.67.100'
17:42:49.031583 IP (tos 0x0, ttl 64, id 65297, offset 0, flags [DF],
proto TCP (6), length 60)
don-desktop.local.52416 > gw-in-f100.1e100.net.www: Flags [S],
cksum 0x69ee (correct), seq 2862310532, win 5840, options [mss
1460,sackOK,TS val 2988949 ecr 0,nop,wscale 7], length 0
In a separate session, I simultaneously ran:
tcpdump -v icmp
17:42:35.654586 IP (tos 0xc0, ttl 63, id 21591, offset 0, flags
[none], proto ICMP (1), length 92)
204.11.182.193 > don-desktop.local: ICMP net 204.9.177.195
unreachable, length 72
IP (tos 0x0, ttl 62, id 22579, offset 0, flags [DF], proto TCP (6),
length 64)
don-desktop.local.40506 > 204.9.177.195.www: Flags [.], ack
2471659815, win 501, options [nop,nop,TS[|tcp]>
17:42:38.074492 IP (tos 0xc0, ttl 63, id 21592, offset 0, flags
[none], proto ICMP (1), length 80)
204.11.182.193 > don-desktop.local: ICMP net 204.9.177.195
unreachable, length 60
IP (tos 0x0, ttl 62, id 22580, offset 0, flags [DF], proto TCP (6),
length 52)
don-desktop.local.40506 > 204.9.177.195.www: Flags [F.], seq 0,
ack 1, win 501, options [nop,nop,TS[|tcp]>
17:42:38.484544 IP (tos 0xc0, ttl 63, id 21593, offset 0, flags
[none], proto ICMP (1), length 80)
204.11.182.193 > don-desktop.local: ICMP net 204.9.177.195
unreachable, length 60
IP (tos 0x0, ttl 62, id 22581, offset 0, flags [DF], proto TCP (6),
length 52)
don-desktop.local.40506 > 204.9.177.195.www: Flags [F.], seq 0,
ack 1, win 501, options [nop,nop,TS[|tcp]>
17:42:39.299537 IP (tos 0xc0, ttl 63, id 21594, offset 0, flags
[none], proto ICMP (1), length 80)
204.11.182.193 > don-desktop.local: ICMP net 204.9.177.195
unreachable, length 60
IP (tos 0x0, ttl 62, id 22582, offset 0, flags [DF], proto TCP (6),
length 52)
don-desktop.local.40506 > 204.9.177.195.www: Flags [F.], seq 0,
ack 1, win 501, options [nop,nop,TS[|tcp]>
17:42:40.949609 IP (tos 0xc0, ttl 63, id 21595, offset 0, flags
[none], proto ICMP (1), length 80)
204.11.182.193 > don-desktop.local: ICMP net 204.9.177.195
unreachable, length 60
IP (tos 0x0, ttl 62, id 22583, offset 0, flags [DF], proto TCP (6),
length 52)
don-desktop.local.40506 > 204.9.177.195.www: Flags [F.], seq 0,
ack 1, win 501, options [nop,nop,TS[|tcp]>
17:42:49.049416 IP (tos 0xc0, ttl 63, id 21596, offset 0, flags
[none], proto ICMP (1), length 88)
204.11.182.193 > don-desktop.local: ICMP net gw-in-f100.1e100.net
unreachable, length 68
IP (tos 0x0, ttl 62, id 65297, offset 0, flags [DF], proto TCP (6),
length 60)
don-desktop.local.52416 > gw-in-f100.1e100.net.www: Flags [S], seq
2862310532, win 5840, options [mss 1260,sackOK,TS[|tcp]>
tracepath 74.125.67.100
1: don-desktop.local (192.168.0.197) 0.083ms
pmtu 1500
1: DD-WRT (192.168.0.1) 1.462ms
1: DD-WRT (192.168.0.1) 1.137ms
2: DD-WRT (192.168.0.1) 1.082ms
pmtu 1300
2: 204.11.182.193 (204.11.182.193) 40.953ms
3: 10.194.7.129 (10.194.7.129) 41.181ms
4: 10.194.5.129 (10.194.5.129) 40.230ms
5: ip-204-11.183.245.jagwireless.net (204.11.183.245) 40.533ms
6: ip-204-11.183.250.jagwireless.net (204.11.183.250) 43.520ms
7: cosentry-edge.loganet.net (204.11.183.254) 53.501ms
8: 216.58.226.33 (216.58.226.33) 46.663ms
9: 216.58.224.157 (216.58.224.157) 49.714ms
10: 64.253.175.189 (64.253.175.189) 44.649ms
11: 66.37.238.201 (66.37.238.201) 49.398ms
12: so-9-1.car2.StLouis1.Level3.net (4.79.134.45) 69.054ms
asymm 13
13: ae-4-4.ebr2.Chicago1.Level3.net (4.69.132.190) 66.600ms
asymm 14
14: ae-3.ebr2.Atlanta2.Level3.net (4.69.132.74) 86.359ms
asymm 15
15: ae-21-52.car1.Atlanta1.Level3.net (4.68.103.34) 91.338ms
asymm 16
16: no reply
17: no reply
...
31: no reply
Too many hops: pmtu 1300
Resume: pmtu 1300
This won't capture the ICMP error messages from intermediate routers or
firewalls.
> 17:42:49.031583 IP (tos 0x0, ttl 64, id 65297, offset 0, flags [DF],
> proto TCP (6), length 60)
> don-desktop.local.52416 > gw-in-f100.1e100.net.www: Flags [S],
> cksum 0x69ee (correct), seq 2862310532, win 5840, options [mss
> 1460,sackOK,TS val 2988949 ecr 0,nop,wscale 7], length 0
Please use -n to disable name lookups.
Doesn't tcpdump tell you all you need to know about the TCP layer?
Additionally it outputs in a text format which can be quoted here.
I find it's often more convenient to capture the traffic with tcpdump,
and analyzing it later with Wireshark and/or tcpdump and various other
tools.
/Jorgen
--
// Jorgen Grahn <grahn@ Oo o. . .
\X/ snipabacken.se> O o .
tcpdump -v -n
tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size
96 bytes
08:05:29.687557 IP (tos 0x0, ttl 64, id 50377, offset 0, flags [DF],
proto TCP (6), length 60)
192.168.0.197.53412 > 74.125.67.100.80: Flags [S], cksum 0x471c
(correct), seq 2994236676, win 5840, options [mss 1460,sackOK,TS val
8825293 ecr 0,nop,wscale 7], length 0
08:05:29.702671 IP (tos 0xc0, ttl 63, id 4495, offset 0, flags [none],
proto ICMP (1), length 88)
204.11.182.193 > 192.168.0.197: ICMP net 74.125.67.100
unreachable, length 68
IP (tos 0x0, ttl 62, id 50377, offset 0, flags [DF], proto TCP (6),
length 60)
192.168.0.197.53412 > 74.125.67.100.80: Flags [S], seq 2994236676,
win 5840, options [mss 1260,sackOK,TS[|tcp]>
So the answer is, the "unreachable" message came from 204.11.182.193,
which is the default gateway for the router after connecting.
Time to start beating up my ISP?
Seems like it. If it is a really good ISP you can send this TCP dump and
they should be able to investigate on this basis. Unfortunately, many
ISPs have too many layers between the customer and the techie.
Would be perfect if you could run tcpdump with -w and send them the file
too.
M4