I have same problem,
OS: Vista SP1
chrome: 4.0.249.78 with flashblock,Google Dictionary and Speeddial extension
modem: TP-Link TD-8817
wireless router: D-Link DIR-451
both with latest firmware
problem starts with unresolved DNS, then my wireless router restart.
the wireless router uses my ISP dns server or OpenDNS and the problem
persist.
it solved by turning off the DNS prefetching option.
Chrome seems faster that way than when the option is on (before the problem
occurred)
--
You received this message because you are listed in the owner
or CC fields of this issue, or because you starred this issue.
You may adjust your issue notification preferences at:
http://code.google.com/hosting/settings
I'm running Chromium with: 4.0.249.78 (36714) and I'm having similar issues.
What I've determined is that it Chromium's DNS lookups are creating too
many active
connections. On my routers' /proc/net/ip_conntrack, it's listing several UDP
connections open with a time to live of up to 3600 seconds (1 hour).
Turning off DNS
prefetch helps the situation, but open enough websites and you can quickly
build up a
good amount of these and reach your routers' max allowed connections. An
example
ip_conntrack line is:
udp 17 3590 src=192.168.1.2 dst=208.67.222.222 sport=50254 dport=53
src=208.67.222.222 dst=68.5.68.1 sport=53 dport=50254 [ASSURED] use=1
rate=85 mark=0
Firefox DNS/UDP lookups tend to only have a max TTL of 30 seconds (which is
far more
bearable for the routers).
I had a problem with scottrade's online trading platform
(trading.scottrade.com)
where the browser would just hang after a few clicks. Once I disabled DNS
prefetching
the problem went away and the web site performance improved overall. It was
so bad I
was considering leaving the brokerage, they were unaware of anyone else
reporting any
problem. If everyone were experiencing this they would've heard about it.
I'm using
chromium version 4.0.285.0 (0).
The following revision refers to this bug:
http://src.chromium.org/viewvc/chrome?view=rev&revision=42181
------------------------------------------------------------------------
r42181 | j...@chromium.org | 2010-03-19 22:41:42 -0700 (Fri, 19 Mar 2010) |
33 lines
Changed paths:
M
http://src.chromium.org/viewvc/chrome/trunk/src/chrome/browser/browser_main.cc?r1=42181&r2=42180
M
http://src.chromium.org/viewvc/chrome/trunk/src/chrome/browser/net/dns_global.cc?r1=42181&r2=42180
M
http://src.chromium.org/viewvc/chrome/trunk/src/chrome/renderer/render_view.cc?r1=42181&r2=42180
M
http://src.chromium.org/viewvc/chrome/trunk/src/net/http/http_stream_parser.cc?r1=42181&r2=42180
2 experiments: DNS prefetch limit concurrency: TCP split a packet
Some firewalls apparently try to preclude a "syn flood to host" by limiting
the number of syn's (used to open a TCP/IP socket) that are outstanding
without having received a syn-ack. Presumably this is to prevent a user
from participating in a syn-flood attack (which traditional sends a lot
of syn packets, with false return addresses, resulting in no responses).
Apparently this firewall technology has in some cases been extended
to include UDP sessions for which there has been no response, and this
may include DNS resolutions. Since the prefetcher currently resolves
as many as 8 names simultaneously, this is remarkably close to the
reported threshold of 10 un-answered connections. This test attempts
to limit connections to 2, 4, or 6, so that we can see if this helps
users.
In TCP, the RTO remains (under windows) at a full 3 seconds until after the
first ack is received. As a result, if the first data packet sent (after
the SYN) is lost, then TCP won't resend until after 3 seconds without an
ack.
As a test, we split up the first packet into two parts (the second part
containing only one byte). This is done as an A/B test, and we'll see
if we get a measurable improvement in page-load-time latency.
Finally, to get better page load stats, I adjusted the PLT histograms
so that we record a "final" time for abandoned pages when they are
closed (even if they didn't finish rendering, etc.). This should give
a much more fair PLT comparison for all network latency experiments.
BUG=3041
BUG=12754
r=mbelshe,darin
Review URL: http://codereview.chromium.org/1088002
------------------------------------------------------------------------
Comment #55 on issue 3041 by da...@chromium.org: DNS pre-fetching causes
frequent internet loss
http://code.google.com/p/chromium/issues/detail?id=3041
(No comment was entered for this change.)
Hi,
I didn't read the whole bug, but the problem described here is almost
certainly the same one described on the
corp wiki here:
http://wiki.corp.google.com/twiki/bin/view/Main/OutboundSynfloodProtectionBug
I investigated the same problem being caused by Google Maps in late '07,
across different browsers.
Briefly, some makes of wireless routers are buggy. They attempt to
implement a form of DoS protection that is
completely broken. They track "number of un-ACKd SYNs" between two IPs and
when it gets too high, they
disconnect the source IP. The intention is to automatically block an IP
that synfloods the access point.
Unfortunately they do not distinguish between inbound and outbound synflood
attacks and thus it's possible
for a sufficiently fast computer to get itself disconnected from the
internet.
I suspect the DNS prefetching is being reported as the trigger because it
generally speeds up the web and
makes it more likely that you'll have enough un-ACKd SYNs in flight
simultaneously to go over the limit.
There is no good solution to this problem. The wiki page has a list of
routers that are impacted by this
remarkably common piece of mis-design - it occurs across different
manufacturers, router models and even
firmware versions. The solution we toyed with for Maps (but never
implemented) involved separating traffic
out over several IP addresses to keep the number of un-ACKd SYNs per IP
pair below the threshold.
For Chrome, I don't know how to solve this. It will start to get worse and
worse as the web generally gets
faster. The only thing I can think of is to track unACKd SYNs in Chrome
itself somehow and then notice if the
connections all drop once it reaches a certain number. The user could then
be informed they need to restart
their router and Chrome could throttle itself to avoid crossing that
threshold again.
Comment #58 on issue 3041 by j...@chromium.org: DNS pre-fetching causes
frequent internet loss
http://code.google.com/p/chromium/issues/detail?id=3041
Summary: Work in progress. Data being gathered. We should know more this
week or
next. Hopefully we can quantify the problem better, and then decide on an
mstone-5
or 6.
There is a chance that this problem is related to the IPv6 issue in bug
http://code.google.com/p/chromium/issues/detail?id=12754
I've been fearful that the "syn flood on host" protection(?) mentioned by
Hearn was a
large part of the issue. I just did a test to see if reducing the number of
concurrent DNS speculative resolutions helped... but I did *not* see an
impact on
page load time. Hence I *think* that DNS resolutions are not significantly
tickling
the syn-flood problem.
There is a good chance (some chance?) that the IPv6 problem was having more
of an
impact on "unanswered DNS resolution counts" than the DNS prefetching
code. As a
result, it may be that the resolution of bug 12754 tremendously mitigates
this
issue.... so we'll have to see.
There is also a chance that Chrome's willingness to (rapidly) open many
connections,
such as during startup with multiple tabs, may be tickling the syn flood
detectors as
well. IF that is the case, then we'll need to pace our socket opens to
avoid the
thresholds.... but we don't yet know (for sure) what the thresholds
commonly are
(10??), and we don't know what our current distribution of socket-opens are
like.
I'll add a histogram to try to look for the high unanswered-syn counts, and
we should
be able to get more data.
One other thing to bear in mind is that DNS lookups can occur via TCP too
in some cases (don't remember exactly
when). Maybe the bug reporter has some kind of weird setup in which DNS
lookups are always done via TCP, so
aggressive DNS prefetching triggers lots of connections and thus the
synflood detectors.
Tracking un-ackd SYNs and reporting them back to us when Chrome detects an
internet hangup/disconncetion
event sounds like a great way to prove or disprove the hypothesis.
@hearn,
Some routers that have the "Syn Flood Blocking" misfeature reportedly have a
threshold that includes not only un-ACKed SYNs, but also unresponded UDP
transmissions :-/ which might then include DNS lookups over UDP. As a
result, for
those users, rapid parallel DNS resolutions could plausibly activate the
flood
blocking "protection."
Your suggestion about proving or disproving a hypothesis sounds good... but
I'm not
clear on the hypothesis, or the stats that would supply the dis/proof.
For example, we could track disconnect events (I think). We can also track
un-ackd
SYNs. What is the correlation or such that I'd want to look for and/or
expect?
There will certainly be some number of users that will have a network go
down (loss
of signal on a cable modem).
Unless I have a clear model of what the router is specifically looking for,
I can't
detect a router's response to the tripping-events. In addition, unless I'm
sure what
a syn-flood block router is going to do, I can't tell for sure that it
tripped. For
example, I've heard of some folk getting disconnected, and needing to reboot
routers... but I don't know if this is unrelated... and I've read about
short
connectivity disruption... but I haven't ever gotten details.
Can you suggest the specific stats you'd like to contrast/correlate, and
what the
precise hypothesis we could be dis/proving?
@60: TCP is used when the response data size exceeds 512 bytes, or for
tasks such as
zone transfers (http://en.wikipedia.org/wiki/Domain_Name_System)
Yeah I was thinking of a histogram of un-ackd syns just before disconnect
events. But
you're right that this would probably be very noisy due to general internet
flakyness, people suspending their laptops etc.
I have to admit, for Maps we kind of gave up trying to measure it and just
started
work on the solution. That work of course never made it to completion
because we
couldn't really prove it'd achieve anything :-(
Maybe I should keep quiet and go back to thinking about it.
One possible workaround if the situation can be detected is to have Chrome
spread DNS
lookups over many IP addresses. Google Public DNS has only one, I think,
but more
could be added. However the team that asked me to prove my case last time
is the same
team that runs the public DNS service, heh.
@hearn:
I VERY interested in making progress on this... so any concrete suggestions
would be
MOST appreciated.... or possibly even a brain storming session.
I'm also tempted to move toward (as you call it) "a solution," such as by
pacing TCP
opens, and exercising care that no more than N un-acked connections be
pending at any
time. It is plausible that an A/B test may show that (for example) we get
better
page load stats, or fewer disconnects (per attempted connects), or
something??
We already limit the number of outstanding DNS resolutions pending at any
point in
time for DNS pre-resolution. Experiments where we varied that number have
not
produced a visible difference in DNS resolution time. Perhaps I wasn't
measuring the
ratio of events (name not found? vs fonud?) caused by no resolver response.
Please someone fix this problem! Many users (like myself) have pages not
loading at random, getting the "Opps!" message, then hit refresh, a time or
two or 20, until finally getting the page to load.
The Resolving Host issue is the same way, it just happens at random.
The Not Loading Images issue, is the same way, happens at random.
One day soon, IE9 will be released, but not for XP users, and IE8 is going
to be VERY unsafe to use, and many people like me will have to eventually
switch to Firefox or Chrome because we refuse to upgrade to Vista or
Windows 7. So please someone, FIX THESE ISSUES so that the pages, will load
ALL THE TIME just like IE8.
I even had IE6 loading pages 99% of the time. But Chrome of course, only
70% of the time, if even that. IE8 is even greater than IE6. So come on
Chrome developers, GET THIS BROWSER BETTER AT BEING COMPATIBLE WITH THE WEB
than IE8 for goodness sakes!
Actually Google public DNS doesn't keep personally identifiable logs, see
the FAQ:
http://code.google.com/speed/public-dns/faq.html#privacy
It's offered as a service to make the net faster, not as a way to improve
advertising.
It sounds like your issue may not be the one discussed in this bug though.
Switching to a different DNS server shouldn't resolve this issue unless for
some reason your system was connecting to your ISP DNS via TCP which is
unlikely. A more likely problem is there's some problem with Comcasts DNS.
Issues with ISP provided DNS is one reason we set up Google Public DNS so
it's good to hear it worked for you.
Comment #70 on issue 3041 by laf...@chromium.org: DNS pre-fetching causes
frequent internet loss
http://code.google.com/p/chromium/issues/detail?id=3041
It's not clear to me what is actionable on this bug, the thread has gone on
since 2008. It looks like we've made progress, though perhaps we should
break whatever remaining work there is out into smaller separate bugs
(which hopefully themselves won't span 2 years).
For me the use of different DNS provider has had little effect (tried
Google Public DNS, OpenDNS and the ISP's own). Chrome bizarrely still has
the slowest DNS resolution of any app I use, regardless if prefetching is
enabled or not (though prefetching does make it worse in most cases).
UPDATE
Well so far everything has been great. No major problems with pages
loading. So the solution is to get a faster DNS, obviously. (I've tried
unchecking the DNS prefetching but didn't make much difference if any)
Hopefully Chrome will eventually improve to the point that it can function
properly with nearly ALL DNS's.
This issue very looks like an issue of the router's firmware, that does not
track corectly the NAT routing for incoming UDP packets, and then hangs
somewhere when trying to determine to which host to reply.
Or this could be an issue in the router's builtin DNS proxy server (that
intercepts the UDP requests from hosts on the private LAN, and then
performs outgoing UDP requests to the upstream DNS provider) : if hosts on
the LAN are performing DNS requests too fast (and the upstream DNS server
from your ISP is too slow to reply), then the DNS proxy firmware will have
the number of ongoing requests initiated by host on the LAN exceeding its
capacity, and possibly causing a buffer overflow in the router's DNS proxy
implementation, as it will attempt to allocate another slot for more DNS
queries. Such overflow in the maximum number of ongoing requests may cause
the router's firmware to hang (and then the router to reboot, closing your
connection and all other Internet sessions).
You should contact the manufacturer of your DSL modem-router (or provider
because this may be your ISP that leased you this modem for no additional
cost, with your ISP subscription) to fix the implementation of the DNS
proxy, or the implementation of the NAT routing (if the router does not
implement a DNS proxy): request a firmware upgrade.
And you should contact your ISP because its upstream DNS server is really
too slow, and this causes multiple DNS requests to fail on your PC because
your router proxy cannot store many ongoind requests.
If you have no solution from the manufacturer, but the bug is effectively
within the DNS proxy (you'll know that your router implements a DNS proxy
if the DNS server configured in your PC via DHCP is using the same private
IP address as the router itself on your LAN, such as 192.168.1.1, if your
PC gets via DHCP an address in the same block like 192.168.1.x), I can
suggest one thing:
Change your PC configuration to NOT use the default DNS serer address
coming from DHCP (i.e. the private IP address of your router), but instead
specify the IP address (on the public Internet space) of another DNS
provider: this could be the DNS servers of your ISP, but other public DNS
servers are available such as OpenDNS (visit www.opendns.org for more
details).
Anyway, I could suggest to Google to limit more strictly the maximum number
of ongoing DNS requests, depending on how fast it can get positive replies
(DNS resolution gave results or not) or failure replies (DNS error status,
can't determine if the domain effectively exists or not):
If there are failure replies coming from the DNS server, or if there are
too many timeouts without any reply, reduce the maximul number of parallel
requests that the Chrome prefetcher will initiate.
In other words, Google, may be you could use a reduction algorithm similar
to the one used to control the TCP window size :
- a slow incremental linear growth, up to a reasonnable maximum probably
not exceeding 16 maximum pending requests, if the DNS query succeeded,
- a fast exponential reduction if there's a failure by an immediate
division by 2, or even less (if a failure is detected when there is already
less than this number of outgoing requests).
- you may also use an optimized/smarter tuning to allow retrying a single
failure from a selective request, notably if the failed request was
detected after other requests succeeded (despite they were initiated AFTER
the failed request): this can be similar to the behavior of SACK retries in
TCP, which avoids a dramatic reduction of the window size just because of a
single randomly occuring lost packet, by allowing up to one half of the
pending DNS requests to be retried selectively, before dividing the window
size by 2 (in this case, you'll just have to reduce the window size by 1
for each request retried).
- manage correctly the delays for retrying any failed DNS request.
- include in Chrome a special sdiagnostic page (about:dns) that users
should be able to open in a tab, showing some statistics about the internal
DNS resolver. This page (about:chrome) should be within the menu of
advanced "Tools", showing internal status of the browser, and also proving
links to the special page "about:plugin".
Note that some ISPs have implemented limitations in the frequency of DNS
requests you can perform. This was not really for avoiding flood requests
(even if this may be useful against really malware attacks against the
ISP's DNS servers), but as an attempt to limit the usage of P2P networks
trying to connect to thousands of hosts everywhere in the world, using
various domain names to get the list of IP addresses of other hosts to
query or to connect to...
In some case, this "pacing" parameter for the number of allowed DNS
resolutions is **very small** (and in fact very inefficient for fighting
the use of P2P protocols).
But this pacing is still very frequent on mobile Internet access through
3G/4G mobile networks and public WiFi hotspots, where the use of P2P
protocols, as well as some direct messaging applications or telephony
applications, is explicitly forbidden by the contractual subscription terms
or by the less explicit "reasonnable usage" policy requiring a strictly
personnal usage.
Normally these terms of services allow simple web browsing, but the pacing
parameters were designed at a time where browsers could not perform so many
DNS requests, and at least did not perform them in a "prefetch" manner.
That's why Google Chrome should be smarter about how to pace its own usage
of DNS requests, and so use an autoadaptative behavior that will make ISPs
happy, on their DNS servers, as well as owners of bogous/limited routers.
Note also tha DNS pacing limits are also used by ISPs as a way to limit the
damages made by spamwares infecting the PCs used by their subscribers, as
some classes of spamwares are trying to send tons of emails to lots of
distinct domains.
But more recent classes of spamwares are downloading lists of recipients
grouped by domain names, so that their usage of DNS requests is
considerably slower. The pacing parameter on permitted DNS requests and
implemented by those ISPs are not very successful in this case to limit the
damages caused by spamwares
But ISPs will still keep these pacing rules at a "reasonnable" limit, as it
helps maintaining such possibility closed to spamware authors. These limits
should not affect the normal usage of browsers, if they also implement a
smarter policy when performing their own DNS resolutions.
So, DNS prefetching is an interesting optimization of the browser, to speed
up the navigation a bit (and it is certainly much less costly in terms of
wasted bandwidth and requests, than performing multiple page preloadings on
all link targets found when parsing the loaded HTML page, as the browser
will then behave exactly like an indexing bot that should, at the minimum,
implement and respect the "/robots.txt" policy requested by the visited web
sites).
But such DNS prefetching should not be implemented at the price of
interoperability with what ISP's are considering a "reasonnable" usage of
the Internet access and of ISP's services such as the DNS servers they
provide to their subscribers.
And this prefetching should also not consume all the available socket
ressources available on the local OS, and not leak the used sockets if a
request never terminates normally: such leakage of OS sockets (within the
DNS Prefetcher) may explain why Chrome may completely block all new IP
traffic from any other application on the same host after some time,
including a simple "ping" command (which requires the allocation and use of
one new "raw" socket or one ICMP datagram socket), until the browser is
completely closed.
I don't know if this helps, but all my tests performed above were done on a
clean install of Windows XP SP3, meaning......
I wrote zero's (full) to the harddrive.
I installed a copy of Windows XP SP3.
I installed IE8.
I installed all security updates
I installed Chrome.
I had all the problems mentioned. None with IE8, IE8 loaded any page
ALWAYS. Chrome did not. It was nearly always a "Opps!" message,
or "Resolving Host", or some other blank page message.
Then I used Google's DNS. Problem solved.
I also use a wired connection, no wireless.
The tests suggest that the number of parallel resolutions may be correlated
with the failure rate of resolutions (ration of successful resolution to
failed resolutions). It also appears that the speculative resolutions are
not significantly correlated with the rate of failed resolutions (perhaps
because they are already constrained to a smallish parallel-number, and
have congestion avoidance built in to discard the speculative queue asap
when resolution delays appear).
As a result, adding the following command line flag may be helpful to users
on this bug list that are encountering this problem:
--host-resolver-parallelism=8
The current (internal) default is 50 parallel resolutions. FF reportedly
has a maximum of 8 parallel resolution (as would be set by the above flag
in Chromium).
I'll craft a CL to reduce the max-parallelism to this value of 8, and also
reduce the maximum simultaneous speculative resolutions (closer to the FF
max of 3).
With those overall maximums in place, we'll then work to use additional
experiments to fine tune these setting.... and we'll watch this bug with
interest to see if the above command line switch helps anyone.