Signed-off-by: Andreas Petlund <apet...@simula.no>
---
Documentation/networking/tcp-thin.txt | 47 +++++++++++++++++++++++++++++++++
include/net/tcp.h | 7 +++++
2 files changed, 54 insertions(+), 0 deletions(-)
create mode 100644 Documentation/networking/tcp-thin.txt
diff --git a/Documentation/networking/tcp-thin.txt b/Documentation/networking/tcp-thin.txt
new file mode 100644
index 0000000..151e229
--- /dev/null
+++ b/Documentation/networking/tcp-thin.txt
@@ -0,0 +1,47 @@
+Thin-streams and TCP
+====================
+A wide range of Internet-based services that use reliable transport
+protocols display what we call thin-stream properties. This means
+that the application sends data with such a low rate that the
+retransmission mechanisms of the transport protocol are not fully
+effective. In time-dependent scenarios (like online games, control
+systems, stock trading etc.) where the user experience depends
+on the data delivery latency, packet loss can be devastating for
+the service quality. Extreme latencies are caused by TCP's
+dependency on the arrival of new data from the application to trigger
+retransmissions effectively through fast retransmit instead of
+waiting for long timeouts.
+
+After analysing a large number of time-dependent interactive
+applications, we have seen that they often produce thin streams
+and also stay with this traffic pattern throughout its entire
+lifespan. The combination of time-dependency and the fact that the
+streams provoke high latencies when using TCP is unfortunate.
+
+In order to reduce application-layer latency when packets are lost,
+a set of mechanisms has been made, which address these latency issues
+for thin streams. In short, if the kernel detects a thin stream,
+the retransmission mechanisms are modified in the following manner:
+
+1) If the stream is thin, fast retransmit on the first dupACK.
+2) If the stream is thin, do not apply exponential backoff.
+
+These enhancements are applied only if the stream is detected as
+thin. This is accomplished by defining a threshold for the number
+of packets in flight. If there are less than 4 packets in flight,
+fast retransmissions can not be triggered, and the stream is prone
+to experience high retransmission latencies.
+
+Since these mechanisms are targeted at time-dependent applications,
+they must be specifically activated by the application using the
+TCP_THIN_LINEAR_TIMEOUTS and TCP_THIN_DUPACK IOCTLS or the
+tcp_thin_linear_timeouts and tcp_thin_dupack sysctls. Both
+modifications are turned off by default.
+
+References
+==========
+More information on the modifications, as well as a wide range of
+experimental data can be found here:
+"Improving latency for interactive, thin-stream applications over
+reliable transport"
+http://simula.no/research/nd/publications/Simula.nd.477/simula_pdf_file
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 87d164b..e5e2056 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -1386,6 +1386,13 @@ static inline void tcp_highest_sack_combine(struct sock *sk,
tcp_sk(sk)->highest_sack = new;
}
+/* Determines whether this is a thin stream (which may suffer from
+ * increased latency). Used to trigger latency-reducing mechanisms.*/
+static inline unsigned int tcp_stream_is_thin(struct tcp_sock *tp)
+{
+ return tp->packets_out < 4 && !tcp_in_initial_slowstart(tp);
+}
+
/* /proc */
enum tcp_seq_states {
TCP_SEQ_STATE_LISTENING,
--
1.6.3.3
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
>
> +/* Determines whether this is a thin stream (which may suffer from
> + * increased latency). Used to trigger latency-reducing mechanisms.*/
> +static inline unsigned int tcp_stream_is_thin(struct tcp_sock *tp)
> +{
> + return tp->packets_out < 4 && !tcp_in_initial_slowstart(tp);
> +}
> +
> /* /proc */
Please format comments:
/* Like this. */
or:
/* Like this.
* And this.
*/
Thanks.
Signed-off-by: Andreas Petlund <apet...@simula.no>
---
Documentation/networking/tcp-thin.txt | 47 +++++++++++++++++++++++++++++++++
include/net/tcp.h | 8 +++++
2 files changed, 55 insertions(+), 0 deletions(-)
create mode 100644 Documentation/networking/tcp-thin.txt
index 75a00c8..0bdc3f6 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -1386,6 +1386,14 @@ static inline void tcp_highest_sack_combine(struct sock *sk,
tcp_sk(sk)->highest_sack = new;
}
+/* Determines whether this is a thin stream (which may suffer from
+ * increased latency). Used to trigger latency-reducing mechanisms.
+ */
+static inline unsigned int tcp_stream_is_thin(struct tcp_sock *tp)
+{
+ return tp->packets_out < 4 && !tcp_in_initial_slowstart(tp);
+}
+
/* /proc */
enum tcp_seq_states {
TCP_SEQ_STATE_LISTENING,
--
1.6.3.3
Signed-off-by: Andreas Petlund <apet...@simula.no>
---
Documentation/networking/ip-sysctl.txt | 12 ++++++++++++
include/linux/tcp.h | 4 +++-
include/net/tcp.h | 1 +
net/ipv4/sysctl_net_ipv4.c | 7 +++++++
net/ipv4/tcp.c | 7 +++++++
net/ipv4/tcp_input.c | 12 ++++++++++++
6 files changed, 42 insertions(+), 1 deletions(-)
diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
index f147310..2571a62 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -499,6 +499,18 @@ tcp_thin_linear_timeouts - BOOLEAN
Documentation/networking/tcp-thin.txt
Default: 0
+tcp_thin_dupack - BOOLEAN
+ Enable dynamic triggering of retransmissions after one dupACK
+ for thin streams. If set, a check is performed upon reception
+ of a dupACK to determine if the stream is thin (less than 4
+ packets in flight). As long as the stream is found to be thin,
+ data is retransmitted on the first received dupACK. This
+ improves retransmission latency for non-aggressive thin
+ streams, often found to be time-dependent.
+ For more information on thin streams, see
+ Documentation/networking/tcp-thin.txt
+ Default: 0
+
UDP variables:
udp_mem - vector of 3 INTEGERs: min, pressure, max
diff --git a/include/linux/tcp.h b/include/linux/tcp.h
index 3ba8b07..a778ee0 100644
--- a/include/linux/tcp.h
+++ b/include/linux/tcp.h
@@ -104,6 +104,7 @@ enum {
#define TCP_MD5SIG 14 /* TCP MD5 Signature (RFC2385) */
#define TCP_COOKIE_TRANSACTIONS 15 /* TCP Cookie Transactions */
#define TCP_THIN_LINEAR_TIMEOUTS 16 /* Use linear timeouts for thin streams*/
+#define TCP_THIN_DUPACK 17 /* Fast retrans. after 1 dupack */
/* for TCP_INFO socket option */
#define TCPI_OPT_TIMESTAMPS 1
@@ -343,7 +344,8 @@ struct tcp_sock {
u8 frto_counter; /* Number of new acks after RTO */
u8 nonagle : 4,/* Disable Nagle algorithm? */
thin_lto : 1,/* Use linear timeouts for thin streams */
- unused : 3;
+ thin_dupack : 1,/* Fast retransmit on first dupack */
+ unused : 2;
/* RTT measurement */
u32 srtt; /* smoothed round trip time << 3 */
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 6278fc7..56f0aec 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -245,6 +245,7 @@ extern int sysctl_tcp_slow_start_after_idle;
extern int sysctl_tcp_max_ssthresh;
extern int sysctl_tcp_cookie_size;
extern int sysctl_tcp_thin_linear_timeouts;
+extern int sysctl_tcp_thin_dupack;
extern atomic_t tcp_memory_allocated;
extern struct percpu_counter tcp_sockets_allocated;
diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
index e6a2460..c1bc074 100644
--- a/net/ipv4/sysctl_net_ipv4.c
+++ b/net/ipv4/sysctl_net_ipv4.c
@@ -582,6 +582,13 @@ static struct ctl_table ipv4_table[] = {
.mode = 0644,
.proc_handler = proc_dointvec
},
+ {
+ .procname = "tcp_thin_dupack",
+ .data = &sysctl_tcp_thin_dupack,
+ .maxlen = sizeof(int),
+ .mode = 0644,
+ .proc_handler = proc_dointvec
+ },
{
.procname = "udp_mem",
.data = &sysctl_udp_mem,
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 21bae9a..5901010 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -2236,6 +2236,13 @@ static int do_tcp_setsockopt(struct sock *sk, int level,
tp->thin_lto = val;
break;
+ case TCP_THIN_DUPACK:
+ if (val < 0 || val > 1)
+ err = -EINVAL;
+ else
+ tp->thin_dupack = val;
+ break;
+
case TCP_CORK:
/* When set indicates to always queue non-full frames.
* Later the user clears this option and we transmit
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 3fddc69..8d950b9 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -89,6 +89,8 @@ int sysctl_tcp_frto __read_mostly = 2;
int sysctl_tcp_frto_response __read_mostly;
int sysctl_tcp_nometrics_save __read_mostly;
+int sysctl_tcp_thin_dupack __read_mostly;
+
int sysctl_tcp_moderate_rcvbuf __read_mostly = 1;
int sysctl_tcp_abc __read_mostly;
@@ -2447,6 +2449,16 @@ static int tcp_time_to_recover(struct sock *sk)
return 1;
}
+ /* If a thin stream is detected, retransmit after first
+ * received dupack. Employ only if SACK is supported in order
+ * to avoid possible corner-case series of spurious retransmissions
+ * Use only if there are no unsent data.
+ */
+ if ((tp->thin_dupack || sysctl_tcp_thin_dupack) &&
+ tcp_stream_is_thin(tp) && tcp_dupack_heuristics(tp) > 1 &&
+ tcp_is_sack(tp) && sk->sk_send_head == NULL)
+ return 1;
+
return 0;
Signed-off-by: Andreas Petlund <apet...@simula.no>
---
Documentation/networking/ip-sysctl.txt | 12 ++++++++++++
include/linux/tcp.h | 5 ++++-
include/net/tcp.h | 4 ++++
net/ipv4/sysctl_net_ipv4.c | 7 +++++++
net/ipv4/tcp.c | 7 +++++++
net/ipv4/tcp_timer.c | 21 ++++++++++++++++++++-
6 files changed, 54 insertions(+), 2 deletions(-)
diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
index 2dc7a1d..f147310 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -487,6 +487,18 @@ tcp_dma_copybreak - INTEGER
and CONFIG_NET_DMA is enabled.
Default: 4096
+tcp_thin_linear_timeouts - BOOLEAN
+ Enable dynamic triggering of linear timeouts for thin streams.
+ If set, a check is performed upon retransmission by timeout to
+ determine if the stream is thin (less than 4 packets in flight).
+ As long as the stream is found to be thin, up to 6 linear
+ timeouts may be performed before exponential backoff mode is
+ initiated. This improves retransmission latency for
+ non-aggressive thin streams, often found to be time-dependent.
+ For more information on thin streams, see
+ Documentation/networking/tcp-thin.txt
+ Default: 0
+
UDP variables:
udp_mem - vector of 3 INTEGERs: min, pressure, max
diff --git a/include/linux/tcp.h b/include/linux/tcp.h
index 7fee8a4..3ba8b07 100644
--- a/include/linux/tcp.h
+++ b/include/linux/tcp.h
@@ -103,6 +103,7 @@ enum {
#define TCP_CONGESTION 13 /* Congestion control algorithm */
#define TCP_MD5SIG 14 /* TCP MD5 Signature (RFC2385) */
#define TCP_COOKIE_TRANSACTIONS 15 /* TCP Cookie Transactions */
+#define TCP_THIN_LINEAR_TIMEOUTS 16 /* Use linear timeouts for thin streams*/
/* for TCP_INFO socket option */
#define TCPI_OPT_TIMESTAMPS 1
@@ -340,7 +341,9 @@ struct tcp_sock {
u32 frto_highmark; /* snd_nxt when RTO occurred */
u16 advmss; /* Advertised MSS */
u8 frto_counter; /* Number of new acks after RTO */
- u8 nonagle; /* Disable Nagle algorithm? */
+ u8 nonagle : 4,/* Disable Nagle algorithm? */
+ thin_lto : 1,/* Use linear timeouts for thin streams */
+ unused : 3;
/* RTT measurement */
u32 srtt; /* smoothed round trip time << 3 */
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 0bdc3f6..6278fc7 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -196,6 +196,9 @@ extern void tcp_time_wait(struct sock *sk, int state, int timeo);
#define TCP_NAGLE_CORK 2 /* Socket is corked */
#define TCP_NAGLE_PUSH 4 /* Cork is overridden for already queued data */
+/* TCP thin-stream limits */
+#define TCP_THIN_LINEAR_RETRIES 6 /* After 6 linear retries, do exp. backoff */
+
extern struct inet_timewait_death_row tcp_death_row;
/* sysctl variables for tcp */
@@ -241,6 +244,7 @@ extern int sysctl_tcp_workaround_signed_windows;
extern int sysctl_tcp_slow_start_after_idle;
extern int sysctl_tcp_max_ssthresh;
extern int sysctl_tcp_cookie_size;
+extern int sysctl_tcp_thin_linear_timeouts;
extern atomic_t tcp_memory_allocated;
extern struct percpu_counter tcp_sockets_allocated;
diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
index 7e3712c..e6a2460 100644
--- a/net/ipv4/sysctl_net_ipv4.c
+++ b/net/ipv4/sysctl_net_ipv4.c
@@ -576,6 +576,13 @@ static struct ctl_table ipv4_table[] = {
.proc_handler = proc_dointvec
},
{
+ .procname = "tcp_thin_linear_timeouts",
+ .data = &sysctl_tcp_thin_linear_timeouts,
+ .maxlen = sizeof(int),
+ .mode = 0644,
+ .proc_handler = proc_dointvec
+ },
+ {
.procname = "udp_mem",
.data = &sysctl_udp_mem,
.maxlen = sizeof(sysctl_udp_mem),
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index e471d03..21bae9a 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -2229,6 +2229,13 @@ static int do_tcp_setsockopt(struct sock *sk, int level,
}
break;
+ case TCP_THIN_LINEAR_TIMEOUTS:
+ if (val < 0 || val > 1)
+ err = -EINVAL;
+ else
+ tp->thin_lto = val;
+ break;
+
case TCP_CORK:
/* When set indicates to always queue non-full frames.
* Later the user clears this option and we transmit
diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c
index de7d1bf..a17629b 100644
--- a/net/ipv4/tcp_timer.c
+++ b/net/ipv4/tcp_timer.c
@@ -29,6 +29,7 @@ int sysctl_tcp_keepalive_intvl __read_mostly = TCP_KEEPALIVE_INTVL;
int sysctl_tcp_retries1 __read_mostly = TCP_RETR1;
int sysctl_tcp_retries2 __read_mostly = TCP_RETR2;
int sysctl_tcp_orphan_retries __read_mostly;
+int sysctl_tcp_thin_linear_timeouts __read_mostly;
static void tcp_write_timer(unsigned long);
static void tcp_delack_timer(unsigned long);
@@ -415,7 +416,25 @@ void tcp_retransmit_timer(struct sock *sk)
icsk->icsk_retransmits++;
out_reset_timer:
- icsk->icsk_rto = min(icsk->icsk_rto << 1, TCP_RTO_MAX);
+ /* If stream is thin, use linear timeouts. Since 'icsk_backoff' is
+ * used to reset timer, set to 0. Recalculate 'icsk_rto' as this
+ * might be increased if the stream oscillates between thin and thick,
+ * thus the old value might already be too high compared to the value
+ * set by 'tcp_set_rto' in tcp_input.c which resets the rto without
+ * backoff. Limit to TCP_THIN_LINEAR_RETRIES before initiating
+ * exponential backoff behaviour to avoid continue hammering
+ * linear-timeout retransmissions into a black hole
+ */
+ if (sk->sk_state == TCP_ESTABLISHED &&
+ (tp->thin_lto || sysctl_tcp_thin_linear_timeouts) &&
+ tcp_stream_is_thin(tp) &&
+ icsk->icsk_retransmits <= TCP_THIN_LINEAR_RETRIES) {
+ icsk->icsk_backoff = 0;
+ icsk->icsk_rto = min(__tcp_set_rto(tp), TCP_RTO_MAX);
+ } else {
+ /* Use normal (exponential) backoff */
+ icsk->icsk_rto = min(icsk->icsk_rto << 1, TCP_RTO_MAX);
+ }
inet_csk_reset_xmit_timer(sk, ICSK_TIME_RETRANS, icsk->icsk_rto, TCP_RTO_MAX);
if (retransmits_timed_out(sk, sysctl_tcp_retries1 + 1))
__sk_dst_reset(sk);
Use tcp_send_head(sk) instead.
> + return 1;
> +
> return 0;
> }
Other than that,
Acked-by: Ilpo Jᅵrvinen <ilpo.j...@helsinki.fi>
--
i.
index 3fddc69..788851c 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -89,6 +89,8 @@ int sysctl_tcp_frto __read_mostly = 2;
int sysctl_tcp_frto_response __read_mostly;
int sysctl_tcp_nometrics_save __read_mostly;
+int sysctl_tcp_thin_dupack __read_mostly;
+
int sysctl_tcp_moderate_rcvbuf __read_mostly = 1;
int sysctl_tcp_abc __read_mostly;
@@ -2447,6 +2449,16 @@ static int tcp_time_to_recover(struct sock *sk)
return 1;
}
+ /* If a thin stream is detected, retransmit after first
+ * received dupack. Employ only if SACK is supported in order
+ * to avoid possible corner-case series of spurious retransmissions
+ * Use only if there are no unsent data.
+ */
+ if ((tp->thin_dupack || sysctl_tcp_thin_dupack) &&
+ tcp_stream_is_thin(tp) && tcp_dupack_heuristics(tp) > 1 &&
+ tcp_is_sack(tp) && !tcp_send_head(sk))
+ return 1;
+
return 0;
}
> +After analysing a large number of time-dependent interactive
> +applications, we have seen that they often produce thin streams
> +and also stay with this traffic pattern throughout its entire
> +lifespan. The combination of time-dependency and the fact that the
> +streams provoke high latencies when using TCP is unfortunate.
> +
> +In order to reduce application-layer latency when packets are lost,
> +a set of mechanisms has been made, which address these latency issues
> +for thin streams. In short, if the kernel detects a thin stream,
> +the retransmission mechanisms are modified in the following manner:
> +
> +1) If the stream is thin, fast retransmit on the first dupACK.
> +2) If the stream is thin, do not apply exponential backoff.
2) seems very dangerous/unfair. If network congestion is caused just
by thin streams, will the network just fall apart?
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
> Hi!
>
>> +After analysing a large number of time-dependent interactive
>> +applications, we have seen that they often produce thin streams
>> +and also stay with this traffic pattern throughout its entire
>> +lifespan. The combination of time-dependency and the fact that the
>> +streams provoke high latencies when using TCP is unfortunate.
>> +
>> +In order to reduce application-layer latency when packets are lost,
>> +a set of mechanisms has been made, which address these latency issues
>> +for thin streams. In short, if the kernel detects a thin stream,
>> +the retransmission mechanisms are modified in the following manner:
>> +
>> +1) If the stream is thin, fast retransmit on the first dupACK.
>> +2) If the stream is thin, do not apply exponential backoff.
>
> 2) seems very dangerous/unfair. If network congestion is caused just
> by thin streams, will the network just fall apart?
and 1) can also be dangerous if we have reordering on the path.
I strongly suggest that we discuss Andreas' idea on IETF TCPM *before*
we integrate it in the kernel and enable it for everyone
Alex,
as an netdev reader and TCPM member
>
>
> --
> (english) http://www.livejournal.com/~pavelmachek
> (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majo...@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
//
// Dipl.-Inform. Alexander Zimmermann
// Department of Computer Science, Informatik 4
// RWTH Aachen University
// Ahornstr. 55, 52056 Aachen, Germany
// phone: (49-241) 80-21422, fax: (49-241) 80-22220
// email: zimme...@cs.rwth-aachen.de
// web: http://www.umic-mesh.net
//
Agree with Alexander. It is not at all clear to me that these changes are safe for non-exerimental use in the wider Internet.
Lars
>> +1) If the stream is thin, fast retransmit on the first dupACK.
>> +2) If the stream is thin, do not apply exponential backoff.
>
>2) seems very dangerous/unfair. If network congestion is caused just
>by thin streams, will the network just fall apart?
I question the fairness of this modification! The modification sounds _really_
far-reaching. Did you analyse the behaviour of this modification? What about
fairness analysis? Great work is available in the network simulator sector for
this kind of analysis (e.g. jain's fairness index).
HGN
--
Hagen Paul Pfeifer <ha...@jauu.net> || http://jauu.net/
Telephone: +49 174 5455209 || Key Id: 0x98350C22
Key Fingerprint: 490F 557B 6C48 6D7E 5706 2EA2 4A22 8D45 9835 0C22
>
> Am 21.02.2010 um 11:21 schrieb Pavel Machek:
>
> > Hi!
> >
> >> +After analysing a large number of time-dependent interactive
> >> +applications, we have seen that they often produce thin streams
> >> +and also stay with this traffic pattern throughout its entire
> >> +lifespan. The combination of time-dependency and the fact that the
> >> +streams provoke high latencies when using TCP is unfortunate.
> >> +
> >> +In order to reduce application-layer latency when packets are lost,
> >> +a set of mechanisms has been made, which address these latency issues
> >> +for thin streams. In short, if the kernel detects a thin stream,
> >> +the retransmission mechanisms are modified in the following manner:
> >> +
> >> +1) If the stream is thin, fast retransmit on the first dupACK.
> >> +2) If the stream is thin, do not apply exponential backoff.
> >
> > 2) seems very dangerous/unfair. If network congestion is caused just
> > by thin streams, will the network just fall apart?
>
> and 1) can also be dangerous if we have reordering on the path.
>
> I strongly suggest that we discuss Andreas' idea on IETF TCPM *before*
> we integrate it in the kernel and enable it for everyone
What difference you see with 1) and early rexmit when cwnd = 2, the latter
being afaict "discussed already" on TCPM?
--
i.
> Hi!
>
> > +After analysing a large number of time-dependent interactive
> > +applications, we have seen that they often produce thin streams
> > +and also stay with this traffic pattern throughout its entire
> > +lifespan. The combination of time-dependency and the fact that the
> > +streams provoke high latencies when using TCP is unfortunate.
> > +
> > +In order to reduce application-layer latency when packets are lost,
> > +a set of mechanisms has been made, which address these latency issues
> > +for thin streams. In short, if the kernel detects a thin stream,
> > +the retransmission mechanisms are modified in the following manner:
> > +
> > +1) If the stream is thin, fast retransmit on the first dupACK.
> > +2) If the stream is thin, do not apply exponential backoff.
>
> 2) seems very dangerous/unfair. If network congestion is caused just
> by thin streams, will the network just fall apart?
...Network will not fall apart. Two possible cases:
a) cwnd > 1 segment, on RTO the window is reduced, thus the sender backs
off regardless of exponential backoff.
b) All flows have window of 1 already... Well, what do you suggest? I'd
say that obviously the network is seriously underprovisioned for the
workload and since the bottleneck can only serve some of them anyway
dropping the rest, no useless work gets done at the bottleneck. RTT
estimator then gets a new samples whenever a flow belongs into the lucky
group. In case any unnecessary retransmission happened (if there was
very dramatic and sudden increase transient in the workload) they won't
happen thereafter (unless we increase the workload toward infinity).
...Thus no problem of "falling apart".
--
i.
> Agree with Alexander. It is not at all clear to me that these
> changes are safe for non-exerimental use in the wider Internet.
It's off by default and we provide all sorts of experimental
congestion control algorithms which can cause just as much
if not more trouble than these thin-stream bits.
There is no problem integrating these changes.
> If it's off by default there is no issue.
Still a sufficient large number of people might turn it on
and then the network would melt.
IMHO TCP changes with badly understood congestion behaviour
should not be merged until the necessary analysis/simulation etc.
is done.
-Andi
--
a...@linux.intel.com -- Speaking for myself only.
> Lars Eggert <lars....@nokia.com> writes:
>
>> If it's off by default there is no issue.
>
> Still a sufficient large number of people might turn it on
> and then the network would melt.
>
> IMHO TCP changes with badly understood congestion behaviour
> should not be merged until the necessary analysis/simulation etc.
> is done.
>
> -Andi
100% ack.
See for example here: http://www.umic-mesh.net/~zimmermann/linux.png
It's a screenshot of the ps3mediaserver. Since it's a streaming server, it's
very useful to deactivate Nagle :-)
Alex
>
> --
> a...@linux.intel.com -- Speaking for myself only.
//
// Dipl.-Inform. Alexander Zimmermann
// Department of Computer Science, Informatik 4
// RWTH Aachen University
// Ahornstr. 55, 52056 Aachen, Germany
// phone: (49-241) 80-21422, fax: (49-241) 80-22220
// email: zimme...@cs.rwth-aachen.de
// web: http://www.umic-mesh.net
//
--
Graphs from our experiments can be found here:
http://simula.no/research/nd/publications/Simula.nd.477/simula_pdf_file
Section 5.2.5 covers experiments on fairness-issues. We have found no
adverse effects of the modifications submitted in this patch set (mFR
and LT in the plot), but further testing cannot hurt.
Best regards,
Andreas
I also encourage you to read the discussions from the first posted
version of the patch set, since I here reference a lot of the
supporting research and related material from other researchers.
http://thread.gmane.org/gmane.linux.network/141394
http://thread.gmane.org/gmane.linux.network/141395
http://thread.gmane.org/gmane.linux.network/141396
http://thread.gmane.org/gmane.linux.network/141397
Cheers,
Andreas
Then we should apply the same policy as used for the congestion control
algorithms!? Restrict the usage to change the standard behavior via Kconfig
(e.g. menuconfig TCP_CONG_ADVANCED) or even more restricted demand for
CAP_NET_ADMIN.
Hagen
> Lars Eggert <lars....@nokia.com> writes:
>
>> If it's off by default there is no issue.
>
> Still a sufficient large number of people might turn it on
> and then the network would melt.
And they can do the same by changing the initial congestion window
setting, which we also support.
The foundation of your argument is as solid as quicksand.