Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

[net-next PATCH v4 1/3] net: TCP thin-stream detection

4 views
Skip to first unread message

Andreas Petlund

unread,
Feb 16, 2010, 9:50:01 AM2/16/10
to
Major changes: Added thin-stream info in new file:
Documentation/networking/tcp-thin.txt

Signed-off-by: Andreas Petlund <apet...@simula.no>
---
Documentation/networking/tcp-thin.txt | 47 +++++++++++++++++++++++++++++++++
include/net/tcp.h | 7 +++++
2 files changed, 54 insertions(+), 0 deletions(-)
create mode 100644 Documentation/networking/tcp-thin.txt

diff --git a/Documentation/networking/tcp-thin.txt b/Documentation/networking/tcp-thin.txt
new file mode 100644
index 0000000..151e229
--- /dev/null
+++ b/Documentation/networking/tcp-thin.txt
@@ -0,0 +1,47 @@
+Thin-streams and TCP
+====================
+A wide range of Internet-based services that use reliable transport
+protocols display what we call thin-stream properties. This means
+that the application sends data with such a low rate that the
+retransmission mechanisms of the transport protocol are not fully
+effective. In time-dependent scenarios (like online games, control
+systems, stock trading etc.) where the user experience depends
+on the data delivery latency, packet loss can be devastating for
+the service quality. Extreme latencies are caused by TCP's
+dependency on the arrival of new data from the application to trigger
+retransmissions effectively through fast retransmit instead of
+waiting for long timeouts.
+
+After analysing a large number of time-dependent interactive
+applications, we have seen that they often produce thin streams
+and also stay with this traffic pattern throughout its entire
+lifespan. The combination of time-dependency and the fact that the
+streams provoke high latencies when using TCP is unfortunate.
+
+In order to reduce application-layer latency when packets are lost,
+a set of mechanisms has been made, which address these latency issues
+for thin streams. In short, if the kernel detects a thin stream,
+the retransmission mechanisms are modified in the following manner:
+
+1) If the stream is thin, fast retransmit on the first dupACK.
+2) If the stream is thin, do not apply exponential backoff.
+
+These enhancements are applied only if the stream is detected as
+thin. This is accomplished by defining a threshold for the number
+of packets in flight. If there are less than 4 packets in flight,
+fast retransmissions can not be triggered, and the stream is prone
+to experience high retransmission latencies.
+
+Since these mechanisms are targeted at time-dependent applications,
+they must be specifically activated by the application using the
+TCP_THIN_LINEAR_TIMEOUTS and TCP_THIN_DUPACK IOCTLS or the
+tcp_thin_linear_timeouts and tcp_thin_dupack sysctls. Both
+modifications are turned off by default.
+
+References
+==========
+More information on the modifications, as well as a wide range of
+experimental data can be found here:
+"Improving latency for interactive, thin-stream applications over
+reliable transport"
+http://simula.no/research/nd/publications/Simula.nd.477/simula_pdf_file
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 87d164b..e5e2056 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -1386,6 +1386,13 @@ static inline void tcp_highest_sack_combine(struct sock *sk,
tcp_sk(sk)->highest_sack = new;
}

+/* Determines whether this is a thin stream (which may suffer from
+ * increased latency). Used to trigger latency-reducing mechanisms.*/
+static inline unsigned int tcp_stream_is_thin(struct tcp_sock *tp)
+{
+ return tp->packets_out < 4 && !tcp_in_initial_slowstart(tp);
+}
+
/* /proc */
enum tcp_seq_states {
TCP_SEQ_STATE_LISTENING,
--
1.6.3.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

David Miller

unread,
Feb 17, 2010, 7:40:01 PM2/17/10
to
From: Andreas Petlund <apet...@simula.no>
Date: Tue, 16 Feb 2010 15:40:32 +0100

>
> +/* Determines whether this is a thin stream (which may suffer from
> + * increased latency). Used to trigger latency-reducing mechanisms.*/
> +static inline unsigned int tcp_stream_is_thin(struct tcp_sock *tp)
> +{
> + return tp->packets_out < 4 && !tcp_in_initial_slowstart(tp);
> +}
> +
> /* /proc */

Please format comments:

/* Like this. */

or:

/* Like this.
* And this.
*/

Thanks.

Andreas Petlund

unread,
Feb 18, 2010, 7:50:02 AM2/18/10
to
Inline function to dynamically detect thin streams based on
the number of packets in flight. Used to dynamically trigger
thin-stream mechanisms if enabled by ioctl or sysctl.

Signed-off-by: Andreas Petlund <apet...@simula.no>
---
Documentation/networking/tcp-thin.txt | 47 +++++++++++++++++++++++++++++++++

include/net/tcp.h | 8 +++++
2 files changed, 55 insertions(+), 0 deletions(-)
create mode 100644 Documentation/networking/tcp-thin.txt

index 75a00c8..0bdc3f6 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -1386,6 +1386,14 @@ static inline void tcp_highest_sack_combine(struct sock *sk,
tcp_sk(sk)->highest_sack = new;
}


+/* Determines whether this is a thin stream (which may suffer from
+ * increased latency). Used to trigger latency-reducing mechanisms.

+ */


+static inline unsigned int tcp_stream_is_thin(struct tcp_sock *tp)
+{
+ return tp->packets_out < 4 && !tcp_in_initial_slowstart(tp);
+}
+
/* /proc */

enum tcp_seq_states {
TCP_SEQ_STATE_LISTENING,
--
1.6.3.3

Andreas Petlund

unread,
Feb 18, 2010, 7:50:03 AM2/18/10
to
This patch enables fast retransmissions after one dupACK for
TCP if the stream is identified as thin. This will reduce
latencies for thin streams that are not able to trigger fast
retransmissions due to high packet interarrival time. This
mechanism is only active if enabled by iocontrol or syscontrol
and the stream is identified as thin.


Signed-off-by: Andreas Petlund <apet...@simula.no>
---

Documentation/networking/ip-sysctl.txt | 12 ++++++++++++
include/linux/tcp.h | 4 +++-
include/net/tcp.h | 1 +
net/ipv4/sysctl_net_ipv4.c | 7 +++++++
net/ipv4/tcp.c | 7 +++++++
net/ipv4/tcp_input.c | 12 ++++++++++++
6 files changed, 42 insertions(+), 1 deletions(-)

diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
index f147310..2571a62 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -499,6 +499,18 @@ tcp_thin_linear_timeouts - BOOLEAN
Documentation/networking/tcp-thin.txt
Default: 0

+tcp_thin_dupack - BOOLEAN
+ Enable dynamic triggering of retransmissions after one dupACK
+ for thin streams. If set, a check is performed upon reception
+ of a dupACK to determine if the stream is thin (less than 4
+ packets in flight). As long as the stream is found to be thin,
+ data is retransmitted on the first received dupACK. This
+ improves retransmission latency for non-aggressive thin
+ streams, often found to be time-dependent.
+ For more information on thin streams, see
+ Documentation/networking/tcp-thin.txt
+ Default: 0
+
UDP variables:

udp_mem - vector of 3 INTEGERs: min, pressure, max
diff --git a/include/linux/tcp.h b/include/linux/tcp.h
index 3ba8b07..a778ee0 100644
--- a/include/linux/tcp.h
+++ b/include/linux/tcp.h
@@ -104,6 +104,7 @@ enum {
#define TCP_MD5SIG 14 /* TCP MD5 Signature (RFC2385) */
#define TCP_COOKIE_TRANSACTIONS 15 /* TCP Cookie Transactions */
#define TCP_THIN_LINEAR_TIMEOUTS 16 /* Use linear timeouts for thin streams*/
+#define TCP_THIN_DUPACK 17 /* Fast retrans. after 1 dupack */

/* for TCP_INFO socket option */
#define TCPI_OPT_TIMESTAMPS 1
@@ -343,7 +344,8 @@ struct tcp_sock {
u8 frto_counter; /* Number of new acks after RTO */
u8 nonagle : 4,/* Disable Nagle algorithm? */
thin_lto : 1,/* Use linear timeouts for thin streams */
- unused : 3;
+ thin_dupack : 1,/* Fast retransmit on first dupack */
+ unused : 2;

/* RTT measurement */
u32 srtt; /* smoothed round trip time << 3 */
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 6278fc7..56f0aec 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -245,6 +245,7 @@ extern int sysctl_tcp_slow_start_after_idle;
extern int sysctl_tcp_max_ssthresh;
extern int sysctl_tcp_cookie_size;
extern int sysctl_tcp_thin_linear_timeouts;
+extern int sysctl_tcp_thin_dupack;

extern atomic_t tcp_memory_allocated;
extern struct percpu_counter tcp_sockets_allocated;
diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
index e6a2460..c1bc074 100644
--- a/net/ipv4/sysctl_net_ipv4.c
+++ b/net/ipv4/sysctl_net_ipv4.c
@@ -582,6 +582,13 @@ static struct ctl_table ipv4_table[] = {
.mode = 0644,
.proc_handler = proc_dointvec
},
+ {
+ .procname = "tcp_thin_dupack",
+ .data = &sysctl_tcp_thin_dupack,
+ .maxlen = sizeof(int),
+ .mode = 0644,
+ .proc_handler = proc_dointvec
+ },
{
.procname = "udp_mem",
.data = &sysctl_udp_mem,
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 21bae9a..5901010 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -2236,6 +2236,13 @@ static int do_tcp_setsockopt(struct sock *sk, int level,
tp->thin_lto = val;
break;

+ case TCP_THIN_DUPACK:
+ if (val < 0 || val > 1)
+ err = -EINVAL;
+ else
+ tp->thin_dupack = val;
+ break;
+
case TCP_CORK:
/* When set indicates to always queue non-full frames.
* Later the user clears this option and we transmit
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 3fddc69..8d950b9 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -89,6 +89,8 @@ int sysctl_tcp_frto __read_mostly = 2;
int sysctl_tcp_frto_response __read_mostly;
int sysctl_tcp_nometrics_save __read_mostly;

+int sysctl_tcp_thin_dupack __read_mostly;
+
int sysctl_tcp_moderate_rcvbuf __read_mostly = 1;
int sysctl_tcp_abc __read_mostly;

@@ -2447,6 +2449,16 @@ static int tcp_time_to_recover(struct sock *sk)
return 1;
}

+ /* If a thin stream is detected, retransmit after first
+ * received dupack. Employ only if SACK is supported in order
+ * to avoid possible corner-case series of spurious retransmissions
+ * Use only if there are no unsent data.
+ */
+ if ((tp->thin_dupack || sysctl_tcp_thin_dupack) &&
+ tcp_stream_is_thin(tp) && tcp_dupack_heuristics(tp) > 1 &&
+ tcp_is_sack(tp) && sk->sk_send_head == NULL)
+ return 1;
+
return 0;

Andreas Petlund

unread,
Feb 18, 2010, 7:50:03 AM2/18/10
to
This patch will make TCP use only linear timeouts if the
stream is thin. This will help to avoid the very high latencies
that thin stream suffer because of exponential backoff. This

mechanism is only active if enabled by iocontrol or syscontrol
and the stream is identified as thin. A maximum of 6 linear
timeouts is tried before exponential backoff is resumed.

Signed-off-by: Andreas Petlund <apet...@simula.no>
---
Documentation/networking/ip-sysctl.txt | 12 ++++++++++++

include/linux/tcp.h | 5 ++++-
include/net/tcp.h | 4 ++++


net/ipv4/sysctl_net_ipv4.c | 7 +++++++
net/ipv4/tcp.c | 7 +++++++

net/ipv4/tcp_timer.c | 21 ++++++++++++++++++++-
6 files changed, 54 insertions(+), 2 deletions(-)

diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
index 2dc7a1d..f147310 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -487,6 +487,18 @@ tcp_dma_copybreak - INTEGER
and CONFIG_NET_DMA is enabled.
Default: 4096

+tcp_thin_linear_timeouts - BOOLEAN
+ Enable dynamic triggering of linear timeouts for thin streams.
+ If set, a check is performed upon retransmission by timeout to
+ determine if the stream is thin (less than 4 packets in flight).
+ As long as the stream is found to be thin, up to 6 linear
+ timeouts may be performed before exponential backoff mode is
+ initiated. This improves retransmission latency for
+ non-aggressive thin streams, often found to be time-dependent.


+ For more information on thin streams, see
+ Documentation/networking/tcp-thin.txt
+ Default: 0
+
UDP variables:

udp_mem - vector of 3 INTEGERs: min, pressure, max
diff --git a/include/linux/tcp.h b/include/linux/tcp.h

index 7fee8a4..3ba8b07 100644
--- a/include/linux/tcp.h
+++ b/include/linux/tcp.h
@@ -103,6 +103,7 @@ enum {
#define TCP_CONGESTION 13 /* Congestion control algorithm */


#define TCP_MD5SIG 14 /* TCP MD5 Signature (RFC2385) */
#define TCP_COOKIE_TRANSACTIONS 15 /* TCP Cookie Transactions */

+#define TCP_THIN_LINEAR_TIMEOUTS 16 /* Use linear timeouts for thin streams*/



/* for TCP_INFO socket option */
#define TCPI_OPT_TIMESTAMPS 1

@@ -340,7 +341,9 @@ struct tcp_sock {
u32 frto_highmark; /* snd_nxt when RTO occurred */
u16 advmss; /* Advertised MSS */


u8 frto_counter; /* Number of new acks after RTO */

- u8 nonagle; /* Disable Nagle algorithm? */
+ u8 nonagle : 4,/* Disable Nagle algorithm? */
+ thin_lto : 1,/* Use linear timeouts for thin streams */
+ unused : 3;



/* RTT measurement */
u32 srtt; /* smoothed round trip time << 3 */
diff --git a/include/net/tcp.h b/include/net/tcp.h

index 0bdc3f6..6278fc7 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -196,6 +196,9 @@ extern void tcp_time_wait(struct sock *sk, int state, int timeo);
#define TCP_NAGLE_CORK 2 /* Socket is corked */
#define TCP_NAGLE_PUSH 4 /* Cork is overridden for already queued data */

+/* TCP thin-stream limits */
+#define TCP_THIN_LINEAR_RETRIES 6 /* After 6 linear retries, do exp. backoff */
+
extern struct inet_timewait_death_row tcp_death_row;

/* sysctl variables for tcp */
@@ -241,6 +244,7 @@ extern int sysctl_tcp_workaround_signed_windows;


extern int sysctl_tcp_slow_start_after_idle;
extern int sysctl_tcp_max_ssthresh;
extern int sysctl_tcp_cookie_size;

+extern int sysctl_tcp_thin_linear_timeouts;



extern atomic_t tcp_memory_allocated;
extern struct percpu_counter tcp_sockets_allocated;
diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c

index 7e3712c..e6a2460 100644
--- a/net/ipv4/sysctl_net_ipv4.c
+++ b/net/ipv4/sysctl_net_ipv4.c
@@ -576,6 +576,13 @@ static struct ctl_table ipv4_table[] = {
.proc_handler = proc_dointvec
},
{
+ .procname = "tcp_thin_linear_timeouts",
+ .data = &sysctl_tcp_thin_linear_timeouts,


+ .maxlen = sizeof(int),
+ .mode = 0644,
+ .proc_handler = proc_dointvec
+ },

+ {


.procname = "udp_mem",
.data = &sysctl_udp_mem,

.maxlen = sizeof(sysctl_udp_mem),
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index e471d03..21bae9a 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -2229,6 +2229,13 @@ static int do_tcp_setsockopt(struct sock *sk, int level,
}
break;

+ case TCP_THIN_LINEAR_TIMEOUTS:


+ if (val < 0 || val > 1)
+ err = -EINVAL;
+ else

+ tp->thin_lto = val;


+ break;
+
case TCP_CORK:
/* When set indicates to always queue non-full frames.
* Later the user clears this option and we transmit

diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c
index de7d1bf..a17629b 100644
--- a/net/ipv4/tcp_timer.c
+++ b/net/ipv4/tcp_timer.c
@@ -29,6 +29,7 @@ int sysctl_tcp_keepalive_intvl __read_mostly = TCP_KEEPALIVE_INTVL;
int sysctl_tcp_retries1 __read_mostly = TCP_RETR1;
int sysctl_tcp_retries2 __read_mostly = TCP_RETR2;
int sysctl_tcp_orphan_retries __read_mostly;
+int sysctl_tcp_thin_linear_timeouts __read_mostly;

static void tcp_write_timer(unsigned long);
static void tcp_delack_timer(unsigned long);
@@ -415,7 +416,25 @@ void tcp_retransmit_timer(struct sock *sk)
icsk->icsk_retransmits++;

out_reset_timer:
- icsk->icsk_rto = min(icsk->icsk_rto << 1, TCP_RTO_MAX);
+ /* If stream is thin, use linear timeouts. Since 'icsk_backoff' is
+ * used to reset timer, set to 0. Recalculate 'icsk_rto' as this
+ * might be increased if the stream oscillates between thin and thick,
+ * thus the old value might already be too high compared to the value
+ * set by 'tcp_set_rto' in tcp_input.c which resets the rto without
+ * backoff. Limit to TCP_THIN_LINEAR_RETRIES before initiating
+ * exponential backoff behaviour to avoid continue hammering
+ * linear-timeout retransmissions into a black hole
+ */
+ if (sk->sk_state == TCP_ESTABLISHED &&
+ (tp->thin_lto || sysctl_tcp_thin_linear_timeouts) &&
+ tcp_stream_is_thin(tp) &&
+ icsk->icsk_retransmits <= TCP_THIN_LINEAR_RETRIES) {
+ icsk->icsk_backoff = 0;
+ icsk->icsk_rto = min(__tcp_set_rto(tp), TCP_RTO_MAX);
+ } else {
+ /* Use normal (exponential) backoff */
+ icsk->icsk_rto = min(icsk->icsk_rto << 1, TCP_RTO_MAX);
+ }
inet_csk_reset_xmit_timer(sk, ICSK_TIME_RETRANS, icsk->icsk_rto, TCP_RTO_MAX);
if (retransmits_timed_out(sk, sysctl_tcp_retries1 + 1))
__sk_dst_reset(sk);

Ilpo Järvinen

unread,
Feb 18, 2010, 8:00:02 AM2/18/10
to

Use tcp_send_head(sk) instead.

> + return 1;
> +
> return 0;
> }

Other than that,

Acked-by: Ilpo Jᅵrvinen <ilpo.j...@helsinki.fi>

--
i.

Andreas Petlund

unread,
Feb 18, 2010, 9:50:02 AM2/18/10
to

index 3fddc69..788851c 100644


--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -89,6 +89,8 @@ int sysctl_tcp_frto __read_mostly = 2;
int sysctl_tcp_frto_response __read_mostly;
int sysctl_tcp_nometrics_save __read_mostly;

+int sysctl_tcp_thin_dupack __read_mostly;
+
int sysctl_tcp_moderate_rcvbuf __read_mostly = 1;
int sysctl_tcp_abc __read_mostly;

@@ -2447,6 +2449,16 @@ static int tcp_time_to_recover(struct sock *sk)
return 1;
}

+ /* If a thin stream is detected, retransmit after first
+ * received dupack. Employ only if SACK is supported in order
+ * to avoid possible corner-case series of spurious retransmissions
+ * Use only if there are no unsent data.
+ */
+ if ((tp->thin_dupack || sysctl_tcp_thin_dupack) &&
+ tcp_stream_is_thin(tp) && tcp_dupack_heuristics(tp) > 1 &&

+ tcp_is_sack(tp) && !tcp_send_head(sk))


+ return 1;
+
return 0;
}

Pavel Machek

unread,
Feb 21, 2010, 5:30:02 AM2/21/10
to
Hi!

> +After analysing a large number of time-dependent interactive
> +applications, we have seen that they often produce thin streams
> +and also stay with this traffic pattern throughout its entire
> +lifespan. The combination of time-dependency and the fact that the
> +streams provoke high latencies when using TCP is unfortunate.
> +
> +In order to reduce application-layer latency when packets are lost,
> +a set of mechanisms has been made, which address these latency issues
> +for thin streams. In short, if the kernel detects a thin stream,
> +the retransmission mechanisms are modified in the following manner:
> +
> +1) If the stream is thin, fast retransmit on the first dupACK.
> +2) If the stream is thin, do not apply exponential backoff.

2) seems very dangerous/unfair. If network congestion is caused just
by thin streams, will the network just fall apart?


--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

Alexander Zimmermann

unread,
Feb 21, 2010, 6:30:01 AM2/21/10
to

Am 21.02.2010 um 11:21 schrieb Pavel Machek:

> Hi!
>
>> +After analysing a large number of time-dependent interactive
>> +applications, we have seen that they often produce thin streams
>> +and also stay with this traffic pattern throughout its entire
>> +lifespan. The combination of time-dependency and the fact that the
>> +streams provoke high latencies when using TCP is unfortunate.
>> +
>> +In order to reduce application-layer latency when packets are lost,
>> +a set of mechanisms has been made, which address these latency issues
>> +for thin streams. In short, if the kernel detects a thin stream,
>> +the retransmission mechanisms are modified in the following manner:
>> +
>> +1) If the stream is thin, fast retransmit on the first dupACK.
>> +2) If the stream is thin, do not apply exponential backoff.
>
> 2) seems very dangerous/unfair. If network congestion is caused just
> by thin streams, will the network just fall apart?

and 1) can also be dangerous if we have reordering on the path.

I strongly suggest that we discuss Andreas' idea on IETF TCPM *before*
we integrate it in the kernel and enable it for everyone

Alex,

as an netdev reader and TCPM member

> To unsubscribe from this list: send the line "unsubscribe netdev" in


> the body of a message to majo...@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html

//
// Dipl.-Inform. Alexander Zimmermann
// Department of Computer Science, Informatik 4
// RWTH Aachen University
// Ahornstr. 55, 52056 Aachen, Germany
// phone: (49-241) 80-21422, fax: (49-241) 80-22220
// email: zimme...@cs.rwth-aachen.de
// web: http://www.umic-mesh.net
//

Lars Eggert

unread,
Feb 21, 2010, 3:10:01 PM2/21/10
to
On 2010-2-21, at 13:23, Alexander Zimmermann wrote:
> I strongly suggest that we discuss Andreas' idea on IETF TCPM *before*
> we integrate it in the kernel and enable it for everyone

Agree with Alexander. It is not at all clear to me that these changes are safe for non-exerimental use in the wider Internet.

Lars

Hagen Paul Pfeifer

unread,
Feb 21, 2010, 5:20:03 PM2/21/10
to
* Pavel Machek | 2010-02-21 11:21:03 [+0100]:

>> +1) If the stream is thin, fast retransmit on the first dupACK.
>> +2) If the stream is thin, do not apply exponential backoff.
>
>2) seems very dangerous/unfair. If network congestion is caused just
>by thin streams, will the network just fall apart?

I question the fairness of this modification! The modification sounds _really_
far-reaching. Did you analyse the behaviour of this modification? What about
fairness analysis? Great work is available in the network simulator sector for
this kind of analysis (e.g. jain's fairness index).


HGN

--
Hagen Paul Pfeifer <ha...@jauu.net> || http://jauu.net/
Telephone: +49 174 5455209 || Key Id: 0x98350C22
Key Fingerprint: 490F 557B 6C48 6D7E 5706 2EA2 4A22 8D45 9835 0C22

Ilpo Järvinen

unread,
Feb 21, 2010, 5:40:02 PM2/21/10
to
On Sun, 21 Feb 2010, Alexander Zimmermann wrote:

>
> Am 21.02.2010 um 11:21 schrieb Pavel Machek:
>
> > Hi!
> >
> >> +After analysing a large number of time-dependent interactive
> >> +applications, we have seen that they often produce thin streams
> >> +and also stay with this traffic pattern throughout its entire
> >> +lifespan. The combination of time-dependency and the fact that the
> >> +streams provoke high latencies when using TCP is unfortunate.
> >> +
> >> +In order to reduce application-layer latency when packets are lost,
> >> +a set of mechanisms has been made, which address these latency issues
> >> +for thin streams. In short, if the kernel detects a thin stream,
> >> +the retransmission mechanisms are modified in the following manner:
> >> +
> >> +1) If the stream is thin, fast retransmit on the first dupACK.
> >> +2) If the stream is thin, do not apply exponential backoff.
> >
> > 2) seems very dangerous/unfair. If network congestion is caused just
> > by thin streams, will the network just fall apart?
>
> and 1) can also be dangerous if we have reordering on the path.
>
> I strongly suggest that we discuss Andreas' idea on IETF TCPM *before*
> we integrate it in the kernel and enable it for everyone

What difference you see with 1) and early rexmit when cwnd = 2, the latter
being afaict "discussed already" on TCPM?

--
i.

Ilpo Järvinen

unread,
Feb 21, 2010, 5:40:03 PM2/21/10
to
On Sun, 21 Feb 2010, Pavel Machek wrote:

> Hi!
>
> > +After analysing a large number of time-dependent interactive
> > +applications, we have seen that they often produce thin streams
> > +and also stay with this traffic pattern throughout its entire
> > +lifespan. The combination of time-dependency and the fact that the
> > +streams provoke high latencies when using TCP is unfortunate.
> > +
> > +In order to reduce application-layer latency when packets are lost,
> > +a set of mechanisms has been made, which address these latency issues
> > +for thin streams. In short, if the kernel detects a thin stream,
> > +the retransmission mechanisms are modified in the following manner:
> > +
> > +1) If the stream is thin, fast retransmit on the first dupACK.
> > +2) If the stream is thin, do not apply exponential backoff.
>
> 2) seems very dangerous/unfair. If network congestion is caused just
> by thin streams, will the network just fall apart?

...Network will not fall apart. Two possible cases:

a) cwnd > 1 segment, on RTO the window is reduced, thus the sender backs
off regardless of exponential backoff.
b) All flows have window of 1 already... Well, what do you suggest? I'd
say that obviously the network is seriously underprovisioned for the
workload and since the bottleneck can only serve some of them anyway
dropping the rest, no useless work gets done at the bottleneck. RTT
estimator then gets a new samples whenever a flow belongs into the lucky
group. In case any unnecessary retransmission happened (if there was
very dramatic and sudden increase transient in the workload) they won't
happen thereafter (unless we increase the workload toward infinity).

...Thus no problem of "falling apart".

--
i.

David Miller

unread,
Feb 21, 2010, 6:10:01 PM2/21/10
to
From: Lars Eggert <lars....@nokia.com>
Date: Sun, 21 Feb 2010 22:04:46 +0200

> Agree with Alexander. It is not at all clear to me that these
> changes are safe for non-exerimental use in the wider Internet.

It's off by default and we provide all sorts of experimental
congestion control algorithms which can cause just as much
if not more trouble than these thin-stream bits.

There is no problem integrating these changes.

Lars Eggert

unread,
Feb 22, 2010, 1:50:01 AM2/22/10
to
If it's off by default there is no issue.

Andi Kleen

unread,
Feb 22, 2010, 10:20:01 AM2/22/10
to
Lars Eggert <lars....@nokia.com> writes:

> If it's off by default there is no issue.

Still a sufficient large number of people might turn it on
and then the network would melt.

IMHO TCP changes with badly understood congestion behaviour
should not be merged until the necessary analysis/simulation etc.
is done.

-Andi

--
a...@linux.intel.com -- Speaking for myself only.

Alexander Zimmermann

unread,
Feb 22, 2010, 10:40:02 AM2/22/10
to

Am 22.02.2010 um 16:17 schrieb Andi Kleen:

> Lars Eggert <lars....@nokia.com> writes:
>
>> If it's off by default there is no issue.
>
> Still a sufficient large number of people might turn it on
> and then the network would melt.
>
> IMHO TCP changes with badly understood congestion behaviour
> should not be merged until the necessary analysis/simulation etc.
> is done.
>
> -Andi

100% ack.

See for example here: http://www.umic-mesh.net/~zimmermann/linux.png

It's a screenshot of the ps3mediaserver. Since it's a streaming server, it's
very useful to deactivate Nagle :-)

Alex

>
> --
> a...@linux.intel.com -- Speaking for myself only.

//


// Dipl.-Inform. Alexander Zimmermann
// Department of Computer Science, Informatik 4
// RWTH Aachen University
// Ahornstr. 55, 52056 Aachen, Germany
// phone: (49-241) 80-21422, fax: (49-241) 80-22220
// email: zimme...@cs.rwth-aachen.de
// web: http://www.umic-mesh.net
//

--

Andreas Petlund

unread,
Feb 22, 2010, 10:50:02 AM2/22/10
to
On 22. feb. 2010 16:17, Andi Kleen wrote:
> Lars Eggert <lars....@nokia.com> writes:
>
>> If it's off by default there is no issue.
>
> Still a sufficient large number of people might turn it on
> and then the network would melt.
>
> IMHO TCP changes with badly understood congestion behaviour
> should not be merged until the necessary analysis/simulation etc.
> is done.
>
> -Andi

Graphs from our experiments can be found here:
http://simula.no/research/nd/publications/Simula.nd.477/simula_pdf_file

Section 5.2.5 covers experiments on fairness-issues. We have found no
adverse effects of the modifications submitted in this patch set (mFR
and LT in the plot), but further testing cannot hurt.

Best regards,
Andreas

Andreas Petlund

unread,
Feb 22, 2010, 11:00:03 AM2/22/10
to
On 22. feb. 2010 16:17, Andi Kleen wrote:
> Lars Eggert <lars....@nokia.com> writes:
>
>> If it's off by default there is no issue.
>
> Still a sufficient large number of people might turn it on
> and then the network would melt.
>
> IMHO TCP changes with badly understood congestion behaviour
> should not be merged until the necessary analysis/simulation etc.
> is done.
>
> -Andi
>

I also encourage you to read the discussions from the first posted
version of the patch set, since I here reference a lot of the
supporting research and related material from other researchers.

http://thread.gmane.org/gmane.linux.network/141394
http://thread.gmane.org/gmane.linux.network/141395
http://thread.gmane.org/gmane.linux.network/141396
http://thread.gmane.org/gmane.linux.network/141397

Cheers,
Andreas

Hagen Paul Pfeifer

unread,
Feb 22, 2010, 11:10:02 AM2/22/10
to

On Mon, 22 Feb 2010 16:17:33 +0100, Andi Kleen <an...@firstfloor.org> wrote:
> Lars Eggert <lars....@nokia.com> writes:
>
>> If it's off by default there is no issue.
>
> Still a sufficient large number of people might turn it on
> and then the network would melt.

Then we should apply the same policy as used for the congestion control
algorithms!? Restrict the usage to change the standard behavior via Kconfig
(e.g. menuconfig TCP_CONG_ADVANCED) or even more restricted demand for
CAP_NET_ADMIN.

Hagen

David Miller

unread,
Feb 22, 2010, 5:20:03 PM2/22/10
to
From: Andi Kleen <an...@firstfloor.org>
Date: Mon, 22 Feb 2010 16:17:33 +0100

> Lars Eggert <lars....@nokia.com> writes:
>
>> If it's off by default there is no issue.
>
> Still a sufficient large number of people might turn it on
> and then the network would melt.

And they can do the same by changing the initial congestion window
setting, which we also support.

The foundation of your argument is as solid as quicksand.

0 new messages