Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Re: New Set of High Performance Networking Patches Available

110 views
Skip to first unread message

Darren Tucker

unread,
Aug 4, 2005, 9:30:54 AM8/4/05
to
This is a multi-part message in MIME format.
--------------040609020803000303090307
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit

Chris Rapier wrote:
> http://www.psc.edu/networking/projects/hpn-ssh/

Looking at these has been on my to-do list for a while and I finally
took a look.

> 1) HPN performance even without both sides of the connection being HPN
> enabled. As long as the bulk data flow is in the direction of the HPN
> side you should see improved performance. I've measure 200Mb/s to an HPN
> server from a non HPN client and vice versa.

I've been testing with tunbridge[1] on OpenBSD to add latency. I've
seen an improvement of around 50% throughput on scp with 100ms of
latency (each way, ie 200ms rtt) simulated link with Linux endpoints.

Using -w doesn't seem to make any difference (or sometimes it's a net
loss) although it's quite possible something in my test environment is
responsible for that. (Yes, I did the stack tuning, both netstat and
getsockopt show the buffers are 1MB or more.)

> 2) HPN client can now set the local tcp receive buffer on a per
> connection basis. Using the -w option allows the client to override the
> local tcp receive window settings up to the maximum tcp buffer size.
> This is just a setsockopt() call really.

I think this should be a ssh_config(5) option (maybe "TCPReceiveBuffer"
?) rather than a command-line switch (ssh already has enough switches...)

This would allow it to be set either per-connection or globally, and may
be passed through from the scp command line with the "-o" option.

The latter would also mean that scp would need less modification (and
scp's code is mostly shared with rcp, so that's also a plus).

Attached is a diff relative to openssh-4.1p1-hpn11.diff with a couple of
proposed changes:
* move the sshconnect.c setsockopt code into its own function
* make that function style(9) compliant
* fix a bug where strerror was used on the non-error path
* make BUFFER_MAX_HPN_LEN an unsigned to placate gcc -Wsign-compare
* replace magic numbers in channels.h with symbolic names

I don't think I changed any functionality (but I could have missed
something...)

[1] http://www.iijlab.net/~kjc/software/dist/tunbridge-0.1.tar.gz

--
Darren Tucker (dtucker at zip.com.au)
GPG key 8FF4FA69 / D9A3 86E9 7EEE AF4B B2D4 37C9 C982 80C7 8FF4 FA69
Good judgement comes with experience. Unfortunately, the experience
usually comes from bad judgement.

--------------040609020803000303090307
Content-Type: text/plain;
name="openssh-hpn.patch"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;
filename="openssh-hpn.patch"

--- openssh-hpn.orig/sshconnect.c 2005-08-04 22:37:10.000000000 +1000
+++ openssh-hpn/sshconnect.c 2005-08-04 21:59:37.000000000 +1000
@@ -145,6 +145,27 @@
}

/*
+ * Set TCP receive buffer if requested.
+ * Note: tuning needs to happen after the socket is
+ * created but before the connection happens
+ * so winscale is negotiated properly -cjr
+ */
+static void
+ssh_set_socket_recvbuf(int sock)
+{
+ void *buf = (void *)&options.tcp_rcv_buf;
+ int sz = sizeof(options.tcp_rcv_buf);
+
+ if (options.tcp_rcv_buf == 0)
+ return;
+ if (setsockopt(sock, SOL_SOCKET, SO_RCVBUF, buf, sz) >= 0)
+ debug("setsockopt SO_RCVBUF set to %d", options.tcp_rcv_buf);
+ else
+ error("Couldn't set socket receive buffer to %d: %.100s",
+ options.tcp_rcv_buf, strerror(errno));
+}
+
+/*
* Creates a (possibly privileged) socket for use as the ssh connection.
*/
static int
@@ -167,58 +188,16 @@
strerror(errno));
else
debug("Allocated local port %d.", p);
-
-
- /* tuning needs to happen after the socket is */
- /* created but before the connection happens */
- /* so winscale is negotiated properly -cjr */
-
- /* Set tcp receive buffer if requested */
- if (options.tcp_rcv_buf)
- {
- if (setsockopt(sock, SOL_SOCKET, SO_RCVBUF,
- (void *)&options.tcp_rcv_buf,
- sizeof(options.tcp_rcv_buf)) >= 0)
- {
- debug("setsockopt SO_RCVBUF: %.100s", strerror(errno));
- }
- else
- {
- /* coudln't set the socket size to use spec. */
- /* should default to system param and continue */
- /* warn the user though - cjr */
- error("Couldn't set socket receive buffer as requested. Continuing anyway.");
- }
- }
+ ssh_set_socket_recvbuf(sock);
return sock;
}
sock = socket(ai->ai_family, ai->ai_socktype, ai->ai_protocol);
if (sock < 0)
error("socket: %.100s", strerror(errno));
-
- /* tuning needs to happen after the socket is */
- /* created but before the connection happens */
- /* so winscale is negotiated properly -cjr */
-
- /* Set tcp receive buffer if requested */
- if (options.tcp_rcv_buf)
- {
- if (setsockopt(sock, SOL_SOCKET, SO_RCVBUF,
- (void *)&options.tcp_rcv_buf,
- sizeof(options.tcp_rcv_buf)) >= 0)
- {
- debug("setsockopt SO_RCVBUF: %.100s", strerror(errno));
- }
- else
- {
- /* coudln't set the socket size to use spec. */
- /* should default to system param and continue */
- /* warn the user though - cjr */
- error("Couldn't set socket receive buffer as requested. Continuing anyway.");
- }
- }
-
- /* Bind the socket to an alternative local IP address */
+
+ ssh_set_socket_recvbuf(sock);
+
+ /* Bind the socket to an alternative local IP address */
if (options.bind_address == NULL)
return sock;

--- openssh-hpn.orig/buffer.h 2005-08-04 22:37:10.000000000 +1000
+++ openssh-hpn/buffer.h 2005-08-04 21:08:22.000000000 +1000
@@ -25,7 +25,7 @@

#define BUFFER_MAX_CHUNK 0x100000
#define BUFFER_MAX_LEN 0xa00000
-#define BUFFER_MAX_HPN_LEN (2>>29)-1
+#define BUFFER_MAX_HPN_LEN ((2U>>29)-1)

void buffer_init(Buffer *);
void buffer_clear(Buffer *);
--- openssh-hpn.orig/channels.h 2005-08-04 22:37:10.000000000 +1000
+++ openssh-hpn/channels.h 2005-08-04 23:10:08.000000000 +1000
@@ -120,11 +120,11 @@

/* default window/packet sizes for tcp/x11-fwd-channel */
#define CHAN_SES_PACKET_DEFAULT (32*1024)
-#define CHAN_SES_WINDOW_DEFAULT (0xa00000/2)
+#define CHAN_SES_WINDOW_DEFAULT (BUFFER_MAX_LEN/2)
#define CHAN_TCP_PACKET_DEFAULT (32*1024)
-#define CHAN_TCP_WINDOW_DEFAULT (0xa00000/2)
+#define CHAN_TCP_WINDOW_DEFAULT (BUFFER_MAX_LEN/2)
#define CHAN_X11_PACKET_DEFAULT (16*1024)
-#define CHAN_X11_WINDOW_DEFAULT (0xa00000/2)
+#define CHAN_X11_WINDOW_DEFAULT (BUFFER_MAX_LEN/2)

/* possible input states */
#define CHAN_INPUT_OPEN 0

--------------040609020803000303090307
Content-Type: text/plain; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

_______________________________________________
openssh-unix-dev mailing list
openssh-...@mindrot.org
http://www.mindrot.org/mailman/listinfo/openssh-unix-dev

--------------040609020803000303090307--

Darren Tucker

unread,
Aug 4, 2005, 9:37:58 AM8/4/05
to
Darren Tucker wrote:
> +#define CHAN_SES_WINDOW_DEFAULT (BUFFER_MAX_LEN/2)

Thinking about it, those ought to be (BUFFER_MAX_LEN - BUFFER_MAX_CHUNK)
since in 4.0 and up, the buffers will be compacted once the buffer
offset is beyond BUFFER_MAX_CHUNK, rather than half of the allocated
size in previous versions.

--
Darren Tucker (dtucker at zip.com.au)
GPG key 8FF4FA69 / D9A3 86E9 7EEE AF4B B2D4 37C9 C982 80C7 8FF4 FA69
Good judgement comes with experience. Unfortunately, the experience
usually comes from bad judgement.

_______________________________________________

Chris Rapier

unread,
Aug 4, 2005, 12:35:23 PM8/4/05
to

Darren Tucker wrote:
> Chris Rapier wrote:
>
>> http://www.psc.edu/networking/projects/hpn-ssh/
>
>
> Looking at these has been on my to-do list for a while and I finally
> took a look.

Excellent.

>> 1) HPN performance even without both sides of the connection being HPN
>> enabled. As long as the bulk data flow is in the direction of the HPN
>> side you should see improved performance. I've measure 200Mb/s to an
>> HPN server from a non HPN client and vice versa.
>
>
> I've been testing with tunbridge[1] on OpenBSD to add latency. I've
> seen an improvement of around 50% throughput on scp with 100ms of
> latency (each way, ie 200ms rtt) simulated link with Linux endpoints.

In the real word test environments we have set up we're commonly seeing
improvements of 10 to 30x greater throughput. 25MBytes/s+ where before
we were seeing less than 1Mbyte/s.

> Using -w doesn't seem to make any difference (or sometimes it's a net
> loss) although it's quite possible something in my test environment is
> responsible for that. (Yes, I did the stack tuning, both netstat and
> getsockopt show the buffers are 1MB or more.)

It will only make a difference when the client is acting as the data
sink. Obviously the max buffer size has to be large enough to handle the
user defined setting. In our test environments it has been shown to
make a difference - some of our users have reported good results with it
as well. If its possible maybe we could get you on one of our test
machines to try it out. Shoot me a note if you are interested and I'll
see what our policies on this are.

>> 2) HPN client can now set the local tcp receive buffer on a per
>> connection basis. Using the -w option allows the client to override
>> the local tcp receive window settings up to the maximum tcp buffer
>> size. This is just a setsockopt() call really.
>
>
> I think this should be a ssh_config(5) option (maybe "TCPReceiveBuffer"
> ?) rather than a command-line switch (ssh already has enough switches...)

Well, the problem is that you only want to set a large buffer size when
you really need it and even then, its often best to tune the buffer to
the specific path. Providing this option on the command line, in our
view, allows for the greatest flexibility. I have no objection to there
being a default in the ssh_config but I think its important for the user
to be able to override it.

> This would allow it to be set either per-connection or globally, and may
> be passed through from the scp command line with the "-o" option.

Okay, so thats basically the same thing. I'd suggest using a shorter
name though but thats not a mahjor point to get hung up on.

> The latter would also mean that scp would need less modification (and
> scp's code is mostly shared with rcp, so that's also a plus).

I honestly think changing scp's code isn't a bad thing. :)

> Attached is a diff relative to openssh-4.1p1-hpn11.diff with a couple of
> proposed changes:
> * move the sshconnect.c setsockopt code into its own function
> * make that function style(9) compliant
> * fix a bug where strerror was used on the non-error path
> * make BUFFER_MAX_HPN_LEN an unsigned to placate gcc -Wsign-compare
> * replace magic numbers in channels.h with symbolic names
>
> I don't think I changed any functionality (but I could have missed
> something...)
>
> [1] http://www.iijlab.net/~kjc/software/dist/tunbridge-0.1.tar.gz

Mike Stevens and I will take a look at next week I think. We're working
on some other stuff based on SSH (basically a more advanced port
forwarding system) that we're trying to nail down. That should be done
by the end of this week though.

Chris

Darren Tucker

unread,
Aug 4, 2005, 1:00:45 PM8/4/05
to
Chris Rapier wrote:
> Darren Tucker wrote:
[hpn patch]

>> I've been testing with tunbridge[1] on OpenBSD to add latency. I've
>> seen an improvement of around 50% throughput on scp with 100ms of
>> latency (each way, ie 200ms rtt) simulated link with Linux endpoints.
>
> In the real word test environments we have set up we're commonly seeing
> improvements of 10 to 30x greater throughput. 25MBytes/s+ where before
> we were seeing less than 1Mbyte/s.
>
>> Using -w doesn't seem to make any difference (or sometimes it's a net
>> loss) although it's quite possible something in my test environment is
>> responsible for that. (Yes, I did the stack tuning, both netstat and
>> getsockopt show the buffers are 1MB or more.)
>
> It will only make a difference when the client is acting as the data
> sink. Obviously the max buffer size has to be large enough to handle the
> user defined setting.

I can see 1+ MB sitting in the server's TCP send queue. I suspect it's
some local problem limiting TCP throughput in the high-BDP configuration
(they're not super beefy hosts but a direct connect gives me ~6MB/s so
they're capable of more than the ~ 500-600 KB/s I'm seeing with the
latency).

> In our test environments it has been shown to
> make a difference - some of our users have reported good results with it
> as well. If its possible maybe we could get you on one of our test
> machines to try it out. Shoot me a note if you are interested and I'll
> see what our policies on this are.
>
>>> 2) HPN client can now set the local tcp receive buffer on a per
>>> connection basis. Using the -w option allows the client to override
>>> the local tcp receive window settings up to the maximum tcp buffer
>>> size. This is just a setsockopt() call really.
>>
>> I think this should be a ssh_config(5) option (maybe
>> "TCPReceiveBuffer" ?) rather than a command-line switch (ssh already
>> has enough switches...)
>
> Well, the problem is that you only want to set a large buffer size when
> you really need it and even then, its often best to tune the buffer to
> the specific path. Providing this option on the command line, in our
> view, allows for the greatest flexibility. I have no objection to there
> being a default in the ssh_config but I think its important for the user
> to be able to override it.

You can specify it in the config file on a per-host basis and/or as a
default so I think it's OK. In fact, it's probably more convenient for
end users (as opposed to people testing ssh mods :-) since they can just
enable it for the appropriate hosts then forget about it.

>> This would allow it to be set either per-connection or globally, and
>> may be passed through from the scp command line with the "-o" option.
>
> Okay, so thats basically the same thing. I'd suggest using a shorter
> name though but thats not a mahjor point to get hung up on.

There's an existing config TCP option (TCPKeepAlive), I picked it to be
consistent with that.

>> The latter would also mean that scp would need less modification (and
>> scp's code is mostly shared with rcp, so that's also a plus).
>
> I honestly think changing scp's code isn't a bad thing. :)

Yeah but you don't have to maintain it :-)

[patch]


> Mike Stevens and I will take a look at next week I think. We're working
> on some other stuff based on SSH (basically a more advanced port
> forwarding system) that we're trying to nail down. That should be done
> by the end of this week though.

Thanks.

--
Darren Tucker (dtucker at zip.com.au)
GPG key 8FF4FA69 / D9A3 86E9 7EEE AF4B B2D4 37C9 C982 80C7 8FF4 FA69
Good judgement comes with experience. Unfortunately, the experience
usually comes from bad judgement.

_______________________________________________

Chris Rapier

unread,
Aug 4, 2005, 1:09:49 PM8/4/05
to

Darren Tucker wrote:
> Chris Rapier wrote:

> I can see 1+ MB sitting in the server's TCP send queue. I suspect it's
> some local problem limiting TCP throughput in the high-BDP configuration
> (they're not super beefy hosts but a direct connect gives me ~6MB/s so
> they're capable of more than the ~ 500-600 KB/s I'm seeing with the
> latency).

Yeah, thats definitely low. Can you try a loopback to your localhost and
dump the data into /dev/null to see what sort of limit the CPU is
imposing? Also, what cipher? 3des is justa nightmare. Most of our tests
are using arcfour and blowfish.


> You can specify it in the config file on a per-host basis and/or as a
> default so I think it's OK. In fact, it's probably more convenient for
> end users (as opposed to people testing ssh mods :-) since they can just
> enable it for the appropriate hosts then forget about it.

That works. Honestly, this was mostly an afterthought - one of the users
that connects to us needed something like this for various reasons. They
did good stuff with it though (near term storm forecasting
http://www.psc.edu/publicinfo/news/2005/2005-07-05-caps.html - all data
xfered with hpn-ssh).


>>> This would allow it to be set either per-connection or globally, and
>>> may be passed through from the scp command line with the "-o" option.
>>
>>
>> Okay, so thats basically the same thing. I'd suggest using a shorter
>> name though but thats not a mahjor point to get hung up on.
>
>
> There's an existing config TCP option (TCPKeepAlive), I picked it to be
> consistent with that.

I just hate having really long command lines but consistant nomenclature
is important. I have no objections to a longer more descriptive name.

>>> The latter would also mean that scp would need less modification (and
>>> scp's code is mostly shared with rcp, so that's also a plus).
>>
>>
>> I honestly think changing scp's code isn't a bad thing. :)
>
>
> Yeah but you don't have to maintain it :-)

Yeah, but if we change it enough we'll be forced to maintain it. So this
would be an excellent opportunity to pass the torch :D

Darren Tucker

unread,
Aug 4, 2005, 1:28:04 PM8/4/05
to
Chris Rapier wrote:
[throughput]

> Yeah, thats definitely low. Can you try a loopback to your localhost and
> dump the data into /dev/null to see what sort of limit the CPU is
> imposing? Also, what cipher? 3des is justa nightmare. Most of our tests
> are using arcfour and blowfish.

Tried all that. All tests done with arcfour, same (low) throughput
copying to /dev/null. And as I mentioned, systems did 6MB/s throughput
w/out the high-latency network (which was disk-to-disk).

I will probably try measuring with ttcp or netperf next.

--
Darren Tucker (dtucker at zip.com.au)
GPG key 8FF4FA69 / D9A3 86E9 7EEE AF4B B2D4 37C9 C982 80C7 8FF4 FA69
Good judgement comes with experience. Unfortunately, the experience
usually comes from bad judgement.

_______________________________________________

Rick Jones

unread,
Aug 4, 2005, 1:51:41 PM8/4/05
to
Random observations, some more obvious than others :)

*) in the face of packet losses, the TCP congestion window may not get and/or
stay as large as the classic TCP window. so, check netstat statistics

*) if you cannot get the perf level with say a netperf TCP_STREAM test :) it
ain't gonna happen with crypto.

rick jones

Chris Rapier

unread,
Aug 4, 2005, 2:55:37 PM8/4/05
to

Darren Tucker wrote:
> Chris Rapier wrote:
> [throughput]
>
>>Yeah, thats definitely low. Can you try a loopback to your localhost and
>>dump the data into /dev/null to see what sort of limit the CPU is
>>imposing? Also, what cipher? 3des is justa nightmare. Most of our tests
>>are using arcfour and blowfish.
>
>
> Tried all that. All tests done with arcfour, same (low) throughput
> copying to /dev/null. And as I mentioned, systems did 6MB/s throughput
> w/out the high-latency network (which was disk-to-disk).
>
> I will probably try measuring with ttcp or netperf next.

You might want to try iPerf too. Its a good tool. Better in some
environments not as good in others.

ni...@bitgnome.net

unread,
Aug 13, 2005, 12:09:13 PM8/13/05
to
Darren Tucker wrote:
> Darren Tucker wrote:
> > +#define CHAN_SES_WINDOW_DEFAULT (BUFFER_MAX_LEN/2)
>
> Thinking about it, those ought to be (BUFFER_MAX_LEN - BUFFER_MAX_CHUNK)
> since in 4.0 and up, the buffers will be compacted once the buffer
> offset is beyond BUFFER_MAX_CHUNK, rather than half of the allocated
> size in previous versions.

Sure about that? I just tried BUFFER_MAX_LEN - BUFFER_MAX_CHUNK
instead of BUFFER_MAX_LEN/2 under sshd 4.1p1 with the HPN patches
copying via tar from a non-HPN patched ssh client 3.8.1p1 and kept
getting "buffer_append_space: alloc 10498048 not supported" very
shortly into the transfer over a 100Mbps network.

Using BUFFER_MAX_LEN/2 gave me no problems at all with the save
versions.

For what it's worth....

--
Mark Nipper

0 new messages