I've run in to a strange issue and I'm wondering if anyone has any
insight on this. So I have a modified version of OpenSSH called hpnssh
that uses larger receive buffer sizes to improve throughput. When I
ported that patch set to 8.9 I found out after an initial burst of data
the throughput would drop to zero after an initial burst and just sort
of stay there. This happens regardless of the RTT; same issues at
sub-millisecond RTTs and 150ms.
I've been able to replicate this behaviour in OpenSSH by increasing the
size of CHAN_SES_DEFAULT_WINDOW from 64*CHAN_SES_PACKET_DEFAULT to
higher values. You kind of see it in the debug log at 96*CSPD and you
can start to see it impact throughput at 128*CSDP. With my tests I have
it at 512*CSPD and the pause will last upwards of 5 seconds. When it's
at 1024*CSPD (corresponding to a 32MB receive buffer) the pause will
last 18 seconds or more.
I dropped in a debug statement in to sshbuf.c in sshbuf_allocate to get
a better view of whats happening. The following are the results of that
in both 8.9p1 and 8.8p1
Version 8.9p1
Start of buffer growth
debug1: adjusted rlen: 33024, len: 32788 for 0x55d54820bd60
debug1: adjusted rlen: 33024, len: 32788 for 0x55d54820bd60
debug1: adjusted rlen: 33024, len: 32788 for 0x55d54820bd60
debug1: adjusted rlen: 33024, len: 32788 for 0x55d54820bd60
debug1: adjusted rlen: 33024, len: 32788 for 0x55d54820bd60
debug1: adjusted rlen: 98304, len: 32768 for 0x55d548211550
debug1: adjusted rlen: 33024, len: 32788 for 0x55d54820bd60
debug1: adjusted rlen: 131072, len: 32768 for 0x55d548211550
debug1: adjusted rlen: 33024, len: 32788 for 0x55d54820bd60
debug1: adjusted rlen: 33024, len: 32788 for 0x55d54820bd60
debug1: adjusted rlen: 33024, len: 32788 for 0x55d54820bd60
debug1: adjusted rlen: 163840, len: 32768 for 0x55d548211550
debug1: adjusted rlen: 33024, len: 32788 for 0x55d54820bd60
debug1: adjusted rlen: 196608, len: 32768 for 0x55d548211550
debug1: adjusted rlen: 33024, len: 32788 for 0x55d54820bd60
debug1: adjusted rlen: 229376, len: 32768 for 0x55d548211550
…
debug1: adjusted rlen: 22003712, len: 32768 for 0x55d548211550
debug1: adjusted rlen: 33024, len: 32788 for 0x55d54820bd60
debug1: adjusted rlen: 22036480, len: 32768 for 0x55d548211550
debug1: adjusted rlen: 33024, len: 32788 for 0x55d54820bd60
debug1: adjusted rlen: 22069248, len: 32768 for 0x55d548211550
debug1: adjusted rlen: 16640, len: 16404 for 0x55d54820bd60
debug1: adjusted rlen: 22085632, len: 16384 for 0x55d548211550
debug1: adjusted rlen: 16640, len: 16404 for 0x55d54820bd60
debug1: adjusted rlen: 22102016, len: 16384 for 0x55d548211550
debug1: adjusted rlen: 33024, len: 32788 for 0x55d54820bd60
debug1: adjusted rlen: 22134784, len: 32768 for 0x55d548211550
debug1: adjusted rlen: 33024, len: 32788 for 0x55d54820bd60
debug1: adjusted rlen: 22167552, len: 32768 for 0x55d548211550
debug1: adjusted rlen: 33024, len: 32788 for 0x55d54820bd60
debug1: adjusted rlen: 22200320, len: 32768 for 0x55d548211550
debug1: adjusted rlen: 16640, len: 16404 for 0x55d54820bd60
debug1: adjusted rlen: 22216704, len: 16384 for 0x55d548211550
debug1: adjusted rlen: 16640, len: 16404 for 0x55d54820bd60
debug1: adjusted rlen: 22233088, len: 16384 for 0x55d548211550
debug1: adjusted rlen: 33024, len: 32788 for 0x55d54820bd60
debug1: adjusted rlen: 22265856, len: 32768 for 0x55d548211550
debug1: adjusted rlen: 33024, len: 32788 for 0x55d54820bd60
debug1: adjusted rlen: 22298624, len: 32768 for 0x55d548211550
debug1: adjusted rlen: 33024, len: 32788 for 0x55d54820bd60
debug1: adjusted rlen: 22331392, len: 32768 for 0x55d548211550
debug1: adjusted rlen: 16640, len: 16404 for 0x55d54820bd60
debug1: adjusted rlen: 22347776, len: 16384 for 0x55d548211550
debug1: adjusted rlen: 16640, len: 16404 for 0x55d54820bd60
debug1: adjusted rlen: 33024, len: 32788 for 0x55d54820bd60
debug1: adjusted rlen: 33024, len: 32788 for 0x55d54820bd60
debug1: adjusted rlen: 33024, len: 32788 for 0x55d54820bd60
— end of buffer growth
So there is a buffer that kicks in after an short initial period of good
throughput that has then has to grow larger that the size of
CHAN_SES_WINDOW_DEFAULT (128*CSPD in this case) before throughput can
start to pick up again. During this growth period CPU usage is pegged at
100% even though throughput is minimal.
Version 8.8p1
Start buffer adjustment
debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
debug1: adjusted rlen: 49152, len: 32768 for 0x5570cf28a3e0
debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
debug1: adjusted rlen: 81920, len: 32768 for 0x5570cf28a3e0
debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
debug1: adjusted rlen: 114688, len: 32768 for 0x5570cf28a3e0
debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
debug1: adjusted rlen: 147456, len: 32768 for 0x5570cf28a3e0
debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
debug1: adjusted rlen: 180224, len: 32768 for 0x5570cf28a3e0
debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
debug1: adjusted rlen: 212992, len: 32768 for 0x5570cf28a3e0
debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
debug1: adjusted rlen: 245760, len: 32768 for 0x5570cf28a3e0
debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
debug1: adjusted rlen: 278528, len: 32768 for 0x5570cf28a3e0
debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
debug1: adjusted rlen: 311296, len: 32768 for 0x5570cf28a3e0
debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
debug1: adjusted rlen: 344064, len: 32768 for 0x5570cf28a3e0
debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
debug1: adjusted rlen: 376832, len: 32768 for 0x5570cf28a3e0
debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
debug1: adjusted rlen: 409600, len: 32768 for 0x5570cf28a3e0
debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
debug1: adjusted rlen: 442368, len: 32768 for 0x5570cf28a3e0
debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
debug1: adjusted rlen: 475136, len: 32768 for 0x5570cf28a3e0
debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
debug1: adjusted rlen: 507904, len: 32768 for 0x5570cf28a3e0
debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
debug1: adjusted rlen: 540672, len: 32768 for 0x5570cf28a3e0
debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
debug1: adjusted rlen: 573440, len: 32768 for 0x5570cf28a3e0
debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
debug1: adjusted rlen: 606208, len: 32768 for 0x5570cf28a3e0
debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
debug1: adjusted rlen: 638976, len: 32768 for 0x5570cf28a3e0
debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
debug1: adjusted rlen: 671744, len: 32768 for 0x5570cf28a3e0
debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
debug1: adjusted rlen: 704512, len: 32768 for 0x5570cf28a3e0
debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
debug1: adjusted rlen: 737280, len: 32768 for 0x5570cf28a3e0
debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
debug1: adjusted rlen: 770048, len: 32768 for 0x5570cf28a3e0
debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
debug1: adjusted rlen: 802816, len: 32768 for 0x5570cf28a3e0
debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
debug1: adjusted rlen: 835584, len: 32768 for 0x5570cf28a3e0
debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
debug1: adjusted rlen: 868352, len: 32768 for 0x5570cf28a3e0
debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
debug1: adjusted rlen: 901120, len: 32768 for 0x5570cf28a3e0
debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
debug1: adjusted rlen: 933888, len: 32768 for 0x5570cf28a3e0
debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
End buffer adjustment
In version 8.8 I also see a buffer growing during the course of the
transfer but it looks like it doesn’t need to grow to the same size as
in 8.9p1 (in this case WINDOW_DEFAULT was set to 4096*CSPD). This is on
the same test bed, same set up, and the tests are being run right after
each other.
The behaviour in 8.9 is very consistent. After a certain number of
adjusted rlens for, what I am assuming is the packet buffer, it goes
into this growth phase for another buffer. In my test that been about
700 of these adjusts which is unrelated to the size of
CHAN_SES_WINDOW_DEFAULT
So I’m wondering what changed between 8.8 and 8.9 that might account for
this and if this is expected/desired behaviour. I'm still working
through the diff from 8.8 to 8.9 but nothing has leaped out at me yet.
Thanks,
Chris
_______________________________________________
openssh-unix-dev mailing list
openssh-...@mindrot.org
https://lists.mindrot.org/mailman/listinfo/openssh-unix-dev
"upstream: add a ssh_packet_process_read() function that reads from
a fd directly into the transport input buffer.
Use this in the client and server mainloops to avoid unnecessary
copying. It also lets us use a more greedy read size without penalty.
Yields a 2-3% performance gain on cipher-speed.sh (in a fairly
unscientific test tbf)"
I still don't know why it's causing this behaviour so if anyone has a
clue please share?
Thanks
Chris
> Hey all,
>
> I've run in to a strange issue and I'm wondering if anyone has any
> insight on this. So I have a modified version of OpenSSH called hpnssh
> that uses larger receive buffer sizes to improve throughput. When I
> ported that patch set to 8.9 I found out after an initial burst of data
> the throughput would drop to zero after an initial burst and just sort
> of stay there. This happens regardless of the RTT; same issues at
> sub-millisecond RTTs and 150ms.
>
> I've been able to replicate this behaviour in OpenSSH by increasing the
> size of CHAN_SES_DEFAULT_WINDOW from 64*CHAN_SES_PACKET_DEFAULT to
> higher values. You kind of see it in the debug log at 96*CSPD and you
> can start to see it impact throughput at 128*CSDP. With my tests I have
> it at 512*CSPD and the pause will last upwards of 5 seconds. When it's
> at 1024*CSPD (corresponding to a 32MB receive buffer) the pause will
> last 18 seconds or more.
>
> I dropped in a debug statement in to sshbuf.c in sshbuf_allocate to get
> a better view of whats happening. The following are the results of that
> in both 8.9p1 and 8.8p1
[snip]
> In version 8.8 I also see a buffer growing during the course of the
> transfer but it looks like it doesn’t need to grow to the same size as
> in 8.9p1 (in this case WINDOW_DEFAULT was set to 4096*CSPD). This is on
> the same test bed, same set up, and the tests are being run right after
> each other.
>
> The behaviour in 8.9 is very consistent. After a certain number of
> adjusted rlens for, what I am assuming is the packet buffer, it goes
> into this growth phase for another buffer. In my test that been about
> 700 of these adjusts which is unrelated to the size of
> CHAN_SES_WINDOW_DEFAULT
>
> So I’m wondering what changed between 8.8 and 8.9 that might account for
> this and if this is expected/desired behaviour. I'm still working
> through the diff from 8.8 to 8.9 but nothing has leaped out at me yet.
8.9 switch from select() to poll() and included a couple of bugs that
could cause weird problems. IMO you should try to port to what's on
the top of the V_9_0 git branch, which is 9.0 + one more poll()-
related fix.
-d
> then aggressively grow that buffer (4MB at a time)
Does this impact low-memory systems like my SPARCstation
and that old 80486 laptop where growing this much may fail?
bye,
//mirabilos
--
>> Why don't you use JavaScript? I also don't like enabling JavaScript in
> Because I use lynx as browser.
+1
-- Octavio Alvarez, me and ⡍⠁⠗⠊⠕ (Mario Lang) on debian-devel