SFTPGo: a Golang performance story and some questions

1,964 views
Skip to first unread message

Nicola Murino

unread,
Mar 11, 2020, 8:43:51 AM3/11/20
to golang-nuts
Hi all,

I want to share the performances analysis I recently did for SFTPGo, the fully featured and highly configurable SFTP server written in Go (https://github.com/drakkan/sftpgo).

When I decided to write an SFTP server I evaluated the available libraries and I did some quick performance tests too.

Golang's SSH stack and pkg/sftp were able to easily saturate a Gigabit connection and this seemed enough. I noticed that OpenSSH was faster, but I didn't investigate further.

So I chose Golang for SFTPGo.

The project is growing fast and one of the users also noticed that SFTPGo has lower performance than OpenSSH. He opened an issue providing some stats when using a 40Gb Ethernet card.

I did some more profiling and discovered that the main bottlenecks are, unsurprisingly, the cipher used and the message authentication. So we can have a huge performance boost using a fast cipher with implicit messages authentication, for example aes12...@openssh.com, however this cipher is not widely supported.

The most used cipher, and the one used in the user's tests is AES-CTR, and the Golang implementations seems quite slow.

He noticed that an unmerged patch is available for Golang, greatly improving AES-CTR performance:


I applied this patch and, while performance improved, the AES-CTR SFTP transfers were still slower than the AES-GCM ones. The bottleneck is now the MAC computation.

The tested hardware supports Intel SHA extensions but Golang's SHA256 implementation only uses the AVX2 extension.

Again, I was lucky: I can simply use minio/sha256-simd as a drop-in replacement for Golang's SHA256 implementation:


The performance improved again, but OpenSSH was still faster.

To my great surprise I noticed that my very simple SCP implementation (https://github.com/drakkan/sftpgo/blob/master/sftpd/scp.go) was now as fast as OpenSSH!

So this time I have to look at pkg/sftp: I found some extraneous copies/allocations in critical paths and I sent some pull requests that are now merged in the master branch. Still, SFTP transfers were slower than OpenSSH ones.

Compared to my SCP implementation the main difference is that pkg/sftp allocates a new slice for each SFTP packet, while my SCP implementation allocates a slice once and then reuses this slice. 

Basically for each SFTP packet pkg/sftp does something like this:

data := make([]byte, size)

So I wrote a proof of concept allocator that tries to avoid all these extra allocations reusing the previously allocated slices:


And bingo! Now SFTPGo performance is very close to OpenSSH! You can find the full benchmark results here:


Conclusion: I see several complaints about Go performance, especially compared to Rust, but, at least in my use case, Go can be as fast as a C project (such as OpenSSH). But some special attention is required and thus this improved performance is not by default available to all the users.

Now some questions:

1) for the pkg/sftp allocator I'm discussing with pkg/sftp maintainers to find a way to get it included. Do you have smarter ideas to avoid these allocations?
2) There is a patch available for AES-CTR in Golang (since 2017): I wonder why it is not yet merged?
3) To improve SHA computation performance, I cannot find anything usable in Golang itself. Is there any plan to have support for Intel SHA Extensions and AVX512 directly in Golang anytime soon?

Thank you for this great programming language, it makes it really simple to add new features to SFTPGo!

regards,
Nicola

Benjamin

unread,
Mar 11, 2020, 8:59:26 AM3/11/20
to Nicola Murino, golang-nuts
I think that’s why go should be driven by the community but not by Bell labs nor Google labs. 
Works in Bell Labs or Google Labs may leave the job, and may not know everything. 
But the community have enough Human Resources and various of knowledges.

Some scientists and reaserchers  worked for commercial company for whole of their life, but never do a things owned by themselves. Like Anders Hejlsberg, the architect of Delphi and C#.  What a pity.

I think go programming language should also belongs to the community but not Google. One day Bell Lab or Google are closed, the community still exists.


--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/7cfdd60d-7415-4c2f-bd7d-9d0372461a3c%40googlegroups.com.

Benjamin

unread,
Mar 11, 2020, 9:44:11 AM3/11/20
to Benjamin, Nicola Murino, golang-nuts
I tell you the truth: Workers of commercial  companies dislike the community pull request, this will wast their free time or vacation time. They only worked for capitalists.



Jesper Louis Andersen

unread,
Mar 11, 2020, 10:24:55 AM3/11/20
to Nicola Murino, golang-nuts
On Wed, Mar 11, 2020 at 1:43 PM Nicola Murino <nicola...@gmail.com> wrote:
So I wrote a proof of concept allocator that tries to avoid all these extra allocations reusing the previously allocated slices:



Any particular reason you avoided something like sync.Pool ? It could be useful in your case, allocation pattern depending.

Nicola Murino

unread,
Mar 11, 2020, 10:47:34 AM3/11/20
to Jesper Louis Andersen, golang-nuts
Il 11/03/20 15:23, Jesper Louis Andersen ha scritto:
Hi,

I initially evaluated the sync.Pool then I read this article:

https://dzone.com/articles/memory-pooling-in-go-where-why-and-how

and this issue:

https://github.com/golang/go/issues/23199

so I decided to avoid the sync.Pool,

anyway I'm currently reviewing the allocator to submit a pull request against pkg/sftp, I'll probably do a quick test with the "sync.Pool" too and I'll take a look at the memory profiling results,

Nicola

Nicola Murino

unread,
Mar 11, 2020, 7:45:38 PM3/11/20
to Adrian Ratnapala, Jesper Louis Andersen, golang-nuts
Il 11/03/20 22:55, Adrian Ratnapala ha scritto:
> > >
> > > Any particular reason you avoided something like sync.Pool ? It
> could be useful in your case, allocation pattern depending.
> > >
> >  Hi,
> >
> >  I initially evaluated the sync.Pool then I read this article:
> >
> > https://dzone.com/articles/memory-pooling-in-go-where-why-and-how
> >
> >  and this issue:
> >
> > https://github.com/golang/go/issues/23199
>
> So If I understand go/issues/23199 correctly, the problem is that
> large allocations from the past hang around in the pool (and
> apparently this is made worse by small allocations somehow pinning them).
>
> I can see how this can be a problem, but is it a problem *for you*? 
> You say that the allocations in question are for packets  -- which I
> would assume all count as small.
>
>

what I understand reading that issue is that sync.Pool is not the best
choice to store variable-length buffers and my first allocator
implementation accepts buffers of any size, each received packet can
have different sizes (between 20 and 262144 bytes).

As said before I'm reviewing the allocator and the allocation strategy
in general and currently, in my not yet public branch, I only allocate
packets of 2 different sizes and so sync.Pool could be appropriate, I'll
do some more tests and benchmarks before submitting the pull request for
the allocator,

thanks
Nicola


Adrian Ratnapala

unread,
Mar 11, 2020, 7:55:10 PM3/11/20
to nicola...@gmail.com, Jesper Louis Andersen, golang-nuts
> >
> > Any particular reason you avoided something like sync.Pool ? It could be useful in your case, allocation pattern depending.
> >
>  Hi,
>
>  I initially evaluated the sync.Pool then I read this article:
>
> https://dzone.com/articles/memory-pooling-in-go-where-why-and-how
>
>  and this issue:
>
> https://github.com/golang/go/issues/23199

Adrian Ratnapala

unread,
Mar 11, 2020, 8:57:55 PM3/11/20
to nicola...@gmail.com, golang-nuts
BTW: Thanks for all this investigation and writeups, they are
interesting, and I look forward to your test results.

Desipte my question, I think using a custom allocator is perfectly
reasonable. go/issues/23199 is showing us that sync.Pool's interface
isn't great for allocating byte-bufffers (it'd be better to explicitly
ask for objects of a given size). In that sense, a custom allocator
with an appropriate interface is more future-proof.

> what I understand reading that issue is that sync.Pool is not the best
> choice to store variable-length buffers and my first allocator

I think the problem is that the total memory usage of the pool ends up
O(size of large objects * total numer of objects kept). This is very
wasteful if you need lots of small objects and only a few large ones.

> implementation accepts buffers of any size, each received packet can
> have different sizes (between 20 and 262144 bytes).

So every 20 byte object will pin down ~100kiB of actual memory, which
sounds pretty wasteful, but might be acceptable.

> in general and currently, in my not yet public branch, I only allocate
> packets of 2 different sizes and so sync.Pool could be appropriate, I'll

You could have two separate pools, one for each size. This is
basically a bucketed-allocator, but simpler because you already know
the structure of your buckets.


--
Adrian Ratnapala

Ian Lance Taylor

unread,
Mar 11, 2020, 9:00:14 PM3/11/20
to Benjamin, Nicola Murino, golang-nuts
On Wed, Mar 11, 2020 at 6:44 AM 'Benjamin' via golang-nuts
<golan...@googlegroups.com> wrote:
>
> Why not merge https://go-review.googlesource.com/c/go/+/51670 since 2017?
> I tell you the truth: Workers of commercial companies dislike the community pull request, this will wast their free time or vacation time. They only worked for capitalists.

The revision history of the Go project shows that this argument is
clearly incorrect. There are numerous high quality contributions from
people who do not work at Google.

Regarding CL 51670 in particular, new assembly code for crypto
packages is a subject of ongoing discussion. See, for example,
https://golang.org/issue/37168.

Ian

Nicola Murino

unread,
Mar 12, 2020, 6:33:12 PM3/12/20
to Adrian Ratnapala, golang-nuts
Il 12/03/20 01:54, Adrian Ratnapala ha scritto:
> BTW: Thanks for all this investigation and writeups, they are
> interesting, and I look forward to your test results.
>
> Desipte my question, I think using a custom allocator is perfectly
> reasonable. go/issues/23199 is showing us that sync.Pool's interface
> isn't great for allocating byte-bufffers (it'd be better to explicitly
> ask for objects of a given size). In that sense, a custom allocator
> with an appropriate interface is more future-proof.
Hi Adrian,

I did some more tests, you can find some test implementations in my
branches here:

https://github.com/drakkan/sftp

- allocator branch uses a variable size allocator
- allocator2 branch uses a fixed size allocator with a sync.Pool
- allocator1 branch uses a fixed size allocator: the first commit has an
implementation very similar to the other ones, in last commit I replaced
the pagelist with a list of byte array slices

In a single SFTP request more than one allocation could be required, so
we need to keep track of all the allocations and release them after the
request is served, so inside the allocator each allocated byte array has
a reference to the request unique ID.

Based on the benchmarks here:

https://github.com/drakkan/sftp/blob/allocator1/allocator_test.go#L97

allocator1 seems the fastest implementation: both versions (with a
without the pagelist) are a bit faster than allocator2 that uses the
sync.Pool. The allocator branch has the slowest implementation.

If you have suggestions for improving my implementations they are welcome!

In a real SFTP test I think the performances of these allocators are
very similar.

I'll add an interface later, after discussing about the allocation
strategies with pkg/sftp maintainers,

thanks
Nicola

Jesper Louis Andersen

unread,
Mar 13, 2020, 10:16:30 AM3/13/20
to Nicola Murino, Adrian Ratnapala, golang-nuts
On Thu, Mar 12, 2020 at 12:45 AM Nicola Murino <nicola...@gmail.com> wrote:

what I understand reading that issue is that sync.Pool is not the best
choice to store variable-length buffers and my first allocator
implementation accepts buffers of any size, each received packet can
have different sizes (between 20 and 262144 bytes).


A trick I learned from the CircleMUD code base (I think!) was that you often needed a 1024 byte buffer to write to a telnet socket. So you would keep a list of those handy for fast allocation in the style of sync.Pool. If you exceeded the buffer, you would allocate a larger buffer to handle it. If you know a typical packet size, you can cut down the allocation rate by a factor of how often those packets occur. Chances are that maintaining a few pools of the correct sizes will work as well as rolling your own.

My initial thought was just as to why it was rejected in the first place.

Nicola Murino

unread,
Apr 14, 2020, 7:07:50 AM4/14/20
to golan...@googlegroups.com
Il 11/03/20 13:43, Nicola Murino ha scritto:
Hi all,

a quick update on this.

SFTPGo master branch now includes the following performance improvements:

- allocator for pkg/sftp
- a crypto branch in which minio/sha256-simd replaces Go sha256 and some other non-performance related changes (https://github.com/drakkan/crypto/tree/sftpgo)

so only the AES-CTR patch for Go has to be manually applied.

Here you can find the updated performance doc:

https://github.com/drakkan/sftpgo/blob/master/docs/performance.md
Reply all
Reply to author
Forward
0 new messages