Hi hackers!
At Yandex Cloud, we are working on the compression of motion data
transfers [0]. The basic idea is that many analytical computations
seem to be constrained by cross-section bandwidth.
Despite the bandwidth of the closely connected network might be very
good, modern compression codecs might be effective even from the
standpoint of putting data into the network stack of the OS. There are
things like ZRam which benefit from compression data stored in RAM.
That's why we believe compression might be useful here and want to
experiment with it.
There are many open questions:
1. Which codecs should we employ? So far we only experimented with
Zstd, but I think we should support Lz4 either. Do we need a knob to
tune the compression level? I think yes.
2. Should we exempt short packets from compression? In proposed libpq
compression we do not compress messages shorter than 60 bytes. AFAIK,
this number was chosen by fair dice roll. [1]
3. Should we compress TCP, UDP, or both?
4. We would like to see this improvement in GP6. But the new feature
must be aimed at v7 first.
5. How do we know that the feature is effective? There will be cases
when it brings some benefits, and cases where compressions harm
performance and memory usage. Will we be able to give the user advice
on when to use it and when to avoid it?
6. What set of GUCs should we use to control motion compression?
We would be happy to get some ideas on this project from the community.
Thank you!
Best regards, Andrey Borodin.
[0]
https://github.com/greenplum-db/gpdb/pull/16045
[1]
https://commitfest.postgresql.org/38/3499/