Compression of the motion traffic

46 views
Skip to first unread message

Andrey Borodin

unread,
Jul 25, 2023, 5:03:55 AM7/25/23
to Greenplum Developers, Кирилл Решке
Hi hackers!

At Yandex Cloud, we are working on the compression of motion data
transfers [0]. The basic idea is that many analytical computations
seem to be constrained by cross-section bandwidth.
Despite the bandwidth of the closely connected network might be very
good, modern compression codecs might be effective even from the
standpoint of putting data into the network stack of the OS. There are
things like ZRam which benefit from compression data stored in RAM.
That's why we believe compression might be useful here and want to
experiment with it.

There are many open questions:
1. Which codecs should we employ? So far we only experimented with
Zstd, but I think we should support Lz4 either. Do we need a knob to
tune the compression level? I think yes.
2. Should we exempt short packets from compression? In proposed libpq
compression we do not compress messages shorter than 60 bytes. AFAIK,
this number was chosen by fair dice roll. [1]
3. Should we compress TCP, UDP, or both?
4. We would like to see this improvement in GP6. But the new feature
must be aimed at v7 first.
5. How do we know that the feature is effective? There will be cases
when it brings some benefits, and cases where compressions harm
performance and memory usage. Will we be able to give the user advice
on when to use it and when to avoid it?
6. What set of GUCs should we use to control motion compression?

We would be happy to get some ideas on this project from the community.
Thank you!


Best regards, Andrey Borodin.

[0] https://github.com/greenplum-db/gpdb/pull/16045
[1] https://commitfest.postgresql.org/38/3499/

Ivan Novick

unread,
Jul 25, 2023, 10:53:02 AM7/25/23
to Andrey Borodin, Greenplum Developers, Кирилл Решке, Zhenghua Lyu

Zhenghua Lyu

unread,
Jul 31, 2023, 9:23:53 PM7/31/23
to Andrey Borodin, Greenplum Developers, Кирилл Решке
Hi,
    thanks so much for starting the discussion and I also see a PR to 6X_STABLE.

    First of all, interconnect traffic compression is a very good and important feature that
    Greenplum will and should consider.
     
    I have some rough thoughts and let me try my best to reply to your questions or comments.

Which codecs should we employ? So far we only experimented with
Zstd, but I think we should support Lz4 either. Do we need a knob to
tune the compression level? I think yes.
I think this can be under the abstract layer then we can test and tune different algorithms to better understand
their memory and CPU tradeoff.

------------

 Should we exempt short packets from compression? In proposed libpq
compression we do not compress messages shorter than 60 bytes. AFAIK,
this number was chosen by fair dice roll. [1]
Greenplum IC UDP send data in blocks, default gp_max_packet_size is 8KBytes.
When a motion only need send very small data, I think it is reasonable not to compress them.

--------
 Should we compress TCP, UDP, or both?
I think most of the users use UDP, so we could firstly try on UDP.

------
We would like to see this improvement in GP6. But the new feature
must be aimed at v7 first.
Yes. And probably we are not going to backport such a huge feature to Greenplum 6X_STABLE.
But for community products, you may have you own decision.

-------

5. How do we know that the feature is effective? There will be cases
when it brings some benefits, and cases where compressions harm
performance and memory usage. Will we be able to give the user advice
on when to use it and when to avoid it?

Good question. Still an open question now. We can run OLAP queries concurrently that will
make the network full and test the throughoutput with compression or without.

------

6. What set of GUCs should we use to control motion compression?

This should be part of the design and implementation.

-----------------------------------------------------------------------------------------------------


Below are some of my thoughts on this topic.

We have some ideas to use QUIC as IC-UDP. QUIC automatically have compress-data feature and handle
non-reliable UDP to make a reliable data communication.

That would be a much larger project. Doing some investigation on this topic as what you are doing is still a good step.

--------

Thanks!


Best,
Zhenghua Lyu



From: Andrey Borodin <ambor...@gmail.com>
Sent: Tuesday, July 25, 2023 5:03 PM

To: Greenplum Developers <gpdb...@greenplum.org>; Кирилл Решке <res...@yandex-team.ru>
Subject: Compression of the motion traffic
 
!! External Email
[0] https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fgreenplum-db%2Fgpdb%2Fpull%2F16045&data=05%7C01%7Czlyu%40vmware.com%7C7f7398e3384c4dcfaeee08db8cee1f5a%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C638258726585997224%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=Wvji8EyNYPWt0GsJBkz97rZv3cNJFzaIBdEtaSwsaLk%3D&reserved=0
[1] https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcommitfest.postgresql.org%2F38%2F3499%2F&data=05%7C01%7Czlyu%40vmware.com%7C7f7398e3384c4dcfaeee08db8cee1f5a%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C638258726585997224%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=0lN%2BlmHK8cC0T6BN1xkIsXe4p%2F2MjnQoInOisSDhMas%3D&reserved=0
Reply all
Reply to author
Forward
0 new messages