Edge Computing for GPDB

90 views
Skip to first unread message

wk...@vitessedata.com

unread,
Mar 25, 2022, 7:55:44 PM3/25/22
to gpdb...@greenplum.org
Hi All,

In the past few months, we have been developing an external
computation+storage solution for GPDB called KITE. This solution
provides a way to cache data files in parquet or csv format
originating from S3/HDFS on a local device, and a SQL interface to
extract selected condensed data from the files. The local device
itself is composed of Samsung Smart SSDs, which provides additional
compute capabilities before the data is transferred into main memory,
i.e. computation can be pushed all the way to the computational
storage device (Smart SSD).


We have also modified GPDB6 so that it works with KITE devices using
external tables.

E.g.

CREATE EXTERNAL TABLE lineitem (
l_orderkey ...
l_partkey ...
...
) LOCATION (
'kite://s3bucket/path/lineitem/*.par',
'kite://s3bucket/path/lineitem/*.par'
);

Each GPDB segment will connect to an assigned kite host and submit
queries to read data when the lineitem table is selected. Each segment
will read a disjoint set of rows. The returned results can be joined
to other external or local tables seamlessly. Thus, the solution
essentially provides GPDB access to read-only data stored on S3/HDFS
and speeds up access as data files are cached and computed externally,
alleviating both the load on the GPDB cluster and the calls to S3
services.

On a high level, we made the following changes to GPDB6:

1. Add handling of kite: URL These are done to the
backend/access/external/ dir. There are approx 400 lines of
diffs.

2. External Scans. These are done to the backend/executor/ dir, and
mostly to the nodeExternalScan.c file. There are again about 400
lines of diffs. The external scan node can now submit a SQL,
schema, xid to an external device.

3. The above two changes allow for a simplistic scan of the tables on
KITE. We wanted to support agg so that we can do more processing on
KITE and return aggregated data to GPDB, as the aggregated data is
much smaller in volume. To that end, we made modifications to the
plan tree to push agg down to external scan. The changes here are
more extensive, but still manageable at ~2k lines of diffs.

With these changes, Q1 will now send agg to KITE, which returns with 4
rows of data per segment. The whole GPDB cluster only processes (4 *
#seg) rows returned by KITE devices, instead of scanning a big
lineitem table.

We hope to submit patches as they become more mature. Please let us
know if the community is interested in checking them out.


Ivan Novick

unread,
Mar 25, 2022, 8:18:57 PM3/25/22
to wk...@vitessedata.com, gpdb...@greenplum.org
Awesome thanks!

Looking forward to this

And for feedback from Greenplum Community

-----------------------------------------
Ivan Novick
Director of Product Management
VMware Tanzu Greenplum


From: wk...@vitessedata.com <wk...@vitessedata.com>
Sent: Friday, March 25, 2022 4:38 PM
To: gpdb...@greenplum.org <gpdb...@greenplum.org>
Subject: [Suspected Spam] Edge Computing for GPDB
 
⚠ External Email
________________________________

⚠ External Email: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender.

Jasper Li

unread,
Mar 28, 2022, 1:16:09 PM3/28/22
to Ivan Novick, wk...@vitessedata.com, gpdb...@greenplum.org
+1 looking forward to the PR 

From: Ivan Novick <ino...@vmware.com>
Sent: Saturday, March 26, 2022 8:18 AM
To: wk...@vitessedata.com <wk...@vitessedata.com>; gpdb...@greenplum.org <gpdb...@greenplum.org>
Subject: Re: [Suspected Spam] Edge Computing for GPDB
 
--
To unsubscribe from this group and stop receiving emails from it, send an email to gpdb-dev+u...@greenplum.org.

Ashwin Agrawal

unread,
Mar 28, 2022, 6:36:06 PM3/28/22
to wk...@vitessedata.com, Greenplum Developers

On Fri, Mar 25, 2022 at 4:55 PM <wk...@vitessedata.com> wrote:
Hi All,

In the past few months, we have been developing an external
computation+storage solution for GPDB called KITE. This solution
provides a way to cache data files in parquet or csv format
originating from S3/HDFS on a local device, and a SQL interface to
extract selected condensed data from the files. The local device
itself is composed of Samsung Smart SSDs, which provides additional
compute capabilities before the data is transferred into main memory,
i.e. computation can be pushed all the way to the computational
storage device (Smart SSD).

Very Interesting, looking forward!
Wish to understand (I am not expert in this area, hence looking forward to learning as well) why custom protocol [1] is not fit for this implementation? Current S3 implementation in GPDB seems to be using this.

Curious to learn if you have given thought for GPDB7 and forward, master branch of GPDB, how will this implementation tie in FDW framework which we wish to leverage going forward.

We hope to submit patches as they become more mature. Please let us
know if the community is interested in checking them out.

Yes, we are interested and happy to have code submitted to provide more focused and constructive dialogue and help on this front.


Thank you,
--
Ashwin Agrawal (VMware)

Eric Lam

unread,
Feb 8, 2023, 3:58:17 AM2/8/23
to Greenplum Developers, wk...@vitessedata.com
Hi,

I have attached the the diff between the gpdb6 source and our changes regarding to the Kite, external
computation+storage solution for GPDB.

In addition to the changes, we have used the following external open source libraries as well:

1. arrow library for Decimal operations

2.  kite client sdk for communication between client and kite server

3. xrg file format library -- internal data file format sent through the network

Please let us know if you have any comments.
Eric

gpdb6x.diff

Eric Lam

unread,
Feb 8, 2023, 4:08:53 AM2/8/23
to Greenplum Developers, Eric Lam, wk...@vitessedata.com
Hi,

I have attached the the diff between the gpdb6 source and our changes regarding to the Kite, external
computation+storage solution for GPDB.

In addition to the changes, we have used the following external open source libraries as well:

1. arrow library for Decimal operations

2.  kite client sdk for communication between client and kite server

3. xrg file format library -- internal data file format sent through the network

4. xexpr library for data serialization of the aggregate plan node.
Please let us know if you have any comments.
Eric
gpdb6x.diff

Eric Lam

unread,
Feb 8, 2023, 5:29:06 AM2/8/23
to Greenplum Developers, Ashwin Agrawal, Greenplum Developers, wk...@vitessedata.com
Hi All,

For the GPDB7 FDW support, we do have a FDW for PostgresSQL which should be able to run with GPDB7 (master or any mode).  For all segments mode, we are wondering GPDB7 will push down the aggregate plan in a distributed way so that aggregate can be pushed down to Kite server correctly.

e.g. With the SQL like "SELECT AVG(quantity) from TABLE", GPDB7 needs to create a plan/SQL like "SELECT COUNT(quantity), SUM(quantity) FROM TABLE" in all segments mode so that GPDB can aggregate the data from all segments and calculate AVG(quantity) afterwards. 

Here is the link for Kite FDW
Eric

Rui Zhao

unread,
Mar 6, 2023, 10:27:55 PM3/6/23
to Greenplum Developers, Eric Lam, Ashwin Agrawal, Greenplum Developers, wk...@vitessedata.com
Hi Eric,

Thanks for the contribution. 
This commit for 6X is awesome. It provide a template for implementing push down for external tables. We will look into it. 
But as for the current release plan of GPDB6, we don't plan to add big new features for external tables, especially this commit made a big modification to external table. 
Our plan is to support more FDWs in GPDB7. So it is more likely that we will support this external data source through your FDW.
But currently, GPDB7 does not support push down for all segments mode, so we plan to support simple agg(count(*)\avg) push down for all segments mode in the short term. But this will definitely after the release of GPDB7.

Thank you!
Rui Zhao(VMware)

Ashwin Agrawal

unread,
Mar 8, 2023, 7:55:02 PM3/8/23
to Eric Lam, Greenplum Developers, wk...@vitessedata.com
On Wed, Feb 8, 2023 at 1:08 AM Eric Lam <eri...@vitessedata.com> wrote:
Hi,

I have attached the the diff between the gpdb6 source and our changes regarding to the Kite, external
computation+storage solution for GPDB.

In addition to the changes, we have used the following external open source libraries as well:

1. arrow library for Decimal operations

2.  kite client sdk for communication between client and kite server

3. xrg file format library -- internal data file format sent through the network

4. xexpr library for data serialization of the aggregate plan node.
Please let us know if you have any comments.

Curious to hear thoughts on this aspect - given the objective is to
provide the solution to cache data files in parquet or csv format
originating from S3/HDFS on a local device and pushing possible
computation down to storage as much as possible. These seem very
similar objectives towards which S3 is heading as well.

Given that, isn't it better if we can interact with this thing using the
S3/HDFS language or APIs itself instead of different KITE
api's. Because if we can talk with KITE using the same S3 APIs then GPDB
already has code via PXF to read/write data and no change is required
on the GPDB front. Downstream it doesn't matter to GPDB, what magic KITE
is performing is data storage format/caching/computation.

PXF supports column projection as well as predicate pushdown for AND,
OR, and NOT operators when using S3 Select as well, so they will work
of the box as well. I think S3 select currently doesn't have aggregate
push-down and hence support for that is not added to PXF, but if KITE
service extends the functionality then that could be added as well.

Main aim of trying to get it is similar to SQL language. If S3 can act as
standard language for this thing then not all applications like GPDB
have to be modified to work with it.

(I am yet to look into the code and implementation details)

--
Ashwin Agrawal (VMware)

Rui Zhao

unread,
Mar 8, 2023, 9:01:36 PM3/8/23
to Ashwin Agrawal, Eric Lam, Greenplum Developers, wk...@vitessedata.com
Hi Eric,

Thanks for the contribution. 
This commit for 6X is awesome. It provide a template for implementing push down for external tables. We will look into it. 
But as for the current release plan of GPDB6, we don't plan to add big new features for external tables, especially this commit made a big modification to external table. 
Our plan is to support more FDWs in GPDB7. So it is more likely that we will support this external data source through your FDW.
But currently, GPDB7 does not support push down for all segments mode, so we plan to support simple agg(count(*)\avg) push down for all segments mode in the short term. But this will definitely after the release of GPDB7.

Thank you!
Rui Zhao(VMware)


发件人: Ashwin Agrawal <ashwi...@gmail.com>
发送时间: 2023年3月9日 8:54
收件人: Eric Lam <eri...@vitessedata.com>
抄送: Greenplum Developers <gpdb...@greenplum.org>; wk...@vitessedata.com <wk...@vitessedata.com>
主题: Re: Edge Computing for GPDB
 
!! External Email
--
To unsubscribe from this topic, visit https://groups.google.com/a/greenplum.org/d/topic/gpdb-dev/5Cpkzmcml5k/unsubscribe.
To unsubscribe from this group and all its topics, send an email to gpdb-dev+u...@greenplum.org.

!! External Email: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender.
Reply all
Reply to author
Forward
0 new messages