Hi All,
In the past few months, we have been developing an external
computation+storage solution for GPDB called KITE. This solution
provides a way to cache data files in parquet or csv format
originating from S3/HDFS on a local device, and a SQL interface to
extract selected condensed data from the files. The local device
itself is composed of Samsung Smart SSDs, which provides additional
compute capabilities before the data is transferred into main memory,
i.e. computation can be pushed all the way to the computational
storage device (Smart SSD).
We hope to submit patches as they become more mature. Please let us
know if the community is interested in checking them out.
Hi,I have attached the the diff between the gpdb6 source and our changes regarding to the Kite, externalcomputation+storage solution for GPDB.
In addition to the changes, we have used the following external open source libraries as well:1. arrow library for Decimal operations2. kite client sdk for communication between client and kite server3. xrg file format library -- internal data file format sent through the network
4. xexpr library for data serialization of the aggregate plan node.Please let us know if you have any comments.
!! External Email
|
!! External Email: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender.
|