Need TPC-H plans for gpdb

48 views
Skip to first unread message

Arun Marathe

unread,
Sep 9, 2020, 5:28:25 PM9/9/20
to Greenplum Developers
Hi all,

I would like to study gpdb's plans for the TPC-H queries. Because it's a popular benchmark, many of you may already have it installed. Would you mind sharing explain plans for the 22 queries (scale factor doesn't matter much), and share somehow? Maybe post here as a zip file, or if you keep a web-site, post it there?

Thanks,
Arun

Jesse Zhang

unread,
Sep 10, 2020, 1:44:27 PM9/10/20
to Greenplum Developers, Arun Marathe

What is TPC-H, do you mind elaborating what you are asking fo here?

Arun Marathe

unread,
Sep 10, 2020, 5:22:54 PM9/10/20
to Jesse Zhang, Greenplum Developers
TPC-H is a well-known decision support benchmark consisting of 22 queries.
I just need "explain" plans for those queries.
I have a CentOS VM, and it is having issues installing gpdb.

Thanks,
Arun
--
Thanks,
Arun

Hans Zeller

unread,
Sep 10, 2020, 5:57:00 PM9/10/20
to Arun Marathe, pvtl-cont-sbjesse, Greenplum Developers

Hi Arun,

 

If you have an account at Amazon AWS, an easier way to get GPDB running is a ready-made GPDB instance of VMware Tanzu Greenplum Database: https://aws.amazon.com/marketplace/pp/Pivotal-Software-Inc-Pivotal-Greenplum-BYOL-by-Piv/B06XKQ8Z3H. It says “bring your own license”, but if you send an email to sup...@pivotal.io I assume that they will be able to give you a free evaluation license to try your benchmark.

 

There are similar VM images available on Google Cloud and on Azure, if you prefer that.

 

The VMWare product is based on open-source GPDB, the explain queries for TPC-H should be the same for the VMware and the open-source versions.

 

Hans

Amit Khandekar

unread,
Sep 11, 2020, 2:05:26 AM9/11/20
to Arun Marathe, Jesse Zhang, Greenplum Developers
On Fri, 11 Sep 2020 at 02:52, Arun Marathe <ap.ma...@gmail.com> wrote:
>
> TPC-H is a well-known decision support benchmark consisting of 22 queries.
> I just need "explain" plans for those queries.
> I have a CentOS VM, and it is having issues installing gpdb.

If you end up running tpc-h yourself using a gpdb instance, I believe,
you would also have to do the same modifications to the data and
queries that are required to be done for PostgreSQL. But someone can
correct me if I am wrong.
The below rep has instructions as well as scripts that do those
modifications for running tpch on PostgreSQL.
https://github.com/tvondra/pg_tpch


>
> Thanks,
> Arun
>
> On Thu, Sep 10, 2020 at 1:44 PM Jesse Zhang <sbj...@gmail.com> wrote:
>>
>>
>> What is TPC-H, do you mind elaborating what you are asking fo here?
>> On Wednesday, September 9, 2020 at 2:28:25 PM UTC-7 Arun Marathe wrote:
>>>
>>> Hi all,
>>>
>>> I would like to study gpdb's plans for the TPC-H queries. Because it's a popular benchmark, many of you may already have it installed. Would you mind sharing explain plans for the 22 queries (scale factor doesn't matter much), and share somehow? Maybe post here as a zip file, or if you keep a web-site, post it there?
>>>
>>> Thanks,
>>> Arun
>
>
>
> --
> Thanks,
> Arun
>
> --
> To unsubscribe from this group and stop receiving emails from it, send an email to gpdb-dev+u...@greenplum.org.



--
Thanks,
-Amit Khandekar
Huawei Technologies

Alastair Turner

unread,
Sep 11, 2020, 2:32:54 AM9/11/20
to Arun Marathe, Jesse Zhang, Greenplum Developers
Hi Arun

If you want to get a VM running, the scripts at
https://github.com/cantzakas/gpdb-packaging will build a VM for you
with the Pivotal binary distribution.

You also need to be careful about how you use the explain plans for
any standard workload - like TPC-H or TPC-DS - because they will vary
depending on the cluster configuration (number of segments, memory
allocation per query backend, ...), physical data structures
(partitioning, distribution) and data volumes.

Regards
Alastair
Reply all
Reply to author
Forward
0 new messages