How to capture queries data in Greenplum?

173 views
Skip to first unread message

Daniel Barale

unread,
Jul 12, 2018, 11:09:22 AM7/12/18
to Greenplum Users

I was given the task of tuning their resource queues (RQs) to avoid performance degradation, query waiting, out of memory events, etc.

Now, to do that properly and to make decisions based on reality, I will need to collect historic data on query execution. I am familiar with most data dictionary views reporting activity on the system (pg_stat_activity and the statistics collector as well as perfmon APIs that give a pretty accurate picture of what’s going on NOW) but to know resource demand over time, I will need to collect all these data over a meaningful period (such as 3 months) on some kind of ad hoc schema.

Do you know of any function or set of functions in GP to do this query execution activity data capture/collection? I mean, capture data into some history tables without relying on GP logs exclusively, like a stream pipeline taking snapshots of the system query activity every given interval.

Any suggestion on that would be awesome. And also, any best practices tweaking RQs. We are upgrading production July 28th from 4.3.12 to 5.9 (big jump) and also we will start using GPCC 4 with workload manager (WLM) that we didn’t use before. But still, resource queues have to be setup properly.

Daniel Barale

Ivan Novick

unread,
Jul 12, 2018, 11:42:15 AM7/12/18
to Daniel Barale, Greenplum Users
Have you tried using the data in the gpperfmon database?

--
You received this message because you are subscribed to the Google Groups "Greenplum Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gpdb-users+unsubscribe@greenplum.org.
To post to this group, send email to gpdb-...@greenplum.org.
Visit this group at https://groups.google.com/a/greenplum.org/group/gpdb-users/.
For more options, visit https://groups.google.com/a/greenplum.org/d/optout.



--
Ivan Novick, Product Manager Pivotal Greenplum

Daniel Barale

unread,
Jul 12, 2018, 12:08:20 PM7/12/18
to Ivan Novick, Greenplum Users
The gpperfmon query _ history table is very useful but it doesn't tell you in the query had waited for execution on the resource queue or not. It tells you priority that is handy, status as done or abort, but not waited yes or no. Also cost column always has zero. :(

DB

On Thu, Jul 12, 2018, 11:42 AM Ivan Novick <ino...@pivotal.io> wrote:
Have you tried using the data in the gpperfmon database?
On Thu, Jul 12, 2018 at 8:09 AM, Daniel Barale <dba...@gmail.com> wrote:

I was given the task of tuning their resource queues (RQs) to avoid performance degradation, query waiting, out of memory events, etc.

Now, to do that properly and to make decisions based on reality, I will need to collect historic data on query execution. I am familiar with most data dictionary views reporting activity on the system (pg_stat_activity and the statistics collector as well as perfmon APIs that give a pretty accurate picture of what’s going on NOW) but to know resource demand over time, I will need to collect all these data over a meaningful period (such as 3 months) on some kind of ad hoc schema.

Do you know of any function or set of functions in GP to do this query execution activity data capture/collection? I mean, capture data into some history tables without relying on GP logs exclusively, like a stream pipeline taking snapshots of the system query activity every given interval.

Any suggestion on that would be awesome. And also, any best practices tweaking RQs. We are upgrading production July 28th from 4.3.12 to 5.9 (big jump) and also we will start using GPCC 4 with workload manager (WLM) that we didn’t use before. But still, resource queues have to be setup properly.

Daniel Barale

--
You received this message because you are subscribed to the Google Groups "Greenplum Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gpdb-users+...@greenplum.org.

To post to this group, send email to gpdb-...@greenplum.org.
Visit this group at https://groups.google.com/a/greenplum.org/group/gpdb-users/.
For more options, visit https://groups.google.com/a/greenplum.org/d/optout.

tinku john varghese

unread,
Jul 12, 2018, 12:25:14 PM7/12/18
to Daniel Barale, Greenplum Users
Hi Daniel

My 2 cents.

for historical queries information try queries_history
table available in gpperfmon database.

Thanks

--

Daniel Barale

unread,
Jul 12, 2018, 12:30:54 PM7/12/18
to tinku john varghese, Greenplum Users
Dear Luis, thanks for the advice / warning ;)

Abração!

Daniel Barale 

Hao Wang

unread,
Jul 12, 2018, 10:13:28 PM7/12/18
to dba...@gmail.com, johnka...@gmail.com, gpdb-...@greenplum.org
If your Greenplum is in RQ mode, in query_history's tstart - tsubmit should be the actual queued time.

Thanks
Hao

Daniel Barale

unread,
Jul 12, 2018, 10:25:43 PM7/12/18
to Hao Wang, johnka...@gmail.com, Greenplum Users
True that. But I'm not talking elapsed time. I'm talking when the query had to wait before getting executed because the RQ was full. 

GPCC has a column on the query monitor screen called "wait time" but the query _ history table wasn't. I'm sure joining with other dictionary tables you can obtain it. Just I don't know how yet.

Daniel Barale 

Hao Wang

unread,
Jul 12, 2018, 10:33:59 PM7/12/18
to dba...@gmail.com, johnka...@gmail.com, gpdb-...@greenplum.org
There are three timestamps in gpperfmon
 - tsubmit is the timestamp a query finished planning and submitted to RQ
 - tstart is the timestamp query finished waiting in RQ and get into executor
 - tfinish is the query done execution

So if you wonder the RQ wait time, the tstart - tsubmit should works.

BTW, elapsed time is tfinish - tstart

- Hao

Daniel Barale

unread,
Jul 12, 2018, 10:49:32 PM7/12/18
to Hao Wang, johnka...@gmail.com, Greenplum Users
Thanks Hao. That's a very useful thing to know. I'll use try it next.

Daniel 
Reply all
Reply to author
Forward
0 new messages