Memory utilization on GP and spill to disk

Matias Montroull

unread,

Nov 2, 2021, 8:40:56 AM11/2/21

to Greenplum Users

Hi,

We've installed GP 6.13 and I have a question about memory, we've noticed the memory utilization on the cluster (through command center and AWS console) doesn't go beyond 15% although there's available memory.

We have a 10 nodes cluster, 60 primary segments and 60 mirror. Each host has 128GB RAM + 32GB Swap.

We're using resource groups, here's some helpful configuration on the cluster to figure out the problem:

gp_resource_group_memory_limit = 0.70

shared_buffers = 125MB

memory policy = eager free

work_mem = 32MB (I think this is deprecated and replaced by statement_mem)

statement_mem = 1574MB

spill ratio = 0 (master)

spill ratio = 20 (hosts)

There's one resource group we created with concurrency = 15, memory limit = 50% and spill ratio = 100%. When a single query is running on this group, it barely reaches 40GB before spilling to disk, we would expect more RAM to be used before spilling to disk when the resource group is literally idle with no queries running on it.

Is this assumption correct?

1) on each host, we have 89.6GB available for GP (128GB*0.7)

2) on a resource group set to memory limit 50% we have 44.8GB available for the resource group

3) since we've set the concurrency to 15, each query gets 2.98GB initially allocated per host, that is roughly 30GB overall per query (2.98 times 10 hosts). This could explain why we don't get more memory on queries?

How can we increase the memory utilization? which parameters play in this?

I'll be happy to provide additional information that can help troubleshoot

Regards

Zhenghua Lyu

unread,

Nov 2, 2021, 10:49:21 AM11/2/21

to Matias Montroull, Greenplum Users

Hi

Your computation for one single quota for one concurrency in the resgroup is correct.

Let me explain the spill ratio parameter here.

Spill ratio is used in master, after you get the plan, and just start the executor to dispatch the plan,

it will first compute the query_mem of the plan, and based on this value, it walk the plan and

set each plannode's operatorMemKB.

Firstly, we have to understand memSpill mode and how it relates to query_mem:

1. memspill is in percentage mode:

* the guc memory_spill_ratio is not 0

* query_mem = this_group's_mem_quota * memory_spill_ratio / concurrency

2. memspill is in absolute value mode:

* the guc memory_spill_ratio is 0

* query_mem is just the statement_mem

And the GUC memory_spill_ratio's value is based on current group.

Next, gpdb will set each plannode's operatorMemKB based on query_mem and some policy (means, we

divide the query_mem into pieces based on some policy, and assign each plannode a piece)

Then we dispatch to segments to run the query (plan). Take hash agg as an example, when we run ExecAgg

on segments, if we have not created the hash table, it will create a hash table, and based on the operatorMemKB

stored in the hash agg node, it will estimate how many entries this hash table can hold, and when later executing,

we are going to insert more entries into the hash table, we will spill to disk.

---------

General advice for more mem use might be:

1. inc mem_limit

2. dec concurrency

3. large spill ratio

From: Matias Montroull <mati...@gmail.com>
Sent: Tuesday, November 2, 2021 8:40 PM
To: Greenplum Users <gpdb-...@greenplum.org>
Subject: [gpdb-users] Memory utilization on GP and spill to disk

--
You received this message because you are subscribed to the Google Groups "Greenplum Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gpdb-users+...@greenplum.org.
To view this discussion on the web visit https://groups.google.com/a/greenplum.org/d/msgid/gpdb-users/5a212ed8-f1f0-446c-8652-0647ee629895n%40greenplum.org.

Ivan Novick

unread,

Nov 2, 2021, 12:23:08 PM11/2/21

to Zhenghua Lyu, Matias Montroull, Greenplum Users

Hi Zhenhua and Matias,

How about trying to set memory_spill_ratio to 0 and then increase the statement_mem and see if you can get more utilization. What do you think?

Ivan

From: Zhenghua Lyu <zl...@vmware.com>
Sent: Tuesday, November 2, 2021 7:49 AM
To: Matias Montroull <mati...@gmail.com>; Greenplum Users <gpdb-...@greenplum.org>
Subject: Re: [gpdb-users] Memory utilization on GP and spill to disk

To view this discussion on the web visit https://groups.google.com/a/greenplum.org/d/msgid/gpdb-users/CO6PR05MB75068128798B48947C2B3D10B58B9%40CO6PR05MB7506.namprd05.prod.outlook.com.

Matias Montroull

unread,

Nov 2, 2021, 12:33:53 PM11/2/21

to Ivan Novick, Zhenghua Lyu, Greenplum Users

Thanks for all the suggestions, yeah I tried a couple queries setting the spill to 0 and statement mem manually and works, I can see way more memory usage and queries finish in half time but I was wondering how I could achieve more memory utilization by tweaking the resource groups (and possibly other parameters)

Ivan Novick

unread,

Nov 2, 2021, 12:37:15 PM11/2/21

to Matias Montroull, Zhenghua Lyu, Greenplum Users

Ok so then we have established that when the computed per query memory allocation is higher, more memory is used.

To get it higher, you can:

-- globally increase the statement_mem and set memory_spill_ratio to 0

or

-- increase memory_spill_ratio to 100 and make sure you have not created too much concurrency then you need and number of groups then you need

This should provide increased or maximum per query allocation

Ivan

From: Matias Montroull <mati...@gmail.com>
Sent: Tuesday, November 2, 2021 9:33 AM
To: Ivan Novick <ino...@vmware.com>
Cc: Zhenghua Lyu <zl...@vmware.com>; Greenplum Users <gpdb-...@greenplum.org>

Matias Montroull

unread,

Nov 2, 2021, 12:39:10 PM11/2/21

to Greenplum Users, Matias Montroull, zlyu, Greenplum Users, inovick

How does the memory shared quota affect the overall available memory for the group?

Luis Filipe de Macedo

unread,

Nov 2, 2021, 12:40:51 PM11/2/21

to Matias Montroull, Greenplum Users, Matias Montroull, Zhenghua Lyu, Greenplum Users, Ivan Novick

You can also increase gp_resource_group_memory_limit to 80% to start with. 70 is kind of conservative in my opinion... Specially for you RAM to core ratio.

Obter o Outlook para Android

From: Matias Montroull <mati...@gmail.com>
Sent: Tuesday, November 2, 2021 1:39:10 PM
To: Greenplum Users <gpdb-...@greenplum.org>
Cc: Matias Montroull <mati...@gmail.com>; Zhenghua Lyu <zl...@vmware.com>; Greenplum Users <gpdb-...@greenplum.org>; Ivan Novick <ino...@vmware.com>

To view this discussion on the web visit https://groups.google.com/a/greenplum.org/d/msgid/gpdb-users/0d8adc2f-7bc1-4980-9818-5a0ee29baabcn%40greenplum.org.

Ivan Novick

unread,

Nov 2, 2021, 1:13:08 PM11/2/21

to Matias Montroull, Zhenghua Lyu, Greenplum Users

As it says in the documentation:

MEMORY_SHARED_QUOTA threshold set aside a percentage of memory allotted to the resource group to share across transactions. This shared memory is allocated on a first-come, first-served basis as available. A transaction may use none, some, or all of this memory. The minimum memory shared quota percentage you can specify for a resource group is 0. The maximum is 100. The default MEMORY_SHARED_QUOTA value is 80.

So if you have a variable amount of memory requirement per query in the group setting this higher can help to get more memory allocated for single queries that need it.

Rereading your original email you said:

since we've set the concurrency to 15, each query gets 2.98GB initially allocated per host, that is roughly 30GB overall per query (2.98 times 10 hosts). This could explain why we don't get more memory on queries?

Are you planning on having 15 queries running in this group all the time? If not than the shared quota seems will help.

Ivan

From: Ivan Novick <ino...@vmware.com>
Sent: Tuesday, November 2, 2021 9:37 AM
To: Matias Montroull <mati...@gmail.com>

To view this discussion on the web visit https://groups.google.com/a/greenplum.org/d/msgid/gpdb-users/BY3PR05MB838608C50C1CDC8E8FABEAB9D08B9%40BY3PR05MB8386.namprd05.prod.outlook.com.

Matias Montroull

unread,

Nov 2, 2021, 2:09:10 PM11/2/21

to Greenplum Users, inovick, zlyu, Greenplum Users, Matias Montroull

Right, we won't have 15 running all the time, I'll increase the shared quota to 80 for that group.

Another question, I increased the memory limit in one group and we started to see out of memory errors on some queries, what would trigger that?

Ivan Novick

unread,

Nov 2, 2021, 4:03:11 PM11/2/21

to Matias Montroull, Greenplum Users, Zhenghua Lyu

What kind of OOM did you receive? Can you share the error message?

From: Matias Montroull <mati...@gmail.com>
Sent: Tuesday, November 2, 2021 11:09 AM
To: Greenplum Users <gpdb-...@greenplum.org>
Cc: Ivan Novick <ino...@vmware.com>; Zhenghua Lyu <zl...@vmware.com>; Greenplum Users <gpdb-...@greenplum.org>; Matias Montroull <mati...@gmail.com>

Matias Montroull

unread,

Nov 2, 2021, 4:48:00 PM11/2/21

to Ivan Novick, Greenplum Users, Zhenghua Lyu

It's high vmem memory usage type errors. The gp_vmem_protect_limit is set to 8192. Shall we increase this limit?

Ivan Novick

unread,

Nov 2, 2021, 4:54:32 PM11/2/21

to Matias Montroull, Greenplum Users, Zhenghua Lyu

can you share the exact error message please? I didn't think the guc you mentioned is relevant with resource groups

thx

Ivan

From: Matias Montroull <mati...@gmail.com>
Sent: Tuesday, November 2, 2021 1:47 PM
To: Ivan Novick <ino...@vmware.com>
Cc: Greenplum Users <gpdb-...@greenplum.org>; Zhenghua Lyu <zl...@vmware.com>

Matias Montroull

unread,

Nov 2, 2021, 10:11:46 PM11/2/21

to Greenplum Users, inovick, Greenplum Users, zlyu, Matias Montroull

I get this:

SQL Error [XX000]: ERROR: Canceling query because of high VMEM usage. current group id is 25420, group memory usage 8836 MB, group shared memory quota is 4496 MB, slot memory quota is 298 MB, global freechunks memory is 444 MB, global safe memory threshold is 448 MB (runaway_cleaner.c:197) (seg42 slice18 10.0.14.40:6000 pid=10875) (runaway_cleaner.c:197)

attached the current Resource Group set up

2021-11-02_23-10.png

Matias Montroull

unread,

Nov 3, 2021, 6:55:55 AM11/3/21

to Greenplum Users, Matias Montroull, inovick, Greenplum Users, zlyu

I just noticed something by looking at the gp_resgroup_status_per_host and gp_resgroup_status_per_segment..

The available memory for this resource group per segment (6 segments) is 8966MB, so maybe the query went over this value on a particular segment (maybe bad distribution?). Or it just hit the pg_vmem_protect_limit on a segment (currently set to 8192MB). Bad query distribution probably. I think I'm starting to get it.

Yao Wang

unread,

Nov 3, 2021, 7:11:57 AM11/3/21

to Greenplum Users, mati...@gmail.com, inovick, Greenplum Users, zlyu

I am trying to clarify the situation based the info you provided.

1. We don't have 89.6GB (128GB*0.7) available for GP. The correct algorithm is like this (maybe not precise because you are changing the settings, just an example for calculation):

total_memory = total_RAM * overcommit_ratio/100 + total_Swap = 128*0.5 + 32 = 96G

Doc about vm.overcommit_ratio: "This Linux kernel parameter, set in /etc/sysctl.conf, identifies the percentage of RAM that is used for application processes; the remainder is reserved for the operating system. " It seems you didn't mention the parameter, and I assume it's 50% for calculation.

2. It's the memory for GP per segment:

total_segment_memory = total_memory * gp_resource_group_memory_limit/segment_count = 96*0.7/6 = 11.2G

3. It's the memory for current group per segment:

resource_group_memory_per_segment = total_segment_memory*memory_limit/100 = 11.2*50/100 = 5.6G

4. For every transaction in this group, memory limit will include two parts:

Fixed memory quota: transaction_fixed_memory_per_segment = resource_group_memory_per_segment * (100-memory_shared_quota)/(100*concurrency) = 5.6 * (100 - 80)/ (100*15) ~= 74.7M

Shared memory quota: transaction_shared_memory_per_segment = resource_group_memory_per_segment * memory_shared_quota)/100 = 5.6 * 80/ 100 ~= 4.48G

(Note it matches the message "group shared memory quota is 4496 MB" in the OOM message)

5. For any new transaction, GPDB allocates some initial memory quota according to memory_spill_ratio:

transaction_initial_quota_per_segment = (resource_group_memory_per_segment * MEMORY_SPILL_RATIO)/(100*concurrency) = (5.6 * 100)/(100*15) ~= 467M

Please note the actual initial memory might be smaller than the value. As my understanding, it will not be larger than fixed memory quota (74.7M for the case).

Some suggestions:

Check vm.overcommit_ratio and increase it if possible.
Increase gp_resource_group_memory_limit (e.g. 90%) if there is no other task on the single host. (Luis has mentioned it)
Try to decrease memory_shared_quota (e.g. 20%). Shared memory is used when fixed quota is exceeded and it is not dedicated. It seems 80% is too high unless you do have some special queries requiring a large number of memory occasionally.
Set MEMORY_SPILL_RATIO to a proper value. MEMORY_SPILL_RATIO is to define the percentage memory you want the queries to start spilling to file. If it's 0, statement_mem and max_statement_mem are used (but I don't recommend it for resource group). If it's 100%, you will hit memory limit before spilling to file. 30% to 50% might be proper, but it depends on actual workload.
Decrease the concurrency number if possible.
gp_vmem_protect_limit is NOT used when RG is enabled. Just ignore it.

Please let me know the result. Thanks.

Yao Wang

VMWare Greenplum

Matias Montroull

unread,

Nov 3, 2021, 7:31:04 AM11/3/21

to Greenplum Users, wa...@vmware.com, Matias Montroull, inovick, Greenplum Users, zlyu

Thanks! it clarifies a lot.

We currently have the overcommit ratio set to 95 in sysctl.conf

We lowered the shared quota to 50 to see if it helps and apparently it did so far.

Based on the above I have in my cluster:

total_memory = total_RAM * overcommit_ratio/100 + total_Swap = 128*0.9 + 32 = 147.2G

total_segment_memory = total_memory * gp_resource_group_memory_limit/segment_count = 147.2*0.7/6 = 17.2G

resource_group_memory_per_segment = total_segment_memory*memory_limit/100 = 17.2*50/100 = 8.59G

Matias Montroull

unread,

Nov 3, 2021, 10:38:40 AM11/3/21

to Greenplum Users, Matias Montroull, wa...@vmware.com, inovick, Greenplum Users, zlyu

last question, how the shared pool (sum of all memory limits - 100) works in all these maths?

If I left 25% to the global shared pool, how much memory that is for a given query? it can use up to the remaining non_resource_group_allocated or it is also dependent on the concurrency overall across all groups?

Yao Wang

unread,

Nov 3, 2021, 10:56:02 PM11/3/21

to Greenplum Users, mati...@gmail.com, Yao Wang, inovick, Greenplum Users, zlyu

It's from doc:

The sum of the MEMORY_LIMITs configured for all resource groups (including the default admin_groupand default_group groups) identifies the percentage of reserved resource group memory. If this sum is less than 100, Greenplum Database allocates any unreserved memory to a resource group global shared memory pool.

When available, Greenplum Database allocates global shared memory to a transaction after first allocating slot and resource group shared memory (if applicable). Greenplum Database allocates resource group global shared memory to transactions on a first-come first-served basis.

For example, if you have two groups with MEMORY_LIMIT = 30%/50%, the size of global shared memory is (100 - 30 - 50) = 20(%).

Global shared memory is shared among groups. For a transaction, it allocates slot memory firstly, (if hits the limit) then resource group shared memory, (if hits the limit) then global shared memory.

Global shared memory is available only to resource groups for now.

Matias Montroull

unread,

Nov 4, 2021, 4:47:56 PM11/4/21

to Yao Wang, Greenplum Users, inovick, zlyu

Thanks, is there a way to monitor this?

Same way I can monitor the resource groups per host/segment using the available view?

Yao Wang

unread,

Nov 4, 2021, 10:38:54 PM11/4/21

to Greenplum Users, mati...@gmail.com, Greenplum Users, inovick, zlyu, Yao Wang

You can use gp_toolkit.gp_resgroup_status_per_host, see the doc:

https://gpdb.docs.pivotal.io/6-18/ref_guide/system_catalogs/gp_resgroup_status_per_host.html

Note the column memory_shared_used:

The group shared memory used by the resource group on the host. If any global shared memory is used by the resource group, this amount is included in the total as well.

I don't know any special way to monitor global share memory.

Matias Montroull

unread,

Nov 5, 2021, 7:06:27 AM11/5/21

to Yao Wang, Greenplum Users, inovick, zlyu

Interesting, when I do the math, no global shared memory is added to the resource group memory shared quota, why this could be?

Matias Montroull

unread,

Nov 5, 2021, 7:17:18 AM11/5/21

to Greenplum Users, Matias Montroull, Greenplum Users, inovick, zlyu, wa...@vmware.com

never mind, my math was incorrect, it does have it

Matias Montroull

unread,

Nov 5, 2021, 7:26:11 AM11/5/21

to Greenplum Users, Matias Montroull, Greenplum Users, inovick, zlyu, wa...@vmware.com

Sorry, it doesn't

According to my math, I have 110G available for resource groups. 50% is allocated to 1 resource group, that's 55G, then another 3 groups sum up to 25%, the remaining 25% I don't see it in the shared_memory_quota. The sum of shared_quota + memory_quota is just 55G, I was expecting some of the 27.5GB available in the global shared pool be somewhere in this view...

Yao Wang

unread,

Nov 8, 2021, 2:57:51 AM11/8/21

to Greenplum Users, mati...@gmail.com, Greenplum Users, inovick, zlyu, Yao Wang

Did you take into account default admin_group and default_group groups?

Matias Montroull

unread,

Nov 8, 2021, 5:14:17 AM11/8/21

to Greenplum Users, wa...@vmware.com, Matias Montroull, Greenplum Users, inovick, zlyu

Yes, I've taken into account all groups in my maths, and total memory does get up to 75% of the available GP memory but I'm not seeing the global shared memory anywhere, it's just missing from the view or not available. I'm pretty sure it is somewhere as my default group has memory_limit 0 and queries get executed in that group at times with no failures but in that particular group, the total memory_shared_available + memory_shared_used is 0 which is quite strange as I was expecting to see some of the global shared memory in the memory_shared bucket.

Yao Wang

unread,

Nov 9, 2021, 8:43:21 AM11/9/21

to Greenplum Users, mati...@gmail.com, Yao Wang, Greenplum Users, inovick, zlyu

Could you please show your complete statements to create/modify each group and the query result of gp_toolkit.gp_resgroup_status_per_host? Thanks.

Matias Montroull

unread,

Nov 9, 2021, 8:57:15 AM11/9/21

to Yao Wang, Greenplum Users, inovick, zlyu

I noticed today that the available shared quota memory goes negative at times so I'm assuming it is using the global shared memory pool.

I ran a few queries on a group with memory limit 0 and I noticed memory shared quota available goes from 0 to negative values.

Let me know if you still need the info you asked for or if this assumption is OK

Yao Wang

unread,

Nov 10, 2021, 10:38:16 AM11/10/21

to Greenplum Users, mati...@gmail.com, Greenplum Users, inovick, zlyu, Yao Wang

That's ok. The behavior makes sense for me. Just let me know if you have more question.Thanks.

Matias Montroull

unread,

Nov 16, 2021, 6:46:49 AM11/16/21

to Greenplum Users, Yao Wang, Matias Montroull, Greenplum Users, inovick, zlyu

Thanks, I have one more, I noticed for some big queries, the mdw host resource group memory gets consumed and the query fails before it's sent to the segment hosts (I think it fails at planning stage before sent to the segments to run?) Even explain Analyze fails as it executes the query, how can we debug this if we can't even run explain analyze on it?

Thanks

Ashwin Agrawal

unread,

Nov 18, 2021, 3:09:18 PM11/18/21

to Matias Montroull, Greenplum Users, Yao Wang, inovick, zlyu

On Tue, Nov 16, 2021 at 3:46 AM Matias Montroull <mati...@gmail.com> wrote:

Thanks, I have one more, I noticed for some big queries, the mdw host resource group memory gets consumed and the query fails before it's sent to the segment hosts (I think it fails at planning stage before sent to the segments to run?) Even explain Analyze fails as it executes the query, how can we debug this if we can't even run explain analyze on it?

Easiest way to find out if it's failing during the planning stage or later during execution is to just run EXPLAIN and see if the plan is generated or not.

Reply all

Reply to author

Forward