HELP! Why So High IO-Wait in my instance running postgresql ?

jerry-soc

unread,

Jun 11, 2014, 9:27:01 AM6/11/14

to gce-dis...@googlegroups.com

I have an instance running postgresql which has about 100G data.

running on a n1-std-2 (2 vcpu 7.5G mem) instance, but the system is always meet high io-wait:

an iostat 1 log:

Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn

sda 0.00 0.00 0.00 0 0

sdb 159.00 2112.00 8.00 2112 8

avg-cpu: %user %nice %system %iowait %steal %idle

2.02 0.00 2.02 95.96 0.00 0.00

Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn

sda 0.00 0.00 0.00 0 0

sdb 169.00 1360.00 40.00 1360 40

avg-cpu: %user %nice %system %iowait %steal %idle

1.49 0.00 1.49 97.01 0.00 0.00

Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn

sda 8.00 0.00 76.00 0 76

sdb 107.00 952.00 48.00 952 48

avg-cpu: %user %nice %system %iowait %steal %idle

22.61 0.00 5.03 72.36 0.00 0.00

Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn

sda 18.00 296.00 140.00 296 140

sdb 162.00 1944.00 0.00 1944 0

and a vmstat log:

0 9 26396 118436 63788 6461500 0 0 1336 24 357 587 5 1 0 95

0 8 26396 144088 63780 6435288 0 0 1472 0 331 471 2 2 0 96

0 8 26396 140244 63780 6439124 0 0 3896 0 382 585 2 1 0 98

0 8 26396 138756 63780 6440628 0 0 1416 0 242 381 1 1 0 98

0 8 26396 137020 63780 6442340 0 0 1776 0 291 462 0 1 0 99

0 9 26396 135408 63788 6443964 0 0 1616 40 295 454 1 1 0 98

0 9 26396 133548 63788 6445568 0 0 1664 0 523 1203 2 3 0 95

0 9 26396 132004 63788 6446704 0 0 1032 0 673 1157 15 5 0 80

0 8 26396 130820 63788 6448156 0 0 1528 0 415 1199 1 1 0 99

0 7 26396 129596 63788 6449512 0 0 1320 0 400 1177 0 0 0 100

0 7 26396 128604 63788 6450608 0 0 1024 0 380 1151 1 1 0 99

0 7 26396 126116 63788 6451772 0 0 1216 0 433 1130 1 3 0 96

0 7 26396 125000 63796 6453012 0 0 1152 60 363 1097 0 2 0 99

0 7 26396 123652 63796 6454344 0 0 1368 0 375 1163 1 0 0 99

0 7 26396 122164 63796 6455616 0 0 1336 0 392 1119 0 2 0 98

0 6 26396 120800 63796 6457040 0 0 1424 0 403 1160 1 1 0 99

0 6 26396 120560 63796 6458540 0 0 1488 0 350 636 2 1 0 97

0 6 26396 119088 63804 6459956 0 0 1448 12 243 405 0 1 0 99

0 6 26396 145500 63796 6433172 0 0 928 0 181 315 0 0 0 100

procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----

r b swpd free buff cache si so bi bo in cs us sy id wa

1 5 26396 141408 63796 6434244 0 0 1184 0 308 474 7 1 0 93

0 5 26396 140044 63796 6435596 0 0 1264 0 211 361 0 1 0 99

see from iostat and vmstat, why IOwait so high though the TPS is not so high ?

I check the average TPS about 180. BPS average about 10M/s, But IOwait always over 50% !

the system is running only Postgresql and a ruby on rails app.

jerry-soc

unread,

Jun 11, 2014, 9:28:45 AM6/11/14

to gce-dis...@googlegroups.com

forgot to mention: the Vol size is 1T.

Eran Sandler

unread,

Jun 11, 2014, 11:37:27 AM6/11/14

to jerry-soc, gce-discussion

You might be hitting the max IO per that instance type. Bigger instances have higher iops caps.

Can you try the same test with a bigger instance?

Eran

--
© 2014 Google Inc. 1600 Amphitheatre Parkway, Mountain View, CA 94043

Email preferences: You received this email because you signed up for the Google Compute Engine Discussion Google Group (gce-dis...@googlegroups.com) to participate in discussions with other members of the Google Compute Engine community and the Google Compute Engine Team.
---
You received this message because you are subscribed to the Google Groups "gce-discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gce-discussio...@googlegroups.com.
To post to this group, send email to gce-dis...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gce-discussion/a8ea5d38-52c9-4691-8dba-f9ea9848e236%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Anthony Voellm

unread,

Jun 11, 2014, 11:44:38 AM6/11/14

to Eran Sandler, jerry-soc, gce-dis...@googlegroups.com

PD IOPs scales with the size of disk. In this case a 100GB disk is expected to get on the order of read 30 IOPs. There are some tokens that make it act like a 1TB disk for a period of time but with heavy use those expire quickly.

Think about how many transactions (aka IOPs) you need and choose a disk size that will meet that rate.

A bigger instance will help also but only after you address the IOPs.

The PD docs have more details.

To view this discussion on the web visit https://groups.google.com/d/msgid/gce-discussion/CALR3k-GQexDqAymqmEPoiwpXHrpLeML0u0qkpZ4sa2NMiee4Dw%40mail.gmail.com.

jerry-soc

unread,

Jun 11, 2014, 11:53:52 AM6/11/14

to gce-dis...@googlegroups.com, er...@sandler.co.il, yao...@socmetrics.com

Thanks Tony,

about IOPs, I monitor my system by iostat 1, and I got an average of TPS(it's just IOPS right?) of 200,

and I have a 1T pdisk, I think it's enough, but the IOwait is still high. maybe I should upgrade the instance type to a bigger one.

but where is the doc talk about instance type and IOps ?

Eran Sandler

unread,

Jun 11, 2014, 12:01:15 PM6/11/14

to jerry-soc, gce-discussion

Here is the doc:

https://developers.google.com/compute/docs/disks

I can't link to a specific section but you will find it under "persistent disk performance"

Anthony Voellm

unread,

Jun 11, 2014, 1:11:19 PM6/11/14

to Eran Sandler, jerry-soc, gce-discussion

If you want lower latency please sign up for our PD on SSD Limited Preview here - https://docs.google.com/a/google.com/forms/d/1SJAg8liHkJqqbySZ6srizFH_HSrUZmuxGjKFIAGQvhw/viewform

To view this discussion on the web visit https://groups.google.com/d/msgid/gce-discussion/CALR3k-EHddfWzFALPAJNneNP2d-x-u7bsR2-Z499EyoyC6p0kQ%40mail.gmail.com.

For more options, visit https://groups.google.com/d/optout.

--
Anthony F. Voellm (aka Tony)

Google Voice: (650) 516-7382

Blog: http://perfguy.blogspot.com/

jerry-soc

unread,

Jun 11, 2014, 1:12:42 PM6/11/14

to gce-dis...@googlegroups.com, er...@sandler.co.il, yao...@socmetrics.com

y, I have signup, but no resp...

Jay Judkowitz

unread,

Jun 11, 2014, 2:05:17 PM6/11/14

to jerry-soc, gce-dis...@googlegroups.com, er...@sandler.co.il

We sent a response about 10 minutes ago. Did you get it?

To view this discussion on the web visit https://groups.google.com/d/msgid/gce-discussion/f80fdb5b-98b0-4df4-8370-9840e28c594d%40googlegroups.com.

jerry-soc

unread,

Jun 12, 2014, 4:31:55 AM6/12/14

to gce-dis...@googlegroups.com, yao...@socmetrics.com, er...@sandler.co.il

got it, thanks, trying.

jerry-soc

unread,

Jun 12, 2014, 10:26:40 PM6/12/14

to gce-dis...@googlegroups.com, yao...@socmetrics.com, er...@sandler.co.il

Is't stable enough for production use ?

jerry-soc

unread,

Jun 12, 2014, 10:27:43 PM6/12/14

to gce-dis...@googlegroups.com, yao...@socmetrics.com, er...@sandler.co.il

Is't stable enough for production use ?

On Thursday, June 12, 2014 2:05:17 AM UTC+8, judkowitz wrote:

Reply all

Reply to author

Forward