HELP! Why So High IO-Wait in my instance running postgresql ?

1,631 views
Skip to first unread message

jerry-soc

unread,
Jun 11, 2014, 9:27:01 AM6/11/14
to gce-dis...@googlegroups.com
I have an instance running postgresql which has about 100G data.

running on a n1-std-2 (2 vcpu 7.5G mem) instance, but the system is always meet high io-wait:

an iostat 1 log:

Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn

sda               0.00         0.00         0.00          0          0

sdb             159.00      2112.00         8.00       2112          8


avg-cpu:  %user   %nice %system %iowait  %steal   %idle

           2.02    0.00    2.02   95.96    0.00    0.00


Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn

sda               0.00         0.00         0.00          0          0

sdb             169.00      1360.00        40.00       1360         40


avg-cpu:  %user   %nice %system %iowait  %steal   %idle

           1.49    0.00    1.49   97.01    0.00    0.00


Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn

sda               8.00         0.00        76.00          0         76

sdb             107.00       952.00        48.00        952         48


avg-cpu:  %user   %nice %system %iowait  %steal   %idle

          22.61    0.00    5.03   72.36    0.00    0.00


Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn

sda              18.00       296.00       140.00        296        140

sdb             162.00      1944.00         0.00       1944          0


and a vmstat log:

 0  9  26396 118436  63788 6461500    0    0  1336    24  357  587  5  1  0 95

 0  8  26396 144088  63780 6435288    0    0  1472     0  331  471  2  2  0 96

 0  8  26396 140244  63780 6439124    0    0  3896     0  382  585  2  1  0 98

 0  8  26396 138756  63780 6440628    0    0  1416     0  242  381  1  1  0 98

 0  8  26396 137020  63780 6442340    0    0  1776     0  291  462  0  1  0 99

 0  9  26396 135408  63788 6443964    0    0  1616    40  295  454  1  1  0 98

 0  9  26396 133548  63788 6445568    0    0  1664     0  523 1203  2  3  0 95

 0  9  26396 132004  63788 6446704    0    0  1032     0  673 1157 15  5  0 80

 0  8  26396 130820  63788 6448156    0    0  1528     0  415 1199  1  1  0 99

 0  7  26396 129596  63788 6449512    0    0  1320     0  400 1177  0  0  0 100

 0  7  26396 128604  63788 6450608    0    0  1024     0  380 1151  1  1  0 99

 0  7  26396 126116  63788 6451772    0    0  1216     0  433 1130  1  3  0 96

 0  7  26396 125000  63796 6453012    0    0  1152    60  363 1097  0  2  0 99

 0  7  26396 123652  63796 6454344    0    0  1368     0  375 1163  1  0  0 99

 0  7  26396 122164  63796 6455616    0    0  1336     0  392 1119  0  2  0 98

 0  6  26396 120800  63796 6457040    0    0  1424     0  403 1160  1  1  0 99

 0  6  26396 120560  63796 6458540    0    0  1488     0  350  636  2  1  0 97

 0  6  26396 119088  63804 6459956    0    0  1448    12  243  405  0  1  0 99

 0  6  26396 145500  63796 6433172    0    0   928     0  181  315  0  0  0 100

procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----

 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa

 1  5  26396 141408  63796 6434244    0    0  1184     0  308  474  7  1  0 93

 0  5  26396 140044  63796 6435596    0    0  1264     0  211  361  0  1  0 99





see from iostat and vmstat, why IOwait so high though the TPS is not so high ?

I check the average TPS about 180. BPS average about 10M/s, But IOwait always over 50% !

the system is running only Postgresql and a ruby on rails app.



jerry-soc

unread,
Jun 11, 2014, 9:28:45 AM6/11/14
to gce-dis...@googlegroups.com
forgot to mention: the Vol size is 1T.

Eran Sandler

unread,
Jun 11, 2014, 11:37:27 AM6/11/14
to jerry-soc, gce-discussion

You might be hitting the max IO per that instance type.  Bigger instances have higher iops caps.

Can you try the same test with a bigger instance?

Eran

--
© 2014 Google Inc. 1600 Amphitheatre Parkway, Mountain View, CA 94043
 
Email preferences: You received this email because you signed up for the Google Compute Engine Discussion Google Group (gce-dis...@googlegroups.com) to participate in discussions with other members of the Google Compute Engine community and the Google Compute Engine Team.
---
You received this message because you are subscribed to the Google Groups "gce-discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gce-discussio...@googlegroups.com.
To post to this group, send email to gce-dis...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gce-discussion/a8ea5d38-52c9-4691-8dba-f9ea9848e236%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Anthony Voellm

unread,
Jun 11, 2014, 11:44:38 AM6/11/14
to Eran Sandler, jerry-soc, gce-dis...@googlegroups.com

PD IOPs scales with the size of disk.  In this case a 100GB disk is expected to get on the order of read 30 IOPs.  There are some tokens that make it act like a 1TB disk for a period of time but with heavy use those expire quickly.

Think about how many transactions (aka IOPs) you need and choose a disk size that will meet that rate.

A bigger instance will help also but only after you address the IOPs.

The PD docs have more details.

jerry-soc

unread,
Jun 11, 2014, 11:53:52 AM6/11/14
to gce-dis...@googlegroups.com, er...@sandler.co.il, yao...@socmetrics.com
Thanks Tony,

about IOPs, I monitor my system by iostat 1, and I got an average of TPS(it's just IOPS right?) of 200, 
and I have a 1T pdisk, I think it's enough, but the IOwait is still high. maybe I should upgrade the instance type to a bigger one.

but where is the doc talk about instance type and IOps ?

Eran Sandler

unread,
Jun 11, 2014, 12:01:15 PM6/11/14
to jerry-soc, gce-discussion

Here is the doc:

https://developers.google.com/compute/docs/disks

I can't link to a specific section but you will find it under "persistent disk performance"

Anthony Voellm

unread,
Jun 11, 2014, 1:11:19 PM6/11/14
to Eran Sandler, jerry-soc, gce-discussion
If you want lower latency please sign up for our PD on SSD Limited Preview here -  https://docs.google.com/a/google.com/forms/d/1SJAg8liHkJqqbySZ6srizFH_HSrUZmuxGjKFIAGQvhw/viewform



For more options, visit https://groups.google.com/d/optout.



--
Anthony F. Voellm (aka Tony)
Google Voice:  (650) 516-7382

jerry-soc

unread,
Jun 11, 2014, 1:12:42 PM6/11/14
to gce-dis...@googlegroups.com, er...@sandler.co.il, yao...@socmetrics.com
y, I have signup, but no resp...

Jay Judkowitz

unread,
Jun 11, 2014, 2:05:17 PM6/11/14
to jerry-soc, gce-dis...@googlegroups.com, er...@sandler.co.il

We sent a response about 10 minutes ago.  Did you get it?

jerry-soc

unread,
Jun 12, 2014, 4:31:55 AM6/12/14
to gce-dis...@googlegroups.com, yao...@socmetrics.com, er...@sandler.co.il
got it, thanks, trying.

jerry-soc

unread,
Jun 12, 2014, 10:26:40 PM6/12/14
to gce-dis...@googlegroups.com, yao...@socmetrics.com, er...@sandler.co.il
Is't stable enough for production use ?

jerry-soc

unread,
Jun 12, 2014, 10:27:43 PM6/12/14
to gce-dis...@googlegroups.com, yao...@socmetrics.com, er...@sandler.co.il
Is't stable enough for production use ?

On Thursday, June 12, 2014 2:05:17 AM UTC+8, judkowitz wrote:
Reply all
Reply to author
Forward
0 new messages