google-fluentd stress our cluster node

34 views

Skip to first unread message

Salvador Gonzalez

unread,

Nov 23, 2016, 4:41:41 PM11/23/16

to Fluentd Google Group

Some days ago we updated our GKE Cluster to version 1.4.5. Everything seems to work as expected but I'm observing a strange behavior on one of the cluster nodes.

If I run a *top* command I can see a high wa %:

top - 21:32:09 up 6 days, 9:48, 1 user, load average: 1.67, 1.66, 1.64

Tasks: 124 total, 1 running, 123 sleeping, 0 stopped, 0 zombie

%Cpu(s): 2.4 us, 4.0 sy, 0.0 ni, 0.0 id, 93.6 wa, 0.0 hi, 0.0 si, 0.0 st

KiB Mem: 3801000 total, 3599260 used, 201740 free, 167304 buffers

KiB Swap: 0 total, 0 used, 0 free, 2056748 cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND

20296 root 20 0 723m 184m 0 S 2.0 5.0 28:20.27 google-fluentd

If I run a *iotop* command I can see the google-fluentd it's reading and writing a lot:

Total DISK READ: 20.96 M/s | Total DISK WRITE: 0.00 B/s

TID PRIO USER DISK READ DISK WRITE SWAPIN IO> COMMAND

20298 be/4 root 2.89 M/s 0.00 B/s 0.00 % 49.48 % ruby /usr/sbin/google-fluentd -q

20329 be/4 root 6.76 M/s 0.00 B/s 0.00 % 33.28 % ruby /usr/sbin/google-fluentd -q

20331 be/4 root 3.60 M/s 0.00 B/s 0.00 % 21.26 % ruby /usr/sbin/google-fluentd -q

20334 be/4 root 2.95 M/s 0.00 B/s 0.00 % 16.17 % ruby /usr/sbin/google-fluentd -q

20350 be/4 root 1455.94 K/s 0.00 B/s 0.00 % 14.73 % ruby /usr/sbin/google-fluentd -q

20335 be/4 root 908.98 K/s 0.00 B/s 0.00 % 7.88 % ruby /usr/sbin/google-fluentd -q

20336 be/4 root 1794.35 K/s 0.00 B/s 0.00 % 7.23 % ruby /usr/sbin/google-fluentd -q

Those are fluent pod logs:

> 2016-11-22 16:44:50 +0000 [warn]: temporarily failed to flush the

> buffer. next_retry=2016-11-22 16:13:48 +0000

> error_class="Faraday::ConnectionFailed" error="end of file reached"

> plugin_id="object:20fcccc" 2016-11-22 16:46:46 +0000 [warn]:

> suppressed same stacktrace 2016-11-22 17:15:21 +0000 [warn]: retry

> succeeded. plugin_id="object:20fcccc" 2016-11-22 17:15:25 +0000

> [warn]: retry succeeded. plugin_id="object:20fcccc" 2016-11-22

> 17:14:57 +0000 [warn]: temporarily failed to flush the buffer.

> next_retry=2016-11-22 16:43:52 +0000

> error_class="Faraday::ConnectionFailed" error="end of file reached"

> plugin_id="object:20fcccc" 2016-11-22 17:17:15 +0000 [warn]:

> suppressed same stacktrace 2016-11-22 17:44:11 +0000 [warn]: retry

> succeeded. plugin_id="object:20fcccc" 2016-11-22 17:44:20 +0000

> [warn]: retry succeeded. plugin_id="object:20fcccc" 2016-11-22

> 17:44:33 +0000 [warn]: retry succeeded. plugin_id="object:20fcccc"

> 2016-11-22 17:43:37 +0000 [warn]: temporarily failed to flush the

> buffer. next_retry=2016-11-22 17:14:00 +0000

> error_class="Faraday::ConnectionFailed" error="end of file reached"

> plugin_id="object:20fcccc"2016-11-22 17:44:32 +0000 [warn]: retry

> succeeded. plugin_id="object:20fcccc"

> 2016-11-22 17:45:30 +0000 [warn]: suppressed same stacktrace

> 2016-11-22 18:12:34 +0000 [warn]: retry succeeded.

> plugin_id="object:20fcccc" 2016-11-22 18:12:58 +0000 [warn]: retry

> succeeded. plugin_id="object:20fcccc" 2016-11-22 18:12:20 +0000

> [warn]: temporarily failed to flush the buffer. next_retry=2016-11-22

> 17:42:49 +0000 error_class="Faraday::ConnectionFailed" error="end of

> file reached" plugin_id="object:20fcccc" 2016-11-22 18:14:43 +0000

> [warn]: suppressed same stacktrace 2016-11-22 18:43:27 +0000 [warn]:

> retry succeeded. plugin_id="object:20fcccc" 2016-11-22 18:43:23 +0000

> [warn]: retry succeeded. plugin_id="object:20fcccc" 2016-11-22

> 18:43:36 +0000 [warn]: retry succeeded. plugin_id="object:20fcccc"

> 2016-11-22 18:42:39 +0000 [warn]: temporarily failed to flush the

> buffer. next_retry=2016-11-22 18:11:12 +0000

> error_class="Faraday::ConnectionFailed" error="end of file reached"

> plugin_id="object:20fcccc" 2016-11-22 18:43:45 +0000 [warn]: retry

> succeeded. plugin_id="object:20fcccc" 2016-11-22 18:44:45 +0000

> [warn]: suppressed same stacktrace

I already restarted the fluentd pod several times.

On our kubernetes cluster, stack driver logging is enabled as you can see here:

> Stackdriver Logging Enabled

> Stackdriver Monitoring Disabled

Any idea whats happening?

Mr. Fiber

unread,

Dec 5, 2016, 2:26:16 AM12/5/16

to Fluentd Google Group

fluentd works with no problem when use v1.4.4 or earlier?

Or v1.4.5 introduces fluentd?

I'm not familiar with k8s, so I want to know the difference points between

v1.4.5 and others.

Masahiro

--
You received this message because you are subscribed to the Google Groups "Fluentd Google Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to fluentd+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all

Reply to author

Forward

0 new messages