google-fluentd stress our cluster node

34 views
Skip to first unread message

Salvador Gonzalez

unread,
Nov 23, 2016, 4:41:41 PM11/23/16
to Fluentd Google Group
Some days ago we updated our GKE Cluster to version 1.4.5. Everything seems to work as expected but I'm observing a strange behavior on one of the cluster nodes.

If I run a *top* command I can see a high wa %:

    top - 21:32:09 up 6 days,  9:48,  1 user,  load average: 1.67, 1.66, 1.64
    Tasks: 124 total,   1 running, 123 sleeping,   0 stopped,   0 zombie
    %Cpu(s):  2.4 us,  4.0 sy,  0.0 ni,  0.0 id, 93.6 wa,  0.0 hi,  0.0 si,  0.0 st
    KiB Mem:   3801000 total,  3599260 used,   201740 free,   167304 buffers
    KiB Swap:        0 total,        0 used,        0 free,  2056748 cached

      PID USER      PR  NI  VIRT  RES  SHR S  %CPU %MEM    TIME+  COMMAND
    20296 root      20   0  723m 184m    0 S   2.0  5.0  28:20.27 google-fluentd


If I run a *iotop* command I can see the google-fluentd it's reading and writing a lot:

    Total DISK READ:      20.96 M/s | Total DISK WRITE:       0.00 B/s

    TID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN     IO>    COMMAND
    20298 be/4 root        2.89 M/s    0.00 B/s  0.00 % 49.48 % ruby /usr/sbin/google-fluentd -q
    20329 be/4 root        6.76 M/s    0.00 B/s  0.00 % 33.28 % ruby /usr/sbin/google-fluentd -q
    20331 be/4 root        3.60 M/s    0.00 B/s  0.00 % 21.26 % ruby /usr/sbin/google-fluentd -q
    20334 be/4 root        2.95 M/s    0.00 B/s  0.00 % 16.17 % ruby /usr/sbin/google-fluentd -q
    20350 be/4 root     1455.94 K/s    0.00 B/s  0.00 % 14.73 % ruby /usr/sbin/google-fluentd -q
    20335 be/4 root      908.98 K/s    0.00 B/s  0.00 %  7.88 % ruby /usr/sbin/google-fluentd -q
    20336 be/4 root     1794.35 K/s    0.00 B/s  0.00 %  7.23 % ruby /usr/sbin/google-fluentd -q


Those are fluent pod logs:

> 2016-11-22 16:44:50 +0000 [warn]: temporarily failed to flush the
> buffer. next_retry=2016-11-22 16:13:48 +0000
> error_class="Faraday::ConnectionFailed" error="end of file reached"
> plugin_id="object:20fcccc"   2016-11-22 16:46:46 +0000 [warn]:
> suppressed same stacktrace 2016-11-22 17:15:21 +0000 [warn]: retry
> succeeded. plugin_id="object:20fcccc" 2016-11-22 17:15:25 +0000
> [warn]: retry succeeded. plugin_id="object:20fcccc" 2016-11-22
> 17:14:57 +0000 [warn]: temporarily failed to flush the buffer.
> next_retry=2016-11-22 16:43:52 +0000
> error_class="Faraday::ConnectionFailed" error="end of file reached"
> plugin_id="object:20fcccc"   2016-11-22 17:17:15 +0000 [warn]:
> suppressed same stacktrace 2016-11-22 17:44:11 +0000 [warn]: retry
> succeeded. plugin_id="object:20fcccc" 2016-11-22 17:44:20 +0000
> [warn]: retry succeeded. plugin_id="object:20fcccc" 2016-11-22
> 17:44:33 +0000 [warn]: retry succeeded. plugin_id="object:20fcccc"
> 2016-11-22 17:43:37 +0000 [warn]: temporarily failed to flush the
> buffer. next_retry=2016-11-22 17:14:00 +0000
> error_class="Faraday::ConnectionFailed" error="end of file reached"
> plugin_id="object:20fcccc"2016-11-22 17:44:32 +0000 [warn]: retry
> succeeded. plugin_id="object:20fcccc"
>   2016-11-22 17:45:30 +0000 [warn]: suppressed same stacktrace
> 2016-11-22 18:12:34 +0000 [warn]: retry succeeded.
> plugin_id="object:20fcccc" 2016-11-22 18:12:58 +0000 [warn]: retry
> succeeded. plugin_id="object:20fcccc" 2016-11-22 18:12:20 +0000
> [warn]: temporarily failed to flush the buffer. next_retry=2016-11-22
> 17:42:49 +0000 error_class="Faraday::ConnectionFailed" error="end of
> file reached" plugin_id="object:20fcccc"   2016-11-22 18:14:43 +0000
> [warn]: suppressed same stacktrace 2016-11-22 18:43:27 +0000 [warn]:
> retry succeeded. plugin_id="object:20fcccc" 2016-11-22 18:43:23 +0000
> [warn]: retry succeeded. plugin_id="object:20fcccc" 2016-11-22
> 18:43:36 +0000 [warn]: retry succeeded. plugin_id="object:20fcccc"
> 2016-11-22 18:42:39 +0000 [warn]: temporarily failed to flush the
> buffer. next_retry=2016-11-22 18:11:12 +0000
> error_class="Faraday::ConnectionFailed" error="end of file reached"
> plugin_id="object:20fcccc" 2016-11-22 18:43:45 +0000 [warn]: retry
> succeeded. plugin_id="object:20fcccc"   2016-11-22 18:44:45 +0000
> [warn]: suppressed same stacktrace

I already restarted the fluentd pod several times.

On our kubernetes cluster, stack driver logging is enabled as you can see here:

> Stackdriver Logging   Enabled 

> Stackdriver Monitoring   Disabled


Any idea whats happening?

Mr. Fiber

unread,
Dec 5, 2016, 2:26:16 AM12/5/16
to Fluentd Google Group
fluentd works with no problem when use v1.4.4 or earlier?
Or v1.4.5 introduces fluentd?

I'm not familiar with k8s, so I want to know the difference points between
v1.4.5 and others.


Masahiro

--
You received this message because you are subscribed to the Google Groups "Fluentd Google Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to fluentd+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages