Only some GKE workload logs appears in "log explorer"

Fredrik Blom

unread,

Jul 6, 2021, 11:24:53 AM7/6/21

to Google Stackdriver Discussion Forum

Hi everyone!

I have a GKE cluster (1.19.9-gke.1900) consisting of spring boot apps. Normal operation is that all the STDOUT logs from the spring boot apps ends up in "Log explorer".

In the "kubectl logs <pod name>", all the logs show up great. Perfect. But, in the "Log explorer", only 5 % of the logs ends up. This is kinda critical for us, as we are basically blind in our production environment.

So; Anyone knows what could cause only some our workload logs appear in the log explorer?

(We havent done anything with the fluentbit - totally managed by GCP. And after my understanding, the error have to be somewhere there?)

Thanks,

Fredrik

Charles Baer

unread,

Jul 6, 2021, 1:10:55 PM7/6/21

to Fredrik Blom, Nathan Beach, Rich Gowman, Google Stackdriver Discussion Forum

+Nathan Beach

Hi Fredrik,

I'm a product manager in Cloud Logging. Thanks for reaching out about this issue.

Without fully troubleshooting the issue, it's not possible to identify the specific issue. However, the managed GKE logging agent is guaranteed at least 100KiB/s throughput and performance can be higher depending on other node factors. I've included the specific documentation below for reference.

Source: https://cloud.google.com/stackdriver/docs/solutions/gke/managing-logs#defaults

"The dedicated Logging agent guarantees at least 100 KB per second log throughput per node for workload logs. If a node is underutilized, then depending on the type of log load (for example, text or structured log entries, very few containers on the node or many containers), the dedicated logging agent might provide throughput as much as 500 KB per second or more. Be aware, however, that at higher throughputs, some logs may be lost."

If your workloads on a GKE node are generating significantly more than 100KiB/s, then it's possible that the logs are not being collected due to the log volume.

+Rich Gowman for comment on GKE log volume and investigating partial log collection from GKE

Thanks,

-Charles

--
© 2021 Google Inc. 1600 Amphitheatre Parkway, Mountain View, CA 94043

Email preferences: You received this email because you signed up for the Google Stackdriver Discussion Google Group (google-stackdr...@googlegroups.com) to participate in discussions with other members of the GoogleStackdriver community.
---
You received this message because you are subscribed to the Google Groups "Google Stackdriver Discussion Forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to google-stackdriver-d...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/google-stackdriver-discussion/dccfc0f5-ca19-4198-864c-5f4794d06b03n%40googlegroups.com.

Rich Gowman

unread,

Jul 7, 2021, 10:50:08 PM7/7/21

to Charles Baer, Fredrik Blom, Nathan Beach, Google Stackdriver Discussion Forum

(Second attempt at posting this to the group; apologies to Charles/Fredrik/Nathan who are probably seeing this for the second time.)

Hi Fredrik,

I'd guess that Charles is correct; it's probably a log volume issue. If you're generating less than 100kb/s per node or are simply unsure how much you're generating, feel free to send me your gcp project number and cluster name (via a private email), and I can take a look at some of our internal metrics to see if I can figure out what's going on.

If you're generating more than 100kb/s, then there's a few workarounds:

1. Generate less logs.

2. Leave the node in question partially idle. This will allow fluentbit to pick up extra cpu cycles and process more logs.

3. Run your own instance of fluentbit with a higher resource allocation.

We realize, none of these workarounds are ideal... :(

The underlying root cause of the 100kb/s limitation is that we only give a small resource allocation to fluentbit so as to leave more resources available for your workloads.