(Second attempt at posting this to the group; apologies to Charles/Fredrik/Nathan who are probably seeing this for the second time.)
Hi Fredrik,
I'd guess that Charles is correct; it's probably a log volume issue. If you're generating less than 100kb/s per node or are simply unsure how much you're generating, feel free to send me your gcp project number and cluster name (via a private email), and I can take a look at some of our internal metrics to see if I can figure out what's going on.
If you're generating more than 100kb/s, then there's a few workarounds:
1. Generate less logs.
2. Leave the node in question partially idle. This will allow fluentbit to pick up extra cpu cycles and process more logs.
3. Run your own instance of fluentbit with a higher resource allocation.
We realize, none of these workarounds are ideal... :(
The underlying root cause of the 100kb/s limitation is that we only give a small resource allocation to fluentbit so as to leave more resources available for your workloads.
Thanks,
Rich.