Hello,
We are using the Ops Agent (
https://cloud.google.com/stackdriver/docs/solutions/agents/ops-agent) on our instances to monitor log files on disk and push them to Google (Stackdriver) Logging. The log files to monitor are configured in /etc/google-cloud-ops-agent/config.yaml. Each log file has its own receiver and pipeline, respectively. This has been working well so far.
Over time we have been adding more files. Since yesterday, after adding some additional log files, the Ops Agent does not seem to start any longer. The result is that no logs and metrics are pushed any more. If we revert to the older config.yaml with fewer files, the Ops Agent starts normally again. The old (working) config file contains roughly 60 log files to monitor, the new (non-working) file contains roughly 90.
We have checked the newly added log files for permission problems, very large sizes and so on, but could not find any likely cause for the problem. The status of the Ops Agent service looks like this when the issue is present:
● google-cloud-ops-agent.service - Google Cloud Ops Agent
Loaded: loaded (/usr/lib/systemd/system/google-cloud-ops-agent.service; enabled; vendor preset: enabled)
Active: active (exited) since Wed 2022-04-20 12:49:07 UTC; 20h ago
Process: 28604 ExecStart=/bin/true (code=exited, status=0/SUCCESS)
Process: 28563 ExecStartPre=/opt/google-cloud-ops-agent/libexec/google_cloud_ops_agent_engine -in /etc/google-cloud-ops-agent/config.yaml (code=exited, status=0/SUCCESS)
Main PID: 28604 (code=exited, status=0/SUCCESS)
CGroup: /system.slice/google-cloud-ops-agent.serviceWhen the issue happens there are no startup log entries from the Ops Agent in /var/log/google-cloud-ops-agent/subagents/logging-module.log. Normally there is the startup log, but in our case on a restart the last logs we see is the Ops Agent shutting down and then nothing. So it does not appear to start up correctly, but there is no error message.
What could be preventing the Ops Agent from starting up without logs or errors? Is there any limitation, e.g. the number of files to monitor?
Best regards
Frank Shimizu