Proposal: Add retries for repo downloads

10 views
Skip to first unread message

Per Olofsson

unread,
5:26 AM (15 hours ago) 5:26 AM
to munk...@googlegroups.com
When munki needs to fetch a resource from the repo any network interruption causes the current run to fail. This is normally not an issue as munki will try again an hour later, or the user can check again in MSC. It is however a missed opportunity and it causes a fair amount of noise in the logs. We're seeing a steady stream of errors in Grafana and MunkiReport — not a huge issue, but it raises the noise floor.

I've experimented a bit with implementing retries in network/middleware/fetch.swift:getHTTPfileIfChangedAtomically(). It adds two new pref keys:

DownloadRetries: int between 0 and 10. Defaults to 0 which is no retries and thus keeps the old behavior.
RetrySleepSeconds: int between 1 and 30. Defaults to 10 seconds.


Are you also seeing these issues and is this something you would find useful?

--
Per Olofsson, IT-service, University of Gothenburg

Allister Banks

unread,
7:49 AM (13 hours ago) 7:49 AM
to munk...@googlegroups.com
This sounds up our alley. I can get more scientific/comprehensively gathered metrics data like percentages, but roughly I do notice the statistically significant failures are with downloading manifests, and our bootstrap can get hung up and miss parts of the relatively minimal set we configure and iterate on when testing new versions of our bootstrap (which starts with --download-only). For the bootstrap we’ve recently built in retries and approximations of a trivial backoff to increase chances of success, but considering we’re employing AWS to CDN’ify our pkgs via Cloudfront and manifests have been in elasticache (reducing the likelihood it’s source/server-side) we’d be appreciative if client/agent-side could gain this functionality.
Allister
Reply all
Reply to author
Forward
0 new messages