*Problem statement*
Bulk pluginsync operations can unexpectedly fail due to transient issues outside of Puppet's control, such as network failures. This can result in messages such as: {code:java} 2021-02-04T10:51:02.967+13:00 E: 'curl --tlsv1 -k -o /tmp/tmp.LLHDefu2lQ -L --write-out %{http_code} -s https://pe.example.com:8140/packages/bulk_pluginsync.tar.gz ' failed (exit_code: 18) {code} Where curl exit code 18 is: {quote}CURLE_PARTIAL_FILE (18)
A file transfer was shorter or larger than expected. This happens when the server first reports an expected transfer size, and then delivers data that doesn't match the previously given size. {quote} *Steps to reproduce*
During bulk pluginsync's curl transfer, externally halt networking during a bulk pluginsync /disconnect/sever network traffic to the recipient .
OR
During bulk pluginsync's curl transfer, fill local storage to capacity on the recipient .
*Expected behavior*
The curl transfer is re-tried at least once, to automatically rule out transient disruptions.
*Observed behavior*
The curl transfer fails, causing installation to fail.
*User impact*
Many Puppet nodes are automatically provisioned and destroyed on failure. If bulk pluginsync fails during provisioning, it cannot be retried manually. This can cause automated node provisioning workflows to fail without recourse due to an unhandled transient issue external to Puppet, such as network instability. These can have no recourse, especially if the nodes are provisioned in cloud providers whose networks the user cannot manage.
Retrying the failuire can also help confirm whether the issue is not related to networking, since E_PARTIAL_FILE can occur when local storage is full, and also when the response is larger than expected. In both instances, it should also consistently fail upon retry, ruling out transient network effects. |
|