Jira (PUP-10887) Retry bulk pluginsync curl operations if they fail with CURLE_PARTIAL_FILE

13 views
Skip to first unread message

Garrett Guillotte (Jira)

unread,
Feb 4, 2021, 5:34:03 PM2/4/21
to puppe...@googlegroups.com
Garrett Guillotte moved an issue
 
Puppet / Bug PUP-10887
Retry bulk pluginsync curl operations if they fail with CURLE_PARTIAL_FILE
Change By: Garrett Guillotte
Key: PE PUP - 31089 10887
Project: Puppet Enterprise [Internal]
Add Comment Add Comment
 
This message was sent by Atlassian Jira (v8.5.2#805002-sha1:a66f935)
Atlassian logo

Garrett Guillotte (Jira)

unread,
Feb 4, 2021, 5:34:04 PM2/4/21
to puppe...@googlegroups.com
Garrett Guillotte updated an issue
 
Puppet / Improvement PUP-10887
Change By: Garrett Guillotte
Method Found: Customer Feedback
Issue Type: Bug Improvement

zendesk.jira (Jira)

unread,
Feb 4, 2021, 5:35:04 PM2/4/21
to puppe...@googlegroups.com
zendesk.jira updated an issue
Change By: zendesk.jira
Zendesk Ticket Count: 1
Zendesk Ticket IDs: 43034

zendesk.jira (Jira)

unread,
Feb 4, 2021, 5:35:04 PM2/4/21
to puppe...@googlegroups.com

Garrett Guillotte (Jira)

unread,
Feb 4, 2021, 5:36:02 PM2/4/21
to puppe...@googlegroups.com
Garrett Guillotte updated an issue
Change By: Garrett Guillotte
*Problem statement*

Bulk pluginsync operations can unexpectedly fail due to transient issues outside of Puppet's control, such as network failures. This can result in messages such as:
{code:java}

2021-02-04T10:51:02.967+13:00 E: 'curl --tlsv1 -k -o /tmp/tmp.LLHDefu2lQ -L --write-out %{http_code} -s https://pe.example.com:8140/packages/bulk_pluginsync.tar.gz ' failed (exit_code: 18) {code}
Where curl exit code 18 is:
{quote}CURLE_PARTIAL_FILE (18)

A file transfer was shorter or larger than expected. This happens when the server first reports an expected transfer size, and then delivers data that doesn't match the previously given size.
{quote}
*Steps to reproduce*

During bulk pluginsync's curl transfer, externally halt
networking during a bulk pluginsync /disconnect/sever network traffic to the recipient .

OR

During bulk pluginsync's curl transfer, fill local storage to capacity
on the recipient .

*Expected behavior*

The curl transfer is re-tried at least once, to automatically rule out transient disruptions.

*Observed behavior*

The curl transfer fails, causing installation to fail.

*User impact*

Many Puppet nodes are automatically provisioned and destroyed on failure. If bulk pluginsync fails during provisioning, it cannot be retried manually. This can cause automated node provisioning workflows to fail without recourse due to an unhandled transient issue external to Puppet, such as network instability. These can have no recourse, especially if the nodes are provisioned in cloud providers whose networks the user cannot manage.

Retrying the failuire can also help confirm whether the issue is not related to networking, since E_PARTIAL_FILE can occur when local storage is full, and also when the response is larger than expected. In both instances, it should also consistently fail upon retry, ruling out transient network effects.

Garrett Guillotte (Jira)

unread,
Feb 4, 2021, 5:37:03 PM2/4/21
to puppe...@googlegroups.com
Garrett Guillotte updated an issue
*Problem statement*

Bulk pluginsync operations can unexpectedly fail due to transient issues outside of Puppet's control, such as network failures. This can result in messages such as:
{code:java}2021-02-04T10:51:02.967+13:00 E: 'curl --tlsv1 -k -o /tmp/tmp.LLHDefu2lQ -L --write-out %{http_code} -s https://pe.example.com:8140/packages/bulk_pluginsync.tar.gz ' failed (exit_code: 18) {code}
Where curl exit code 18 is:
{quote}CURLE_PARTIAL_FILE (18)

A file transfer was shorter or larger than expected. This happens when the server first reports an expected transfer size, and then delivers data that doesn't match the previously given size.
{quote}
*Steps to reproduce*

During bulk pluginsync's curl transfer, externally halt/disconnect/sever network traffic to the recipient.


OR

During bulk pluginsync's curl transfer, fill local storage to capacity on the recipient.

*Expected behavior*

The curl transfer is re-tried at least once, to automatically rule out transient disruptions.

*Observed behavior*

The curl transfer fails, causing installation bulk pluginsync to fail.


*User impact*

Many Puppet nodes are automatically provisioned and destroyed on failure. If bulk pluginsync fails during provisioning, it cannot be retried manually. This can cause automated node provisioning workflows to fail without recourse due to an unhandled transient issue external to Puppet, such as network instability. These can have no recourse, especially if the nodes are provisioned in cloud providers whose networks the user cannot manage.

Retrying the failuire can also help confirm whether the issue is not related to networking, since E_PARTIAL_FILE can occur when local storage is full, and also when the response is larger than expected. In both instances, it should also consistently fail upon retry, ruling out transient network effects.

Garrett Guillotte (Jira)

unread,
Feb 4, 2021, 9:16:02 PM2/4/21
to puppe...@googlegroups.com
Garrett Guillotte updated an issue

Thanks, Josh Cooper! I've reset the Team field to Installer and Management.

Change By: Garrett Guillotte
Team: Night's Watch Installer and Management
Reply all
Reply to author
Forward
0 new messages