Jira (PUP-8300) File resource can't handle HTTP redirects when server does not accept HEAD requests

5 views
Skip to first unread message

Mariusz Gronczewski (JIRA)

unread,
Jan 9, 2018, 3:50:03 AM1/9/18
to puppe...@googlegroups.com
Mariusz Gronczewski updated an issue
 
Puppet / Bug PUP-8300
File resource can't handle HTTP redirects when server does not accept HEAD requests
Change By: Mariusz Gronczewski
If URL is redirect  and server does not handle HEAD requests (like in that case Amazon API who returns 403 when URL after redirect gets HEAD request) , Puppet fails with unhelpful message:

{noformat}
/opt/puppetlabs/bin/puppet apply /tmp/1.pp        
Notice: Compiled catalog for hydra.devrandom.pl in environment production in 0.01 seconds
Error: Could not retrieve information from environment production source(s) https://github.com/XANi/go-dpp/releases/download/v0.0.4/dpp.aarch64
Error: /Stage[main]/Main/File[/tmp/dpp]/ensure: change from 'absent' to 'present' failed: Could not retrieve information from environment production source(s) https://github.com/XANi/go-dpp/releases/download/v0.0.4/dpp.aarch64
{noformat}

no redirect file works fine
{code:puppet}
 File {
     ensure => present
     }

file { '/tmp/dpp':
    source => 'https://github.com/XANi/go-dpp/releases/download/v0.0.4/dpp.aarch64',
    mode => "644",
    checksum => "sha256",
    checksum_value => '6bda5ca841bf47d81283ee5b73b030c3b82e33e6817d8bb36a1e1006c1b5dd81',
    backup => false,
}
file { '/tmp/rnd':
    source => 'http://192.168.1.1/rnd',
    mode => "644",
    checksum => "sha256",
    checksum_value => '19248907fa84d23721987abf90f9f7cbac44c547fd38ef2c0ce31e194eb4d9ed',
    backup => false,
}
{code}

this was present in 4.8.x version too so it seems like something broken since http was added
Add Comment Add Comment
 
This message was sent by Atlassian JIRA (v7.0.2#70111-sha1:88534db)
Atlassian logo

Josh Cooper (JIRA)

unread,
Jan 9, 2018, 5:01:04 PM1/9/18
to puppe...@googlegroups.com
Josh Cooper commented on Bug PUP-8300
 
Re: File resource can't handle HTTP redirects when server does not accept HEAD requests

Thanks [~Mariusz Gronczewski] I think this is a duplicate of PUP-6380. Can you review, and close this ticket if that's the case?

One note, puppet makes a HEAD request to get metadata about the file, so that it doesn't download the file if it doesn't need to. If it wasn't for that, then puppet could skip HEAD requests altogether.

Mariusz Gronczewski (JIRA)

unread,
Jan 10, 2018, 4:52:03 AM1/10/18
to puppe...@googlegroups.com

Okay, but why it does 3 HEADs then ?

On my localhost server (no redirect, just nginx on localhost serving a directory), just requesting one file yields me access log like that:

127.0.0.1 - - [10/Jan/2018:10:42:18 +0100] "HEAD /rnd HTTP/1.1" 200 0 "-" "Ruby"
127.0.0.1 - - [10/Jan/2018:10:42:18 +0100] "HEAD /rnd HTTP/1.1" 200 0 "-" "Ruby"
127.0.0.1 - - [10/Jan/2018:10:42:18 +0100] "HEAD /rnd HTTP/1.1" 200 0 "-" "Ruby"
127.0.0.1 - - [10/Jan/2018:10:42:18 +0100] "GET /rnd HTTP/1.1" 200 1024 "-" "Ruby"

then it downloads the file and saves it in destination... *without checking the checksum*. On next run, it does check the checksum of file on the disk, then re-downloads it:

Notice: /Stage[main]/Main/File[/tmp/rnd]/checksum_value: checksum_value changed '34bcf0e1286630afc465ac5f6219cabc97f63567e9f206442d8314a451e406a3' to '19248907fa84d23721987abf90f9f7cbac44c547fd38ef2c0ce31e194eb4d9ed'
Notice: Applied catalog in 0.12 seconds

and that is repeated in every run.

So currently checksum functionality is utterly broken with http (do you want me to make a separate ticket about it?) as having "bad" file on target will still download "bad" file to client

as for this ticket, yes, that's a dupe

Josh Cooper (JIRA)

unread,
Jan 19, 2018, 6:34:02 PM1/19/18
to puppe...@googlegroups.com
Josh Cooper commented on Bug PUP-8300

The 4 requests are due to a bug. We first make a HEAD request to get the content-md5 or last-modified "checksums". Note the server must return one of these headers in order for puppet to not download the file every time it runs. Puppet also doesn't support ETag.

For reasons unknown the code would make 2 HEAD requests. I eliminated the second request in https://github.com/puppetlabs/puppet/commit/deac457fe305b487614eb280fe9e67d518d13d97.

If the destination file is not "insync" (it's missing, "checksums" don't match, or the server didn't send a checksum), then we try to download the file contents. We make another HEAD request apparently to see if we should be redirected, and then issue the GET. I don't believe this HEAD request is necessary, and should probably be eliminated. Seems more correct to just issue the GET request, and follow the redirect(s), if any.

Whenever puppet downloads a file, it should write to a tempfile, and verify the downloaded content matches the expected checksum. Note if the http server doesn't send the content-md5 header, then we have to fall back to last-modified, which won't ensure the downloaded file checksum is correct.

So for this ticket it sounds like there are two things:

  • Eliminate the one extra HEAD request
  • Improve documentation that the server must reply with Content-MD5 or Last-Modified in the HEAD response so that the agent doesn't download the file on every run.

And we can keep PUP-6380 open for how to handle servers like S3 that don't allow HEAD requests. Does that make sense?

Mariusz Gronczewski (JIRA)

unread,
Jan 21, 2018, 9:43:02 AM1/21/18
to puppe...@googlegroups.com

Note that if you already know checksum of file does not match you don't need to make any HEAD requests - as you already know you need to download new version of the file. You only need to do HEAD for resources without checksum.

And in that case you could probably just do GET with if-modified-since header but that would be a problem if server does not support it (.. if that is even a problem ? I'm not sure, I'd imagine most do)

Whenever puppet downloads a file, it should write to a tempfile, and verify the downloaded content matches the expected checksum. Note if the http server doesn't send the content-md5 header, then we have to fall back to last-modified, which won't ensure the downloaded file checksum is correct.

At that point, just fail the resource with sensible error. That is horrible security disaster waiting to happen. Just think about it:

  • User changes MD5 hash (which is currently broken) into something secure like sha256. Effect: any attacker can serve any file under that URL because sha256 is silently ignored when downloading file and only indication is same resource running over and over again.
  • Software on web server changes to something that doesn't serve MD5 header - effect same as above, checksum silently ignored.
  • Everything works, MD5 hash is checked - still vulnerable as causing MD5 collision is pretty trivial with current computing power.

With the current implementation checksum + http(s) is utterly pointless.

It should either:

  • checksum tmp file before moving
  • checksum while downloading - most (all?) checksum algorightms support incremental checksumming and that way will probably be a bit faster for big files.

Josh Cooper (JIRA)

unread,
Mar 15, 2018, 7:56:03 PM3/15/18
to puppe...@googlegroups.com
Josh Cooper updated an issue
 
Change By: Josh Cooper
Sub-team: Coremunity
This message was sent by Atlassian JIRA (v7.7.1#77002-sha1:e75ca93)
Atlassian logo

Josh Cooper (Jira)

unread,
May 19, 2020, 2:51:04 PM5/19/20
to puppe...@googlegroups.com
Josh Cooper commented on Bug PUP-8300
 
Re: File resource can't handle HTTP redirects when server does not accept HEAD requests

The issue with HEAD requests failing due to Amazon presigned URLs is filed as PUP-6380.

The issue with duplicate HEAD requests has been resolved now that we're using the HTTP client for all network requests.

The issue with checksums is filed as PUP-10368, and I'm going to close this as a duplicate.

This message was sent by Atlassian Jira (v8.5.2#805002-sha1:a66f935)
Atlassian logo
Reply all
Reply to author
Forward
0 new messages