Jira (PUP-8300) File resource can't handle HTTP redirects when server does not accept HEAD requests

Mariusz Gronczewski (JIRA)

unread,

Jan 9, 2018, 3:50:03 AM1/9/18

to puppe...@googlegroups.com

Mariusz Gronczewski updated an issue

Puppet /

PUP-8300

File resource can't handle HTTP redirects when server does not accept HEAD requests

Change By:	Mariusz Gronczewski

If URL is redirect and server does not handle HEAD requests (like in that case Amazon API who returns 403 when URL after redirect gets HEAD request) , Puppet fails with unhelpful message:

{noformat}
/opt/puppetlabs/bin/puppet apply /tmp/1.pp
Notice: Compiled catalog for hydra.devrandom.pl in environment production in 0.01 seconds
Error: Could not retrieve information from environment production source(s) https://github.com/XANi/go-dpp/releases/download/v0.0.4/dpp.aarch64
Error: /Stage[main]/Main/File[/tmp/dpp]/ensure: change from 'absent' to 'present' failed: Could not retrieve information from environment production source(s) https://github.com/XANi/go-dpp/releases/download/v0.0.4/dpp.aarch64
{noformat}

no redirect file works fine
{code:puppet}
File {
     ensure => present
     }

file { '/tmp/dpp':
    source => 'https://github.com/XANi/go-dpp/releases/download/v0.0.4/dpp.aarch64',
    mode => "644",
    checksum => "sha256",
    checksum_value => '6bda5ca841bf47d81283ee5b73b030c3b82e33e6817d8bb36a1e1006c1b5dd81',
    backup => false,
}
file { '/tmp/rnd':
    source => 'http://192.168.1.1/rnd',
    mode => "644",
    checksum => "sha256",
    checksum_value => '19248907fa84d23721987abf90f9f7cbac44c547fd38ef2c0ce31e194eb4d9ed',
    backup => false,
}
{code}

this was present in 4.8.x version too so it seems like something broken since http was added

Add Comment

This message was sent by Atlassian JIRA (v7.0.2#70111-sha1:88534db)

Josh Cooper (JIRA)

unread,

Jan 9, 2018, 5:01:04 PM1/9/18

to puppe...@googlegroups.com

Josh Cooper commented on

PUP-8300

Re: File resource can't handle HTTP redirects when server does not accept HEAD requests

Thanks [~Mariusz Gronczewski] I think this is a duplicate of PUP-6380. Can you review, and close this ticket if that's the case?

One note, puppet makes a HEAD request to get metadata about the file, so that it doesn't download the file if it doesn't need to. If it wasn't for that, then puppet could skip HEAD requests altogether.

Add Comment

Mariusz Gronczewski (JIRA)

unread,

Jan 10, 2018, 4:52:03 AM1/10/18

to puppe...@googlegroups.com

Mariusz Gronczewski commented on

PUP-8300

Re: File resource can't handle HTTP redirects when server does not accept HEAD requests

Okay, but why it does 3 HEADs then ?

On my localhost server (no redirect, just nginx on localhost serving a directory), just requesting one file yields me access log like that:

127.0.0.1 - - [10/Jan/2018:10:42:18 +0100] "HEAD /rnd HTTP/1.1" 200 0 "-" "Ruby"

127.0.0.1 - - [10/Jan/2018:10:42:18 +0100] "HEAD /rnd HTTP/1.1" 200 0 "-" "Ruby"

127.0.0.1 - - [10/Jan/2018:10:42:18 +0100] "HEAD /rnd HTTP/1.1" 200 0 "-" "Ruby"

127.0.0.1 - - [10/Jan/2018:10:42:18 +0100] "GET /rnd HTTP/1.1" 200 1024 "-" "Ruby"

then it downloads the file and saves it in destination... *without checking the checksum*. On next run, it does check the checksum of file on the disk, then re-downloads it:

Notice: /Stage[main]/Main/File[/tmp/rnd]/checksum_value: checksum_value changed '34bcf0e1286630afc465ac5f6219cabc97f63567e9f206442d8314a451e406a3' to '19248907fa84d23721987abf90f9f7cbac44c547fd38ef2c0ce31e194eb4d9ed'
Notice: Applied catalog in 0.12 seconds

and that is repeated in every run.

So currently checksum functionality is utterly broken with http (do you want me to make a separate ticket about it?) as having "bad" file on target will still download "bad" file to client

as for this ticket, yes, that's a dupe

Add Comment

Josh Cooper (JIRA)

unread,

Jan 19, 2018, 6:34:02 PM1/19/18

to puppe...@googlegroups.com

Josh Cooper commented on

PUP-8300

Re: File resource can't handle HTTP redirects when server does not accept HEAD requests

The 4 requests are due to a bug. We first make a HEAD request to get the content-md5 or last-modified "checksums". Note the server must return one of these headers in order for puppet to not download the file every time it runs. Puppet also doesn't support ETag.

For reasons unknown the code would make 2 HEAD requests. I eliminated the second request in https://github.com/puppetlabs/puppet/commit/deac457fe305b487614eb280fe9e67d518d13d97.

If the destination file is not "insync" (it's missing, "checksums" don't match, or the server didn't send a checksum), then we try to download the file contents. We make another HEAD request apparently to see if we should be redirected, and then issue the GET. I don't believe this HEAD request is necessary, and should probably be eliminated. Seems more correct to just issue the GET request, and follow the redirect(s), if any.

Whenever puppet downloads a file, it should write to a tempfile, and verify the downloaded content matches the expected checksum. Note if the http server doesn't send the content-md5 header, then we have to fall back to last-modified, which won't ensure the downloaded file checksum is correct.

So for this ticket it sounds like there are two things:

Eliminate the one extra HEAD request
Improve documentation that the server must reply with Content-MD5 or Last-Modified in the HEAD response so that the agent doesn't download the file on every run.

And we can keep PUP-6380 open for how to handle servers like S3 that don't allow HEAD requests. Does that make sense?

Add Comment

Mariusz Gronczewski (JIRA)

unread,

Jan 21, 2018, 9:43:02 AM1/21/18

to puppe...@googlegroups.com

Mariusz Gronczewski commented on

PUP-8300

Re: File resource can't handle HTTP redirects when server does not accept HEAD requests

Note that if you already know checksum of file does not match you don't need to make any HEAD requests - as you already know you need to download new version of the file. You only need to do HEAD for resources without checksum.

And in that case you could probably just do GET with if-modified-since header but that would be a problem if server does not support it (.. if that is even a problem ? I'm not sure, I'd imagine most do)

Whenever puppet downloads a file, it should write to a tempfile, and verify the downloaded content matches the expected checksum. Note if the http server doesn't send the content-md5 header, then we have to fall back to last-modified, which won't ensure the downloaded file checksum is correct.

At that point, just fail the resource with sensible error. That is horrible security disaster waiting to happen. Just think about it:

User changes MD5 hash (which is currently broken) into something secure like sha256. Effect: any attacker can serve any file under that URL because sha256 is silently ignored when downloading file and only indication is same resource running over and over again.
Software on web server changes to something that doesn't serve MD5 header - effect same as above, checksum silently ignored.
Everything works, MD5 hash is checked - still vulnerable as causing MD5 collision is pretty trivial with current computing power.

With the current implementation checksum + http(s) is utterly pointless.

It should either:

checksum tmp file before moving
checksum while downloading - most (all?) checksum algorightms support incremental checksumming and that way will probably be a bit faster for big files.