| Puppet's http implementation always sets the response body encoding to Encoding::BINARY. This is fine when downloading file content application/octet-stream, but not good for plain/text as we can't assume UTF-8. For example, if you append the Mozilla CA cert bundle to puppetserver's CA bundle, then it will contain CA certs with non USASCII comments like:
NetLock Arany (Class Gold) Főtanúsítvány |
======================================== |
-----BEGIN CERTIFICATE----- |
MIIEFTCCAv2gAwIBAgIGSUEs5AAQMA0GCSqGSIb3DQEBCwUAMIGnMQswCQYDVQQGEwJIVTERMA8G |
A1UEBwwIQnVkYXBlc3QxFTATBgNVBAoMDE5ldExvY2sgS2Z0LjE3MDUGA1UECwwuVGFuw7pzw610 |
dsOhbnlraWFkw7NrIChDZXJ0aWZpY2F0aW9uIFNlcnZpY2VzKTE1MDMGA1UEAwwsTmV0TG9jayBB |
cmFueSAoQ2xhc3MgR29sZCkgRsWRdGFuw7pzw610dsOhbnkwHhcNMDgxMjExMTUwODIxWhcNMjgx |
MjA2MTUwODIxWjCBpzELMAkGA1UEBhMCSFUxETAPBgNVBAcMCEJ1ZGFwZXN0MRUwEwYDVQQKDAxO |
ZXRMb2NrIEtmdC4xNzA1BgNVBAsMLlRhbsO6c8OtdHbDoW55a2lhZMOzayAoQ2VydGlmaWNhdGlv |
biBTZXJ2aWNlcykxNTAzBgNVBAMMLE5ldExvY2sgQXJhbnkgKENsYXNzIEdvbGQpIEbFkXRhbsO6 |
c8OtdHbDoW55MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEAxCRec75LbRTDofTjl5Bu |
0jBFHjzuZ9lk4BqKf8owyoPjIMHj9DrTlF8afFttvzBPhCf2nx9JvMaZCpDyD/V/Q4Q3Y1GLeqVw |
/HpYzY6b7cNGbIRwXdrzAZAj/E4wqX7hJ2Pn7WQ8oLjJM2P+FpD/sLj916jAwJRDC7bVWaaeVtAk |
H3B5r9s5VA1lddkVQZQBr17s9o3x/61k/iCa11zr/qYfCGSji3ZVrR47KGAuhyXoqq8fxmRGILdw |
fzzeSNuWU7c5d+Qa4scWhHaXWy+7GRWF+GmF9ZmnqfI0p6m2pgP8b4Y9VHx2BJtr+UBdADTHLpl1 |
neWIA6pN+APSQnbAGwIDAKiLo0UwQzASBgNVHRMBAf8ECDAGAQH/AgEEMA4GA1UdDwEB/wQEAwIB |
BjAdBgNVHQ4EFgQUzPpnk/C2uNClwB7zU/2MU9+D15YwDQYJKoZIhvcNAQELBQADggEBAKt/7hwW |
qZw8UQCgwBEIBaeZ5m8BiFRhbvG5GK1Krf6BQCOUL/t1fC8oS2IkgYIL9WHxHG64YTjrgfpioTta |
YtOUZcTh5m2C+C8lcLIhJsFyUR+MLMOEkMNaj7rP9KdlpeuY0fsFskZ1FSNqb4VjMIDw1Z4fKRzC |
bLBQWV2QWzuoDTDPv31/zvGdg73JRm4gpvlhUbohL3u+pRVjodSVh/GeufOJ8z2FuLjbvrW5Kfna |
NwUASZQDhETnv0Mxz3WLJdH0pmT1kvarBes96aULNmLazAZfNou2XjG4Kvte9nHfRCaexOYNkbQu |
dZWAUWpLMKawYqGT8ZvYzsRjdT9ZR7E= |
-----END CERTIFICATE-----
|
puppetserver's certificate REST API will set the charset to ISO-8859-1:
<- "GET /puppet-ca/v1/certificate/ca HTTP/1.1\r\nAccept: text/plain\r\nAccept-Encoding: gzip;q=1.0,deflate;q=0.6,identity;q=0.3\r\nUser-Agent: Puppet/6.0.5 Ruby/2.5.1-p57 (x86_64-darwin15)\r\nConnection: close\r\nHost: localhost:8140\r\n\r\n" |
-> "HTTP/1.1 200 OK\r\n" |
-> "Connection: close\r\n" |
-> "Date: Thu, 01 Nov 2018 22:15:23 GMT\r\n" |
-> "Content-Type: text/plain;charset=iso-8859-1\r\n"
|
We end up calling String#scan on the binary string, which amazingly works:
delimiters = /-----BEGIN CERTIFICATE-----.*?-----END CERTIFICATE-----/m |
certs = bundle_string.scan(delimiters)
|
But if we try to transcode to UTF-8, then it will fail:
> bundle_string.encode('UTF-8')
-
-
- Encoding::UndefinedConversionError Exception: "\xFA" from ASCII-8BIT to UTF-8
For this to work properly, the Puppet::Network::HTTP::Compression.uncompress_body methods for Active (gzip, zlib, identity), and None should: #. Extract the charset parameter of the Content-Type response header (if present) #. Map the charset to a ruby encoding #. Call String#force_encoding(<encoding>) on the response body #. Fallback to Encoding::Binary if there isn't a specified charset. This is a follow up to the work done in PUP-7251 /cc Maggie Dreyer, Justin Stoller |