Jira (PUP-9299) Observe the charset parameter of Content-Type response header

2 views
Skip to first unread message

Josh Cooper (JIRA)

unread,
Nov 1, 2018, 6:33:02 PM11/1/18
to puppe...@googlegroups.com
Josh Cooper created an issue
 
Puppet / Bug PUP-9299
Observe the charset parameter of Content-Type response header
Issue Type: Bug Bug
Assignee: Unassigned
Created: 2018/11/01 3:32 PM
Priority: Normal Normal
Reporter: Josh Cooper

Puppet's http implementation always sets the response body encoding to Encoding::BINARY. This is fine when downloading file content application/octet-stream, but not good for plain/text as we can't assume UTF-8. For example, if you append the Mozilla CA cert bundle to puppetserver's CA bundle, then it will contain CA certs with non USASCII comments like:

NetLock Arany (Class Gold) Főtanúsítvány
========================================
-----BEGIN CERTIFICATE-----
MIIEFTCCAv2gAwIBAgIGSUEs5AAQMA0GCSqGSIb3DQEBCwUAMIGnMQswCQYDVQQGEwJIVTERMA8G
A1UEBwwIQnVkYXBlc3QxFTATBgNVBAoMDE5ldExvY2sgS2Z0LjE3MDUGA1UECwwuVGFuw7pzw610
dsOhbnlraWFkw7NrIChDZXJ0aWZpY2F0aW9uIFNlcnZpY2VzKTE1MDMGA1UEAwwsTmV0TG9jayBB
cmFueSAoQ2xhc3MgR29sZCkgRsWRdGFuw7pzw610dsOhbnkwHhcNMDgxMjExMTUwODIxWhcNMjgx
MjA2MTUwODIxWjCBpzELMAkGA1UEBhMCSFUxETAPBgNVBAcMCEJ1ZGFwZXN0MRUwEwYDVQQKDAxO
ZXRMb2NrIEtmdC4xNzA1BgNVBAsMLlRhbsO6c8OtdHbDoW55a2lhZMOzayAoQ2VydGlmaWNhdGlv
biBTZXJ2aWNlcykxNTAzBgNVBAMMLE5ldExvY2sgQXJhbnkgKENsYXNzIEdvbGQpIEbFkXRhbsO6
c8OtdHbDoW55MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEAxCRec75LbRTDofTjl5Bu
0jBFHjzuZ9lk4BqKf8owyoPjIMHj9DrTlF8afFttvzBPhCf2nx9JvMaZCpDyD/V/Q4Q3Y1GLeqVw
/HpYzY6b7cNGbIRwXdrzAZAj/E4wqX7hJ2Pn7WQ8oLjJM2P+FpD/sLj916jAwJRDC7bVWaaeVtAk
H3B5r9s5VA1lddkVQZQBr17s9o3x/61k/iCa11zr/qYfCGSji3ZVrR47KGAuhyXoqq8fxmRGILdw
fzzeSNuWU7c5d+Qa4scWhHaXWy+7GRWF+GmF9ZmnqfI0p6m2pgP8b4Y9VHx2BJtr+UBdADTHLpl1
neWIA6pN+APSQnbAGwIDAKiLo0UwQzASBgNVHRMBAf8ECDAGAQH/AgEEMA4GA1UdDwEB/wQEAwIB
BjAdBgNVHQ4EFgQUzPpnk/C2uNClwB7zU/2MU9+D15YwDQYJKoZIhvcNAQELBQADggEBAKt/7hwW
qZw8UQCgwBEIBaeZ5m8BiFRhbvG5GK1Krf6BQCOUL/t1fC8oS2IkgYIL9WHxHG64YTjrgfpioTta
YtOUZcTh5m2C+C8lcLIhJsFyUR+MLMOEkMNaj7rP9KdlpeuY0fsFskZ1FSNqb4VjMIDw1Z4fKRzC
bLBQWV2QWzuoDTDPv31/zvGdg73JRm4gpvlhUbohL3u+pRVjodSVh/GeufOJ8z2FuLjbvrW5Kfna
NwUASZQDhETnv0Mxz3WLJdH0pmT1kvarBes96aULNmLazAZfNou2XjG4Kvte9nHfRCaexOYNkbQu
dZWAUWpLMKawYqGT8ZvYzsRjdT9ZR7E=
-----END CERTIFICATE-----

puppetserver's certificate REST API will set the charset to ISO-8859-1:

<- "GET /puppet-ca/v1/certificate/ca HTTP/1.1\r\nAccept: text/plain\r\nAccept-Encoding: gzip;q=1.0,deflate;q=0.6,identity;q=0.3\r\nUser-Agent: Puppet/6.0.5 Ruby/2.5.1-p57 (x86_64-darwin15)\r\nConnection: close\r\nHost: localhost:8140\r\n\r\n"
-> "HTTP/1.1 200 OK\r\n"
-> "Connection: close\r\n"
-> "Date: Thu, 01 Nov 2018 22:15:23 GMT\r\n"
-> "Content-Type: text/plain;charset=iso-8859-1\r\n"

We end up calling String#scan on the binary string, which amazingly works:

delimiters = /-----BEGIN CERTIFICATE-----.*?-----END CERTIFICATE-----/m
certs = bundle_string.scan(delimiters)

But if we try to transcode to UTF-8, then it will fail:

 

> bundle_string.encode('UTF-8')

      • Encoding::UndefinedConversionError Exception: "\xFA" from ASCII-8BIT to UTF-8

         

For this to work properly, the Puppet::Network::HTTP::Compression.uncompress_body methods for Active (gzip, zlib, identity), and None should:

#. Extract the charset parameter of the Content-Type response header (if present)
#. Map the charset to a ruby encoding
#. Call String#force_encoding(<encoding>) on the response body
#. Fallback to Encoding::Binary if there isn't a specified charset.

This is a follow up to the work done in PUP-7251

/cc Maggie Dreyer, Justin Stoller

Add Comment Add Comment
 
This message was sent by Atlassian JIRA (v7.7.1#77002-sha1:e75ca93)
Atlassian logo

Josh Cooper (JIRA)

unread,
Nov 1, 2018, 6:34:05 PM11/1/18
to puppe...@googlegroups.com
Josh Cooper updated an issue
Change By: Josh Cooper
Puppet's http implementation always sets the response body encoding to {{Encoding::BINARY}}. This is fine when downloading file content {{application/octet-stream}}, but not good for {{plain/text}} as we can't assume {{UTF-8}}. For example, if you append the [Mozilla CA cert bundle|https://curl.haxx.se/ca/cacert-2018-10-17.pem] to puppetserver's CA bundle, then it will contain CA certs with non USASCII comments like:

{noformat}

NetLock Arany (Class Gold) Főtanúsítvány
========================================
-----BEGIN CERTIFICATE-----
MIIEFTCCAv2gAwIBAgIGSUEs5AAQMA0GCSqGSIb3DQEBCwUAMIGnMQswCQYDVQQGEwJIVTERMA8G
A1UEBwwIQnVkYXBlc3QxFTATBgNVBAoMDE5ldExvY2sgS2Z0LjE3MDUGA1UECwwuVGFuw7pzw610
dsOhbnlraWFkw7NrIChDZXJ0aWZpY2F0aW9uIFNlcnZpY2VzKTE1MDMGA1UEAwwsTmV0TG9jayBB
cmFueSAoQ2xhc3MgR29sZCkgRsWRdGFuw7pzw610dsOhbnkwHhcNMDgxMjExMTUwODIxWhcNMjgx
MjA2MTUwODIxWjCBpzELMAkGA1UEBhMCSFUxETAPBgNVBAcMCEJ1ZGFwZXN0MRUwEwYDVQQKDAxO
ZXRMb2NrIEtmdC4xNzA1BgNVBAsMLlRhbsO6c8OtdHbDoW55a2lhZMOzayAoQ2VydGlmaWNhdGlv
biBTZXJ2aWNlcykxNTAzBgNVBAMMLE5ldExvY2sgQXJhbnkgKENsYXNzIEdvbGQpIEbFkXRhbsO6
c8OtdHbDoW55MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEAxCRec75LbRTDofTjl5Bu
0jBFHjzuZ9lk4BqKf8owyoPjIMHj9DrTlF8afFttvzBPhCf2nx9JvMaZCpDyD/V/Q4Q3Y1GLeqVw
/HpYzY6b7cNGbIRwXdrzAZAj/E4wqX7hJ2Pn7WQ8oLjJM2P+FpD/sLj916jAwJRDC7bVWaaeVtAk
H3B5r9s5VA1lddkVQZQBr17s9o3x/61k/iCa11zr/qYfCGSji3ZVrR47KGAuhyXoqq8fxmRGILdw
fzzeSNuWU7c5d+Qa4scWhHaXWy+7GRWF+GmF9ZmnqfI0p6m2pgP8b4Y9VHx2BJtr+UBdADTHLpl1
neWIA6pN+APSQnbAGwIDAKiLo0UwQzASBgNVHRMBAf8ECDAGAQH/AgEEMA4GA1UdDwEB/wQEAwIB
BjAdBgNVHQ4EFgQUzPpnk/C2uNClwB7zU/2MU9+D15YwDQYJKoZIhvcNAQELBQADggEBAKt/7hwW
qZw8UQCgwBEIBaeZ5m8BiFRhbvG5GK1Krf6BQCOUL/t1fC8oS2IkgYIL9WHxHG64YTjrgfpioTta
YtOUZcTh5m2C+C8lcLIhJsFyUR+MLMOEkMNaj7rP9KdlpeuY0fsFskZ1FSNqb4VjMIDw1Z4fKRzC
bLBQWV2QWzuoDTDPv31/zvGdg73JRm4gpvlhUbohL3u+pRVjodSVh/GeufOJ8z2FuLjbvrW5Kfna
NwUASZQDhETnv0Mxz3WLJdH0pmT1kvarBes96aULNmLazAZfNou2XjG4Kvte9nHfRCaexOYNkbQu
dZWAUWpLMKawYqGT8ZvYzsRjdT9ZR7E=
-----END CERTIFICATE-----
{noformat}


puppetserver's certificate REST API will set the {{charset}} to {{ISO-8859-1}}:

{noformat}

<- "GET /puppet-ca/v1/certificate/ca HTTP/1.1\r\nAccept: text/plain\r\nAccept-Encoding: gzip;q=1.0,deflate;q=0.6,identity;q=0.3\r\nUser-Agent: Puppet/6.0.5 Ruby/2.5.1-p57 (x86_64-darwin15)\r\nConnection: close\r\nHost: localhost:8140\r\n\r\n"
-> "HTTP/1.1 200 OK\r\n"
-> "Connection: close\r\n"
-> "Date: Thu, 01 Nov 2018 22:15:23 GMT\r\n"
-> "Content-Type: text/plain;charset=iso-8859-1\r\n"
{noformat}


We end up calling {{String#scan}} on the binary string, which amazingly works:

{code:ruby}

delimiters = /-----BEGIN CERTIFICATE-----.*?-----END CERTIFICATE-----/m
certs = bundle_string.scan(delimiters)
{code}


But if we try to transcode to UTF-8, then it will fail:

{code:ruby}
> bundle_string.encode('UTF-8')
*** Encoding::UndefinedConversionError Exception: "\xFA" from ASCII-8BIT to UTF-8
{code
:ruby }


For this to work properly, the {{Puppet::Network::HTTP::Compression.uncompress_body}} methods for Active (gzip, zlib, identity), and None should:

# . Extract the {{charset}} parameter of the {{Content-Type}} response header (if present)
#
. Map the {{charset}} to a ruby encoding
#
. Call {{String#force_encoding(<encoding>)}} on the response body
#
. Fallback to {{Encoding::Binary}} if there isn't a specified {{charset}}.


This is a follow up to the work done in PUP-7251

/cc [~maggie], [~justin]

Josh Cooper (JIRA)

unread,
Nov 1, 2018, 6:54:07 PM11/1/18
to puppe...@googlegroups.com

Josh Cooper (JIRA)

unread,
Jan 13, 2020, 8:13:04 PM1/13/20
to puppe...@googlegroups.com
Josh Cooper commented on Bug PUP-9299

I verified the new http client downloads the catalog using gzip compression, and correctly deserializes the catalog containing the unicode characters:

irb(main):001:0> require 'puppet'
irb(main):002:0> Puppet.initialize_settings
irb(main):003:0> Puppet[:http_debug] = true
irb(main):004:0> client = Puppet::HTTP::Client.new
irb(main):005:0> session = client.create_session
irb(main):006:0> puppet = session.route_to(:puppet)
irb(main):007:0> catalog = puppet.get_catalog('localhost', environment: 'production', facts: Puppet::Node::Facts.indirection.find('localhost'))
...
-> "HTTP/1.1 200 OK\r\n"
-> "Date: Tue, 14 Jan 2020 01:10:39 GMT\r\n"
-> "Content-Type: application/vnd.puppet.rich+json; charset=utf-8\r\n"
-> "X-Puppet-Version: 6.11.1\r\n"
-> "Vary: Accept-Encoding, User-Agent\r\n"
-> "Content-Encoding: gzip\r\n"
-> "Content-Length: 440\r\n"
-> "\r\n"
...
irb(main):008:0> catalog.resources[4]
=> File[/tmp/cert静硤.pem]{:path=>"/tmp/cert静硤.pem", :ensure=>"file", :source=>"puppet:///modules/foo/cert.pem"}

Given that, and the controversy on the ruby ticket, I'm going to close this.

Reply all
Reply to author
Forward
0 new messages