| Henrik Lindberg mentioned:
It would be great to file another PUP ticket for the String new with / without %s - both of them should result in an UTF-8 encoded String as that is expected of all strings in the puppet language.
But based on https://puppet.com/docs/puppet/latest/function.html#creating-a-binary, the %s format string converts as:
The data is a puppet string. The string must be valid UTF-8, or convertible to UTF-8 or an error is raised.
If given a binary ruby string, it is possible that the string matches the start of a valid UTF-8 sequence. For example remember all the problems we had with the CET timezone when localized to German:
irb(main):031:0> garbage = [77, 105, 116, 116, 101, 108, 101, 117, 114, 111, 112, 195] => [77, 105, 116, 116, 101, 108, 101, 117, 114, 111, 112, 195] irb(main):032:0> str = garbage.pack("C*") => "Mitteleurop\xC3" irb(main):033:0> str.force_encoding('UTF-8') => "Mitteleurop\xC3" irb(main):034:0> str.valid_encoding? => false irb(main):035:0> str.encode('UTF-8') => "Mitteleurop\xC3"
So ruby claims encode to UTF-8 was successful, but that's because the encoding label matches, and it short-circuits. If you actually try to perform string manipulation, it will fail mysteriously sometime later:
irb(main):038:0> str =~ /c/ |
ArgumentError: invalid byte sequence in UTF-8
|
My preference would be for String.new(..) and String.new(.., '%s') to both reject binary ruby strings, even if they happen to match a valid UTF-8 encoded string. A more lenient approach would be to force encode as UTF-8, but check that the resulting string is String.valid_encoding? and likely warn? |