Jira (PUP-10098) String.new without formater does not convert ruby binary string to puppet string

Reinhard Vicinus (JIRA)

unread,

Oct 10, 2019, 11:39:05 AM10/10/19

to puppe...@googlegroups.com

Reinhard Vicinus created an issue

Puppet /

PUP-10098

String.new without formater does not convert ruby binary string to puppet string

Issue Type:	Bug
Assignee:	Unassigned
Created:	2019/10/10 8:38 AM
Priority:	Minor
Reporter:	Reinhard Vicinus

Puppet Version: 6.10
Puppet Server Version: 6.10
OS Name/Version: Ubuntu 18.04

I would expect that both String.new($binary_string) and String.new($binary_string, '%s') convert the binary string to a puppet string. But the first one does not. Here is an example:

$x = { 'y' => 'test'}

# This is a really ugly hack to create ASCII-8BIT encoded data:

inline_template('<% @x["y"] = "test".force_encoding("ASCII-8BIT")  %>')

$v = $x['y']

$converts_string = String.new($v, '%s')

$does_not_convert_string = String.new($v)

notice("v type: ${inline_template('<%= @v.encoding %>')}")

notice("converts_string type: ${inline_template('<%= @converts_string.encoding %>')}")

notice("does_not_convert_string type: ${inline_template('<%= @does_not_convert_string.encoding %>')}")

Result with puppet apply:

> puppet apply test.pp

Notice: Scope(Class[main]): v type: ASCII-8BIT

Notice: Scope(Class[main]): converts_string type: UTF-8

Notice: Scope(Class[main]): does_not_convert_string type: ASCII-8BIT

Notice: Compiled catalog for node1 in environment production in 0.13 seconds

Notice: Applied catalog in 0.18 seconds

Add Comment

This message was sent by Atlassian JIRA (v7.7.1#77002-sha1:e75ca93)

Josh Cooper (JIRA)

unread,

Oct 10, 2019, 2:25:04 PM10/10/19

to puppe...@googlegroups.com

Josh Cooper commented on

PUP-10098

Re: String.new without formater does not convert ruby binary string to puppet string

Henrik Lindberg mentioned:

It would be great to file another PUP ticket for the String new with / without %s - both of them should result in an UTF-8 encoded String as that is expected of all strings in the puppet language.

But based on https://puppet.com/docs/puppet/latest/function.html#creating-a-binary, the %s format string converts as:

The data is a puppet string. The string must be valid UTF-8, or convertible to UTF-8 or an error is raised.

If given a binary ruby string, it is possible that the string matches the start of a valid UTF-8 sequence. For example remember all the problems we had with the CET timezone when localized to German:

irb(main):031:0> garbage = [77, 105, 116, 116, 101, 108, 101, 117, 114, 111, 112, 195]
=> [77, 105, 116, 116, 101, 108, 101, 117, 114, 111, 112, 195]
irb(main):032:0> str = garbage.pack("C*")
=> "Mitteleurop\xC3"
irb(main):033:0> str.force_encoding('UTF-8')
=> "Mitteleurop\xC3"
irb(main):034:0> str.valid_encoding?
=> false
irb(main):035:0> str.encode('UTF-8')
=> "Mitteleurop\xC3"

So ruby claims encode to UTF-8 was successful, but that's because the encoding label matches, and it short-circuits. If you actually try to perform string manipulation, it will fail mysteriously sometime later:

irb(main):038:0> str =~ /c/

ArgumentError: invalid byte sequence in UTF-8

My preference would be for String.new(..) and String.new(.., '%s') to both reject binary ruby strings, even if they happen to match a valid UTF-8 encoded string. A more lenient approach would be to force encode as UTF-8, but check that the resulting string is String.valid_encoding? and likely warn?

Add Comment

Henrik Lindberg (JIRA)

unread,

Oct 10, 2019, 5:38:02 PM10/10/19

to puppe...@googlegroups.com

Henrik Lindberg commented on

PUP-10098

Re: String.new without formater does not convert ruby binary string to puppet string

Josh Cooper You referenced Binary.new and that %s rule is not really relevant here - that is for creating a Binary from a String, not creating a String. If there is something wrong with the Binary %s conversion then that is another issue.

What is going on with String vs String(x, %s) is that the function new() starts by checking if there is only one argument and that argument is an instance of the data type in question. That short circuits the call so that the type specific function is never consulted. That is slightly different than if a format is given as that makes new() relay the call.

If we remove the optimization we will lose some performance as the call is not cheap. Hard to say how much. Can probably be optimized in other ways

If the call would have been relayed to the String specific new() function- it would have ended up with a %s default which in turn calls Kernel.format (which takes place in StringConverter.string_PStringType().