Jira (PUP-10098) String.new without formater does not convert ruby binary string to puppet string

9 views
Skip to first unread message

Reinhard Vicinus (JIRA)

unread,
Oct 10, 2019, 11:39:05 AM10/10/19
to puppe...@googlegroups.com
Reinhard Vicinus created an issue
 
Puppet / Bug PUP-10098
String.new without formater does not convert ruby binary string to puppet string
Issue Type: Bug Bug
Assignee: Unassigned
Created: 2019/10/10 8:38 AM
Priority: Minor Minor
Reporter: Reinhard Vicinus

Puppet Version: 6.10
Puppet Server Version: 6.10
OS Name/Version: Ubuntu 18.04

I would expect that both String.new($binary_string) and String.new($binary_string, '%s') convert the binary string to a puppet string. But the first one does not. Here is an example:

$x = { 'y' => 'test'}
# This is a really ugly hack to create ASCII-8BIT encoded data:
inline_template('<% @x["y"] = "test".force_encoding("ASCII-8BIT")  %>')
$v = $x['y']
 
$converts_string = String.new($v, '%s')
$does_not_convert_string = String.new($v)
 
notice("v type: ${inline_template('<%= @v.encoding %>')}")
notice("converts_string type: ${inline_template('<%= @converts_string.encoding %>')}")
notice("does_not_convert_string type: ${inline_template('<%= @does_not_convert_string.encoding %>')}")

Result with puppet apply:

> puppet apply test.pp 
Notice: Scope(Class[main]): v type: ASCII-8BIT
Notice: Scope(Class[main]): converts_string type: UTF-8
Notice: Scope(Class[main]): does_not_convert_string type: ASCII-8BIT
Notice: Compiled catalog for node1 in environment production in 0.13 seconds
Notice: Applied catalog in 0.18 seconds

 

Add Comment Add Comment
 
This message was sent by Atlassian JIRA (v7.7.1#77002-sha1:e75ca93)
Atlassian logo

Josh Cooper (JIRA)

unread,
Oct 10, 2019, 2:25:04 PM10/10/19
to puppe...@googlegroups.com
Josh Cooper commented on Bug PUP-10098
 
Re: String.new without formater does not convert ruby binary string to puppet string

Henrik Lindberg mentioned:

It would be great to file another PUP ticket for the String new with / without %s - both of them should result in an UTF-8 encoded String as that is expected of all strings in the puppet language.

But based on https://puppet.com/docs/puppet/latest/function.html#creating-a-binary, the %s format string converts as:

The data is a puppet string. The string must be valid UTF-8, or convertible to UTF-8 or an error is raised.

If given a binary ruby string, it is possible that the string matches the start of a valid UTF-8 sequence. For example remember all the problems we had with the CET timezone when localized to German:

irb(main):031:0> garbage = [77, 105, 116, 116, 101, 108, 101, 117, 114, 111, 112, 195]
=> [77, 105, 116, 116, 101, 108, 101, 117, 114, 111, 112, 195]
irb(main):032:0> str = garbage.pack("C*")
=> "Mitteleurop\xC3"
irb(main):033:0> str.force_encoding('UTF-8')
=> "Mitteleurop\xC3"
irb(main):034:0> str.valid_encoding?
=> false
irb(main):035:0> str.encode('UTF-8')
=> "Mitteleurop\xC3"

So ruby claims encode to UTF-8 was successful, but that's because the encoding label matches, and it short-circuits. If you actually try to perform string manipulation, it will fail mysteriously sometime later:

irb(main):038:0> str =~ /c/
ArgumentError: invalid byte sequence in UTF-8

My preference would be for String.new(..) and String.new(.., '%s') to both reject binary ruby strings, even if they happen to match a valid UTF-8 encoded string. A more lenient approach would be to force encode as UTF-8, but check that the resulting string is String.valid_encoding? and likely warn?

Henrik Lindberg (JIRA)

unread,
Oct 10, 2019, 5:38:02 PM10/10/19
to puppe...@googlegroups.com

Josh Cooper You referenced Binary.new and that %s rule is not really relevant here - that is for creating a Binary from a String, not creating a String. If there is something wrong with the Binary %s conversion then that is another issue.

What is going on with String vs String(x, %s) is that the function new() starts by checking if there is only one argument and that argument is an instance of the data type in question. That short circuits the call so that the type specific function is never consulted. That is slightly different than if a format is given as that makes new() relay the call.

If we remove the optimization we will lose some performance as the call is not cheap. Hard to say how much. Can probably be optimized in other ways

If the call would have been relayed to the String specific new() function- it would have ended up with a %s default which in turn calls Kernel.format (which takes place in StringConverter.string_PStringType().

Rob Braden (JIRA)

unread,
Oct 14, 2019, 1:03:04 PM10/14/19
to puppe...@googlegroups.com

Rob Braden (JIRA)

unread,
Oct 14, 2019, 1:03:04 PM10/14/19
to puppe...@googlegroups.com

Josh Cooper (JIRA)

unread,
Oct 31, 2019, 5:51:03 PM10/31/19
to puppe...@googlegroups.com
Reply all
Reply to author
Forward
0 new messages