Jira (FACT-1902) Confirm Facter 4 validates that external/custom/executable facts output proper UTF-8

44 views
Skip to first unread message

Morgan Rhodes (Jira)

unread,
Sep 13, 2022, 4:31:02 PM9/13/22
to puppe...@googlegroups.com
Morgan Rhodes updated an issue
 
Facter / Improvement FACT-1902
Confirm Facter 4 validates that external/custom/executable facts output proper UTF-8
Change By: Morgan Rhodes
Summary: Confirm Facter 3 should validate 4 validates that external/custom/executable facts output proper UTF-8
Add Comment Add Comment
 
This message was sent by Atlassian Jira (v8.20.11#820011-sha1:0629dd8)
Atlassian logo

Morgan Rhodes (Jira)

unread,
Sep 13, 2022, 4:32:02 PM9/13/22
to puppe...@googlegroups.com
Morgan Rhodes commented on Improvement FACT-1902
 
Re: Confirm Facter 4 validates that external/custom/executable facts output proper UTF-8

Updated title to reflect that we should confirm this is the behavior for facter4

Josh Cooper (Jira)

unread,
Sep 14, 2022, 8:15:02 PM9/14/22
to puppe...@googlegroups.com
Josh Cooper updated an issue
 
Change By: Josh Cooper
Acceptance Criteria: Verify the following behavior with Facter 4 and add unit tests if they are missing.

1. If a custom, external data (ini/json/yaml) or external executable fact emits a string whose byte sequence is not a valid UTF-8 encoding, then facter should substitute those bytes with the Unicode replacement character (U+FFFD � )
2. If a custom, extdrnal data or executable fact emits a valid UTF-8 string containing an embedded null byte,then facter should do whatever facter 3 did, for example:

{code:ruby}
Facter.add(:nullbyte) do
  setcode { "a\0b" }
end
{code}

Morgan Rhodes (Jira)

unread,
Sep 20, 2022, 4:31:03 PM9/20/22
to puppe...@googlegroups.com

Alvin Rodis (Jira)

unread,
Sep 29, 2022, 10:06:02 AM9/29/22
to puppe...@googlegroups.com
Alvin Rodis updated an issue
Change By: Alvin Rodis
Zendesk Ticket Count: 3 4
Zendesk Ticket IDs: 45908,45956,48390 ,49408

David Piekny (Jira)

unread,
Oct 20, 2022, 1:29:04 PM10/20/22
to puppe...@googlegroups.com

Charmaine Pritchett (Jira)

unread,
Feb 9, 2023, 10:51:01 PM2/9/23
to puppe...@googlegroups.com
Charmaine Pritchett updated an issue
Change By: Charmaine Pritchett
Zendesk Ticket Count: 4 5
Zendesk Ticket IDs: 45908,45956,48390,49408 ,51041

Josh Cooper (Jira)

unread,
Mar 21, 2023, 10:25:01 PM3/21/23
to puppe...@googlegroups.com
Josh Cooper commented on Improvement FACT-1902
 
Re: Confirm Facter 4 validates that external/custom/executable facts output proper UTF-8

Facter 4:

$ bx facter --custom-dir ./custom_facts -j nullbyte
{
  "nullbyte": "a\u0000b"
}

Josh Cooper (Jira)

unread,
Mar 22, 2023, 12:52:02 PM3/22/23
to puppe...@googlegroups.com

Josh Cooper (Jira)

unread,
Mar 23, 2023, 6:22:02 PM3/23/23
to puppe...@googlegroups.com
Josh Cooper commented on Improvement FACT-1902
 
Re: Confirm Facter 4 validates that external/custom/executable facts output proper UTF-8

Facter 3 is the same:

# rpm -qa puppet-agent
puppet-agent-6.28.0-1.el7.x86_64
# /opt/puppetlabs/puppet/bin/facter --version
3.14.24 (commit 91ed8a2de5c9d686345859fe12ea2914415758f0)
# /opt/puppetlabs/puppet/bin/facter -j --custom-dir custom nullbyte
{
  "nullbyte": "a\u0000b"
}
 

Josh Cooper (Jira)

unread,
Apr 25, 2023, 12:30:01 PM4/25/23
to puppe...@googlegroups.com
Josh Cooper updated an issue
Change By: Josh Cooper
Modern versions of Puppet require that the data they serialize to JSON is proper UTF-8. When Since facter collects data from different external sources, it's possible for facter data to be incorrectly encoded. Examples include:
* Unicode code points are encoded as a UTF-16LE byte sequence, but the string's "encoding" method returns UTF-8 (Windows Registry)
* String contains binary data, but "encoding" returns UTF-8 (EC2 userdata)
* String contains the start of a valid multibyte UTF-8 sequnce, e.g. 

hen
facts have an incorrect encoding (either the encoding is mislabeled/doesn't match the  match the underlying byte sequence or the byte sequence , this currently does not raise an error until it is serialized, at which point it is far too late, and the error message is not helpful.

Instead, Facter itself should
raise an error about this, indicating encode the specific fact which returned bad data as UTF-8, replacing invalid byte sequences with the unicode replacement character . This will provide better context And issue a warning for the fact key or value for debugging.

Josh Cooper (Jira)

unread,
Apr 25, 2023, 12:36:01 PM4/25/23
to puppe...@googlegroups.com
Josh Cooper updated an issue
Modern versions of Puppet require that the data they serialize to JSON is proper UTF-8. Since facter collects data from different external sources, it's possible for facter data to be incorrectly encoded. Examples include:
*
Unicode code points are encoded as String contains a valid UTF-16LE byte sequence, but the string's "encoding" method returns UTF-8 (Windows Registry)

* String contains binary data, but "encoding" returns UTF-8 (EC2 userdata)
* String contains the start of a valid multibyte UTF-8 sequnce, e.g.
 

hen facts have an incorrect encoding
( either "\xc3\x28")
* String contains embedded nulls. Strictly speaking
the encoding "\u0000" code point is mislabeled/doesn't match the  match the underlying valid and is encoded as a single null byte sequence or the byte sequence , this currently does not raise an error until but it is serialized, at which point it is far too late, surprising and can't be stored in Postgres.
* String was generated by a child process based on
the error message active code page (Windows CP1252), but the output is not helpful. interpreted as UTF-8

Instead, Facter itself 's normalization should encode the ensure:
* All fact
data as contains valid UTF-8 data
* If the string data is not valid
, replacing the invalid byte sequences sequence should be replaced with the unicode replacement character . And issue a warning , so that it is valid
* Same
for embedded null values
* A warning should be generated specifying
the fact key or value for debugging. with invalid data

Josh Cooper (Jira)

unread,
Apr 25, 2023, 12:36:02 PM4/25/23
to puppe...@googlegroups.com

Aria Li (Jira)

unread,
May 31, 2023, 1:28:01 PM5/31/23
to puppe...@googlegroups.com
Aria Li assigned an issue to Aria Li
Change By: Aria Li
Assignee: Aria Li
This message was sent by Atlassian Jira (v8.20.21#820021-sha1:38274c8)
Atlassian logo

Aria Li (Jira)

unread,
May 31, 2023, 5:28:01 PM5/31/23
to puppe...@googlegroups.com
Aria Li updated an issue
Change By: Aria Li
Acceptance Criteria:
Verify the following behavior with Facter 4 and add unit tests if they are missing.

1. If a custom, external data (ini/json/yaml) or external executable fact emits a string whose byte sequence is not a valid UTF-8 encoding, then facter should substitute those bytes with clearly identify the Unicode replacement character (U+FFFD � )
2. If a custom, extdrnal data or executable
source of the fact emits a valid UTF-8 string containing an embedded null byte,then facter should do whatever facter 3 did, for example:


{code:ruby}
Facter.add(:nullbyte) do
  setcode { "a\0b" }
end
{code} so it's easy to identify where it came from

Aria Li (Jira)

unread,
Jun 1, 2023, 12:03:02 PM6/1/23
to puppe...@googlegroups.com

Josh Cooper (Jira)

unread,
Jun 6, 2023, 5:50:02 PM6/6/23
to puppe...@googlegroups.com

Tony Vu (Jira)

unread,
Jun 7, 2023, 1:16:02 PM6/7/23
to puppe...@googlegroups.com
Tony Vu updated an issue
Change By: Tony Vu
Sprint: Phoenix 2023-06-07 , Phoenix 2023-06-21

Tony Vu (Jira)

unread,
Jun 7, 2023, 1:46:02 PM6/7/23
to puppe...@googlegroups.com

Josh Cooper (Jira)

unread,
Jun 8, 2023, 12:28:02 PM6/8/23
to puppe...@googlegroups.com
Josh Cooper updated an issue
Change By: Josh Cooper
Modern versions of Puppet require that the data they serialize to JSON is proper UTF-8. Since facter collects data from different external sources, it's possible for facter data to be incorrectly encoded. Examples include:
* String contains a valid UTF-16LE byte sequence, but the string's "encoding" method returns UTF-8 (Windows Registry)

* String contains binary data, but "encoding" returns UTF-8 (EC2 userdata)
* String contains the start of a valid multibyte UTF-8 sequnce, e.g. ("\xc3\x28")
* String contains embedded nulls. Strictly speaking the "\u0000" code point is valid and is encoded as a single null byte, but it is surprising and can't be stored in Postgres.
* String was generated by a child process based on the active code page (Windows CP1252), but the output is interpreted as UTF-8

Facter's normalization should ensure:
* All fact data contains valid UTF-8 data
* If the string data is not valid, the invalid byte sequence should be replaced with the unicode replacement character, so that it is valid

* Same for embedded null values
* A warning should be generated specifying the fact key - or value - with invalid data

Josh Cooper (Jira)

unread,
Jun 8, 2023, 12:28:02 PM6/8/23
to puppe...@googlegroups.com
Josh Cooper updated an issue
Modern versions of Puppet require that the data they serialize to JSON is proper UTF-8. Since facter collects data from different external sources, it's possible for facter data to be incorrectly encoded. Examples include:
* String contains a valid UTF-16LE byte sequence, but the string's "encoding" method returns UTF-8 (Windows Registry)
* String contains binary data, but "encoding" returns UTF-8 (EC2 userdata)
* String contains the start of a valid multibyte UTF-8 sequnce, e.g. ("\xc3\x28")
* String contains embedded nulls. Strictly speaking the "\u0000" code point is valid and is encoded as a single null byte, but it is surprising and can't be stored in Postgres.
* String was generated by a child process based on the active code page (Windows CP1252), but the output is interpreted as UTF-8

Facter's normalization should ensure:
* All fact data contains valid UTF-8 data
* If the string data is not valid, the invalid byte sequence should be replaced with the unicode replacement character, so that it is valid
* Same for embedded null values (different ticket)
* A warning should be generated specifying the fact key -or value- with invalid data

Christopher Thorn (Jira)

unread,
Jun 8, 2023, 6:49:02 PM6/8/23
to puppe...@googlegroups.com
Christopher Thorn updated an issue
Change By: Christopher Thorn
Fix Version/s: FACT 4.4.1
Fix Version/s: FACT 4.4.2

Josh Cooper (Jira)

unread,
Jun 12, 2023, 3:32:02 PM6/12/23
to puppe...@googlegroups.com
Josh Cooper updated an issue
Change By: Josh Cooper
Modern versions of Puppet require that the data they serialize to JSON is proper UTF-8. Since facter collects data from different external sources, it's possible for facter data to be incorrectly encoded. Examples include:
* String contains a valid UTF-16LE byte sequence, but the string's "encoding" method returns UTF-8 (Windows Registry)
* String contains binary data, but "encoding" returns UTF-8 (EC2 userdata)
* String contains the start of a valid multibyte UTF-8 sequnce, e.g. ("\xc3\x28")
* String contains embedded nulls. Strictly speaking the "\u0000" code point is valid and is encoded as a single null byte, but it is surprising and can't be stored in Postgres.
* String was generated by a child process based on the active code page (Windows CP1252), but the output is interpreted as UTF-8

Facter's normalization should ensure:
* All fact data contains valid UTF-8 data
* If the string data is not valid, the invalid byte sequence an error should be replaced with logged stating the unicode replacement character, so custom or external fact that it is valid caused the issue. The fact should be omitted from the fact collection sent to the server and the agent run should continue
* Same for embedded null values (different ticket)
* A warning should be generated specifying the fact key -or value- with invalid data

Josh Cooper (Jira)

unread,
Jun 13, 2023, 12:07:01 PM6/13/23
to puppe...@googlegroups.com

Josh Cooper (Jira)

unread,
Jun 13, 2023, 3:10:03 PM6/13/23
to puppe...@googlegroups.com
Reply all
Reply to author
Forward
0 new messages