Jira (FACT-1902) Confirm Facter 4 validates that external/custom/executable facts output proper UTF-8

Morgan Rhodes (Jira)

unread,

Sep 13, 2022, 4:31:02 PM9/13/22

to puppe...@googlegroups.com

Morgan Rhodes updated an issue

Facter /

FACT-1902

Confirm Facter 4 validates that external/custom/executable facts output proper UTF-8

Change By:	Morgan Rhodes
Summary:	Confirm Facter 3 should validate 4 validates that external/custom/executable facts output proper UTF-8

Add Comment

This message was sent by Atlassian Jira (v8.20.11#820011-sha1:0629dd8)

Morgan Rhodes (Jira)

unread,

Sep 13, 2022, 4:32:02 PM9/13/22

to puppe...@googlegroups.com

Morgan Rhodes commented on

FACT-1902

Re: Confirm Facter 4 validates that external/custom/executable facts output proper UTF-8

Updated title to reflect that we should confirm this is the behavior for facter4

Add Comment

Josh Cooper (Jira)

unread,

Sep 14, 2022, 8:15:02 PM9/14/22

to puppe...@googlegroups.com

Josh Cooper updated an issue

Facter /

FACT-1902

Confirm Facter 4 validates that external/custom/executable facts output proper UTF-8

Change By:

Josh Cooper

Acceptance Criteria:

Verify the following behavior with Facter 4 and add unit tests if they are missing.

1. If a custom, external data (ini/json/yaml) or external executable fact emits a string whose byte sequence is not a valid UTF-8 encoding, then facter should substitute those bytes with the Unicode replacement character (U+FFFD � )
2. If a custom, extdrnal data or executable fact emits a valid UTF-8 string containing an embedded null byte,then facter should do whatever facter 3 did, for example:

{code:ruby}
Facter.add(:nullbyte) do
setcode { "a\0b" }
end
{code}

Add Comment

Morgan Rhodes (Jira)

unread,

Sep 20, 2022, 4:31:03 PM9/20/22

to puppe...@googlegroups.com

Morgan Rhodes updated an issue

Facter /

FACT-1902

Confirm Facter 4 validates that external/custom/executable facts output proper UTF-8

Change By:	Morgan Rhodes
Epic Link:	PUP-11619

Add Comment

Alvin Rodis (Jira)

unread,

Sep 29, 2022, 10:06:02 AM9/29/22

to puppe...@googlegroups.com

Alvin Rodis updated an issue

Facter /

FACT-1902

Confirm Facter 4 validates that external/custom/executable facts output proper UTF-8

Change By:	Alvin Rodis
Zendesk Ticket Count:	3 4
Zendesk Ticket IDs:	45908,45956,48390 ,49408

Add Comment

David Piekny (Jira)

unread,

Oct 20, 2022, 1:29:04 PM10/20/22

to puppe...@googlegroups.com

David Piekny updated an issue

Facter /

FACT-1902

Confirm Facter 4 validates that external/custom/executable facts output proper UTF-8

Change By:	David Piekny
Epic Link:	PUP- 11619 11658

Add Comment

Charmaine Pritchett (Jira)

unread,

Feb 9, 2023, 10:51:01 PM2/9/23

to puppe...@googlegroups.com

Charmaine Pritchett updated an issue

Facter /

FACT-1902

Confirm Facter 4 validates that external/custom/executable facts output proper UTF-8

Change By:	Charmaine Pritchett
Zendesk Ticket Count:	4 5
Zendesk Ticket IDs:	45908,45956,48390,49408 ,51041

Add Comment

Josh Cooper (Jira)

unread,

Mar 21, 2023, 10:25:01 PM3/21/23

to puppe...@googlegroups.com

Josh Cooper commented on

FACT-1902

Re: Confirm Facter 4 validates that external/custom/executable facts output proper UTF-8

Facter 4:

$ bx facter --custom-dir ./custom_facts -j nullbyte

  "nullbyte": "a\u0000b"

Add Comment

Josh Cooper (Jira)

unread,

Mar 22, 2023, 12:52:02 PM3/22/23

to puppe...@googlegroups.com

Josh Cooper updated an issue

Facter /

FACT-1902

Confirm Facter 4 validates that external/custom/executable facts output proper UTF-8

Change By:	Josh Cooper
Priority:	Major High

Add Comment

Josh Cooper (Jira)

unread,

Mar 23, 2023, 6:22:02 PM3/23/23

to puppe...@googlegroups.com

Josh Cooper commented on

FACT-1902

Re: Confirm Facter 4 validates that external/custom/executable facts output proper UTF-8

Facter 3 is the same:

# rpm -qa puppet-agent

puppet-agent-6.28.0-1.el7.x86_64

# /opt/puppetlabs/puppet/bin/facter --version

3.14.24 (commit 91ed8a2de5c9d686345859fe12ea2914415758f0)

# /opt/puppetlabs/puppet/bin/facter -j --custom-dir custom nullbyte

  "nullbyte": "a\u0000b"

Add Comment

Josh Cooper (Jira)

unread,

Apr 25, 2023, 12:30:01 PM4/25/23

to puppe...@googlegroups.com

Josh Cooper updated an issue

Facter /

FACT-1902

Confirm Facter 4 validates that external/custom/executable facts output proper UTF-8

Change By:	Josh Cooper

Modern versions of Puppet require that the data they serialize to JSON is proper UTF-8. When Since facter collects data from different external sources, it's possible for facter data to be incorrectly encoded. Examples include:
* Unicode code points are encoded as a UTF-16LE byte sequence, but the string's "encoding" method returns UTF-8 (Windows Registry)
* String contains binary data, but "encoding" returns UTF-8 (EC2 userdata)
* String contains the start of a valid multibyte UTF-8 sequnce, e.g.

hen facts have an incorrect encoding (either the encoding is mislabeled/doesn't match the match the underlying byte sequence or the byte sequence , this currently does not raise an error until it is serialized, at which point it is far too late, and the error message is not helpful.

Instead, Facter itself should raise an error about this, indicating encode the specific fact which returned bad data as UTF-8, replacing invalid byte sequences with the unicode replacement character . This will provide better context And issue a warning for the fact key or value for debugging.

Add Comment

Josh Cooper (Jira)

unread,

Apr 25, 2023, 12:36:01 PM4/25/23

to puppe...@googlegroups.com

Josh Cooper updated an issue

Facter /

FACT-1902

Confirm Facter 4 validates that external/custom/executable facts output proper UTF-8

Change By:	Josh Cooper

Modern versions of Puppet require that the data they serialize to JSON is proper UTF-8. Since facter collects data from different external sources, it's possible for facter data to be incorrectly encoded. Examples include:
* Unicode code points are encoded as String contains a valid UTF-16LE byte sequence, but the string's "encoding" method returns UTF-8 (Windows Registry)

* String contains binary data, but "encoding" returns UTF-8 (EC2 userdata)
* String contains the start of a valid multibyte UTF-8 sequnce, e.g.

hen facts have an incorrect encoding ( either "\xc3\x28")
* String contains embedded nulls. Strictly speaking the encoding "\u0000" code point is mislabeled/doesn't match the match the underlying valid and is encoded as a single null byte sequence or the byte sequence , this currently does not raise an error until but it is serialized, at which point it is far too late, surprising and can't be stored in Postgres.
* String was generated by a child process based on the error message active code page (Windows CP1252), but the output is not helpful. interpreted as UTF-8

Instead, Facter itself 's normalization should encode the ensure:
* All fact data as contains valid UTF-8 data
* If the string data is not valid , replacing the invalid byte sequences sequence should be replaced with the unicode replacement character . And issue a warning , so that it is valid
* Same for embedded null values
* A warning should be generated specifying the fact key or value for debugging. with invalid data

Add Comment

Josh Cooper (Jira)

unread,

Apr 25, 2023, 12:36:02 PM4/25/23

to puppe...@googlegroups.com

Josh Cooper updated an issue

Facter /

FACT-1902

Confirm Facter 4 validates that external/custom/executable facts output proper UTF-8

Change By:	Josh Cooper
Issue Type:	Improvement Bug

Add Comment

Aria Li (Jira)

unread,

May 31, 2023, 1:28:01 PM5/31/23

to puppe...@googlegroups.com

Aria Li assigned an issue to Aria Li

Facter /

FACT-1902

Confirm Facter 4 validates that external/custom/executable facts output proper UTF-8

Change By:	Aria Li
Assignee:	Aria Li

Add Comment

This message was sent by Atlassian Jira (v8.20.21#820021-sha1:38274c8)

Aria Li (Jira)

unread,

May 31, 2023, 5:28:01 PM5/31/23

to puppe...@googlegroups.com

Aria Li updated an issue

Facter /

FACT-1902

Confirm Facter 4 validates that external/custom/executable facts output proper UTF-8

Change By:	Aria Li
Acceptance Criteria:

Verify the following behavior with Facter 4 and add unit tests if they are missing.

1. If a custom, external data (ini/json/yaml) or external executable fact emits a string whose byte sequence is not a valid UTF-8 encoding, then facter should substitute those bytes with clearly identify the Unicode replacement character (U+FFFD � )
2. If a custom, extdrnal data or executable source of the fact emits a valid UTF-8 string containing an embedded null byte,then facter should do whatever facter 3 did, for example:

{code:ruby}
Facter.add(:nullbyte) do
setcode { "a\0b" }
end

{code} so it's easy to identify where it came from

Add Comment

Aria Li (Jira)

unread,

Jun 1, 2023, 12:03:02 PM6/1/23

to puppe...@googlegroups.com

Aria Li updated an issue

Facter /

FACT-1902

Confirm Facter 4 validates that external/custom/executable facts output proper UTF-8

Change By:	Aria Li
Sprint:	Phoenix 2023-06-07

Add Comment

Josh Cooper (Jira)

unread,

Jun 6, 2023, 5:50:02 PM6/6/23

to puppe...@googlegroups.com

Josh Cooper updated an issue

Facter /

FACT-1902

Confirm Facter 4 validates that external/custom/executable facts output proper UTF-8

Change By:	Josh Cooper
Fix Version/s:	FACT 4.4.1

Add Comment

Tony Vu (Jira)

unread,

Jun 7, 2023, 1:16:02 PM6/7/23

to puppe...@googlegroups.com

Tony Vu updated an issue

Facter /

FACT-1902

Confirm Facter 4 validates that external/custom/executable facts output proper UTF-8

Change By:	Tony Vu
Sprint:	Phoenix 2023-06-07 , Phoenix 2023-06-21

Add Comment

Tony Vu (Jira)

unread,

Jun 7, 2023, 1:46:02 PM6/7/23

to puppe...@googlegroups.com

Tony Vu updated an issue

Facter /

FACT-1902

Confirm Facter 4 validates that external/custom/executable facts output proper UTF-8

Change By:	Tony Vu
Story Points:	2

Add Comment

Josh Cooper (Jira)

unread,

Jun 8, 2023, 12:28:02 PM6/8/23

to puppe...@googlegroups.com

Josh Cooper updated an issue

Facter /

FACT-1902

Confirm Facter 4 validates that external/custom/executable facts output proper UTF-8

Change By:	Josh Cooper

Modern versions of Puppet require that the data they serialize to JSON is proper UTF-8. Since facter collects data from different external sources, it's possible for facter data to be incorrectly encoded. Examples include:

* String contains a valid UTF-16LE byte sequence, but the string's "encoding" method returns UTF-8 (Windows Registry)

* String contains binary data, but "encoding" returns UTF-8 (EC2 userdata)

* String contains the start of a valid multibyte UTF-8 sequnce, e.g. ("\xc3\x28")
* String contains embedded nulls. Strictly speaking the "\u0000" code point is valid and is encoded as a single null byte, but it is surprising and can't be stored in Postgres.
* String was generated by a child process based on the active code page (Windows CP1252), but the output is interpreted as UTF-8

Facter's normalization should ensure:
* All fact data contains valid UTF-8 data
* If the string data is not valid, the invalid byte sequence should be replaced with the unicode replacement character, so that it is valid

* Same for embedded null values

* A warning should be generated specifying the fact key - or value - with invalid data

Add Comment

Josh Cooper (Jira)

unread,

Jun 8, 2023, 12:28:02 PM6/8/23

to puppe...@googlegroups.com

Josh Cooper updated an issue

Facter /

FACT-1902

Confirm Facter 4 validates that external/custom/executable facts output proper UTF-8

Change By:	Josh Cooper

Modern versions of Puppet require that the data they serialize to JSON is proper UTF-8. Since facter collects data from different external sources, it's possible for facter data to be incorrectly encoded. Examples include:
* String contains a valid UTF-16LE byte sequence, but the string's "encoding" method returns UTF-8 (Windows Registry)
* String contains binary data, but "encoding" returns UTF-8 (EC2 userdata)
* String contains the start of a valid multibyte UTF-8 sequnce, e.g. ("\xc3\x28")
* String contains embedded nulls. Strictly speaking the "\u0000" code point is valid and is encoded as a single null byte, but it is surprising and can't be stored in Postgres.
* String was generated by a child process based on the active code page (Windows CP1252), but the output is interpreted as UTF-8

Facter's normalization should ensure:
* All fact data contains valid UTF-8 data
* If the string data is not valid, the invalid byte sequence should be replaced with the unicode replacement character, so that it is valid

* Same for embedded null values (different ticket)
* A warning should be generated specifying the fact key -or value- with invalid data

Add Comment

Christopher Thorn (Jira)

unread,

Jun 8, 2023, 6:49:02 PM6/8/23

to puppe...@googlegroups.com

Christopher Thorn updated an issue

Facter /

FACT-1902

Confirm Facter 4 validates that external/custom/executable facts output proper UTF-8

Change By:	Christopher Thorn
Fix Version/s:	FACT 4.4.1
Fix Version/s:	FACT 4.4.2

Add Comment

Josh Cooper (Jira)

unread,

Jun 12, 2023, 3:32:02 PM6/12/23

to puppe...@googlegroups.com

Josh Cooper updated an issue

Facter /

FACT-1902

Confirm Facter 4 validates that external/custom/executable facts output proper UTF-8

Change By:	Josh Cooper

Modern versions of Puppet require that the data they serialize to JSON is proper UTF-8. Since facter collects data from different external sources, it's possible for facter data to be incorrectly encoded. Examples include:
* String contains a valid UTF-16LE byte sequence, but the string's "encoding" method returns UTF-8 (Windows Registry)
* String contains binary data, but "encoding" returns UTF-8 (EC2 userdata)
* String contains the start of a valid multibyte UTF-8 sequnce, e.g. ("\xc3\x28")
* String contains embedded nulls. Strictly speaking the "\u0000" code point is valid and is encoded as a single null byte, but it is surprising and can't be stored in Postgres.
* String was generated by a child process based on the active code page (Windows CP1252), but the output is interpreted as UTF-8

Facter's normalization should ensure:
* All fact data contains valid UTF-8 data

* If the string data is not valid, the invalid byte sequence an error should be replaced with logged stating the unicode replacement character, so custom or external fact that it is valid caused the issue. The fact should be omitted from the fact collection sent to the server and the agent run should continue

* Same for embedded null values (different ticket)

* A warning should be generated specifying the fact key -or value- with invalid data