Modern versions of Puppet require that the data they serialize to JSON is proper UTF-8. When Since facter collects data from different external sources, it's possible for facter data to be incorrectly encoded. Examples include: * Unicode code points are encoded as a UTF-16LE byte sequence, but the string's "encoding" method returns UTF-8 (Windows Registry) * String contains binary data, but "encoding" returns UTF-8 (EC2 userdata) * String contains the start of a valid multibyte UTF-8 sequnce, e.g.
hen facts have an incorrect encoding (either the encoding is mislabeled/doesn't match the match the underlying byte sequence or the byte sequence , this currently does not raise an error until it is serialized, at which point it is far too late, and the error message is not helpful.
Instead, Facter itself should raise an error about this, indicating encode the specific fact which returned bad data as UTF-8, replacing invalid byte sequences with the unicode replacement character . This will provide better context And issue a warning for the fact key or value for debugging. |
|