| With the changes in Puppet 5 and 6 to ensure we're using utf-8 everywhere, reporting non-utf8 data from EC2 metadata is gonna be even more complicated than it was when this ticket was filed. We pretty much can't send that data to the server now, so we probably shouldn't even be collecting it. The ideal solution would be to plumb Puppet's datatypes down into facter, so we could report the userdata as binary. The next best solutions are to base64 encode it unconditionally, or just stop reporting it entirely. The first is basically just wishful thinking, and the second two are both breaking changes (though certainly more correct than the current behavior). The compromise would be to conditionally omit or base64-encode the fact if it's not utf-8. This has some potentially gross impacts for certain users (for example, latin-1 text could look like utf-8 sometimes, when it's only using ASCII). But I think that'd only happen for cases that currently require a PSON or fact-blocking workaround, so those users wouldn't be any worse off. We could then switch to always encoding (or always omitting the fact) in Facter 4. |