On Tue, Aug 11, 2020 at 1:36 AM Martijn Dekker <
mar...@inlv.org> wrote:
> Op 11-08-20 om 02:25 schreef Andras Farkas:
> > is incompatible with XML (and the documentation currently
> > claims output of %H is compatible with both HTML and XML, which only
> > lets us use four named values: gt, lt, amp, quot)
>
> Good point, I'd forgotten about XML.
>
> > Options include:
> > 1. Just outputting the non-breaking space exactly as it was input
> > 2. Using the numeric version (like what was chosen instead of apos)
> > which is  
> > I greatly prefer the first of these two options, as I think only five
> > characters should be changed in any way by %H: < > & " ' as these are
> > the characters semantically meaningful to XML and HTML and what a user
> > expects to be changed.
>
> The second option is what would happen by default, as all non-printable
> and non-graph ("invisible") characters except the ASCII space are
> changed to numeric values. This most closely matches the old behaviour
> while fixing what was broken about it, so I'm inclined to keep it,
> although I'm open to further arguments.
Ah, true! I'm fine with this, too.
So, I'm fine with the second of these two options being implemented,
now, for sure.
> I also think encoding non-graph characters (other than the regular ASCII
> space, carriage return and newline) is more user friendly, because it
> makes the difference between the various kinds of spaces visible when
> reading the code. This difference is important, because adjacent ASCII
> spaces and control characters are collapsed into one space in XML and
> HTML, whereas the various other Unicode spaces are rendered as they are.
> So if the rendering ends up with strange spacing, it would be mystifying
> if you cannot see the Unicode spaces causing it in the code.
Ah, true, I don't disagree with this! It really could go either way:
there are cases where numerically-encoding Unicode spaces will make
HTML more readable, and some cases where it'll be less readable. I
still do have a preference for the first option, as it means reading
the text of an HTML file in a text editor is more natural.
I suppose the question is "What should ksh do versus what should be
left for the user to do before or after it?" What's a more generic
solution?
> So it appears we have different views on user expectations. Could you
> elaborate on why you think a user would expect only those five
> characters to be changed?
Well, my reasoning was that for someone acquainted with HTML and XML,
they probably only hope for protection from the five semantically
important characters in text they intend to put in HTML/XML elements.
You do remind me, though, that someone already using Korn shell for
this might be interested in some other characters being encoded too.
So, this is fine.
Again, thanks so much. :D