Thanks for that background, Greg. From the thread, this is what I can
see in terms of a problem to fix:
> The presence of such characters within the text degrades functionality
> by interfering with operations such as search, indexing, copy/paste to
> other environments, etc. Their presence is typically the result of
> broken authoring tools/workflows, but as long as browsers ignore them
> for rendering, authors generally remain unaware that their data is bad,
> and readers will usually be unaware that their searches, etc., may be
> missing content they would have expected to match.
Honestly, this seems like a problem best solved in the browser itself,
by ignoring control characters, just like how you would ignore the
empty element in A<b></b>C and thus find a match for "AC". Compare
that to AC, where I cannot match "AC", at least not in Opera or
Chrome.
Some estimate of the expected impact is really needed here. Making
some proportion of the web look worse is a very tangible downside,
while "follow the Unicode spec" and "test synced release of breaking
changes" are relatively weak upsides, IMHO.
(I think synced release of breaking changes has great potential,
perhaps for restricting Geolocation to secure origins, and there's
lots of other changes that would benefit from such coordination.)
Philip