Contact emails
Spec
https://github.com/tc39/proposal-well-formed-stringify
Summary
A Stage 3 proposal changes JSON.stringify to prevent it from returning ill-formed Unicode strings.
Motivation
RFC 8259 section 8.1 requires JSON text exchanged outside the scope of a closed ecosystem to be encoded using UTF-8, but JSON.stringify can return strings including code points that have no representation in UTF-8 (specifically, surrogate code points U+D800 through U+DFFF). And contrary to the description of JSON.stringify, such strings are not “in UTF-16” because “isolated UTF-16 code units in the range D800₁₆..DFFF₁₆ are ill-formed” per The Unicode Standard, Version 10.0.0, Section 3.4 at definition D91 and excluded from being “in UTF-16” per definition D89.
However, returning such invalid Unicode strings is unnecessary, because JSON strings can include Unicode escape sequences.
Interoperability and compatibility risk
This change is backwards-compatible, under an assumption of consumer compliance with the JSON specification. User-visible effects are limited to the replacement of some rare single UTF-16 code units in JSON.stringify output with equivalent six-character escape sequences that can be represented both in UTF-16 and in UTF-8. Any consumer accepting the current ill-formed output is unaffected by this change (this is true in particular of ECMAScript JSON.parse). Any consumer rejecting the current ill-formed output will have a new opportunity to accept its well-formed representation, although such consumers may still reject input that specifies strings including Unicode code points that are not scalar values (e.g. because they only accept I-JSON input), but those that accept it must have mechanisms for dealing with unpaired surrogates (as mentioned in the specification of JSON).
This feature has a ready-to-land Firefox/SpiderMonkey implementation. There are tracking bugs for Edge/Chakra and Safari/JavaScriptCore.
Is this feature fully tested?
Yes. In addition to V8’s own tests (v8/test/mjsunit/harmony/well-formed-json-stringify*.js and v8/test/cctest/test-strings*), Test262 includes tests for this feature.
Tracking bug
https://bugs.chromium.org/p/v8/issues/detail?id=7782
Link to entry on the Chrome Platform Status dashboard
https://www.chromestatus.com/feature/5752304045129728
Requesting approval to ship?
Yes. Note that since this is a V8/JS feature, this post is just an FYI to blink-dev — no signoff from Blink API owners is required.
--
--
v8-users mailing list
v8-u...@googlegroups.com
http://groups.google.com/group/v8-users
---
You received this message because you are subscribed to the Google Groups "v8-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to v8-users+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
LGTMOn Wed, Oct 3, 2018 at 8:09 AM 'Mathias Bynens' via v8-users <v8-u...@googlegroups.com> wrote:Contact emails
Spec
https://github.com/tc39/proposal-well-formed-stringify
Summary
A Stage 3 proposal changes JSON.stringify to prevent it from returning ill-formed Unicode strings.
Motivation
RFC 8259 section 8.1 requires JSON text exchanged outside the scope of a closed ecosystem to be encoded using UTF-8, but JSON.stringify can return strings including code points that have no representation in UTF-8 (specifically, surrogate code points U+D800 through U+DFFF). And contrary to the description of JSON.stringify, such strings are not “in UTF-16” because “isolated UTF-16 code units in the range D800₁₆..DFFF₁₆ are ill-formed” per The Unicode Standard, Version 10.0.0, Section 3.4 at definition D91 and excluded from being “in UTF-16” per definition D89.
However, returning such invalid Unicode strings is unnecessary, because JSON strings can include Unicode escape sequences.
Interoperability and compatibility risk
This change is backwards-compatible, under an assumption of consumer compliance with the JSON specification. User-visible effects are limited to the replacement of some rare single UTF-16 code units in JSON.stringify output with equivalent six-character escape sequences that can be represented both in UTF-16 and in UTF-8. Any consumer accepting the current ill-formed output is unaffected by this change (this is true in particular of ECMAScript JSON.parse). Any consumer rejecting the current ill-formed output will have a new opportunity to accept its well-formed representation, although such consumers may still reject input that specifies strings including Unicode code points that are not scalar values (e.g. because they only accept I-JSON input), but those that accept it must have mechanisms for dealing with unpaired surrogates (as mentioned in the specification of JSON).
This feature has a ready-to-land Firefox/SpiderMonkey implementation. There are tracking bugs for Edge/Chakra and Safari/JavaScriptCore.
Is this feature fully tested?
Yes. In addition to V8’s own tests (v8/test/mjsunit/harmony/well-formed-json-stringify*.js and v8/test/cctest/test-strings*), Test262 includes tests for this feature.