Intent to ship: well-formed JSON.stringify

28 views
Skip to first unread message

Mathias Bynens

unread,
Oct 3, 2018, 11:09:29 AM10/3/18
to blink-dev, v8-users

Contact emails

mat...@chromium.org


Spec

https://github.com/tc39/proposal-well-formed-stringify


Summary

A Stage 3 proposal changes JSON.stringify to prevent it from returning ill-formed Unicode strings.


Motivation

RFC 8259 section 8.1 requires JSON text exchanged outside the scope of a closed ecosystem to be encoded using UTF-8, but JSON.stringify can return strings including code points that have no representation in UTF-8 (specifically, surrogate code points U+D800 through U+DFFF). And contrary to the description of JSON.stringify, such strings are not “in UTF-16” because “isolated UTF-16 code units in the range D800₁₆..DFFF₁₆ are ill-formed” per The Unicode Standard, Version 10.0.0, Section 3.4 at definition D91 and excluded from being “in UTF-16” per definition D89.

However, returning such invalid Unicode strings is unnecessary, because JSON strings can include Unicode escape sequences.


Interoperability and compatibility risk

This change is backwards-compatible, under an assumption of consumer compliance with the JSON specification. User-visible effects are limited to the replacement of some rare single UTF-16 code units in JSON.stringify output with equivalent six-character escape sequences that can be represented both in UTF-16 and in UTF-8. Any consumer accepting the current ill-formed output is unaffected by this change (this is true in particular of ECMAScript JSON.parse). Any consumer rejecting the current ill-formed output will have a new opportunity to accept its well-formed representation, although such consumers may still reject input that specifies strings including Unicode code points that are not scalar values (e.g. because they only accept I-JSON input), but those that accept it must have mechanisms for dealing with unpaired surrogates (as mentioned in the specification of JSON).

This feature has a ready-to-land Firefox/SpiderMonkey implementation. There are tracking bugs for Edge/Chakra and Safari/JavaScriptCore.

Is this feature fully tested?

Yes. In addition to V8’s own tests (v8/test/mjsunit/harmony/well-formed-json-stringify*.js and v8/test/cctest/test-strings*), Test262 includes tests for this feature.


Tracking bug

https://bugs.chromium.org/p/v8/issues/detail?id=7782


Link to entry on the Chrome Platform Status dashboard

https://www.chromestatus.com/feature/5752304045129728


Requesting approval to ship?

Yes. Note that since this is a V8/JS feature, this post is just an FYI to blink-dev — no signoff from Blink API owners is required.


Adam Klein

unread,
Oct 3, 2018, 5:42:45 PM10/3/18
to v8-users, blink-dev
LGTM

--
--
v8-users mailing list
v8-u...@googlegroups.com
http://groups.google.com/group/v8-users
---
You received this message because you are subscribed to the Google Groups "v8-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to v8-users+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Sathya Gunasekaran

unread,
Oct 3, 2018, 7:43:12 PM10/3/18
to v8-users, blink-dev
LGTM2

On Wed, Oct 3, 2018 at 2:42 PM Adam Klein <ad...@chromium.org> wrote:
LGTM

On Wed, Oct 3, 2018 at 8:09 AM 'Mathias Bynens' via v8-users <v8-u...@googlegroups.com> wrote:

Contact emails

mat...@chromium.org


Spec

https://github.com/tc39/proposal-well-formed-stringify


Summary

A Stage 3 proposal changes JSON.stringify to prevent it from returning ill-formed Unicode strings.


Motivation

RFC 8259 section 8.1 requires JSON text exchanged outside the scope of a closed ecosystem to be encoded using UTF-8, but JSON.stringify can return strings including code points that have no representation in UTF-8 (specifically, surrogate code points U+D800 through U+DFFF). And contrary to the description of JSON.stringify, such strings are not “in UTF-16” because “isolated UTF-16 code units in the range D800₁₆..DFFF₁₆ are ill-formed” per The Unicode Standard, Version 10.0.0, Section 3.4 at definition D91 and excluded from being “in UTF-16” per definition D89.

However, returning such invalid Unicode strings is unnecessary, because JSON strings can include Unicode escape sequences.


Interoperability and compatibility risk

This change is backwards-compatible, under an assumption of consumer compliance with the JSON specification. User-visible effects are limited to the replacement of some rare single UTF-16 code units in JSON.stringify output with equivalent six-character escape sequences that can be represented both in UTF-16 and in UTF-8. Any consumer accepting the current ill-formed output is unaffected by this change (this is true in particular of ECMAScript JSON.parse). Any consumer rejecting the current ill-formed output will have a new opportunity to accept its well-formed representation, although such consumers may still reject input that specifies strings including Unicode code points that are not scalar values (e.g. because they only accept I-JSON input), but those that accept it must have mechanisms for dealing with unpaired surrogates (as mentioned in the specification of JSON).

This feature has a ready-to-land Firefox/SpiderMonkey implementation. There are tracking bugs for Edge/Chakra and Safari/JavaScriptCore.

Is this feature fully tested?

Yes. In addition to V8’s own tests (v8/test/mjsunit/harmony/well-formed-json-stringify*.js and v8/test/cctest/test-strings*), Test262 includes tests for this feature.



Note: Mathias rolled the latest test262 tests in to V8 (https://chromium-review.googlesource.com/c/v8/v8/+/1259865) and V8 passes the test262 tests for this feature.
Reply all
Reply to author
Forward
0 new messages