Proposal to switch from Unibrow to ICU in V8

53 views
Skip to first unread message

Daniel Ehrenberg

unread,
Jan 12, 2016, 2:31:29 AM1/12/16
to v8-u...@googlegroups.com
Hi V8 users,

V8 has its own built-in, custom, minimal Unicode library called
Unibrow. I am looking into replacing Unibrow with the ICU library.
There would be a fallback for non-ICU builds which supports the UTF-8
and UTF-16 encodings for source code, but only gets operations like
case mapping right on ASCII. This has a few advantages:
- The long-term maintenance burden of our separate Unicode
implementation would be removed, freeing up effort to work on other
efforts, including better internationalization.
- Unibrow has a few long-standing bugs, and sometimes lags the ICU
version in the same Chrome build. Replacing it with ICU or ASCII
support, depending on build-time flags, would remove these bugs.
- The binary size would be slightly reduced, especially on builds
without internationalization enabled.

I'm wondering whether anyone who builds without ICU depends on the
following features, which I'm proposing to make dependent on ICU:
- Unicode-aware case mapping (e.g., String.prototype.toUpperCase())
- JavaScript code with non-ASCII identifiers
- Unicode-aware case-insensitive RegExp matching

Thanks,
Dan

Yang Guo

unread,
Jan 20, 2016, 5:07:23 AM1/20/16
to v8-users
FWIW I'm already implementing unicode-aware case-insensitive RegExp matching relying on ICU [0]. Unibrow simply does not support getting case fold closures, and I don't think it makes sense to extend it further to include support.
It would be nice to get rid of unibrow altogether. One issue we have is that unibrow and ICU may get out of sync wrt Unicode version. However, I don't think this has high priority, and can be done incrementally.


Yang
Reply all
Reply to author
Forward
0 new messages