Integrating ICU into Mozilla build

Norbert Lindenberg

unread,

Dec 3, 2012, 2:32:15 PM12/3/12

to dev-pl...@lists.mozilla.org, Norbert Lindenberg

As part of implementing the ECMAScript Internationalization API [1, 2] in SpiderMonkey, and as an aid in internationalizing other functionality in Mozilla products [3], I need to integrate the ICU library (International Components for Unicode [4]) into the source tree and the build.

For integrating ICU into the source tree, I see two main alternatives:

- Add the required set of ICU source files as separate files to the Mozilla repository. The current version of ICU (50.1, C/C++ version) has about 5350 source files; stripping out files that aren't needed for the internationalization API (but might be needed later) brings this to about 3250 files. The complete Mozilla tree before this addition has about 70,000 files.

- Add the source bundles (zip/tar) to the Mozilla repository, and then extract the source files as part of the build.

One might also imagine leaving ICU out of the tree entirely, and either downloading the sources as part of the build, or building ICU completely separately and only installing the binaries, but neither of these options seem appropriate.

Comments?

Thanks,
Norbert

[1] http://wiki.ecmascript.org/doku.php?id=globalization:specification_drafts
[2] http://norbertlindenberg.com/2012/10/ecmascript-internationalization-api/index.html
[3] https://bugzilla.mozilla.org/show_bug.cgi?id=724529
[4] http://site.icu-project.org

Kyle Huey

unread,

Dec 3, 2012, 2:35:59 PM12/3/12

to Norbert Lindenberg, dev-pl...@lists.mozilla.org

> _______________________________________________
> dev-platform mailing list
> dev-pl...@lists.mozilla.org
> https://lists.mozilla.org/listinfo/dev-platform
>

We should just add the ICU sources to the tree. The complexity of
extracting/downloading the sources during the build would be high, and we
don't do that for any other third-party library.

I'm far more worried about ICU's impacts on the size of the binaries we
ship to users than I am about the impact to the size of our source tree ;-)

- Kyle

Benjamin Smedberg

unread,

Dec 3, 2012, 2:48:06 PM12/3/12

to Norbert Lindenberg, dev-pl...@lists.mozilla.org

On 12/3/2012 2:32 PM, Norbert Lindenberg wrote:
> As part of implementing the ECMAScript Internationalization API [1, 2] in SpiderMonkey, and as an aid in internationalizing other functionality in Mozilla products [3], I need to integrate the ICU library (International Components for Unicode [4]) into the source tree and the build.

This has been brought up many times over the years, and each time
previously we decided not to import ICU. At first, the license was
incompatible; that has since been fixed. Now the question is mainly
about whether the features ICU provides are worth the really cost in
terms of binary size.

How much size does ICU cost, if we took the entire library? How much of
that is data (which can be shared in 32/64 mac universal builds) and how
much is code which cannot be shared?

How much smaller can we make it if we take only the bits we need for the
JS APIs?

Can we remove or replace some of our existing intl code with ICU, to
reduce the impact? (Have you discussed this with smontagu?)

Is ICU already available on Android, or would we have to include it in
our Android package?

What are our other options if we decide the ICU is just too large or
unwieldy?

--BDS

Mike Hommey

unread,

Dec 3, 2012, 3:11:58 PM12/3/12

to Benjamin Smedberg, dev-pl...@lists.mozilla.org, Norbert Lindenberg

On Mon, Dec 03, 2012 at 02:48:06PM -0500, Benjamin Smedberg wrote:
> On 12/3/2012 2:32 PM, Norbert Lindenberg wrote:
> >As part of implementing the ECMAScript Internationalization API [1, 2] in SpiderMonkey, and as an aid in internationalizing other functionality in Mozilla products [3], I need to integrate the ICU library (International Components for Unicode [4]) into the source tree and the build.
> This has been brought up many times over the years, and each time
> previously we decided not to import ICU. At first, the license was
> incompatible; that has since been fixed. Now the question is mainly
> about whether the features ICU provides are worth the really cost in
> terms of binary size.
>
> How much size does ICU cost, if we took the entire library? How much
> of that is data (which can be shared in 32/64 mac universal builds)
> and how much is code which cannot be shared?

ICU doesn't come with data files. Data is enclosed in libraries, and
such data is not shared between the 32-bits and 64-bits parts of
universal binaries.

Mike

Mike Hommey

unread,

Dec 3, 2012, 3:14:36 PM12/3/12

to Benjamin Smedberg, dev-pl...@lists.mozilla.org, Norbert Lindenberg

FWIW, the libicudata.so file on my debian install has about 18MB of
.rodata.

Mike

Jonathan Kew

unread,

Dec 3, 2012, 5:10:46 PM12/3/12

to Mike Hommey, Norbert Lindenberg, Benjamin Smedberg, dev-pl...@lists.mozilla.org

On 3/12/12 20:11, Mike Hommey wrote:
> On Mon, Dec 03, 2012 at 02:48:06PM -0500, Benjamin Smedberg wrote:
>> On 12/3/2012 2:32 PM, Norbert Lindenberg wrote:
>>> As part of implementing the ECMAScript Internationalization API [1, 2] in SpiderMonkey, and as an aid in internationalizing other functionality in Mozilla products [3], I need to integrate the ICU library (International Components for Unicode [4]) into the source tree and the build.
>> This has been brought up many times over the years, and each time
>> previously we decided not to import ICU. At first, the license was
>> incompatible; that has since been fixed. Now the question is mainly
>> about whether the features ICU provides are worth the really cost in
>> terms of binary size.
>>
>> How much size does ICU cost, if we took the entire library? How much
>> of that is data (which can be shared in 32/64 mac universal builds)
>> and how much is code which cannot be shared?
>
> ICU doesn't come with data files. Data is enclosed in libraries, and
> such data is not shared between the 32-bits and 64-bits parts of
> universal binaries.
>

Actually, ICU has several options for how its data is packaged. One
option is libraries (which are not sharable between architectures,
AFAIK), but another possibility is to use data package (.dat) files,
which I believe *could* be shared between the 32- and 64-bit builds.

http://userguide.icu-project.org/icudata

JK

Rafael Ávila de Espíndola

unread,

Dec 3, 2012, 5:25:55 PM12/3/12

to Jonathan Kew, Mike Hommey, Benjamin Smedberg, Norbert Lindenberg, dev-pl...@lists.mozilla.org

>
> Actually, ICU has several options for how its data is packaged. One option is libraries (which are not sharable between architectures, AFAIK), but another possibility is to use data package (.dat) files, which I believe *could* be shared between the 32- and 64-bit builds.

getting a bit off topic, but since we don't support 10.5 anymore, can't we build just a 32 bit plugin container instead of the full browser as a universal binary? Would the plugin container need to link with ICU too?

> http://userguide.icu-project.org/icudata
>
> JK

Cheers,
Rafael

Norbert Lindenberg

unread,

Dec 3, 2012, 5:39:55 PM12/3/12

to Benjamin Smedberg, dev-pl...@lists.mozilla.org, Norbert Lindenberg

On Dec 3, 2012, at 11:48 , Benjamin Smedberg wrote:

> On 12/3/2012 2:32 PM, Norbert Lindenberg wrote:
>> As part of implementing the ECMAScript Internationalization API [1, 2] in SpiderMonkey, and as an aid in internationalizing other functionality in Mozilla products [3], I need to integrate the ICU library (International Components for Unicode [4]) into the source tree and the build.
> This has been brought up many times over the years, and each time previously we decided not to import ICU. At first, the license was incompatible; that has since been fixed. Now the question is mainly about whether the features ICU provides are worth the really cost in terms of binary size.

OK, just as an introduction, why we're doing this: The ECMAScript Internationalization API (which has been approved by Ecma TC 39 and is on track to become an Ecma standard next week) provides web applications with the ability to format numbers, dates, and times and sort strings according to the rules of the language that the application is using, not the one that browser and OS default to. Many users are multilingual and go to web sites in different languages, and even users who aren't sometimes have to use browsers that don't support their language. The API in addition lets applications tailor the results to their specific needs, e.g., specify the currency with which numbers are displayed, select the date-time components used in a date format, or ignore punctuation in sorting.

To implement that, we need good library support, and ICU fits the bill.

> How much size does ICU cost, if we took the entire library? How much of that is data (which can be shared in 32/64 mac universal builds) and how much is code which cannot be shared?

On my Mac (64 bit), the full data library is 20.8 MB, the six code libraries combined 4.2 MB.

> How much smaller can we make it if we take only the bits we need for the JS APIs?

I have currently trimmed it to 9.7 MB for the data library and 3.1 MB for two code libraries.

> Can we remove or replace some of our existing intl code with ICU, to reduce the impact? (Have you discussed this with smontagu?)

The functionality needed for the internationalization API and for the existing intl code don't overlap much - the existing code relies on the OS for number and date formatting and string comparison. The biggest chunk of intl code that could be replaced with ICU is encoding conversion, which I've currently cut from my ICU build.

> Is ICU already available on Android, or would we have to include it in our Android package?

I'm told Android has ICU, but it's only available for system apps. Chrome installs its own copy.

> What are our other options if we decide the ICU is just too large or unwieldy?

Well, the first question is what size increase would be acceptable given the benefits that ICU provides. Obviously, there's a wide range options from cutting the set of locales and possibly some features from ICU, to trying to build as much of the internationalization API as possible on top of the various OS APIs, with various trade-offs between download size, available functionality, supported locales, compliance with the standard, and engineering effort.

Norbert

Justin Lebar

unread,

Dec 3, 2012, 7:15:20 PM12/3/12

to Norbert Lindenberg, Benjamin Smedberg, dev-pl...@lists.mozilla.org

On Mon, Dec 3, 2012 at 5:39 PM, Norbert Lindenberg
<mozill...@lindenbergsoftware.com> wrote:
>
> Well, the first question is what size increase would be acceptable given the benefits that ICU provides.

> I have currently trimmed it to 9.7 MB for the data library and 3.1 MB for two code libraries.

Ignoring download size for a moment, I want to consider the memory
usage of this at runtime.

Does all of this data need to be loaded into memory?

If so, 13mb of code + data is likely unacceptable to B2G. That's
roughly 10% of all memory we have available to gecko on our devices.
I would lobby hard against turning this on for B2G.

If we can avoid loading most of that data into memory, the situation
is much better. But even 3mb of code is dicey; we consider a 3mb
memory win to be substantial and worthy of a large amount of effort to
obtain.

I'd imagine that the Fennec folks working on project 256mb [1] would
have similar reactions.

On Windows desktop, our median memory usage is ~500mb, and the 5th
percentile is ~175mb, so an extra 13mb, while not great, might be
acceptable. 3mb wouldn't be a big deal.

-Justin

[1] https://bugzilla.mozilla.org/show_bug.cgi?id=792131

Norbert Lindenberg

unread,

Dec 3, 2012, 8:04:57 PM12/3/12

to Justin Lebar, dev-pl...@lists.mozilla.org, Benjamin Smedberg, Norbert Lindenberg

On Dec 3, 2012, at 16:15 , Justin Lebar wrote:

> On Mon, Dec 3, 2012 at 5:39 PM, Norbert Lindenberg
> <mozill...@lindenbergsoftware.com> wrote:
>>
>> Well, the first question is what size increase would be acceptable given the benefits that ICU provides.
>
>> I have currently trimmed it to 9.7 MB for the data library and 3.1 MB for two code libraries.
>
> Ignoring download size for a moment, I want to consider the memory
> usage of this at runtime.

Memory usage on small devices certainly warrants some investigation and discussion. Unfortunately, I don't have real data yet.

> Does all of this data need to be loaded into memory?

Most of the data is locale data for several hundred locales, separated by locale and functionality, so as long as applications don't request a specific locale/functionality combination, it doesn't need to be loaded. Note though that the size of the data isn't uniform per locale - e.g., Chinese collation data is huge.

> If so, 13mb of code + data is likely unacceptable to B2G. That's
> roughly 10% of all memory we have available to gecko on our devices.
> I would lobby hard against turning this on for B2G.

Understood. Now, how does B2G support internationalization in the absence of ICU?

> If we can avoid loading most of that data into memory, the situation
> is much better. But even 3mb of code is dicey; we consider a 3mb
> memory win to be substantial and worthy of a large amount of effort to
> obtain.

Is all code loaded into memory as one lump on B2G, or is it paged in as needed? There may be quite a bit of code in there that's not commonly needed. The ICU documentation suggests static linking as a way to reduce code size - I haven't tried yet how much that would help.

> I'd imagine that the Fennec folks working on project 256mb [1] would
> have similar reactions.
>
> On Windows desktop, our median memory usage is ~500mb, and the 5th
> percentile is ~175mb, so an extra 13mb, while not great, might be
> acceptable. 3mb wouldn't be a big deal.

Good to know.

Norbert

Norbert Lindenberg

unread,

Dec 3, 2012, 8:10:43 PM12/3/12

to Jonathan Kew, Mike Hommey, Benjamin Smedberg, Norbert Lindenberg, dev-pl...@lists.mozilla.org

On Dec 3, 2012, at 14:10 , Jonathan Kew wrote:

> On 3/12/12 20:11, Mike Hommey wrote:

>> On Mon, Dec 03, 2012 at 02:48:06PM -0500, Benjamin Smedberg wrote:
>>> On 12/3/2012 2:32 PM, Norbert Lindenberg wrote:
>>>> As part of implementing the ECMAScript Internationalization API [1, 2] in SpiderMonkey, and as an aid in internationalizing other functionality in Mozilla products [3], I need to integrate the ICU library (International Components for Unicode [4]) into the source tree and the build.
>>> This has been brought up many times over the years, and each time
>>> previously we decided not to import ICU. At first, the license was
>>> incompatible; that has since been fixed. Now the question is mainly
>>> about whether the features ICU provides are worth the really cost in
>>> terms of binary size.
>>>

>>> How much size does ICU cost, if we took the entire library? How much
>>> of that is data (which can be shared in 32/64 mac universal builds)
>>> and how much is code which cannot be shared?
>>

>> ICU doesn't come with data files. Data is enclosed in libraries, and
>> such data is not shared between the 32-bits and 64-bits parts of
>> universal binaries.
>

> Actually, ICU has several options for how its data is packaged. One option is libraries (which are not sharable between architectures, AFAIK), but another possibility is to use data package (.dat) files, which I believe *could* be shared between the 32- and 64-bit builds.

How important is this on Macs? My ICU build compresses to about 4MB, so even if we double everything, it's about 8MB. Apple routinely sends software updates with hundreds of MB, with some recent ones going above 1GB.

Norbert

Chris Double

unread,

Dec 3, 2012, 10:09:45 PM12/3/12

to Norbert Lindenberg, dev-pl...@lists.mozilla.org, Benjamin Smedberg, Justin Lebar

On Tue, Dec 4, 2012 at 2:04 PM, Norbert Lindenberg
<mozill...@lindenbergsoftware.com> wrote:
> Understood. Now, how does B2G support internationalization in the absence of ICU?

Possibly it uses ICU already. There's an external/icu4c in the B2G repository.

Chris.
--
http://www.bluishcoder.co.nz

Justin Lebar

unread,

Dec 3, 2012, 10:54:34 PM12/3/12

to Norbert Lindenberg, Benjamin Smedberg, dev-pl...@lists.mozilla.org

> Is all code loaded into memory as one lump on B2G, or is it paged in as needed? There may be quite a bit of code in there that's not commonly needed.

I don't know how it works, to be honest. I can't imagine that library
pages are loaded only on demand with no pre-fetching, since that would
be slow. And I imagine once a code page is in memory it's not going
out. Our devices don't have swap, and I've never observed them
dropping clean code pages (which don't have to go into a swap file) to
make room for data.

-Justin

On Mon, Dec 3, 2012 at 8:04 PM, Norbert Lindenberg
<mozill...@lindenbergsoftware.com> wrote:
>
> On Dec 3, 2012, at 16:15 , Justin Lebar wrote:
>

>> On Mon, Dec 3, 2012 at 5:39 PM, Norbert Lindenberg
>> <mozill...@lindenbergsoftware.com> wrote:
>>>
>>> Well, the first question is what size increase would be acceptable given the benefits that ICU provides.
>>
>>> I have currently trimmed it to 9.7 MB for the data library and 3.1 MB for two code libraries.
>>
>> Ignoring download size for a moment, I want to consider the memory
>> usage of this at runtime.
>
> Memory usage on small devices certainly warrants some investigation and discussion. Unfortunately, I don't have real data yet.
>
>> Does all of this data need to be loaded into memory?
>
> Most of the data is locale data for several hundred locales, separated by locale and functionality, so as long as applications don't request a specific locale/functionality combination, it doesn't need to be loaded. Note though that the size of the data isn't uniform per locale - e.g., Chinese collation data is huge.
>
>> If so, 13mb of code + data is likely unacceptable to B2G. That's
>> roughly 10% of all memory we have available to gecko on our devices.
>> I would lobby hard against turning this on for B2G.
>

> Understood. Now, how does B2G support internationalization in the absence of ICU?
>

Makoto Kato

unread,

Dec 3, 2012, 11:15:12 PM12/3/12

to

A year ago, I investigated to replace uconv with ICU.

Many (most is IBM code page) converters in ICU isn't compatible with
uconv, and ICU isn't have all character set converters that we already
use in uconv.

If using ICU for uconv, we can remove a few table code into uconv, but
not all.

-- Makoto

Dave Hylands

unread,

Dec 4, 2012, 12:48:18 AM12/4/12

to Justin Lebar, Norbert Lindenberg, Benjamin Smedberg, dev-pl...@lists.mozilla.org

Hi,

----- Original Message -----
> From: "Justin Lebar" <justin...@gmail.com>
> To: "Norbert Lindenberg" <mozill...@lindenbergsoftware.com>
> Cc: "Benjamin Smedberg" <benj...@smedbergs.us>, dev-pl...@lists.mozilla.org
> Sent: Monday, December 3, 2012 7:54:34 PM
> Subject: Re: Integrating ICU into Mozilla build
>
> > Is all code loaded into memory as one lump on B2G, or is it paged
> > in as needed? There may be quite a bit of code in there that's not
> > commonly needed.
>
> I don't know how it works, to be honest. I can't imagine that
> library
> pages are loaded only on demand with no pre-fetching, since that
> would
> be slow. And I imagine once a code page is in memory it's not going
> out. Our devices don't have swap, and I've never observed them
> dropping clean code pages (which don't have to go into a swap file)
> to
> make room for data.

I'm pretty sure that the library pages are brought in on a demand basis, although I also seem to recall that the kernel also does some read-ahead. The pages that are paged in become the actual code that is run, so I'm pretty sure that they don't get released, but conceptually there's no reason why it couldn't.

I dug around a bit and came up with this paper which gives some background on the algorithms used in the kernel.
http://landley.net/kdocs/ols/2007/ols2007v2-pages-273-284.pdf

Dave Hylands

Mike Hommey

unread,

Dec 4, 2012, 1:16:13 AM12/4/12

to Rafael Ávila de Espíndola, Jonathan Kew, dev-pl...@lists.mozilla.org, Benjamin Smedberg, Norbert Lindenberg

On Mon, Dec 03, 2012 at 05:25:55PM -0500, Rafael Ávila de Espíndola
wrote:

> >
> > Actually, ICU has several options for how its data is packaged. One
> > option is libraries (which are not sharable between architectures,
> > AFAIK), but another possibility is to use data package (.dat) files,
> > which I believe *could* be shared between the 32- and 64-bit builds.
>

> getting a bit off topic, but since we don't support 10.5 anymore,
> can't we build just a 32 bit plugin container instead of the full
> browser as a universal binary? Would the plugin container need to link
> with ICU too?

The plugin container is only slightly less than the browser. Moreover,
there are 32-bits only macs out there that run 10.6 (the first
generation of intel macs).

Mike

Jean-Marc Desperrier

unread,

Dec 4, 2012, 5:53:23 AM12/4/12

to

Norbert Lindenberg a écrit :
> The ECMAScript Internationalization API [...] provides web applications

> with the ability to format numbers, dates, and times and sort strings
> according to the rules of the language that the application is using,
> not the one that browser and OS default to.

If the OS doesn't support the language, those features are icing on the
cake when there's no cake.

Only a minority of users are multilingual, and the first thing they do
is install the support for the language they need on the OS.
In Internet café and the like, you'll see also people installing
multiple-language support on the OS, so that users get a correct support
for it, with the adequate fonts. In places where the need is real, they
frequently have a set of separate computer with each of them properly
configured for one specific language.

> and even users who aren't
> sometimes have to use browsers that don't support their language

ECMAScript i18n is not going to properly solve that problem

> To implement that, we need good library support, and ICU fits the bill.

ICU is a massive, huge juggernaut. It fits the bill in professional
application that have no download size constraints, and no requirement
to support the low end of installed memory size. OS support is
incredibly more efficient. It does require more work, and has less
guaranties about always getting the same behavior. They both fit
different needs and constraints. The professional application is also a
context where properly formatting the string will be enough for the
language support of a remote user who has installed OS support locally.

I wished for proper ECMAScript i18n support for years, but never that
it'd forcefeed ICU on everybody.

Axel Hecht

unread,

Dec 4, 2012, 6:45:28 AM12/4/12

to

On 04.12.12 11:53, Jean-Marc Desperrier wrote:
> Norbert Lindenberg a écrit :
>> The ECMAScript Internationalization API [...] provides web applications
>> with the ability to format numbers, dates, and times and sort strings
>> according to the rules of the language that the application is using,
>> not the one that browser and OS default to.
>
> If the OS doesn't support the language, those features are icing on the
> cake when there's no cake.
>
> Only a minority of users are multilingual, and the first thing they do
> is install the support for the language they need on the OS.
> In Internet café and the like, you'll see also people installing
> multiple-language support on the OS, so that users get a correct support
> for it, with the adequate fonts. In places where the need is real, they
> frequently have a set of separate computer with each of them properly
> configured for one specific language.

Actually, we just had a bug on adding our own fonts for people that
don't have their OS fully fleshed.

Additionally, quite a few users are multilingual. If I look at en-US
usage, only half of that is within the US, followed by India and Indonesia.

Also, OS support usually means support for the language the OS is
running in, not the language we use for Firefox.

On that note,
https://bugzilla.mozilla.org/buglist.cgi?quicksearch=summary:toLocaleString
is a nice little subset of the issues we have with getting useful data
back from the OS, just took the simplest search I could come up with.

Axel

Justin Wood (Callek)

unread,

Dec 4, 2012, 7:51:21 AM12/4/12

to Rafael Ávila de Espíndola

Rafael Ávila de Espíndola wrote:
>>

>> Actually, ICU has several options for how its data is packaged. One option is libraries (which are not sharable between architectures, AFAIK), but another possibility is to use data package (.dat) files, which I believe *could* be shared between the 32- and 64-bit builds.
>
> getting a bit off topic, but since we don't support 10.5 anymore, can't we build just a 32 bit plugin container instead of the full browser as a universal binary? Would the plugin container need to link with ICU too?
>

Not yet, there are supported Hardware models using 10.6 that *do not*
have 64 bit avaiable. Granted they are on the older end of stuff, but it
does exist.

--
~Justin Wood (Callek)

Henri Sivonen

unread,

Dec 4, 2012, 8:22:19 AM12/4/12

to dev-pl...@lists.mozilla.org

On Tue, Dec 4, 2012 at 6:15 AM, Makoto Kato <m_k...@ga2.so-net.ne.jp> wrote:
> A year ago, I investigated to replace uconv with ICU.
>
> Many (most is IBM code page) converters in ICU isn't compatible with uconv,
> and ICU isn't have all character set converters that we already use in
> uconv.
>
> If using ICU for uconv, we can remove a few table code into uconv, but not
> all.

The main problem uconv has is that its API design is wrong: It doesn’t
have a way to signal the end of the stream, so uconv fails to emit a
REPLACEMENT CHARACTER as the last character of the output if the input
ends with an incomplete multibyte sequence. (FWIW, iconv has the same
API design flaw, AFAICT.)

ICU gets the design of conversion function right, but otherwise ICU’s
converter API surface is a complete overkill. But most importantly,
ICU implements the de jure Unicode specs, IANA labels and a bunch of
encodings that aren’t needed for the Web instead of implementing
what’s Web-compatible and only what’s Web-compatible. Gecko needs
converters and label alias resolution that conforms to the Encoding
Standard (http://encoding.spec.whatwg.org/).

We already have code for Encoding Standard-compliant label handling in
the tree and uconv is being patched to be more and more Encoding
Standard-compliant where it isn’t already.

I think for encoders, decoders and encoding label handling, the end
result should be compliance with the Encoding Standard and no dead
weight in the release binaries arising from compiled-but-unused
Encoding Standard-incompliant ICU converter or label handling
code/data.

(My understanding is that Chrome has a patched fork of ICU converters
in order to have characteristics that are closer to the Encoding
Standard than vanilla ICU. Hence, we shouldn’t expect to be able to
use unpatched system ICU for converters and label handling and get
right behavior. At least not without upstreaming Encoding
Standard-compliant code into ICU and waiting for it to propagate to
system libraries first.)

--
Henri Sivonen
hsiv...@iki.fi
http://hsivonen.iki.fi/

Mike Hommey

unread,

Dec 5, 2012, 9:40:51 AM12/5/12

to Justin Wood (Callek), dev-pl...@lists.mozilla.org

Note that apparently, this is even worse than that. 10.6 didn't enable
64 bits by default on 64 bits capable hardware. (I just figured while
looking at something unrelated on my wife's mac running 10.6.8)

Mike

Justin Wood (Callek)

unread,

Dec 5, 2012, 10:19:25 AM12/5/12

to Mike Hommey

Yes, I meant that as well, since some of these older machines are by
default set like that, and no UI way to change it.

I noticed this as well on the x64 mini's SeaMonkey has that are 10.6 but
my research showed that x64 was unstable on that version of mini we have
so I didn't turn it on :-)

--
~Justin Wood (Callek)

Benjamin Smedberg

unread,

Dec 5, 2012, 11:07:42 AM12/5/12

to Norbert Lindenberg, dev-pl...@lists.mozilla.org

On 12/3/2012 5:39 PM, Norbert Lindenberg wrote:
> OK, just as an introduction, why we're doing this: The ECMAScript Internationalization API (which has been approved by Ecma TC 39 and is on track to become an Ecma standard next week) provides web applications with the ability to format numbers, dates, and times and sort strings according to the rules of the language that the application is using, not the one that browser and OS default to. Many users are multilingual and go to web sites in different languages, and even users who aren't sometimes have to use browsers that don't support their language. The API in addition lets applications tailor the results to their specific needs, e.g., specify the currency with which numbers are displayed, select the date-time components used in a date format, or ignore punctuation in sorting.
>

> To implement that, we need good library support, and ICU fits the bill.

If I may be a skeptic:

Is this feature really worth the costs? Right now xul.dll is about 18MB
on Windows, and the entire install size of Firefox on disk is 91MB.
Assuming that the weight is roughly similar to mac, we'd be talking a
15% increase in on-disk size for a feature which seems on the surface to
be relatively obscure. Maybe it would be better to just not implement
this EMCA specification? What does this feature really buy us in terms
of strategic importance?

How well do the data files compress? Even more important than the
installed size, a 15% increase in download size could have a noticeable
impact on our install conversion rates.

--BDS

Axel Hecht

unread,

Dec 5, 2012, 12:06:03 PM12/5/12

to

I think we're having a more general issue with intl data right now.
Hyphenation dicts are in a similar bucket as collation data, I think.

Like, if chinese isn't hyphenated great, or sorted perfectly, I wouldn't
be able to tell. I can't read the script.

I'd love if we had a way to be sub-optimal the first time someone hits
the problem and then on-demand, download language-specific data later.

Which raises the question, how large would ICU be with mostly-no data?
Or, say, just the data for Chinese, assuming we could actually build
just the UI locale, and chinese was said to be big earlier in this thread.

Axel

Robert Strong

unread,

Dec 5, 2012, 2:16:25 PM12/5/12

to dev-pl...@lists.mozilla.org

On 12/5/2012 8:07 AM, Benjamin Smedberg wrote:
> On 12/3/2012 5:39 PM, Norbert Lindenberg wrote:
>> OK, just as an introduction, why we're doing this: The ECMAScript
>> Internationalization API (which has been approved by Ecma TC 39 and
>> is on track to become an Ecma standard next week) provides web
>> applications with the ability to format numbers, dates, and times and
>> sort strings according to the rules of the language that the
>> application is using, not the one that browser and OS default to.
>> Many users are multilingual and go to web sites in different
>> languages, and even users who aren't sometimes have to use browsers
>> that don't support their language. The API in addition lets
>> applications tailor the results to their specific needs, e.g.,
>> specify the currency with which numbers are displayed, select the
>> date-time components used in a date format, or ignore punctuation in
>> sorting.
>>
>> To implement that, we need good library support, and ICU fits the bill.
> If I may be a skeptic:
>
> Is this feature really worth the costs? Right now xul.dll is about
> 18MB on Windows, and the entire install size of Firefox on disk is 91MB.

Just an FYI: I suspect that you have a staged update since the size on
Windows is actually around 45 MB.

Robert

> Assuming that the weight is roughly similar to mac, we'd be talking a
> 15% increase in on-disk size for a feature which seems on the surface
> to be relatively obscure. Maybe it would be better to just not
> implement this EMCA specification? What does this feature really buy
> us in terms of strategic importance?
>
> How well do the data files compress? Even more important than the
> installed size, a 15% increase in download size could have a
> noticeable impact on our install conversion rates.
>

> --BDS

Gervase Markham

unread,

Dec 5, 2012, 5:54:32 PM12/5/12

to Norbert Lindenberg

This thread has got off-topic (reasonably; the new issues are important)
but back on the original point, if we do add ICU then:

On 03/12/12 11:32, Norbert Lindenberg wrote:
> - Add the required set of ICU source files as separate files to the
> Mozilla repository. The current version of ICU (50.1, C/C++ version)
> has about 5350 source files; stripping out files that aren't needed
> for the internationalization API (but might be needed later) brings
> this to about 3250 files. The complete Mozilla tree before this
> addition has about 70,000 files.

This,

> - Add the source bundles (zip/tar) to the Mozilla repository, and
> then extract the source files as part of the build.

Not this. Having source bundled up like this makes code search and
license compliance, to name but two things, much harder.

Gerv

Asa Dotzler

unread,

Dec 6, 2012, 7:08:10 PM12/6/12

to

On 12/3/2012 2:39 PM, Norbert Lindenberg wrote:
> Well, the first question is what size increase would be acceptable
> given the benefits that ICU provides.

I don't understand what benefits this actually provides. How are users'
online lives improved by this change, either today or in the future?

Adding to the download size costs us in user acquisition so we cannot be
OK with taking on megabytes of additional download size for features of
questionable value.

- A

Axel Hecht

unread,

Dec 6, 2012, 7:33:02 PM12/6/12

to

I think there are folks outside of mozilla that have been evaluating the
app development, and then said "to make metro a compelling ecosystem for
js apps, we need at least X apis for internationalized Y". That's what's
shaping the js i18n api. Nobody ever said that literally, but it's been
inbetween every two lines.

I don't think it serves us good to debate the necessity of the API.

I think that other competitors implement this for the languages they
have on the device, not so much for the languages on the web.

I think this is a challenge for us, and our approach to languages on the
web in general. But I do think it's essential that we take on that
challenge and win.

Axel

Norbert Lindenberg

unread,

Dec 6, 2012, 9:21:32 PM12/6/12

to Asa Dotzler, dev-pl...@lists.mozilla.org, Norbert Lindenberg

The benefit is that the ECMAScript Internationalization API lets developers create a more consistent localized experience for their users, with the correct date, time, and number formats, the culturally appropriate calendar, correct currency symbols, and correct sorting. It also helps avoid latency by removing the need to send lists back to the server for sorting.

The functionality provided by the ECMAScript Language Specification for this purpose is basically useless, because the behavior of its locale-sensitive functions is totally unpredictable. There are a number of JavaScript libraries for number and date formatting, but they require applications to load these libraries and the associated locale data, and their coverage for different calendars, time zones, and currencies is usually very limited (and where there's more, you pay with a bigger download size). As far as I know, there's no JavaScript library that supports localized sorting, so the only solution for applications is to do all sorting on the server.

Google has already implemented the Internationalization API and is shipping it in Chrome (still prefixed in Chrome 23), also by bundling ICU into their downloads.

User acquisition is an important goal of course. Has Mozilla studied how it correlates with download size, e.g., by measuring what percentage of users cancel out of downloads if the size is artificially inflated? Also, when I tried to get download size numbers on Windows, I couldn't get a number for Chrome, but I noticed that there were fewer messages about security risks and fewer buttons to click than for Firefox. What impact does this have on user acquisition?

Norbert

Norbert Lindenberg

unread,

Dec 6, 2012, 9:25:17 PM12/6/12

to Axel Hecht, dev-pl...@lists.mozilla.org

On Dec 6, 2012, at 16:33 , Axel Hecht wrote:

> On 07.12.12 01:08, Asa Dotzler wrote:
>> On 12/3/2012 2:39 PM, Norbert Lindenberg wrote:
>>> Well, the first question is what size increase would be acceptable
>>> given the benefits that ICU provides.
>>
>> I don't understand what benefits this actually provides. How are users'
>> online lives improved by this change, either today or in the future?
>>
>> Adding to the download size costs us in user acquisition so we cannot be
>> OK with taking on megabytes of additional download size for features of
>> questionable value.
>
> I think there are folks outside of mozilla that have been evaluating the app development, and then said "to make metro a compelling ecosystem for js apps, we need at least X apis for internationalized Y". That's what's shaping the js i18n api. Nobody ever said that literally, but it's been inbetween every two lines.

"Metro" seems to imply Microsoft. While Microsoft has actively participated in the development of this API, it was Google that kicked off the project, and the API differs from similar functionality in the Windows 8 JavaScript API.

> I don't think it serves us good to debate the necessity of the API.
>
> I think that other competitors implement this for the languages they have on the device, not so much for the languages on the web.

Google Chrome is bundling ICU, so they're not limited by what's on the device.

> I think this is a challenge for us, and our approach to languages on the web in general. But I do think it's essential that we take on that challenge and win.

Agreed.

Norbert

Robert O'Callahan

unread,

Dec 6, 2012, 9:36:50 PM12/6/12

to Norbert Lindenberg, Axel Hecht, dev-pl...@lists.mozilla.org

How hard would it be to incrementally download data for the locales we need?

It seems that most users won't ever need the collation tables for Chinese,
for example. If we could figure out a way to make them available
just-in-time, that could be a win.

I assume the relevant APIs are synchronous, so this might not be trivial.

Rob
--
Jesus called them together and said, “You know that the rulers of the
Gentiles lord it over them, and their high officials exercise authority
over them. Not so with you. Instead, whoever wants to become great among
you must be your servant, and whoever wants to be first must be your
slave — just
as the Son of Man did not come to be served, but to serve, and to give his
life as a ransom for many.” [Matthew 20:25-28]

Norbert Lindenberg

unread,

Dec 6, 2012, 10:08:00 PM12/6/12

to rob...@ocallahan.org, dev-pl...@lists.mozilla.org

On Dec 6, 2012, at 18:36 , Robert O'Callahan wrote:

> How hard would it be to incrementally download data for the locales we need?
>
> It seems that most users won't ever need the collation tables for Chinese,
> for example. If we could figure out a way to make them available
> just-in-time, that could be a win.
>
> I assume the relevant APIs are synchronous, so this might not be trivial.

This sounds like non-trivial surgery on ICU. Yes, the APIs are synchronous. And we don't know whether the time when a user stumbles onto a Chinese web page that requests Chinese collation is really the best time to download the data - at that time the user may be roaming in China with a U.S. data plan...

Norbert

Chris Peterson

unread,

Dec 6, 2012, 10:11:57 PM12/6/12

to

On 12/6/12 6:36 PM, Robert O'Callahan wrote:
> How hard would it be to incrementally download data for the locales we need?
>
> It seems that most users won't ever need the collation tables for Chinese,
> for example. If we could figure out a way to make them available
> just-in-time, that could be a win.

Can the ICU data be split by locale and bundled with the localized
Firefox builds?

For example, someone who downloads Firefox's Portuguese build is
probably interested in ECMAScript internationalization, but only as it
pertains to Portuguese.

chris

Robert O'Callahan

unread,

Dec 6, 2012, 10:25:57 PM12/6/12

to Norbert Lindenberg, dev-pl...@lists.mozilla.org

On Fri, Dec 7, 2012 at 4:08 PM, Norbert Lindenberg <
mozill...@lindenbergsoftware.com> wrote:

> This sounds like non-trivial surgery on ICU. Yes, the APIs are
> synchronous. And we don't know whether the time when a user stumbles onto a
> Chinese web page that requests Chinese collation is really the best time to
> download the data - at that time the user may be roaming in China with a
> U.S. data plan...
>

It may not be easy, but I think it's worth jumping through some hoops to
avoid the download size hit. (Not to mention the B2G and Android footprint
hit.)

Robert O'Callahan

unread,

Dec 6, 2012, 10:35:10 PM12/6/12

to Norbert Lindenberg, dev-pl...@lists.mozilla.org

On Fri, Dec 7, 2012 at 4:25 PM, Robert O'Callahan <rob...@ocallahan.org>wrote:

> On Fri, Dec 7, 2012 at 4:08 PM, Norbert Lindenberg <
> mozill...@lindenbergsoftware.com> wrote:
>
>> This sounds like non-trivial surgery on ICU. Yes, the APIs are
>> synchronous. And we don't know whether the time when a user stumbles onto a
>> Chinese web page that requests Chinese collation is really the best time to
>> download the data - at that time the user may be roaming in China with a
>> U.S. data plan...
>>
>
> It may not be easy, but I think it's worth jumping through some hoops to
> avoid the download size hit. (Not to mention the B2G and Android footprint
> hit.)

One way this could work (straw-man proposal) is that when a locale is
requested that we support but haven't yet downloaded data for, we simply
start the download and pretend not to support that locale until the
download has happened. When the download completes, show an infobar
explaining what happened and suggesting the user reload the page.

The data-plan issue you mention is real but automated Firefox updates have
the same issue ... we could adopt whatever strategy we already have for
that.

Henri Sivonen

unread,

Dec 7, 2012, 6:49:16 AM12/7/12

to dev-platform

On Fri, Dec 7, 2012 at 5:11 AM, Chris Peterson <cpet...@mozilla.com> wrote:
> Can the ICU data be split by locale and bundled with the localized Firefox
> builds?
>
> For example, someone who downloads Firefox's Portuguese build is probably
> interested in ECMAScript internationalization, but only as it pertains to
> Portuguese.

I (strongly) think we should not make the capabilities of the Web
platform dependent on the UI locale of the browser. Precedent exists
for fallback encoding for unlabeled HTML pages, but I think the
precedent should be taken as a “never do this again” learning
opportunity rather than an excuse to introduce more of that.

Given the alternatives of Web apps downloading their own JS-based
collation, sorting and date formatting code and the platform providing
collation, sorting and date formatting functionality inconsistently
depending on the UI locale, I’d much rather have each Web app download
their own code than to provide an inconsistent platform.

Also, the notion that a user with a given browser UI language only
cares about stuff working for that language is incorrect. There are
plenty of people who use Web apps/sites from outside their home locale
box.

Jonathan Kew

unread,

Dec 7, 2012, 7:33:40 AM12/7/12

to dev-pl...@lists.mozilla.org

On 7/12/12 03:35, Robert O'Callahan wrote:
> On Fri, Dec 7, 2012 at 4:25 PM, Robert O'Callahan <rob...@ocallahan.org>wrote:
>
>> On Fri, Dec 7, 2012 at 4:08 PM, Norbert Lindenberg <
>> mozill...@lindenbergsoftware.com> wrote:
>>
>>> This sounds like non-trivial surgery on ICU. Yes, the APIs are
>>> synchronous. And we don't know whether the time when a user stumbles onto a
>>> Chinese web page that requests Chinese collation is really the best time to
>>> download the data - at that time the user may be roaming in China with a
>>> U.S. data plan...
>>>
>>
>> It may not be easy, but I think it's worth jumping through some hoops to
>> avoid the download size hit. (Not to mention the B2G and Android footprint
>> hit.)
>
>
> One way this could work (straw-man proposal) is that when a locale is
> requested that we support but haven't yet downloaded data for, we simply
> start the download and pretend not to support that locale until the
> download has happened. When the download completes, show an infobar
> explaining what happened and suggesting the user reload the page.

This is somewhat analogous to the solution (proposed and prototyped in
bug 619521 and bug 648548) to provide downloaded-on-demand fonts to
extend the character coverage when the device lacks any preinstalled
fonts for a given script/writing system.

Given that ICU data does not have to be built into a monolithic library,
but can be packaged into multiple data files that are explicitly loaded
at runtime by the application, I don't think it should be overly
difficult to engineer such a solution.

JK

Benjamin Smedberg

unread,

Dec 7, 2012, 9:18:04 AM12/7/12

to Norbert Lindenberg, Asa Dotzler, dev-pl...@lists.mozilla.org

On 12/6/2012 9:21 PM, Norbert Lindenberg wrote:
> The benefit is that the ECMAScript Internationalization API lets developers create a more consistent localized experience for their users, with the correct date, time, and number formats, the culturally appropriate calendar, correct currency symbols, and correct sorting. It also helps avoid latency by removing the need to send lists back to the server for sorting.

Is it possible to self-host this functionality in JS? Would it make more
sense to just build this functionality as a JS library?

This really feels to me like a small feature which isn't worth 20% of
our footprint, and we should be pushing back to find ways to make it
possible to implement as a JS library or build in asynchronous
functionality for dynamic download of locale data as needed, or both.

--BDS

Ehsan Akhgari

unread,

Dec 7, 2012, 11:39:04 AM12/7/12

to Chris Peterson, dev-pl...@lists.mozilla.org

On 2012-12-06 10:11 PM, Chris Peterson wrote:
> For example, someone who downloads Firefox's Portuguese build is
> probably interested in ECMAScript internationalization, but only as it
> pertains to Portuguese.

There is nothing special about a Portuguese build compared to, let's
say, an English US build, or a build of any other languages.

Cheers,
Ehsan

Norbert Lindenberg

unread,

Dec 7, 2012, 4:39:35 PM12/7/12

to Benjamin Smedberg, dev-pl...@lists.mozilla.org, Asa Dotzler, Norbert Lindenberg

On Dec 7, 2012, at 6:18 , Benjamin Smedberg wrote:

> On 12/6/2012 9:21 PM, Norbert Lindenberg wrote:
>> The benefit is that the ECMAScript Internationalization API lets developers create a more consistent localized experience for their users, with the correct date, time, and number formats, the culturally appropriate calendar, correct currency symbols, and correct sorting. It also helps avoid latency by removing the need to send lists back to the server for sorting.
> Is it possible to self-host this functionality in JS? Would it make more sense to just build this functionality as a JS library?

I discussed this in the paragraph that you cut off:

>> [...] There are a number of JavaScript libraries for number and date formatting, but they require applications to load these libraries and the associated locale data, and their coverage for different calendars, time zones, and currencies is usually very limited (and where there's more, you pay with a bigger download size). As far as I know, there's no JavaScript library that supports localized sorting, so the only solution for applications is to do all sorting on the server.

For sorting in particular, while the (rather complex) algorithms could be ported to JavaScript, it also requires big data tables. Download size is an even bigger concern for web applications than for browsers, and even if you could assume a shared CDN and caching in the browser, nobody wants to be the first one to take the hit.

This is really basic infrastructure that for native applications is provided by the OS and for web applications should be provided by the browser.

Norbert

Brian Smith

unread,

Dec 7, 2012, 11:25:15 PM12/7/12

to Jean-Marc Desperrier, dev-pl...@lists.mozilla.org

Jean-Marc Desperrier wrote:
> ICU is a massive, huge juggernaut. It fits the bill in professional
> application that have no download size constraints, and no
> requirement to support the low end of installed memory size. OS
> support is incredibly more efficient.

Jean-Marc, I don't agree with everything you said but I do agree with this part, which I think people might be glossing over too easily. I don't understand the fixation on ICU as *the* solution to this problem. If the EcmaScript specification is so complicated and so unusual in its design that it cannot be easily implemented using widely-deployed system i18n APIs, then IMO that spec is broken. But, I highly suspect that it is quite possible to implement that spec with OS-provided libraries. Window and Mac have extensive internationalization APIs. Why not use them?

I am also unsure about the comments that say imply that even stock ICU is not a good choice for implementing this API. Is it *required* to modify ICU to implement the JS API, or is it just inconvenient or (slightly?) inefficient to use the stock ICU API?

We can ship ICU as a system library on B2G. Some Linux distributions apparently ship ICU as a system library so we may be able to make an ICU system library a runtime prerequisite for Firefox on Linux, or we could just make Firefox on Linux 20% bigger (I don't think Linux users are that particular about the download size).

According to previous messages in this thread, Android has ICU as a system library, that just isn't exposed as an official NDK library. However, I've read that it is possible to dlopen the system libraries and use them; you should just be extra-careful about handling the case where the libraries are different or missing (e.g. renamed). I think it is worth exploring doing this, and falling back to "no JS i18n support" or "we must download a bunch of ICU data" when things fail. Also, Android is similar to an open-source project. Perhaps we could contribute the glue to provide a usable system ICU to NDK applications as a long-term solution. Then the pain and uncertainty for Android would be somewhat bounded in time.

Granted, the above ideas are a lot more work than just using ICU everywhere. I don't know *how* much more work it would be. But, I think that if an engineer came to us and said "Give me one year and I will reduce your download size by 20% in one year" then I hope we'd consider hiring him to do that. So, IMO, the extra work to save download size is justifiable if the feature itself is really a high priority.

We may be able to just take the 20% hit on download size on Mac too without being too concerned. We didn't/aren't implementing stub installer on Mac, right? And, we've been shipping universal binaries on Mac (did we stop that yet). Those two things indicate to me we're less concerned about download size on Mac. If so, then we may be able to get away with just two implementations: One ICU, and one Windows API.

Even if we decided that ICU is the only choice for all platforms, there is a middle ground between "Must block the startup of Firefox during installation on the download of ICU data" and "Delay downloading the ICU data until a web page requests it." We could add an updater for the ICU data that downloads/installs/updates the ICU data into the Firefox profile separately from Firefox installation and update (so the user doesn't have to wait on the ICU data to download to use Firefox during installation or update). Note that we already do (did?) something similar, downloading ~45MB of safe browsing data on first use. (Actually, I think that we could maybe do something like this for WebRTC stuff too, which IIRC is about ~1.5MB of object code.)

Cheers,
Brian

Asa Dotzler

unread,

Dec 8, 2012, 3:40:48 PM12/8/12

to

On 12/6/2012 6:25 PM, Norbert Lindenberg wrote:
> Google Chrome is bundling ICU, so they're not limited by what's on the device.

How much does this bundling add to the Google Chrome download size.
Presumably someone can compile/package Chromium with and without or
compare the size of the build from before the feature landed and after.

- A

Neil Harris

unread,

Dec 8, 2012, 7:22:23 PM12/8/12

to Asa Dotzler, dev-pl...@lists.mozilla.org

Also, has anyone looked at whether the ICU tables would benefit from
special-purpose data compression being applied to them, and then
decompressed at install time (or even run-time, if RAM use is an issue)?

This might be a ideal case for using a compression algorithm with a
compute-intensive compression phase and a fast decompression phase (PPM
and LZMA come to mind), possibly after a custom hand-tuned
pre-processing step to make the job of the statistical compressor
faster, if appropriate.

-- Neil

Gervase Markham

unread,

Dec 10, 2012, 8:12:43 AM12/10/12

to Norbert Lindenberg, Asa Dotzler

On 07/12/12 02:21, Norbert Lindenberg wrote:
> The benefit is that the ECMAScript Internationalization API lets
> developers create a more consistent localized experience for their
> users, with the correct

* date time, and number formats,
* the culturally appropriate calendar,
* correct currency symbols, and
* correct sorting.

Are you able to quantify the relative code/download size impact of these
features? I suspect sorting is the largest by far, but it would be
useful to know if that's correct.

> It also helps avoid latency by removing the need to
> send lists back to the server for sorting.

Of course, you can add per-column data indexes server-side to avoid this
without needing client-side collation information.

> Google has already implemented the Internationalization API and is
> shipping it in Chrome (still prefixed in Chrome 23), also by bundling
> ICU into their downloads.

Did they use ICU anyway before they implemented this API?

> User acquisition is an important goal of course. Has Mozilla studied
> how it correlates with download size, e.g., by measuring what
> percentage of users cancel out of downloads if the size is
> artificially inflated?

I don't think we've ever artificially inflated the size, but as it
varies over time, we may have figures on how that's affected retention
rates. Asa?

Gerv

Gervase Markham

unread,

Dec 10, 2012, 8:14:31 AM12/10/12

to Jonathan Kew

On 07/12/12 12:33, Jonathan Kew wrote:
> This is somewhat analogous to the solution (proposed and prototyped in
> bug 619521 and bug 648548) to provide downloaded-on-demand fonts to
> extend the character coverage when the device lacks any preinstalled
> fonts for a given script/writing system.

And we could use the same for dictionaries. And hyphenation tables with
inappropriate licenses.

Perhaps it's time we had a generic "extra bits downloader/updater
service" that we can plug all of these requirements into.

Gerv

Benjamin Smedberg

unread,

Dec 10, 2012, 11:37:08 AM12/10/12

to Norbert Lindenberg, Asa Dotzler, dev-pl...@lists.mozilla.org

On 12/7/2012 4:39 PM, Norbert Lindenberg wrote:
> On Dec 7, 2012, at 6:18 , Benjamin Smedberg wrote:
>
>> On 12/6/2012 9:21 PM, Norbert Lindenberg wrote:
>>> The benefit is that the ECMAScript Internationalization API lets developers create a more consistent localized experience for their users, with the correct date, time, and number formats, the culturally appropriate calendar, correct currency symbols, and correct sorting. It also helps avoid latency by removing the need to send lists back to the server for sorting.
>> Is it possible to self-host this functionality in JS? Would it make more sense to just build this functionality as a JS library?
> I discussed this in the paragraph that you cut off:

I think you misunderstood. I was suggesting that the functionality we
require be part of the browser and implemented in JS. That would really
only affect the "code" weight, not the "data" weight, but since C++ code
weight affects the number of initial pages in memory, that could still
be significant. And I imagine that a JS library performing those
functions might be significantly smaller than the equivalent binary code
weight.

And the other messages in this thread discuss ways that we could spread
out the download weight of the data tables after the initial
installation to reduce the costs, or even make them download-on-demand.
Does the API allow for the possibility that a sorting algorithm may not
be immediately available?

> This is really basic infrastructure that for native applications is provided by the OS and for web applications should be provided by the browser.

We still need to weigh the cost of this "basic infrastructure" against
its value or the alternatives. Having a client-side sorting algorithm is
basically a performance optimization, though a significant one on slow
networks.

--BDS

Steve Fink

unread,

Dec 10, 2012, 12:37:25 PM12/10/12

to Benjamin Smedberg, Norbert Lindenberg, Asa Dotzler, dev-pl...@lists.mozilla.org

On Mon 10 Dec 2012 08:37:08 AM PST, Benjamin Smedberg wrote:

> On 12/7/2012 4:39 PM, Norbert Lindenberg wrote:
>> This is really basic infrastructure that for native applications is
>> provided by the OS and for web applications should be provided by the
>> browser.
> We still need to weigh the cost of this "basic infrastructure" against
> its value or the alternatives. Having a client-side sorting algorithm
> is basically a performance optimization, though a significant one on
> slow networks.

Unless you're writing one of those offline HTML5 apps I keep hearing
about.

Jean-Marc Desperrier

unread,

Dec 12, 2012, 11:51:18 AM12/12/12

to

Axel Hecht a écrit :
> Additionally, quite a few users are multilingual. If I look at en-US
> usage, only half of that is within the US, followed by India and Indonesia.
> Also, OS support usually means support for the language the OS is
> running in, not the language we use for Firefox.

Bi-lingual, not multi-lingual. If you look at the requirements for
Devanagari input and display, you will see the the user will not go very
far if the OS does not have some strong support for it included.

> On that note,
> https://bugzilla.mozilla.org/buglist.cgi?quicksearch=summary:toLocaleString
> is a nice little subset of the issues we have with getting useful data
> back from the OS, just took the simplest search I could come up with.

From what I've seen, all the issues here are Firefox not correctly
calling the OS, and not the OS being unable to do the job correctly in
the context given.
Maybe just calling ICU once requires a bit less effort, but it comes at
a very significant price. In the case of complex line breaking (Thai),
the team just made the effort to correctly call the OS, and it was just
not that huge.
cf https://bugzilla.mozilla.org/show_bug.cgi?id=336959
https://bugzilla.mozilla.org/show_bug.cgi?id=389520
https://bugzilla.mozilla.org/show_bug.cgi?id=389520

Note that you use ICU, those bugs will be replaced by a list of
inconsistencies between the ICU result and the OS. They will never be
perfectly synchronized.

Jean-Marc Desperrier

unread,

Dec 12, 2012, 12:10:37 PM12/12/12

to

Norbert Lindenberg a écrit :

> This is really basic infrastructure that for native applications is
> provided by the OS and for web applications should be provided by the
> browser.

And why do you set the aim for the web application to be able to do what
the native application next to it will be unable of ?

Knowing that real support also requires input, and there you really
can't do that properly without OS support.

Knowing that in most cases you will be reimplementing in parallel the
support the user has added to the OS so that native application can get it.
And doing it in parallel means never doing it perfectly the same way.

Robert O'Callahan

unread,

Dec 12, 2012, 5:10:33 PM12/12/12

to Jean-Marc Desperrier, dev-pl...@lists.mozilla.org

On Thu, Dec 13, 2012 at 6:10 AM, Jean-Marc Desperrier <jmd...@gmail.com>wrote:

> Knowing that in most cases you will be reimplementing in parallel the
> support the user has added to the OS so that native application can get it.
> And doing it in parallel means never doing it perfectly the same way.

This argument doesn't always work. Often, the OS support for a particular
language is really terrible and we can and should do better even if it
means being inconsistent with the OS. This is certainly true for the case
of font shaping, for example.

Jean-Marc Desperrier

unread,

Dec 13, 2012, 2:58:33 AM12/13/12

to

Robert O'Callahan a écrit :

> Often, the OS support for a particular
> language is really terrible and we can and should do better even if it
> means being inconsistent with the OS. This is certainly true for the case
> of font shaping, for example.

I've seen the references to font shaping in the start of the discussion,
but didn't follow what's happening on that front.
Very probably should. Which OS are that weak on this point ?

Initially Arabic shaping under Windows was very weak in Firefox until
the rewrite that made it properly use Uniscribe.

Gervase Markham

unread,

Dec 20, 2012, 8:42:28 AM12/20/12

to

On 03/12/12 19:32, Norbert Lindenberg wrote:
> As part of implementing the ECMAScript Internationalization API [1,
> 2] in SpiderMonkey, and as an aid in internationalizing other
> functionality in Mozilla products [3], I need to integrate the ICU
> library (International Components for Unicode [4]) into the source
> tree and the build.

Has a conclusion been reached on the next step to take towards
implementing this JS feature (I specifically say that rather than
specifically saying 'integrating ICU')?

Gerv

Norbert Lindenberg

unread,

Dec 20, 2012, 1:12:02 PM12/20/12

to Gervase Markham, dev-pl...@lists.mozilla.org, Norbert Lindenberg

I'm working on a document discussing the issues and possible solutions, including some that haven't come up in the discussion yet. This will hopefully provide a basis for further discussion and decision.

Thanks,
Norbert

> _______________________________________________
> dev-platform mailing list
> dev-pl...@lists.mozilla.org
> https://lists.mozilla.org/listinfo/dev-platform