emscripten locales

217 views
Skip to first unread message

Nicholas Wilson

unread,
Nov 15, 2014, 5:06:54 PM11/15/14
to emscripte...@googlegroups.com
Hi,

Emscripten currently supports only the C locale (and its "POSIX" alias), and doesn't even have "C.UTF-8" (emscripten's "C" locale claims its codeset is ASCII, although actually its functions like printf and strftime are already completely UTF-8-safe). We ship complete translations for several languages in our products, and will soon have to have multi-language support in our emscripten product.

Really, it's just the date and time that we need from C, when it comes down it. If I submitted support for just this, would that be OK, or would you want "complete" support for all the odd number-parsing and monetary functions too?

These are the functions that are locale-dependent:
* atoi/atol/sscanf/printf/strtol and all the other digit-parsing and printing functions
These can be supported using Intl.NumberFormat in browsers to output nice Arabic digits; there isn't a handy ECMAScript function for parsing though, oddly. Getting the currency symbol for strfmon is also possible using NumberFormat.
* strftime
This can be implemented using Intl.DateTimeFormat
* strcoll, isupper, etc (character functions)
Intl.Collator should be helpful here
* nl_langinfo
Some of the things like D_FMT are a right nuisance to scrape from Intl.DateTimeFormat but should probably be possible.

I don't know much about C++ locale stuff, never had need for it (I stick to the C functions that work, plus bits of ICU to replace broken ISO functions).

Maybe struggling to retro-fit the ECMAScript functions into the C ones is just too error-prone though? Another alternative I'd be happy to code up would be an Emscripten-specific set of date/time functions to go in <emscripten.h> that expose the browser's locale functions without munging them into a legacy C interface. You already have to call platform-specific functions on Mac and Windows already, because the C strftime is buggy and incomplete, so having a solid set of emscripten functions might be both easier and preferable to trying to make setlocale+strftime work just like on Linux.

Any thoughts on these possibilities?

Nick

Alon Zakai

unread,
Nov 17, 2014, 12:58:43 PM11/17/14
to emscripte...@googlegroups.com
In general adding such support is great. The only issue is if it is in code we get from upstream musl, in which case we would need to coordinate with them, we don't want to diverge more than we have to.

- Alon


--
You received this message because you are subscribed to the Google Groups "emscripten-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to emscripten-disc...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Jukka Jylänki

unread,
Nov 17, 2014, 4:56:57 PM11/17/14
to emscripte...@googlegroups.com
Agreed - I think any added support for these should first look whether migrating over to musl upstream code to replace an older handwritten one will provide a fix, and if not, it would be preferred to first add the feature to musl and then bring it as an updated musl version to Emscripten.

If not, the best route of action is probably to identify the smallest surface area of api modifications that are needed so that existing musl code that we already build can also recognize the added new locale features. For example, how would sprintf which we implement from upstream musl hook into this? It sounds to me like the browser-specific parts that we port are in the locale info fields, rather than changing existing code algorithms, i.e. that we would not use Intl.DateTimeFormat as a replacement to strftime, but instead figure out how to feed the browser locale info into strftime using the musl machinery.

Nicholas Wilson

unread,
Nov 20, 2014, 3:29:47 AM11/20/14
to emscripte...@googlegroups.com
Hmm. I've had a good look into musl's locale code now. On the plus
side, it's small and tidy. On the down side, it comes as a unit that
we'd need to take in one lump: replace library.js locale functions in
one swoop, from setlocale() to nl_langinfo(). Feeding the browser's
date formatting data into this would messy, because musl's really not
designed to access that data except through its config files. (We'd
have similar difficulty if we tried to replace our custom time
functions (localtime(), gmtime()) with musl ones.)

strftime() is fundamentally an outdated/bad API, so it's good that the
browser doesn't offer it and instead has a high-level API that allows
a proper implementation using UTR#35. On the other hand, it makes
getting the data into musl hard from that side too.

For the time being, how about this very underwhelming suggestion:
https://github.com/NWilson/emscripten/commit/9ebe0270bcb9e6b61ce2a9e35e65a82edd818a16

For our product here, I've implemented some NSDate-like functions in a
JS library that give us tidy high-level access to the
Intl.DateTimeFormat data. I'm sorry, I'm not being very good at
sending code upstream, we're sitting on about 20 JS library files here
proving a C interface to Websockets, Flash applets, window.navigator,
WebCrypto, DOM storage, and more; it might be useful to someone but we
always have too much to do here!

I would like to make an effort at getting the musl locale and date
functions into emscripten, I just don't know when I'll get round to
it.

Nick

Alon Zakai

unread,
Nov 21, 2014, 6:37:56 PM11/21/14
to emscripte...@googlegroups.com
Hmm, what does that commit cause to happen?

- Alon



Nick

Reply all
Reply to author
Forward
0 new messages