metalua lexer locale

11 views
Skip to first unread message

David Manura

unread,
Nov 26, 2011, 4:27:35 PM11/26/11
to met...@googlegroups.com
I see a `os.setlocale('C')` call was added to the function loader of
lexer.lua. In theory, this is the worst type of side-effect. It
leaks across Lua states and likely OS threads as well and most code
depends on it. But it may be warranted. Personally, I stay with the
C locale, which makes these bugs a non-issue.

I've added some background notes on locales to the wiki:
http://lua-users.org/wiki/LuaLocales .

Ways to avoid the setlocale in lexer.lua:

(1) Replace the affected tonumber, pattern match char sets (%d/%s),
etc. with more portable versions. This in fact is what the 5.2 lexer
does with things like lctype.c / `lisspace`. I don't know what the
performance impact is in pure Lua. It may not even be worth it--e.g.
it's a non-issue under the usual C locale. Moreover, even under the C
locale, tonumber is not guaranteed to work in 5.1 when parsing 5.2 hex
floats but might do so anyway (in MSVC it fails), so maybe a pure Lua
version of this is warranted as a fallback to ensure 5.1 can always
lex 5.2 source.

(2) Move the `setlocale` call from the module loader into the actual
lexer construction call. This may be more reliable in the sense that
simply loading the module but not using it doesn't cause a
side-effect, and you are free to do whatever you want with the locale
between loading and using the module. This may still assume that
noone changes the locale between reading individual tokens of course.
On the other hand, calling setlocale might not be thread safe.

(3) Don't invoke `setlocale` yourself but rather check the current
locale and error out if its not the C locale or at least not some
locale compatible with lexer.lua. This is probably the most polite
approach of all. It relies on the caller appropriately dealing with
locales, and rather than silently changing the locale it raises an
error if the locale is wrong during load.

Note also that one change in Lua 5.2 is that during lexing,
identifiers should not have locale dependent characters (or, for that
matter, any dependence on locales due to lctype.c). This is good
practice for 5.1 as well, and 5.1 programs that also need to run in
5.2 should follow this.

Reply all
Reply to author
Forward
0 new messages