XL case insensitivity

Henry

unread,

Mar 6, 2011, 5:26:53 PM3/6/11

to XL programming language and runtime

I have been writing some code in XL2 and hit an unexpected problem --
idintifiers appear to be case-insensitive. I looked around the XLR
code base which supports UTF8 but is also case-insensitive although it
is not quite clear to be how

while (isalnum(c) || c == '_' || IS_UTF8_FIRST(c) ||
IS_UTF8_NEXT(c))
{
if (c == '_')
IGNORE_CHAR(c);
else
NEXT_LOWER_CHAR(c);
}

in scanner.cpp works with UTF8 given than NEXT_LOWER_CHAR calls
tolower rather than towlower but then I am not quite sure what level
of character differentiation is intended in the identifiers.

For implementation of complex simulation codes with symbolic rather
than descriptive identifiers e.g. t for time and T for temperature,
case-sensitivity is really useful or better still support fully
differentiated UTF8 characters as in Fortress so that e.g. Greek
characters and mathematical symbols can be used.

What is the rationale for limiting the distinction between identifiers
in XL2 and XLR? Would it be a problem to support case-sensitivity or
even full support for UTF8 character strings as identifiers?

Thanks

Henry

Christophe de Dinechin

unread,

Mar 6, 2011, 7:13:00 PM3/6/11

to xlr-...@googlegroups.com

You are right that the tolower code is wrong. UTF8 was added a bit as an afterthought. The intent is to support UTF8 identifiers. XLR is a bit closer to that objective compared to XL2.

In the native XL2 compiler, name normalization occurs in XLNormalize, defined in xl.parser.tree.xl. It should be easy enough to add an option here to keep names as they show.

The idea of name normalization was to eliminate "style wars" that have plagued C++. In other words, you are free to write openFile or open_file or Open_File or whatever style suits you best. Better yet, the library doesn't impose its style on your code.

I understand the problem with simulation code. Someone else already made that point. So it makes a lot of sense to make this an option.

Regards
Christophe

> --
> You received this message because you are subscribed to the Google Groups "XL programming language and runtime" group.
> To post to this group, send email to xlr-...@googlegroups.com.
> To unsubscribe from this group, send email to xlr-talk+u...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/xlr-talk?hl=en.
>

Henry Weller

unread,

Mar 7, 2011, 9:32:17 AM3/7/11

to xlr-...@googlegroups.com

> You are right that the tolower code is wrong. UTF8 was added a bit as an
> afterthought. The intent is to support UTF8 identifiers.

Excellent! But with this flexibility is it consistent then to remove case
sensitivity? Would Greek letters also be handled case insensitive?

> XLR is a bit closer to that objective compared to XL2. In the native XL2
> compiler, name normalization occurs in XLNormalize, defined in
> xl.parser.tree.xl. It should be easy enough to add an option here to keep
> names as they show.

Would this option be supplied on the compiler command line or as an attribute of
the module? If the former how would modules developed under different
case-sensitivity options be used together? If the latter would you support an
identifier import rename option as in EuLisp and other module systems?

> The idea of name normalization was to eliminate "style wars" that have plagued
> C++. In other words, you are free to write openFile or open_file or Open_File
> or whatever style suits you best. Better yet, the library doesn't impose its
> style on your code.

OK, I see where your are coming from. However, XL is designed to be very
flexible so people will develop code in different ways and styles which is part
of the attraction of the language so it doesn't seem consistent to me to limit
the flexibility in choosing identifier naming conventions which can help code
comprehension. I am personally against the use of underscores is names and
would much prefer hyphen as in Lisp but this would dictate a need for spaces
between operators in the syntax (which I would not be against), however, I am
now finding that _ is perhaps the best way to denote a sub-script in a symbolic
identifier. Perhaps UTF8 would be better all round but perhaps not if case
sensitivity is enforced.

> I understand the problem with simulation code. Someone else already made that
> point. So it makes a lot of sense to make this an option.

That sounds fine if it could be done in such a way that code developed under the
two options can still inter-operate.