column numbers, i like
I assumed removing pmod syntax was the first step in a complete deprecation process.
>> On 17/10/2012, at 2:51 AM, Patrik Nyblom wrote:
>>>
>>> The OTP Technical board decisions from last Thursday are now published on the erlang.org website, which means that the answers to some questions about changes in R16 are finally officially answered.
>
> Great initiative to publish these decisions and explain them!
>
> On Wed, Oct 17, 2012 at 1:22 AM, Richard O'Keefe <o...@cs.otago.ac.nz> wrote:
>>
>> "Variable names will continue to be limited to Latin characters."
>>
>> I hope that means "for this release."
>
> That's an interesting problem. Variable names are defined as starting
> with an upper case letter, but the only scripts that I know of that
> have those are roman, greek, cyrillic and armenian.
"Variables are defined as starting with an upper case letter"
isn't exactly true, unless you do what Quintus did back in the
80s and redefine "_" from being a 'punctuation connector' to
an 'upper case letter'. Quintus did that for CJK, so that 日付
was an unquoted atom and _日付 was a Prolog variable.
This was apparently acceptable, and the same practice is followed in
other Prologs. I see no reason why it would not work for Erlang,
where _1 is a perfectly good variable.
> So would variables
> with names in other scripts be forced to start with an uppercase latin
> letter? We might just as well have them start with '§' or something,
> and drop the capitalization rule.
It is _already_ the case that Erlang variables are not forced to
start with an uppercase latin letter, but may start with "_".
(The section sign is not allowed in Unicode identifiers.)
Having Latin, Greek, Cryllic, and Armenian scripts already covers a
lot of languages.
Here I am in New Zealand. There are two official languages in this country.
English isn't actually one of them, although it is in practice the language
of government, commerce, and practically everything. One of the two
official languages is New Zealand Sign Language. The other is Māori. Note
the little bar over the "a"? It's called a macron. And a-with-macron is
not a Latin-1 character. The city I'm living in has the name Dunedin in
English and Ōtepoti in Māori. (Note the macron on the "O".) The organisation
I work for is the University of Otago/Te Whare Wānanga o Otāgo. Notice a
pattern? Do I look forward to being able to tell those of my students who
are Māori that it is now possible for them to use words of their own
language as Erlang? You bet. Did I mention that although the language of
_instruction_ in this University is English, by official decree students may
submit assignments and answers to examination questions in Māori? If I ask
them to write programs in Erlang (which I did last year and will again next
year), am I actually _allowed_ to do this if they cannot use Māori words as
freely as English ones? I'd rather not find out, thanks. I'd definitely
rather not be told to require C, C++, Java, or Ada, which _do_ allow
non-Latin-1 letters in identifiers.
Unicode Standard Annex #31 (UAX 31),
'Unicode identifier and Pattern Syntax',
http://www.unicode.org/reports/tr31/
says how to handle identifiers in Unicode.
In particular, Coptic, Deseret, and Glagolitic are in table 4:
"candidate characters for exclusion from identifiers".
Section 5 recommends NFC for case-sensitive identifiers.
_ is not an ID_Start character, but if C can have that extension, so can we.
As for '§', SECTION SIGN is _not_ allowed in identifiers.
Ada 2012 identifier syntax (section 2.3) is closely based on Unicode
(technically, on ISO 10646). Let's see what they say:
identifier ::= identifier_start {identifier_start | identifier_extend}
identifier_start ::= letter_uppercase | letter_lowercase |
letter_titlecase | letter_modifier |
letter_other| number_letter
identifier_extend ::= mark_non_spacing | mark_spacing_combining |
number_decimal | punctuation_connector
An identifier shall not contain two consecutive characters in
category punctuation_connector or end with a character in that category.
Ada's restrictions on underscores (the one Latin-1 character that is a
punctuation_connector) have always been idiosyncratic. This is otherwise
pretty close to what UAX31 says.
Ada is case-insensitive, so they don't greatly care about which letters
are upper+title-case and which are not. So adapt the rules like this:
unquoted_atom ::= atom_start identifier_continuation*
atom_start ::= letter_lowercase | letter_modifier | letter_other |
number_letter | "." % the last is Erlang-specific
variable ::= variable_start identifier_continuation*
variable_start ::= letter_uppercase | letter_titlecase |
punctuation_connector % this includes "_" and some others
identifier_continuation ::= atom_start | variable_start |
number_decimal | mark_non_spacing | mark_spacing_combining |
| "@" % this is Erlang-specific
> My guess is that atoms will be allowed to contain unicode just so that
> atom_to_list and list_to_atom can still be used, but usage of such
> atoms in source code will be discouraged because these will have to be
> written as quoted.
Why _should_ Unicode identifiers that would be legal identifiers in Ada
but do not begin with an upper case letter, title case letter, or
connector punctuation mark require quoting? You might as well require
atoms containing the letter 'z' to be quoted.
On Wed, Oct 17, 2012 at 5:23 PM, Loïc Hoguin <es...@ninenines.eu> wrote:
> I'm sure you guys use it a lot in Elixir, and that'd surely slow Elixir
> programs down, but that'd also probably make it a good test case for doing
> them right.
And maybe one day a feature you use in Cowboy to great effect will be
decried by a small minority otherwise unbothered by such feature -- a
feature whose existence has no bearing on their everyday use -- on
which day I will make sure not to suggest you to instead do it the
"right" way.
-Devin
Cowboy doesn't use undocumented features. That's one of the biggest
Cowboy claim. It protects both the project and its users from bad surprises.
--
Loïc Hoguin
Erlang Cowboy
Nine Nines
http://ninenines.eu
That's a great suggestion! An alternative to a global setting could be
to let the error_handler setting get inherited by child processes from
their parents. This way just a few roots need to get configured
manually and it also leaves open the possibility to have different
values for different processes.
best regards,
Vlad
> how their implementation somehow limited future optimizations that
> could have been possible. Does it mean their support for Pmod
> utilization means further optimizations will be cancelled?
Well, the apply instruction will need to handle it - there is an extra
branch that cannot be removed :(
Missing the point, but okay. The point is: not removing the feature
affects nobody different today. You can continue to ignore it forever
and it will never make a difference to your daily life. Removing the
feature adversely affects some.
Instead of removing features, why don't we let the OTP team focus on
adding features that will make the lives of everyday Erlang
programmers better? Like frames. :)
-Devin
Didn't miss it, just ignored it. The point makes no sense.
Why would I want something that I consider bad practice to stay in a
language I want to improve?
> Instead of removing features, why don't we let the OTP team focus on
> adding features that will make the lives of everyday Erlang
> programmers better? Like frames. :)
Or, why not remove the bad features to allow them to spend more time on
adding good features instead of forever maintaining bad ones?
Plus that's not removing a feature, that's removing an experiment gone bad.
--
Loïc Hoguin
Erlang Cowboy
Nine Nines
http://ninenines.eu
I must say, I'm still wondering why you (and others) consider
parameterized modules an "experiment gone bad". Most of the criticism
I've seen has been about the implementation (which was just intended as
a proof of concept), i.e., tuples instead of a separate datatype, no
additional support in tracing makes debugging hard, and so on. Those
things could be fixed with a proper implementation. But the current OTP
proposal is to keep only the crappy part of it, which makes very little
sense to me. And if you dislike parameterized modules, do you dislike
funs as well? They're both just closures.
/Richard
Yes, we dislike the implementation.
I said "gone bad" as in "got used in real products", it evolved from an
honest experimentation into something that we have to deal with
depending on what dependencies we need.
I think a proper implementation would be great, I am especially
intrigued by functors which from what I understand would be made
possible with a proper pmod implementation.
--
Loïc Hoguin
Erlang Cowboy
Nine Nines
http://ninenines.eu
> I must say, I'm still wondering why you (and others) consider
> parameterized modules an "experiment gone bad". Most of the criticism
> I've seen has been about the implementationYes, we dislike the implementation.
I'm speaking for "you (and others)", the people who consider the current
parameterized modules to be wrong.
You weren't included, obviously.
Those who feel misrepresented can interject. There's a pretty good
consensus in that camp AFAIK, though.
When I say "proper implementation", I simply mean a separate opaque
datatype (much like funs) for module instances, and support throughout
the ecosystem for tracing and debugging. Apart from that, I think the
existing syntax and semantics of parameterized modules is not lacking
anything (beyond some simple additions like static-declared functions).
Could you be more exact with what you refer to by "functors", because
that's a quite fuzzy concept. ML functors, for example, are very static
in nature and are more akin to C++ templates in the way they are
expanded at compile time. I certainly don't think that is desirable in a
language like Erlang.
/Richard
Opaque, no discrepencies between function arities, etc. Fred listed
quite well the issues, although the two I just cited are my bigger concerns.
I believe we are on the same tracks there.
> Could you be more exact with what you refer to by "functors", because
> that's a quite fuzzy concept. ML functors, for example, are very static
> in nature and are more akin to C++ templates in the way they are
> expanded at compile time. I certainly don't think that is desirable in a
> language like Erlang.
I'll have to get back to you on that after I get re-explained how it
would work, it's been a while and things have gotten vague about it,
only the good feeling remained. :)
--
Loïc Hoguin
Erlang Cowboy
Nine Nines
http://ninenines.eu
The underscore as a variable prefix is not _that_ special for the
compiler. Again, this is no new thing. Prolog did the same thing.
You could perfectly well have a policy that
_<Latin 1 letter or digit>...
compiler does not warn about singleton use
_<extended letter>...
compiler DOES warn about singleton use
__<extended letter>...
compiler does NOT warn about singleton use
In any case, if you adopt Unicode identifier syntax, a whole
bunch of extra characters become available:
U+203F ‿
U+FE34 ︴
U+FE4D ﹍
U+FE4E ﹎
U+FE4F ﹏
among them. We could perfectly well say that an identifier beginning
with a Pc character is a variable, which would generalise the current
"_" rule, and the compiler would _not_ be treating those specially.
>
> Anyway, I find here (http://www.unicode.org/reports/tr31/) that "Each
> programming language can define its identifier syntax as relative to
> the Unicode identifier syntax,
Right. Already noted, which is why we can continue to allow '@' and '.'
> The actual point of my note is this: there must be a way to make a
> difference between atoms and variables.
We have one right now:
A variable begins with a character that is in
(Lu | Lt | Pc) & Latin1.
All we do is remove the restriction to Latin1. Done!
> Some languages add a marker
> before atoms, some before variables, and Erlang uses the
> capitalization of the first letter. With full unicode, there is no
> clear way to use that rule anymore,
Yes there is. Keep the existing rule exactly as it stands except
for removing the restriction to Latin1. Using a script with case?
(*LOTS* of languages.) You can use an Lu or Lt character. Using
a script without? You can use any Pc character you like. "‿" is
kind of cute.
> so I observed that an alternative could be to do like other languages do.
We don't _need_ an alternative. Once we break the Latin1 boundary
we have more underscore-like characters to work with, and if you
want to think of them as funny prefixes, good luck to you and
‿日付 will work, but if you want to think of them as differently
shaped underscores, then ‿日付‿今日 will work too (if anyone _wants_ it
to, which is of course another matter).
if you want
Instead of removing features, why don't we let the OTP team focus on
adding features that will make the lives of everyday Erlang
programmers better? Like frames. :)
On Thu, Oct 18, 2012 at 6:40 PM, Michael Richter <ttmri...@gmail.com> wrote:
> Because this is how you get kitchen sink language ecosystems like C++, Perl,
> Java, C#, etc.
All wildly successful ecosystems and languages way more popular than Erlang.
-Devin
Forces:
(1) Support for Unicode continues to increase, with
minimal source code support about to arrive.
(2) Unicode variable names and unquoted atoms are not
here yet, so now is the time to settle on a design.
(3) They will need to come. There may be legal or
institutional reasons why unicode-capable languages
are required. Some people just want to use their
own language and script. Erlang's strength in
network applications means that being able to
represent Internationalized Domain Names as unquoted
atoms would be just as much of a convenience as
being able to represent ASCII domain names like
www.example.com (which needs no quotes in Erlang) is.
(4) There is a framework for Unicode identifiers in
Unicode standard annex 31 (UAX#31), and several
programming languages, including Ada, Java,
C++, C, C#, Javascript, and Python (section 2.3 of
http://docs.python.org/release/3.1.5/reference/lexical_analysis.html
and see also http://www.python.org/dev/peps/pep-3131/
(5) Existing Erlang identifiers should remain valid,
including ones containing "@" and ".".
(6) Existing Erlang support features, such as ignoring
names of the form [_][a-zA-Z0-9_]* when reporting
singleton variables, should not be broken.
(7) We should not "steal" any characters to use as "magic
markers" for variables because they might be needed for
other purposes. A good (bad) example of this is "?", which
could be used for several things if it were not used for macros.
Reference
Names of sets of characters, XID_Start, XID_Continue, Lu, Lt, Lo, Pc,
Other_Id_Start, are drawn from Unicode and UAX#31.
Lu = upper case letters
Lt = title case letters
Pc = connector punctuators, including the low line (_) and
a number of other characters like undertie (‿).
Other_Id_Start = script capital p, estimated symbol,
katakana-hiragana voiced sound mark, and
katakana-hiragana semi-voiced sound mark.
Variables
variable ::= var_start var_continue*
var_start ::= XID_Start ∩ (Lu ∪ Lt ∪ Pc ∪ Other_Id_Start)
var_continue ::= XID_Continue U "@"
The choice of XID here follows Python. It ensures that the normalisation
of a variable is still a variable. In fact Unicode variables should be
normalised. Unicode has enough look-alike characters that we cannot hope
for "look the same <=> are the same" to be true, but we should go _some_
way in that direction.
Variables in scripts that do not distinguish letter case have to
begin with _some_ special character to ensure that they are not
mistaken for unquoted atoms. There are 10 Pc characters in the Basic
Multilingual Plane. The Erlang parser treats a variable beginning
with an underscore specially: there will be no complaint if it is a
singleton. There are 9 other Pc characters for which this special
treatment is not applied. Of course, someone might be using fonts
that do include say Arabic letters but not say the undertie. We can
deal with that by revising the underscore rule.
Variable does not begin with a Pc character =>
should not be a singleton.
Variable is just a Pc character and nothing else =>
is a wild card.
Variable begins with a Pc character followed by a
Latin-1 character =>
may be a singleton.
Variable begins with a Pc character following by
a character outside the Latin-1 range =>
should not be a singleton.
Thus ‿ is a wild-card, 隠者 is an atom, _隠者 should not be
a singleton, but __隠者 _may_ be a singleton. This rule is a
consistent generalisation of the existing rule.
Unquoted atoms
unquoted_atom ::= atom_start atom_continue
atom_start ::= XID_Start \ (Lu ∪ Lt ∪ Lo ∪ Pc)
| "." (Ll ∪ Lo)
atom_continue ::= XID_Continue U "@"
| "." (Ll ∪ Lo)
Again the choice of XID follows Python, and ensures that the
normalisation of an unquoted atom is still an unquoted atom.
Unquoted atoms should be normalised.
The details of Erlang unquoted atoms are somewhat subtle; I have
checked my understanding experimentally.
Keywords
Keywords have the form of unquoted atoms. No new keywords are
introduced.
Specifics
- Any Python identifier or keyword is
an Erlang variable or unquoted atom or keyword.
- @ signs may occur freely in variables and unquoted atoms except as the
first character, as now.
- dots may not be followed by capital letters, digits, or underscores,
as now.
- I am not sure whether modifier letters should be allowed after a dot.
- I am not sure what to do with the Other_ID_Start characters.
Script capital p _looks_ like a capital p and even has "capital" in
its name. All other "* SCRIPT CAPITAL *" characters are upper case
letters. Surely it should be allowed to start a variable.
The estimated sign looks like an enlarged lower case e; other symbols
that look like letters are classified as letters. You'd expect this
to begin an atom. As for the Katakana-Hiragana voicing marks, I have
no intuition whatever. Assigning the whole group to atoms seems
safest.
- All existing variable names and unquoted atoms remain legal, and no
new variable or atom forms using only Latin-1 characters have been
introduced.
Trouble spot
If you don't remove unused/experimental features then your language will slowly accumulate crud and eventually become a right mess of "features". I haven't got quite as far as removing something every time you add something but close. I think the problem is that people seem to misunderstand the meaning of "experimental" and expect them to remain even if the experiment fails.
Robert
I see that there are two (at least) problems here:
1. People do not realise, or want to realise, that experiments are just experiments and they might be deemed to have failed and be removed. Even if I love them enough other people might not so they disappear. Going hard-line here I would say "tough, that's YOUR problem, we told you it was an experiment".
2. In this case I think they did the wrong thing. However you look at it pmods were the real "thing" being tested and the tuple modules were were just a quick hack to a allow you to test the idea; they weren't, or shouldn't have been considered to have been, a separate thing in their own right. So the choice should have been: pmods yes or no? If no then throw them AND their hack test implementation away. If yes, then keep them and do them properly by creating an opaque data type with a set of well defined ways of interacting with it, and throw away the hack test implementation.
This is how it was done with funs, they originated as a tuple with undefined "stuff" in them and later they were changed to an opaque data type. I don't know if anyone ever seriously tried to use the internal structure of the tuple but if they did they learned the hard way. IMAO this is what should have been with pmods.
I will admit that I would probably not be popular if I did this. But I do believe that this is what you have to do in the long run. Some people on the bleeding edge will get burned, but that is the price you pay for being there. Otherwise you do accumulate crud in your language. And I do see tuple modules as crud while pmods done properly wouldn't have been.
Is there anyone I have managed NOT to rile?
Robert
http://www.erlang.org/eeps/eep-0040.html
It would be nice if Richard could check it and see if anything
got lost in translation. Especially the odd line "Trouble Spot"
is not supposed to be there, I guess, but I kept it.
/ Raimo Niskanen, as EEP editor
--
/ Raimo Niskanen, Erlang/OTP, Ericsson AB
Also non-latin users who don't know English should be able to use
atoms and variables they understand.
On Mon, Oct 22, 2012 at 7:08 AM, Yurii Rashkovskii <yra...@gmail.com> wrote:
> Richard,
>
> Please excuse my ignorance, but can you name a single good reason for
> non-latin atoms and variable names? From my personal point of view, this is
> a sure road to hell.
I mean, realistically, do we need unfamiliar characters to start
appearing in somebody's code who thinks it's a smart idea to do so?
What happened to the principle of the least astonishment?
Non-latin users who don't know English somehow manage to use english
function names and keywords. It's a slippery slope to suggest we need
to cater to everybody's wish to program in their native language.
Isn't English a lingua franca of software development that allows us
to communicate better? If it wasn't, think about how emptier the space
would have been.
Please excuse my ignorance, but can you name a single good reason for non-latin atoms and variable names? From my personal point of view, this is a sure road to hell.
If it were still possible to submit EEPs in plain text,
this would be an EEP. If someone else would like to
package this up as an EEP and submit it (under their
name, mine, or both), feel free.
Forces:
(1) Support for Unicode continues to increase, with
minimal source code support about to arrive.
(2) Unicode variable names and unquoted atoms are not
here yet, so now is the time to settle on a design.
Richard,
Please excuse my ignorance, but can you name a single good reason for non-latin atoms and variable names? From my personal point of view, this is a sure road to hell.
On 22 October 2012 13:08, Yurii Rashkovskii <yra...@gmail.com> wrote:Please excuse my ignorance, but can you name a single good reason for non-latin atoms and variable names? From my personal point of view, this is a sure road to hell.
Because people frequently like to work in their own language instead of a foreign language they ill understand?
Jesus! How can so many smart people be so god-damned dumb over this issue?
So you're recommending them to use function names they don't understand but name variables in a way they will understand and nobody else will?
Could it be possible that I am able to communicate with you because of my passion for programming, which indirectly made me learn English so that I can understand it better?
On 22 October 2012 14:54, Yurii Rashkovskii <yra...@gmail.com> wrote:So you're recommending them to use function names they don't understand but name variables in a way they will understand and nobody else will?
I guess you could read that as my recommendation (if you can't read, that is).
Could it be possible that I am able to communicate with you because of my passion for programming, which indirectly made me learn English so that I can understand it better?
If this is what you consider "communication" to be, Yurii, then I'd say that your 7-year-old's passion for programming didn't help you one iota.
So you're recommending them to use function names they don't understand but name variables in a way they will understand and nobody else will?
I guess you could read that as my recommendation (if you can't read, that is).
This was a question. How are they supposed to be "using their own language" if 100% of Erlang itself is in English?
If this is what you consider "communication" to be, Yurii, then I'd say that your 7-year-old's passion for programming didn't help you one iota.
This is rather inappropriate, although expectable from somebody who starts off calling his interlocutor stupid ;)
How can so many smart people be so god-damned dumb over this issue?
And regardless of where one falls on this issue, shouldn't it be
rather low on the priority list anyway? I'm thinking way below fixing
the stdlib, or fixing records, or perhaps improving text handling.
Can't stress this enough.
--
Loïc Hoguin
Erlang Cowboy
Nine Nines
http://ninenines.eu