On Tue, Oct 23, 2012 at 9:45 AM, Ivan Uemlianin <i...@llaisdy.com> wrote: > On 23/10/2012 06:38, Benoit Chesneau wrote:
>> Beeing private doesn't mean you won't need any help from external >> programmer. And sometimes they are foreigner.
> So the team should write all module, function and variable names, and all > comments and documentation in a language in which none of them are familiar, > because one day they might need help from a foreigner?
yes. Afterall didn't you learn at first maths syntax before to use them?
>> imo programming in that context is like math. It should be >> understandable by all. Now imagine if all maths where localized...
> Chinese (hanzi) is a lot like math, in that meaning and pronunciation are > separated. We all understand "5", however differently we pronounce it. In > exactly the same way, we can understand "列" to mean list* even though we all > pronounce it differently (see en.wiktionary.org/wiki/列 for Mandarin, > Cantonese, Japanese, Korean & Vietnamese pronunciations).
> imho, using hanzi for all module, function and variable names has a lot > going for it. If you had a convention that module and function names should > be two characters long, the results could be very aesthetically pleasing, > e.g.:
> 词典:新生() %% dict:new()
> Best wishes
> Ivan
so hopefully I had the font installed on my machine to read it ... I think you can have a choice anyway, living in your small world without any interactions with others on the code or thinking more globally and use a common syntax.
And this isn't about english here it's about a syntax. It may be unfortunate for some that it is english but that is just history. Just like we are using number syntax from a culture and symbols from another in maths. _______________________________________________ erlang-questions mailing list erlang-questi...@erlang.org http://erlang.org/mailman/listinfo/erlang-questions
> On Tue, Oct 23, 2012 at 10:22 AM, Michael Richter <ttmrich...@gmail.com>
> wrote:
> > On 23 October 2012 16:20, Michael Richter <ttmrich...@gmail.com> wrote:
> >> On 23 October 2012 13:38, Benoit Chesneau <bchesn...@gmail.com> wrote:
> >>> Now imagine if all maths where localized...
> >> They are. The notation used by my students in their maths books is
> >> completely different in many key areas from the notation I used in high
> >> school and university to the point that I can't read them.
> >> Oddly enough the world of maths hasn't exploded at the seams.
> > Note, too, that the words between the actual maths symbols which remain
> the
> > same aren't in English, so even if the maths notation was completely,
> > absolutely, 100% the same, the explanatory prose in between would still
> be
> > unreadable to an English reader. So, too, are the variables often in
> Hanzi
> > instead of the Latin or Greek alphabets.
> Maybe chinese use hanzi, but how do they call whith others around the
> world when it's about sharing their work or collaborate globally ?
> They come back to the common syntax.
First, they don't publish everything they write in international journals.
You know, just like … well, everybody.
Second, they translate at need, often through professional translators or,
at the very least, they translate into (very rough) English themselves and
then pass that on for editing to a native speaker. (Evidence: that's my
side business. I edit translated scientific papers for journal
submissions.)
> I find it rather costly to say less to do the work twice just because
> at first you are using a syntax only used in your part of the world
> even if it's about some billions people.
You know what's even more expensive? Having to do all of your work in a
language you don't have a firm grasp on. Having to slow down your thinking
to speak in L2 instead of going full-tilt in L1. Having to bend to the
will of a linguistic minority to do your work *even if that linguistic
minority will never see your work*.
I ask again: how can so many smart people be so fucking dumb on this point?
-- "Perhaps people don't believe this, but throughout all of the discussions
of entering China our focus has really been what's best for the Chinese
people. It's not been about our revenue or profit or whatnot."
--Sergey Brin, demonstrating the emptiness of the "don't be evil" mantra.
On 23 October 2012 16:26, Benoit Chesneau <bchesn...@gmail.com> wrote:
> And this isn't about english here it's about a syntax. It may be
> unfortunate for some that it is english but that is just history. Just
> like we are using number syntax from a culture and symbols from
> another in maths.
Straw man.
Nobody is saying Erlang's syntax should be changed to match locales.
They're saying that VARIABLE NAMES (lexis) and FUNCTION NAMES (lexis)
should permit people to work in languages they're familiar with.
-- "Perhaps people don't believe this, but throughout all of the discussions
of entering China our focus has really been what's best for the Chinese
people. It's not been about our revenue or profit or whatnot."
--Sergey Brin, demonstrating the emptiness of the "don't be evil" mantra.
> Maybe chinese use hanzi, but how do they call whith others around the
> world when it's about sharing their work or collaborate globally ?
> They come back to the common syntax.
This is political. Just as the British, and then the Americans stopped learning foreign languages and expected everyone else to use English. The Chinese will increasingly expect to be able to use Mandarin whenever abroad.
High-end retail around the world are now hiring more Mandarin-speaking shop assistants. The same will happen with everything else.
> I find it rather costly to say less to do the work twice just because
> at first you are using a syntax only used in your part of the world
> even if it's about some billions people.
I expect they'll be fine.
Best
Ivan
-- ============================================================
Ivan A. Uemlianin PhD
Llaisdy
Speech Technology Research and Development
"hilaritas excessum habere nequit"
(Spinoza, Ethica, IV, XLII)
============================================================
_______________________________________________
erlang-questions mailing list
erlang-questi...@erlang.org
http://erlang.org/mailman/listinfo/erlang-questions
> ... > And this isn't about english here it's about a syntax. It may be > unfortunate for some that it is english but that is just history. Just > like we are using number syntax from a culture and symbols from > another in maths.
The syntax remains the same: erlang
词典:新生() %% erlang syntax
dict:new() %% erlang syntax
Note that erlang terms do not have the same meaning as similar English terms. The erlang term "list" does not have the same meaning as the English term "list". You might call them faux amis.
Best
Ivan
-- ============================================================ Ivan A. Uemlianin PhD Llaisdy Speech Technology Research and Development
"hilaritas excessum habere nequit" (Spinoza, Ethica, IV, XLII) ============================================================ _______________________________________________ erlang-questions mailing list erlang-questi...@erlang.org http://erlang.org/mailman/listinfo/erlang-questions
On Tue, Oct 23, 2012 at 5:33 PM, Ivan Uemlianin <i...@llaisdy.com> wrote:
> On 23/10/2012 09:25, Benoit Chesneau wrote:
>> Maybe chinese use hanzi, but how do they call whith others around the
>> world when it's about sharing their work or collaborate globally ?
>> They come back to the common syntax.
> This is political. Just as the British, and then the Americans stopped
> learning foreign languages and expected everyone else to use English. The
> Chinese will increasingly expect to be able to use Mandarin whenever abroad.
Or not. One thing I've learned from living in a high-population
country with a language spoken nowhere else is that, for the most
part, it doesn't matter that most people can't use English to save
their lives. For the Dutch and the Finns, English might be mandatory.
For the Japanese? Not so much. I expect much the same to be true for
the Chinese. An economy's needs for interfacing linguistically with
the outside world do not grow linearly with its size. More like
according to the cube root of its size. This means there can be an
entire software industry for business apps that doesn't need
anything written in English at all.
On Oct 19, 2012, at 8:06 AM, Richard O'Keefe <o...@cs.otago.ac.nz> wrote:
> If it were still possible to submit EEPs in plain text,
> this would be an EEP. If someone else would like to
> package this up as an EEP and submit it (under their
> name, mine, or both), feel free.
[…] Snip the rest of the EEP proposal.
So now, I've taken the time to read through the proposal. In general I like it since it seems to be a conservative extension to what we already have. Yet, there are two points which I would like to have your opinion on:
Google Go takes two stances differently:
* There is *no* normalization. This means that you can write the same symbol using one codepoint or with two code points combining into the same representation. Of course this is the conservative stance where it is expected that people do not do silly things. But my guess is that it is much easier to handle. Is there a specific reason to pick normalization, apart from the obvious one? I see some similarities to tabs vs spaces for indentation here.
* In Go, identifiers are exported if they begin with a codepoint in class Lu. This is also a very conservative stance since now your programs must use an Lu codepoint for variable names if we just ported that solution to Erlang. But it is quite simple again, and very easy to handle from a parser perspective.
I am not saying that the proposal is bad, mind you. I am just trying to get an opinion on the above two stances. I do feel the Pc class is an elegant way of handling backwards compatibility while still allowing some slack going forward.
Jesper Louis Andersen
Erlang Solutions Ltd., Copenhagen
On Tue, Oct 23, 2012 at 11:20 AM, Jesper Louis Andersen
<jesper.louis.ander...@erlang-solutions.com> wrote:
> Google Go takes two stances differently:
> * There is *no* normalization. This means that you can write the same symbol using one codepoint or with two code points combining into the same representation. Of course this is the conservative stance where it is expected that people do not do silly things. But my guess is that it is much easier to handle. Is there a specific reason to pick normalization, apart from the obvious one? I see some similarities to tabs vs spaces for indentation here.
These are the obious reasons I can think of:
- It may not be easy for people to choose which normalization, or lack
of normalization is used by their preferred editor, or by their input
method. A piece of code not written by me can be in a normalization
state different from the one used by my editor, and to check it I must
examine the text at byte level, or use a tool, and it may be
impossible to establish with certainty.
- It's just crazy to not normalize the source text of a program, in
any language.
- Better: it's crazy to have unicode text not normalized to a known
form, in any application which does more than pass around the text
untouched.
> On 22/10/12 1:08 AM, Yurii Rashkovskii wrote: >> Please excuse my ignorance, but can you name a single good reason for >> non-latin atoms and variable names?
> Not everyone uses the Roman alphabet.
And those of us who *do* use the Latin alphabet may need characters outside the range not only of ASCII but Latin 1.
>> Isn't it a blessing that we all are using a fairly short and commonly >> known alphabet
I already said this, but just for the benefit of certain people who may not have been paying attention:
I live in a country where there are two official languages. One is New Zealand Sign Language. The other is New Zealand Māori. Māori is written using the Roman alphabet, *with* five long vowels written using macrons, lower and upper case. As for that matter is Latin itself in some textbooks.
With me so far? (a) You cannot write a Latin textbook using ISO Latin 1. (b) Quite by coincidence, an official language of my country needs the same non-Latin-1 long vowel letters that Latin does. Next step.
It is the explicit policy of the University that employs me that students *MUST* be allowed to submit written work in Māori. If I were to insist that students submit work in a programming language that cannot accept and correctly process these ten extra *latin* vowels, - I would have the University authorities chasing me - joined by the Human Rights Commissioner - and a host of lawyers and I'd probably be posting my résumé to this list shortly afterwards.
This hasn't happened to anyone here yet because Java and Python and Ada and SWI Prolog and a host of other programming languages *can* handle these characters (whew, just in time), and if anyone made an issue of it I'd whip up a transliteration program, or very possibly say to use Scala.
I mean, this is not a new issue. InterLisp was handling wide characters back in the early 1980s, and Quintus Prolog not long afterwards. Guess what? The sky has _not_ fallen!
> > From my personal point of view, this >> is a sure road to hell.
> Now imagine positions reversed with someone else.
>> How would you read these pieces of code:
>> Довж1 = length(Сп1) >> [Г|Х]
Reading the _letters_ is no great problem; reading the _words_ leaves me scratching my head, but we have the same problem *without* leaving Latin 1:
Tá_Ag_Gach_Duine = amadán
Sad to say, I can only read that with a dictionary. (Well, I know 'amadán', because it crossed over into English.)
> imo programming in that context is like math. It should be understandable by all. Now imagine if all maths where localized...
I find that deeply ironic coming from someone with a name that sounds like French.
Maths *is* localised.
English-speakers write one thousand two hundred and thirty-four and
fifty-six hundredths as 1,234.56; French-speakers as 1.234,56.
English-speakers write the half-open interval from x inclusive to y exclusive
as [x,y); French-speakers as [x,y[.
There is a statistical technique called correspondence analysis which I have
often found extremely valuable, but it was developed in France, and the
English-language books describing it use French mathematical conventions
which I find *extremely* hard to follow.
(Don't get me started on the ways different countries drew electronic circuits.)
On 23/10/2012, at 10:20 PM, Jesper Louis Andersen wrote:
> Google Go takes two stances differently:
> * There is *no* normalization. This means that you can write the same symbol using one codepoint or with two code points combining into the same representation. Of course this is the conservative stance where it is expected that people do not do silly things. But my guess is that it is much easier to handle. Is there a specific reason to pick normalization, apart from the obvious one? I see some similarities to tabs vs spaces for indentation here.
Normalisation is a pain in the πρωκτος. The only thing worse is _not_ doing it. (As it happens, I am planning to rewrite the tokeniser of my Smalltalk system to accept Unicode -- the run-time already does -- and this is one of the issues I've been thinking about.)
I can see four options: (1) say that different encodings of the same text are different (2) leave it undefined whether they are different (3) say that it's someone else's problem (like XML 1.0, which says "Characters in names should be expressed using Normalization Form C" but leaves it to the author to make it so) (4) require normalisation.
The issue is a severely practical one: can two people with different editors edit the same source file? As you sapiently observe, this _is_ very like tabs vs spaces: your editor may think tabs are every 3 columns, but mine thinks they are every 8, and you didn't tell _me_ otherwise. (Again, my Smalltalk system discerns method and class boundaries using indentation, and it has paid off to enforce no-tabs-in-source-files at check-in.) Of the options above, it is only option (4) that makes multiple editors safe to use.
As it happens, I _have_ had the experience of typing exactly what I saw and having it fail to match, so I do not want to see anyone else suffering the same fate.
> * In Go, identifiers are exported if they begin with a codepoint in class Lu. This is also a very conservative stance since now your programs must use an Lu codepoint for variable names if we just ported that solution to Erlang. But it is quite simple again, and very easy to handle from a parser perspective.
Restriction to Lu is not an option for Erlang. We *have* to continue to allow "_" as well, which is a Pc character, not an Lu character. And if we allow _that_ Pc character, why not the others? They aren't used for anything else in Erlang.
We really have to allow Lt as well. It would be surpassing strange if Ljudevit was a variable but Ljudevit was not. There are 31 "Lt" letters in Unicode 6. Of those, 27 are Greek. The other 4 exist for the sake of Croatian (which has an alphabet of 30 letters). As it happens, my maternal grandfather came from a small town not far from Dubrovnik. Do I want to be the one to tell 4.4 million people who look rather like Granddad Covič they can't write a variable name in their own language using their own letters? No, not really.
From a lexical analyser perspective, scanning variable names requires just two character sets: things that can begin a variable and things that can continue one. How those sets are derived really has no effect whatever on how complicated the parsing is. Scanning unquoted atoms is admittedly tricky, but that's entirely down to Erlang's _existing_ treatment of "." and "@"; without those two to worry about we'd just have atom starts and atom continuations and again the derivation of the sets would make no difference to the scanner's complexity.
> As it happens, I _have_ had the experience of typing exactly what I saw and having
> it fail to match, so I do not want to see anyone else suffering the same fate.
I spent a good while last week trying to debug some copy-paste code in a test suite, only to eventually realise that what *rendered* like a space in my tty (and emacs) was in fact an extended character. Let's just say that wasn't the first possibility that came to mind. Of course, *manually retyping* the expression rather than copy-pasting it would have helped me solve it quicker. Come to think of it, that kind of problem has occurred to me a handful of times before.
> +1. Outwith very specific circumstances allowing non-English code is > dumb if for no other reason that it will drastically reduce the pool > of programmers that can be hired to work on your system.
I am truly impressed by the number of omniscient people interested in Erlang. Are there no projects where _enough_ programmers are hired, so that nobody _cares_ whether you can hire more foreign programmers or not? Are there no projects where the programmers have to understand the requirements as well as the code, and the requirements are not in English?
Plenty of other programming languages already allow this (SWI Prolog, for one). The sky has _not_ fallen. People outside North America have proven to be adults capable of making their own decisions about when to use English and when not to.
It's only a couple of days since I was marking an exam in which a dismaying proportion of 3rd year students, the majority allegedly native speakers of English, answered "how did you determine whether" as if it meant "how did you ensure that", so if they don't understand a program because the variable names are in Russian, that may be better than them _thinking_ that they understand the program but don't because it's all in English.
Note that RIGHT NOW programmers can use accented letters in Erlang atoms and variables. RIGHT NOW programmers can write in lots of languages other than English. They can only use the Latin *script* (or rather, a subset of it), but they can use quite a lot of *languages*. The "allowing non-English code" genie is already out of the bottle; that horse has already bolted; you are crying about keeping the lid on milk that's already spilled all over the floor; <insert proverb about futility here>. Latin 1 covers at least the following languages:
French (fr), Spanish (es), Catalan (ca), Basque (eu), Portuguese (pt), Italian (it), Albanian (sq), Rhaeto-Romanic (rm), Dutch (nl), German (de), Danish (da), Swedish (sv), Norwegian (no), Finnish (fi), Faroese (fo), Icelandic (is), Irish (ga), Scottish (gd), and English (en), Afrikaans (af) and Swahili (sw). I'm pretty sure you can add Frisian (fy). You can definitely add Malay (ms), Indonesian (id), Tok Pisin (tpi), and Pijin (cpe).
By what feat of straining do people accept the possibility of Malay in Erlang atoms and variable names but reject the possibility of Māori? Accept Italian but reject Latin?
> And regardless of where one falls on this issue, shouldn't it be > rather low on the priority list anyway?
Who said it _wasn't_?
> I'm thinking way below fixing the stdlib,
proposed long ago
> or fixing records,
proposed long ago
> or perhaps improving text handling.
proposed long ago (and allowing Unicode in atoms is *part* of improving text handling).
On 22/10/2012, at 10:41 PM, Henning Diedrich wrote:
> There is something to that, seriously.
> On Oct 22, 2012, at 8:55 AM, Valentin Micic <valen...@pixie.co.za> wrote:
>> Why can't we use colors to express equations?
Because about 10% of men are colour-blind to some degree.
Two of my colleagues here, for example, are red-green
colour-blind. Whenever a student of my is writing a report
or thesis, I find myself telling them "if you make the
lines in your graph red and green like that, Dr X and Dr Y
won't be able to tell them apart."
In addition to people who can discriminate fewer colours,
there are of course people who can distinguish more than
most people because they have four kinds of cones instead
of the usual three. Colours that we think are the same,
they might think are obviously different.
> On 25 Oct 2012, at 06:20, Richard O'Keefe wrote:
>> As it happens, I _have_ had the experience of typing exactly what I saw and
>> having it fail to match, so I do not want to see anyone else suffering the
>> same fate.
> I spent a good while last week trying to debug some copy-paste code in a test
> suite, only to eventually realise that what *rendered* like a space in my tty
> (and emacs) was in fact an extended character. Let's just say that wasn't the
> first possibility that came to mind. Of course, *manually retyping* the
> expression rather than copy-pasting it would have helped me solve it quicker.
> Come to think of it, that kind of problem has occurred to me a handful of
> times before.
That, is exactly the sort of problems that I fear from introduction of more and
more characters.
On Oct 26, 2012, at 4:51 AM, "Richard O'Keefe" <o...@cs.otago.ac.nz> wrote:
> By what feat of straining do people accept the possibility of > Malay in Erlang atoms and variable names but reject the possibility > of Māori? Accept Italian but reject Latin?
I am happy to learn atoms could be using Umlauts and I will be working very hard to keep myself from using that /in code/.
The first reason being that it is likely to break something outside of Erlang. As mentioned.
The second, new one, that sadly it is too complicated for me to make sure that I'll never have a Latin-1-Unicode trip-up somewhere in my IDEs or tool chains and then all Umlauts suddenly will have been broken, transcoded into something else.
For that reason I have regretted using Umlauts /in comments/ too many times. Someone in a project always breaks it and then nobody cares, there's no time, and it creates pseudo-diffs.
As a third (true) horror I'll add Ulf's pseudo-whitespace experience to the list. I am in agony already over the days lost in the future due to someone inserting a Unicode look-alike into code that I cannot spot until I re-type the entire seemingly cursed code that-should-work-but-magically-doesn't. And have hex-view ready at my finger tips again to inspect awkward code. Thanks so much for the nightmare.
The possibility, thus, appears worthless to this non-English speaker. And for that, my thinking goes why extend the possibilityt and make it more, yes, 'dangerous'.
I appreciate it's a done deal (horse, milk etc). And most of all that there are different circumstances, e.g. in academia obviously.
As an aside, I think I still don't believe what I understood there though: that a programming language could be banned on grounds of political incorrectness?
Is it possible that those rules are wrong and banning a programming language for being, what, culturally biased, is over the top?
On 22/10/2012, at 11:45 PM, Yurii Rashkovskii wrote:
> Also, consider this: there are characters that look the same but encoded differently.
You did read the part of the proposal that said to normalise?
> We don't need to go far: the colon character.
Which is not used in variable names or unquoted atoms, and is therefore outside the scope of the proposal.
> Lets suppose Erlang's "native" colon is U+003A. Now, there is at least three characters that look very similar: U+A789, U+2236, U+05C3. Now you can produce a code that will confuse the hell out of you. Which colon is the right colon?
U+003A = COLON Hard to believe that Erlang was invented in Sweden, isn't it? From the Wikipedia page on 'Colon_(punctuation)':
Word-medial separator
In Finnish and Swedish, the colon can appear inside words in a manner similar to the apostrophe in the English possessive case, connecting a grammatical suffix to an abbreviation or initialism, a special symbol, or a digit (e.g., Finnish USA:n and Swedish USA:s for the genitive case of "USA", Finnish %:ssa for the inessive case of "%", or Finnish 20:een for the illative case of "20").
Abbreviation
In Swedish, the colon is used in contractions, such as S:t for Sankt (Swedish for "Saint"), e.g., in the Stockholm metro station S:t Eriksplan. This can even occur in people's names, for example Antonia Ax:son Johnson (Ax:son forAxelson). The colon was also used to mark abbreviations in early modern English.
U+05C3 = HEBREW PUNCTUATION SOF PASUQ (end-of-"verse" cantillation mark) If people start incorporating portions of the Torah in their Erlang code and notating the whole for chanting, we could be in real trouble. Until then, not.
It's in class 'Po', which the proposal before us doesn't use. It would remain *illegal* in Erlang.
U+2236 = RATIO (mathematics) I've lived my whole educated life not distinguishing this in any way from plain old colon; it's not clear to me what if anything would stop Erlang *actively not caring* which one you use.
It's in class 'Sm', which the proposal before us doesn't use. It would remain *illegal* in Erlang.
U+A789 = MODIFIER LETTER COLON. This one looks tricky. The Wikipedia lists about a dozen languages that use this to indicate tone. UAX#31 says
Modifier letters (General_Category=Lm) are also included in the definition of the syntax classes for identifiers. Modifier letters are often part of natural language orthographies and are useful for making word-like identifiers in formal languages. On the other hand, modifier symbols (General_Category=Sk), which are seldom a part of language orthographies, are excluded from identifiers.
So does the proposal before us require, for the sake of Budu, foo꞉bar as an identifier?
Actually, NO. It's *called* "MODIFIER LETTER COLON", but it is *classified* as a modifier symbol (Sk), and as such, explicitly excluded from Unicode identifiers. It's not currently used for anything in Erlang, so It would remain *illegal* in Erlang.
In short, of the four colon-like characters mentioned, there is one and only one which would be allowed in Erlang by the proposal before us.
This is just FUD.
The "colon problem" is NOT GOING TO HAPPEN. The sky is still not falling.
On Fri, Oct 26, 2012 at 12:00 PM, Richard O'Keefe <o...@cs.otago.ac.nz> wrote:
> On 22/10/2012, at 10:41 PM, Henning Diedrich wrote:
>> There is something to that, seriously.
>> On Oct 22, 2012, at 8:55 AM, Valentin Micic <valen...@pixie.co.za> wrote:
>>> Why can't we use colors to express equations?
> Because about 10% of men are colour-blind to some degree.
[snip]
So easily solved: just develop an assisting interface that generates odors!
My work here is done.
Regards,
Michael Turner
Project Persephone
1-25-33 Takadanobaba
Shinjuku-ku Tokyo 169-0075
(+81) 90-5203-8682
tur...@projectpersephone.org
http://www.projectpersephone.org/
"Love does not consist in gazing at each other, but in looking outward
together in the same direction." -- Antoine de Saint-Exupéry
_______________________________________________
erlang-questions mailing list
erlang-questi...@erlang.org
http://erlang.org/mailman/listinfo/erlang-questions
> This might be off-topic now, but rather than modifying the core of a
> language for something that is of no clear benefit 99.99% of the time,
> one would be better off configuring their editor to handle the case
> you've described.
The problem with Emacs support for symbols &c is that people using
Emacs see one thing and people using any other editor see another.
Using symbols for operators is nice, although in the Haskell café
mailing list and Haskell papers I've noted a frequent need for distfix
operators (like banana brackets).
However, it has little or nothing to do with the topic of this thread.
I am impressed with the precision of the 99.99% figure though.
Empirical measurements are *wonderful*: how was that one obtained?
> Just an idea: If you want to have Unicode in your code. There is always working solution that doesn't require to touch Erlang anyhow. You need to setup a pre-compile hook that will run kind off parse-transform tool (PTT) that will pre-process code and replace Unicode parts of ut to currently allowed set of chars. Of course in CRASH DUMPS you will see replaced atoms, but PTT can generate replacement table so, you can refer to the original Unicode value.
> Yes, it is not very nice build-in Unicode support, but you can implement it right now! :-)
Indeed we can. But let us implement *the same thing*. And to do that,
we need a specification of what Unicode identifiers should actually look
like.
On 22/10/2012, at 11:40 PM, Yurii Rashkovskii wrote:
> Imagine you don't know any English. Would writing code this way help you in any way: mwi34frrc:ehfeiurvhsdsdcd(VariableNameIUnderstand) ?
Except for the use of baStudlyCaps, yes, absolutely, certainly, sure,
but of course! Been there (a Simula program mostly in Swedish, a C program
with everything in an approximation of German; the French Pascal code I was
actually mostly able to understand), done that, outgrew the t-shirt.
*Something* I can understand is better than *nothing* I can understand.
On what planet would it not be?
> On Mon, Oct 22, 2012 at 2:29 PM, Fred Hebert <monon...@ferd.ca> wrote:
>> Telling these people "well just Learn English, that's what I did when I
>> needed to" isn't a valid way of doing things.
> This is very correct. However, I wonder how would anyone learn Erlang
> (for example) without knowing English,
From other programmers who have already learned it.
> I would also argue that today, as Open Source is extremely important,
> one should think twice about promoting coding styles that make code
> sharing much more difficult.
It is ALREADY the case that Erlang facilitates that.
Right now, nothing stops an Erlang programmer writing
comments, strings, atoms, and variables in Irish or Swahili.
Or Klingon, for that matter.
I'm sorry, but this is like a mother of six saying she
doesn't want to try sex.
Some degree of familiarity with the *letters* that a program is
written in does not mean that any part of it is intelligible.
On 23/10/2012, at 4:41 AM, Henning Diedrich wrote:
> But how many programmers do we really know who don't speak English?
For heaven's sake, even *COBOL* allows Unicode in identifiers these days!
(ISO/IEC 1989:20xx CD 1.2 (E) pp 61-62:
"Extended letters additional characters from the
repertoire of ISO/IEC 10646-1 used in formation of
user-defined words"
To be a bit clearer, that draft of the standard did not say that _all_
COBOL implementations _must_ support full Unicode identifiers but that
any COBOL implementation may admit additional Unicode 'letters'.)
The term "speak English" is, empirically, extremely vague.
I once needed to ask for directions in Paris.
The man I spoke to understood me.
Judging from his accent he was a native speaker of French.
I did not understand him.
Because I'd tried, he grabbed me, turned me around, and pointed.
If he'd written it down, I'd probably have got it.
Did I "speak French"?
As the majority of people throughout history have shown,
it is possible to speak a language, even fluently, without
being able to read or write it.
It's hard enough translating requirements into code without having
to translate languages as well!
On 26/10/2012, at 4:28 PM, Henning Diedrich wrote:
> As a third (true) horror I'll add Ulf's pseudo-whitespace experience to the list. I am in agony already over the days lost in the future due to someone inserting a Unicode look-alike into code that I cannot spot until I re-type the entire seemingly cursed code that-should-work-but-magically-doesn't. And have hex-view ready at my finger tips again to inspect awkward code. Thanks so much for the nightmare.
But (a) THERE IS NO PROPOSAL TO ALLOW EXTRA KINDS OF SPACE IN ERLANG! And (b) the problem is not with there _being_ extra kinds of space character, but with their not being _treated as_ space characters. This is why _partial_ support for Unicode is a bad thing.
> As an aside, I think I still don't believe what I understood there though: that a programming language could be banned on grounds of political incorrectness?
Not that it can be *banned*, but that it cannot be *required* for any assessed work. Nobody says I can't use whatever I like. But there may very well be limits on what I can ask students to use. I would rather let that sleeping dog lie while the potential problem goes away.
> Is it possible that those rules are wrong and banning a programming language for being, what, culturally biased, is over the top? > I still hope i read that wrong.
Respect for the principles of Te Tiriti o Waitangi is part of the law of this country. Article the second reads (in a back translation of the Maori text into English): The Queen of England agrees to protect the Chiefs, the subtribes and all the people of New Zealand in the unqualified exercise of their chieftainship over their lands, villages and all their treasures and Te Reo is a these days regarded as taonga of the tangata whenua (a treasure of the [native] people of the land). [Every word in this paragraph counts as 'English' here...]
I'm sufficiently distressed by the continuing replacement of New Zealand English by American that I have strong sympathy with people wanting to keep Māori alive and functioning in all modern contexts, so I _want_ to let students use Māori.
But in addition to that, the University has a clear policy.
Note in particular Principle 1 In recognition of the status of te reo Māori as a taonga protected under the Treaty of Waitangi, and within the spirit of the Māori Language Act 1987, the University of Otago will endorse the right of students and staff to use te reo Māori, including for assessment. ^^^^^^^^^^^^^^^^^^^^^^^^
I really don't want to ask for an official decision lest the answer be "no".
I would expect any country with one or more minority languages whose speakers got a sufficient degree of legal protection to have similar policies.