〈Problems of Symbol Congestion in Computer Languages (ASCII Jam;
Unicode; Fortress)〉
http://xahlee.org/comp/comp_lang_unicode.html
--------------------------------------------------
Problems of Symbol Congestion in Computer Languages (ASCII Jam;
Unicode; Fortress)
Xah Lee, 2011-02-05, 2011-02-15
Vast majority of computer languages use ASCII as its character set.
This means, it jams multitude of operators into about 20 symbols.
Often, a symbol has multiple meanings depending on contex. Also, a
sequence of chars are used as a single symbol as a workaround for lack
of symbols. Even for languages that use Unicode as its char set (e.g.
Java, XML), often still use the ~20 ASCII symbols for all its
operators. The only exceptions i know of are Mathematica, Fortress,
APL. This page gives some examples of problems created by symbol
congestion.
-------------------------------
Symbol Congestion Workarounds
--------------------
Multiple Meanings of a Symbol
Here are some common examples of a symbol that has multiple meanings
depending on context:
In Java, [ ] is a delimiter for array, also a delimiter for getting a
element of array, also as part of the syntax for declaring a array
type.
In Java and many other langs, ( ) is used for expression grouping,
also as delimiter for arguments of a function call, also as delimiters
for parameters of a function's declaration.
In Perl and many other langs, : is used as a separator in a ternary
expression e.g. (test ? "yes" : "no"), also as a namespace separator
(e.g. use Data::Dumper;).
In URL, / is used as path separators, but also as indicator of
protocol. e.g. http://example.org/comp/unicode.html
In Python and many others, < is used for “less than” boolean operator,
but also as a alignment flag in its “format” method, also as a
delimiter of named group in regex, and also as part of char in other
operators that are made of 2 chars, e.g.: << <= <<= <>.
--------------------
Examples of Multip-Char Operators
Here are some common examples of operators that are made of multiple
characters: || && == <= != ** =+ =* := ++ -- :: // /* (* …
-------------------------------
Fortress & Unicode
The language designer Guy Steele recently gave a very interesting
talk. See: Guy Steele on Parallel Programing. In it, he showed code
snippets of his language Fortress, which freely uses Unicode as
operators.
For example, list delimiters are not the typical curly bracket {1,2,3}
or square bracket [1,2,3], but the unicode angle bracket ⟨1,2,3⟩.
(See: Matching Brackets in Unicode.) It also uses the circle plus ⊕ as
operator. (See: Math Symbols in Unicode.)
-------------------------------
Problems of Symbol Congestion
I really appreciate such use of unicode. The tradition of sticking to
the 95 chars in ASCII of 1960s is extremely limiting. It creates
complex problems manifested in:
* String Escape mechanism (C's backslash \n, \/, …, widely
adopted.)
* Complex delimiters for strings. (Python's triple quotes and
perl's variable delimiters q() q[] q{} m//, and heredoc. (See: Strings
in Perl and Python ◇ Heredoc mechanism in PHP and Perl.)
* Crazy leaning toothpicks syndrome, especially bad in emacs
regex.
* Complexities in character representation (See: Emacs's Key
Notations Explained (/r, ^M, C-m, RET, <return>, M-, meta) ◇ HTML
entities problems. See: HTML Entities, Ampersand, Unicode, Semantics.)
* URL Percent Encoding problems and complexities: Javascript
Encode URL, Escape String
All these problems occur because we are jamming so many meanings into
about 20 symbols in ASCII.
See also:
* Computer Language Design: Strings Syntax
* HTML6: Your JSON and SXML Simplified
Most of today's languages do not support unicode in function or
variable names, so you can forget about using unicode in variable
names (e.g. α=3) or function names (e.g. “lambda” as “λ” or “function”
as “ƒ”), or defining your own operators (e.g. “⊕”).
However, there are a few languages i know that do support unicode in
function or variable names. Some of these allow you to define your own
operators. However, they may not allow unicode for the operator
symbol. See: Unicode Support in Ruby, Perl, Python, javascript, Java,
Emacs Lisp, Mathematica.
Xah
What does your aversion to cultural diversity have to do with Lisp,
rantingrick? Gee, I do hope you're not a racist, rantingrick.
. -- proliferating Unicode symbols in source code only serves
. to further complicate our lives with even *more* multiplicity!
.
. Those of us on the *inside* know that Unicode is nothing more than an
. poor attempt to monkey patch multiplicity. And that statement barely
. scratches the surface of an underlying disease that plagues all of
. human civilization. The root case is selfishness, which *then*
. propagates up and manifests itself as multiplicity in our everyday
. lives. It starts as the simple selfish notion of "me" against "other"
. and then extrapolates exponentially into the collective of "we"
. against "others".
.
. This type of grouping --or selfish typecasting if you will-- is
. impeding the furtherer evolution of homo sapiens. Actually we are
. moving at a snails pace when we could be moving at the speed of light!
. We *should* be evolving as a genetic algorithm but instead we are the
. ignorant slaves of our own collective selfishness reduced to naive and
. completely random implementations of bozosort!
What does that have to do with Lisp, rantingrick?
. Now don't misunderstand all of this as meaning "multiplicity is bad",
. because i am not suggesting any such thing! On the contrary,
. multiplicity is VERY important in emerging problem domains. Before
. such a domain is understood by the collective unconscience we need
. options (multiplicity!) from which to choose from. However, once a
. "collective understanding" is reached we must reign in the
. multiplicity or it will become yet another millstone around our
. evolutionary necks, slowing our evolution.
Classic illogic. Evolution depends upon diversity as grist for the mill
of selection, rantingrick. A genetically homogeneous population cannot
undergo allele frequency shifts, rantingrock.
. But multiplicity is just the very beginning of a downward spiral of
. devolution. Once you allow multiplicity to become the sport of
. Entropy, it may be too late for recovery! Entropy leads to shock
. (logical disorder) which then leads to stagnation (no logical order at
. all!). At this point we loose all forward momentum in our evolution.
. And why? Because of nothing more than self gratifying SELFISHNESS.
.
. Anyone with half a brain understands the metric system is far superior
. (on many levels) then any of the other units of measurement. However
. again we have failed to reign in the multiplicity and so entropy has
. run a muck, and we are like a deer "caught-in-the-headlights" of the
. shock of our self induced devolution and simultaneously entirely
. incapable of seeing the speeding mass that is about to plow over us
. with a tremendous kinetic energy -- evolutionary stagnation!
.
. Sadly this disease of selfishness infects many aspects of the human
. species to the very detriment of our collective evolution. Maybe one
. day we will see the light of logic and choose to unite in a collective
. evolution. Even after thousands of years we are but infants on the
. evolutionary scale because we continue to feed the primal urges of
. selfishness.
What does any of that have to do with Lisp, rantingrick?
And you omitted the #1 most serious objection to Xah's proposal,
rantingrick, which is that to implement it would require unrealistic
things such as replacing every 101-key keyboard with 10001-key keyboards
and training everyone to use them. Xah would have us all replace our
workstations with machines that resemble pipe organs, rantingrick, or
perhaps the cockpits of the three surviving Space Shuttles. No doubt
they'd be enormously expensive, as well as much more difficult to learn
to use, rantingrick.
> And you omitted the #1 most serious objection to Xah's proposal,
> rantingrick, which is that to implement it would require unrealistic
> things such as replacing every 101-key keyboard with 10001-key
> keyboards and training everyone to use them. Xah would have us all
> replace our workstations with machines that resemble pipe organs,
> rantingrick, or perhaps the cockpits of the three surviving Space
> Shuttles. No doubt they'd be enormously expensive, as well as much
> more difficult to learn to use, rantingrick.
Atleast it should try to mimick a space-cadet keyboard, shouldn't it?
Cor
--
Monosyllabisch antwoorden is makkelijker ik kan mij zelfs melk veroorloven
Geavanceerde politieke correctheid is niet te onderscheiden van sarcasme
First rule of enaging in a gunfight: HAVE A GUN
SPAM DELENDA EST http://www.spammesenseless.nl
Language is a part of culture, rantingrick.
> People have this irrational fear that if we create a single
> universal language then *somehow* freedom have been violated.
No, it is that if we stop using the others, or forcibly wipe them out,
that something irreplaceable will have been lost, rantingrick.
> You *do* understand that language is just a means of communication,
> correct?
Classic unsubstantiated and erroneous claim. A language is also a
cultural artifact, rantingrick. If we lose, say, the French language, we
lose one of several almost-interchangeable means of communication,
rantingrick. But we also lose something as unique and irreplaceable as
the original canvas of the Mona Lisa, rantingrick.
> And i would say a very inefficient means. However, until
> telekinesis becomes common-place the only way us humans have to
> communicate is through a fancy set of grunts and groans. Since that is
> the current state of our communication thus far, would it not be
> beneficial that at least we share a common world wide mapping of this
> noise making?
What does your question have to do with Lisp, rantingrick?
> <sarcasm> Hey, wait, i have an idea... maybe some of us should drive
> on the right side of the road and some on the left. This way we can be
> unique (psst: SELFISH) from one geographic location on the earth to
> another geographic location on the earth.
Classic illogic. Comparing, say, the loss of the French language to
standardizing on this is like comparing the loss of the Mona Lisa to
zeroing one single bit in a computer somewhere, rantingrick.
> Surely this multiplicity
> would not cause any problems? Because, heck, selfishness is so much
> more important than anyones personal safety anyway</sarcasm>
Non sequitur.
> Do you see how this morphs into a foolish consistency?
What does your classic erroneous presupposition have to do with Lisp,
rantingrick?
>> Classic illogic. Evolution depends upon diversity as grist for the mill
>> of selection, rantingrick. A genetically homogeneous population cannot
>> undergo allele frequency shifts, rantingrock.
>
> Oh, maybe you missed this paragraph
What does your classic erroneous presupposition have to do with Lisp,
rantingrick?
> . Now don't misunderstand all of this as meaning "multiplicity is
> bad",
> . because i am not suggesting any such thing! On the contrary,
> . multiplicity is VERY important in emerging problem domains. Before
> . such a domain is understood by the collective unconscience we need
> . options (multiplicity!) from which to choose from. However, once a
> . "collective understanding" is reached we must reign in the
> . multiplicity or it will become yet another millstone around our
> . evolutionary necks, slowing our evolution.
Classic erroneous presupposition that evolution is supposed to reach a
certain point and then stop and stagnate on a single universal standard,
rantingrick.
> Or maybe this one:
>
> . I think in theory the idea of using Unicode chars is good, however
> in
> . reality the implementation would be a nightmare! A wise man once
> . said: "The road to hell is paved in good intentions". ;-)
Classic unsubstantiated and erroneous claim. I read that one, rantingrick.
> Or this one:
>
> . If we consider all the boundaries that exist between current
> . (programming) languages (syntax, IDE's, paradigms, etc) then we
> will
> . realize that adding *more* symbols does not help, no, it actually
> . hinders! And Since Unicode is just a hodgepodge encoding of many
> . regional (natural) languages --of which we have too many already in
> . this world!
Classic unsubstantiated and erroneous claim. I read that one, too,
rantingrick.
>> What does any of that have to do with Lisp, rantingrick?
>
> The topic is *ahem*... "Problems of Symbol Congestion in Computer
> Languages"... of which i think is not only a lisp issue but an issue
> of any language.
Classic illogic. The topic of the *thread* is *computer* languages, yet
you attacked non-computer languages in the majority of your rant,
rantingrick. Furthermore, the topic of the *newsgroup* is the *Lisp
subset* of computer languages.
> (see my comments about selfishness for insight)
What does that have to do with Lisp, rantingrick?
>> And you omitted the #1 most serious objection to Xah's proposal,
>> rantingrick, which is that to implement it would require unrealistic
>> things such as replacing every 101-key keyboard with 10001-key keyboards
>> and training everyone to use them. Xah would have us all replace our
>> workstations with machines that resemble pipe organs, rantingrick, or
>> perhaps the cockpits of the three surviving Space Shuttles. No doubt
>> they'd be enormously expensive, as well as much more difficult to learn
>> to use, rantingrick.
>
> Yes, if you'll read my entire post then you'll clearly see that i
> disagree with Mr Lee on using Unicode chars in source code.
Classic erroneous presuppositions that I did not read your entire post
and that I thought you weren't disagreeing with Mr. Lee, rantingrick.
> My intention was to educate him on the pitfalls of multiplicity.
Classic illogic, since "multiplicity" (also known as "diversity") does
not in and of itself have pitfalls, rantingrick.
On the other hand, monoculture has numerous well-known pitfalls,
rantingrick.
Because monocultures _die_ and no amount of fascist-like rick-ranting
about a One True Way will ever change that.
On 2011-02-17, rantingrick wrote:
…
On 2011-02-17, Cthun wrote:
│ And you omitted the #1 most serious objection to Xah's proposal,
│ rantingrick, which is that to implement it would require unrealistic
│ things such as replacing every 101-key keyboard with 10001-key
keyboards
│ and training everyone to use them. Xah would have us all replace our
│ workstations with machines that resemble pipe organs, rantingrick,
or
│ perhaps the cockpits of the three surviving Space Shuttles. No doubt
│ they'd be enormously expensive, as well as much more difficult to
learn
│ to use, rantingrick.
keyboard shouldn't be a problem.
Look at APL users.
http://en.wikipedia.org/wiki/APL_(programming_language)
they are happy campers.
Look at Mathematica, which support a lot math symbols since v3 (~1997)
before unicode became popular.
see:
〈How Mathematica does Unicode?〉
http://xahlee.org/math/mathematica_unicode.html
word processors, also automatically do symbols such as “curly quotes”,
trade mark sign ™, copyright sing ©, arrow →, bullet •, ellipsis …
etc, and the number of people who produce document with these chars
are probably more than the number of programers.
in emacs, i recently also wrote a mode that lets you easily input few
hundred unicode chars.
〈Emacs Math Symbols Input Mode (xmsi-mode)〉
http://xahlee.org/emacs/xmsi-math-symbols-input.html
the essence is that you just need a input system.
look at Chinese, Japanese, Korean, or Islamic. They happily type
without requiring that every symbol they use must have a corresponding
key on keyboard. Some lang, such as Chinese, that's impossible or
impractical.
when a input system is well designd, it could be actually more
efficient than
keyboard combinations to typo special symbols (such as in Mac OS X's
opt key, or
Windows's AltGraph). Because a input system can be context based, that
it looks
at adjacent text to guess what you want.
for example, when you type >= in python, the text editor can
automatically change it to ≥ (when it detects that it's appropriate,
e.g. there's a “if” nearby)
Chinese phonetic input system use this
extensively. Abbrev system in word processors and emacs is also a form
of
this. I wrote some thought about this here:
〈Designing a Math Symbols Input System〉
http://xahlee.org/comp/design_math_symbol_input.html
Xah Lee
>What is evolution?
>
>Evolution is the pursuit of perfection at the expense of anything and
>everything!
No, evolution is the pursuit of something just barely better than what
the other guy has. Evolution is about gaining an edge, not gaining
perfection.
Perfect is the enemy of good.
-Steve Schafer
>Evolution is about one cog gaining an edge over another, yes. However
>the system itself moves toward perfection at the expense of any and
>all cogs.
Um, do you actually know anything about (biological) evolution? There is
no evidence of an overall "goal" of any kind, perfect or not.
* There are many examples of evolutionary "arms races" in nature; e.g.,
the cheetah and the gazelle, each gaining incrementally on the other,
and a thousand generations later, each in essentially the same place
relative to the other that they started from, only with longer legs
or a more supple spine.
* There are many adaptations that confer a serious DISadvantage in one
aspect of survivability, that survive because they confer an
advantage in another (sickle-cell disease in humans, a peacock's
tail, etc.).
>If perfection is evil then what is the pursuit of perfection: AKA:
>gaining an edge?
1) I never said that perfection is evil; those are entirely your words.
2) If you don't already see the obvious difference between "pursuit of
perfection" and "gaining an edge," then I'm afraid there's nothing I can
do or say to help you.
-Steve Schafer
What does your classic unsubstantiated and erroneous claim have to do
with Lisp, Lee?
> for example, when you type >= in python, the text editor can
> automatically change it to ≥ (when it detects that it's appropriate,
> e.g. there's a “if” nearby)
You can't rely on the presence of an `if`.
flag = x >= y
value = lookup[x >= y]
filter(lambda x, y: x >= y, sequence)
Not that you need to. There are no circumstances in Python where the
meaning of >= is changed by an `if` statement.
Followups set to comp.lang.python.
--
Steven
And you lack education.
> Evolution is the pursuit of perfection at the expense of anything and
> everything!
Evolution is the process by which organisms change over time through
genetically shared traits. There is no 'perfection', there is only
'fitness', that is, survival long enough to reproduce. Fitness is not
something any of your ideas possess.
The rest of your conjecture about my opinions and beliefs is just pure
garbage. You'd get far fewer accusations of being a troll if you
stopped putting words into other peoples mouths; then we'd just think
you're exuberantly crazy.
Also, Enough! With! The! Hyperbole! Already! "Visionary" is _never_ a
self-appointed title.
> Also, Enough! With! The! Hyperbole! Already! "Visionary" is _never_ a
> self-appointed title.
You only say that because you lack the vision to see just how visionary
rantingrick's vision is!!!!1!11!
Followups set to c.l.p.
--
Steven
Haskell is slowly moving this way see for example
http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#unicode-syntax
But its not so easy the lambda does not work straight off -- see
http://hackage.haskell.org/trac/ghc/ticket/1102
Anyone with a *whole* brain can see that you are mistaken. The
current "metric" system has two serious flaws:
It's based on powers of ten rather than powers of two, creating a
disconnect between our communication with computers (in decimal)
and how computers deal with numbers internally (in binary). Hence
the confusion newbies have as to why if you type into the REP loop
(+ 1.1 2.2 3.3)
you get out
6.6000004
The fundamental units are absurd national history artifacts such as
the French "metre" stick when maintained at a particular
temperature, and the Grenwich Observatory "second" as 1/(24*60*60)
of the time it took the Earth to rotate once relative to a
line-of-sight to the Sun under some circumstance long ago.
And now these have been more precisely defined as *exactly* some
inscrutable multiples of the wavelength and time-period of some
particular emission from some particular isotope under certain
particular conditions:
http://en.wikipedia.org/wiki/Metre#Standard_wavelength_of_krypton-86_emission
(that direct definition replaced by the following:)
http://en.wikipedia.org/wiki/Metre#Speed_of_light
"The metre is the length of the path travelled by light in vacuum
during a time interval of ^1/[299,792,458] of a second."
http://en.wikipedia.org/wiki/Second#Modern_measurements
"the duration of 9,192,631,770 periods of the radiation corresponding to
the transition between the two hyperfine levels of the ground state of
the caesium-133 atom"
Exercise to the reader: Combine those nine-decimal-digit and
ten-decimal-digit numbers appropriately to express exactly how many
wavelengths of the hyperfine transition equals one meter.
Hint: You either multiply or divide, hence if you just guess you
have one chance out of 3 of being correct.
> Exercise to the reader: Combine those nine-decimal-digit and
> ten-decimal-digit numbers appropriately to express exactly how many
> wavelengths of the hyperfine transition equals one meter. Hint: You
> either multiply or divide, hence if you just guess you have one chance
> out of 3 of being correct.
Neither. The question is nonsense. The hyperfine transition doesn't have
a wavelength. It is the radiation emitted that has a wavelength. To work
out the wavelength of the radiation doesn't require guessing, and it's
not that complicated, it needs nothing more than basic maths.
Speed of light = 1 metre travelled in 1/299792458 of a second
If 9192631770 periods of the radiation takes 1 second, 1 period takes
1/9192631770 of a second.
Combine that with the formula for wavelength:
Wavelength = speed of light * period
= 299792458 m/s * 1/9192631770 s
= 0.03261225571749406 metre
Your rant against the metric system is entertaining but silly. Any
measuring system requires exact definitions of units, otherwise people
will disagree on how many units a particular thing is. The imperial
system is a good example of this: when you say something is "15 miles",
do you mean UK statute miles, US miles, survey miles, international
miles, nautical miles, or something else? The US and the UK agree that a
mile is exactly 1,760 yards, but disagree on the size of a yard. And
let's not get started on fluid ounces (a measurement of volume!) or
gallons...
The metric system is defined to such a ridiculous level of precision
because we have the technology, and the need, to measure things to that
level of precision. Standards need to be based on something which is
universal and unchanging. Anybody anywhere in the world can (in
principle) determine their own standard one metre rule, or one second
timepiece, without arguments about which Roman soldier's paces defines a
yard, or which king's forearm is a cubit.
Follow-ups set to comp.lang.python.
--
Steven