Message from discussion
Syntax Design: Use of Unicode Matching Brackets as Specialized Delimiters
Path: g2news1.google.com!postnews.google.com!35g2000prp.googlegroups.com!not-for-mail
From: Xah Lee <xah...@gmail.com>
Newsgroups: comp.lang.lisp,comp.emacs
Subject: Syntax Design: Use of Unicode Matching Brackets as Specialized Delimiters
Date: Tue, 10 May 2011 07:23:17 -0700 (PDT)
Organization: http://groups.google.com
Lines: 245
Message-ID: <f77cecf4-eb35-4631-ae05-f55d7402fe66@35g2000prp.googlegroups.com>
NNTP-Posting-Host: 76.126.112.84
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
X-Trace: posting.google.com 1305037397 5213 127.0.0.1 (10 May 2011 14:23:17 GMT)
X-Complaints-To: groups-abuse@google.com
NNTP-Posting-Date: Tue, 10 May 2011 14:23:17 +0000 (UTC)
Complaints-To: groups-abuse@google.com
Injection-Info: 35g2000prp.googlegroups.com; posting-host=76.126.112.84; posting-account=bRPKjQoAAACxZsR8_VPXCX27T2YcsyMA
User-Agent: G2/1.0
X-HTTP-UserAgent: Mozilla/5.0 (Windows NT 6.0; WOW64) AppleWebKit/534.24
(KHTML, like Gecko) Chrome/11.0.696.65 Safari/534.24,gzip(gfe)
might be of interest to those interested in language design.
=E3=80=88Syntax Design: Use of Unicode Matching Brackets as Specialized
Delimiters=E3=80=89
http://xahlee.org/comp/unicode_brackets_use.html
text version follows.
=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=
=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=
=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=
=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80
Syntax Design: Use of Unicode Matching Brackets as Specialized
Delimiters
Xah Lee, 2011-05-08
In my tech blogs, often i give instructions involving the graphical
menu. For example, i'd say: it's at the menu =E2=80=9CFile=E2=96=B8Open=E2=
=80=9D. Today i
decided to use a special delimiter to indicate menu. The delimiter is
the unicode =E3=80=96WHITE LENTICULAR BRACKET=E3=80=97. So, the menu would =
be written
as =E3=80=96File=E2=96=B8Open=E3=80=97. I just spend a couple hours changin=
g all mentions of
menu on my site to use the new delimiter.
Here's a summary of my usage of special unicode brackets:
ANGLE BRACKET. Article title. e.g. =E3=80=88Xah's Emacs Lisp Tutorial=
=E3=80=89.
DOUBLE ANGLE BRACKET. Book title. e.g. =E3=80=8ABasic Economics=E3=80=
=8B.
BLACK LENTICULAR BRACKET. Key combinations. e.g. =E3=80=90Ctrl+c=E3=80=
=91.
WHITE LENTICULAR BRACKET. Menu. e.g. =E3=80=96File=E2=96=B8Open=E3=80=
=97.
TORTOISE SHELL BRACKET. File names, path, url. e.g. =E3=80=94~/Document=
s/
notes.txt=E3=80=95.
CORNER BRACKET. Computer code, or math expression. e.g. =E3=80=8Cx =3D =
3;=E3=80=8D.
ANGLE QUOTATION MARK. Indicator of semantic for a keyword/
expression in computer code. e.g. function =E2=80=B9parameter name=E2=80=BA=
=3D
=E2=80=B9expression=E2=80=BA.
DOUBLE QUOTATION MARK. Generic delimiter. e.g. =E2=80=9Csomething=E2=80=
=9D.
=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=
=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=
=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=
=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80
Why Are These Brackets Choosen?
There are many other brackets in unicode. (See: Matching Brackets in
Unicode.) I choose these brackets and my use of them carefully. The
following are the reasons, in no particular order:
=E2=91=A0 It must be a fairly common character, so that most browsers,
editors, fonts, or other tools can display them.
=E2=91=A1 The meaning i assigned to them must be compatible with the
semantics given to the char in unicode.
All the brackets i've used are common ones. The =E2=80=9Ccurly quote=E2=80=
=9D and
=E2=80=B9angle quote=E2=80=BA are widely used in western languages. The =E3=
=80=88=E3=80=89=E3=80=8A=E3=80=8B=E3=80=90=E3=80=91=E3=80=96=E3=80=97=E3=80=
=8C=E3=80=8D=E3=80=94=E3=80=95
are used daily in Chinese and Japanese. (See: Intro to Chinese
Punctuation with Computer Language Syntax Perspectives.) These
languages are widely used in computing in China and Japan, and they
are also widely supported even in non-Asian countries.
If a font or tool has any support for unicode, these brackets are
probably among the top 100 or so chars supported.
=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=
=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=
=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=
=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80
Are the Use of These Delimiters Necessary?
Are the Use of These Delimiters Necessary? No, but they provide
meaningful info, as visual enhancement but especially for computer
processing.
For example, once you realized that the lenticular bracket =E3=80=90Ctrl+x=
=E3=80=91 is
a marker for computer keyboard shortcut notation, users can easily
recognize all keys on the page at a glance. For a sample article with
these marks, see: How To Set Emacs's User Interface to Modern
Conventions.
For another example, with these markers, i can easily write a program
that extract all book titles, computer keys shortcuts mentioned,
program menus, or code snippets from my website articles (of few
thousand files). Without these markers, the problem is non-trivial.
Here's a example of the benefit of computer recognition: suppose in my
Emacs Tutorial, i want to add interactive annotation for all emacs key
shortcuts mentioned in the tutorial. (emacs has few hundred key
shortcuts by default) When user hovers mouse over a emacs key shortcut
on the article, it should have a pop-up box indicating the associated
name of the command. When keys are marked with a specific delimiter
for that purpose, such as =E3=80=90Ctrl+x=E3=80=91, a program can trivially=
identify
all of them.
=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=
=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=
=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=
=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80
What About Using HTML Markup Instead?
HTML markup is great. It serves the same purpose. I have dithered on
whether to use HTML markup instead, or by special brackets in unicode,
or a mixture of both. I've experimented with that over the past 2
years. Right now, i use a mixture of both.
Here's a sample html markup snippet:
Computer code: <span class=3D"computer_code">x =3D 3;</span>
Keys: <span class=3D"keyboard_shortcut">Ctrl+c</span>
Book Title: <span class=3D"book_title">Emacs Tutorial</span>
Here's a CSS definition that automatically makes a text colored, and
also inserts the brackets for display, for any text marked up with the
=E2=80=9Ccode=E2=80=9D tag:
code{color:red;font-family:"DejaVu Sans Mono",monospace}
code:before,code:after{color:black;background-color:white}
code:before{content:"=E3=80=8C"}
code:after{content:"=E3=80=8D"}
The advantage of HTML markup is that it's a more elaborate system. For
example, you can color the text, specify font, text size. You can add
brackets if you want. The markup is also more precise. For example, if
you have <span class=3D"book_title">=E2=80=A6</span> is precise, while a br=
acket
=E3=80=8A=E2=80=A6=E3=80=8B could mean something else (just look at this pa=
ge you are reading,
where the text inside that bracket is not necessarily book title.)
The disadvantage is that it's much more verbose, and makes the raw
source code much harder to read.
Right now, all my book titles, article titles, computer code snippet,
are marked using HTML, and using CSS to add specialized brackets for
visual clue.
=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=
=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=
=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=
=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80
A Finer Point: Are Delimiter Brackets Semantically Meaningful or Just
for Visual Enhancement?
Suppose you use CSS. For example, a book title is wrapped up by html
tag like this:
<span class=3D"book_title">The Story Of My Life</span>
and here's CSS code to add color:
span.book_title:before,code:after{color:red}
You can also add brackets:
span.book_title:before{content:"=E3=80=8A"}
span.book_title:after{content:"=E3=80=8B"}
So, if you want the text to be colored, you must use CSS. However, you
can add the bracket in the text without relying on CSS, like this:
=E3=80=8A<span class=3D"book_title">The Story Of My Life</span>=E3=80=8B
The question for me was, should the bracket be part of the text or
added by CSS? Which format should i choose?
The answer depends on whether the bracket is considered just a visual
enhancement, or semantically meaningful. If it's just visual
enhancement, then it should be part of CSS (cascading Style Sheet), as
implied by the word =E2=80=9Cstyle=E2=80=9D in its name. When CSS is off, r=
eaders
won't see the bracket, and it doesn't matter. However, if the bracket
is considered semantically meaningful, then it should not be in CSS.
That way, doesn't matter whether CSS is on or off, you still see the
bracket.
There are opposing views on whether the bracket should be in text or
added by CSS.
=E2=91=A0 The brackets are semantically meaningful, thus should be part of
text. For example, in Chinese, book titles have angle brackets. They
are semantically meaningful. It is not just a decoration. In the same
way, western text involving matched pairs: =E2=80=9Ccurly quotes=E2=80=9D, =
=C2=ABfrench
quote=C2=BB, or various brackets (paren), [square bracket], {braces}, are
almost always semantically meaningful. If you remove them, it effects
the text in major ways.
=E2=91=A1 A bracket in a text when the text is already marked up, is
redundant. Therefore, in this view, one should not add the brackets in
the text. Even though CSS is considered for appearances, but the fact
is that appearances, layout, and semantics are often intertwined in
various degree. Positioning (layout), sizes, often adds subtle but non-
trivial semantics to a page. In practice, probably a significant
percentage of web pages would become unreadable or its meaning
effected if you turn off CSS, and as a fact, probably less than 0.01%
pages are ever read without CSS. The bottom line of this reasoning is
that, if you use HTML/CSS tech bundle, then you shouldn't add the
bracket in the text, because it's already precisely marked up. Just
let CSS add the bracket for you.
Right now i haven't decided which is =E2=80=9Cbetter=E2=80=9D. More precise=
ly, i think
one way might be better than the other, if a more precise goal,
purpose, is given. As for now for me, it doesn't matter much for the
purpose of online articles.
As a example where it might matters, is when in defining a document
using XML, or the article in HTML is a basis for printed publication
that goes thru further processing. (for example, The finely printed
book A New Kind of Science is based on Mathematica notebook format.
(see also: Notes on A New Kind of Science.) Some books are based on
HTML/CSS tech. For example, H=C3=A5kon Wium Lie's book. Some books are
based on unix's troff system (man pages). There are QuarkXPress, Adobe
InDesign (PageMaker), DocBook, LaTeX, etc. )
=E2=9C=8D
Matching Brackets in Unicode
HTML Entities, Ampersand, Unicode, Semantics
Problems of Symbol Congestion in Computer Languages (ASCII Jam;
Unicode; Fortress)
Intro to Chinese Punctuation with Computer Language Syntax
Perspectives
HTML6: Your JSON and SXML Simplified
The Writing Style on XahLee.org
The Moronicities of Typography
How to Create a APL or Math Symbols Keyboard Layout
The TeX Pestilence (or, the problems of TeX/LaTex)
Xah