Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Syntax Design: Use of Unicode Matching Brackets as Specialized Delimiters

92 views
Skip to first unread message

Xah Lee

unread,
May 10, 2011, 10:23:17 AM5/10/11
to
might be of interest to those interested in language design.

〈Syntax Design: Use of Unicode Matching Brackets as Specialized
Delimiters〉
http://xahlee.org/comp/unicode_brackets_use.html

text version follows.

──────────────────────────────
Syntax Design: Use of Unicode Matching Brackets as Specialized
Delimiters

Xah Lee, 2011-05-08

In my tech blogs, often i give instructions involving the graphical
menu. For example, i'd say: it's at the menu “File▸Open”. Today i
decided to use a special delimiter to indicate menu. The delimiter is
the unicode 〖WHITE LENTICULAR BRACKET〗. So, the menu would be written
as 〖File▸Open〗. I just spend a couple hours changing all mentions of
menu on my site to use the new delimiter.

Here's a summary of my usage of special unicode brackets:

ANGLE BRACKET. Article title. e.g. 〈Xah's Emacs Lisp Tutorial〉.
DOUBLE ANGLE BRACKET. Book title. e.g. 《Basic Economics》.
BLACK LENTICULAR BRACKET. Key combinations. e.g. 【Ctrl+c】.
WHITE LENTICULAR BRACKET. Menu. e.g. 〖File▸Open〗.
TORTOISE SHELL BRACKET. File names, path, url. e.g. 〔~/Documents/
notes.txt〕.
CORNER BRACKET. Computer code, or math expression. e.g. 「x = 3;」.
ANGLE QUOTATION MARK. Indicator of semantic for a keyword/
expression in computer code. e.g. function ‹parameter name› =
‹expression›.
DOUBLE QUOTATION MARK. Generic delimiter. e.g. “something”.


──────────────────────────────
Why Are These Brackets Choosen?

There are many other brackets in unicode. (See: Matching Brackets in
Unicode.) I choose these brackets and my use of them carefully. The
following are the reasons, in no particular order:

① It must be a fairly common character, so that most browsers,
editors, fonts, or other tools can display them.
② The meaning i assigned to them must be compatible with the
semantics given to the char in unicode.

All the brackets i've used are common ones. The “curly quote” and
‹angle quote› are widely used in western languages. The 〈〉《》【】〖〗「」〔〕
are used daily in Chinese and Japanese. (See: Intro to Chinese
Punctuation with Computer Language Syntax Perspectives.) These
languages are widely used in computing in China and Japan, and they
are also widely supported even in non-Asian countries.

If a font or tool has any support for unicode, these brackets are
probably among the top 100 or so chars supported.

──────────────────────────────
Are the Use of These Delimiters Necessary?

Are the Use of These Delimiters Necessary? No, but they provide
meaningful info, as visual enhancement but especially for computer
processing.

For example, once you realized that the lenticular bracket 【Ctrl+x】 is
a marker for computer keyboard shortcut notation, users can easily
recognize all keys on the page at a glance. For a sample article with
these marks, see: How To Set Emacs's User Interface to Modern
Conventions.

For another example, with these markers, i can easily write a program
that extract all book titles, computer keys shortcuts mentioned,
program menus, or code snippets from my website articles (of few
thousand files). Without these markers, the problem is non-trivial.

Here's a example of the benefit of computer recognition: suppose in my
Emacs Tutorial, i want to add interactive annotation for all emacs key
shortcuts mentioned in the tutorial. (emacs has few hundred key
shortcuts by default) When user hovers mouse over a emacs key shortcut
on the article, it should have a pop-up box indicating the associated
name of the command. When keys are marked with a specific delimiter
for that purpose, such as 【Ctrl+x】, a program can trivially identify
all of them.

──────────────────────────────
What About Using HTML Markup Instead?

HTML markup is great. It serves the same purpose. I have dithered on
whether to use HTML markup instead, or by special brackets in unicode,
or a mixture of both. I've experimented with that over the past 2
years. Right now, i use a mixture of both.

Here's a sample html markup snippet:

Computer code: <span class="computer_code">x = 3;</span>
Keys: <span class="keyboard_shortcut">Ctrl+c</span>
Book Title: <span class="book_title">Emacs Tutorial</span>

Here's a CSS definition that automatically makes a text colored, and
also inserts the brackets for display, for any text marked up with the
“code” tag:

code{color:red;font-family:"DejaVu Sans Mono",monospace}
code:before,code:after{color:black;background-color:white}
code:before{content:"「"}
code:after{content:"」"}

The advantage of HTML markup is that it's a more elaborate system. For
example, you can color the text, specify font, text size. You can add
brackets if you want. The markup is also more precise. For example, if
you have <span class="book_title">…</span> is precise, while a bracket
《…》 could mean something else (just look at this page you are reading,
where the text inside that bracket is not necessarily book title.)

The disadvantage is that it's much more verbose, and makes the raw
source code much harder to read.

Right now, all my book titles, article titles, computer code snippet,
are marked using HTML, and using CSS to add specialized brackets for
visual clue.

──────────────────────────────
A Finer Point: Are Delimiter Brackets Semantically Meaningful or Just
for Visual Enhancement?

Suppose you use CSS. For example, a book title is wrapped up by html
tag like this:

<span class="book_title">The Story Of My Life</span>

and here's CSS code to add color:

span.book_title:before,code:after{color:red}

You can also add brackets:

span.book_title:before{content:"《"}
span.book_title:after{content:"》"}

So, if you want the text to be colored, you must use CSS. However, you
can add the bracket in the text without relying on CSS, like this:

《<span class="book_title">The Story Of My Life</span>》

The question for me was, should the bracket be part of the text or
added by CSS? Which format should i choose?

The answer depends on whether the bracket is considered just a visual
enhancement, or semantically meaningful. If it's just visual
enhancement, then it should be part of CSS (cascading Style Sheet), as
implied by the word “style” in its name. When CSS is off, readers
won't see the bracket, and it doesn't matter. However, if the bracket
is considered semantically meaningful, then it should not be in CSS.
That way, doesn't matter whether CSS is on or off, you still see the
bracket.

There are opposing views on whether the bracket should be in text or
added by CSS.

① The brackets are semantically meaningful, thus should be part of
text. For example, in Chinese, book titles have angle brackets. They
are semantically meaningful. It is not just a decoration. In the same
way, western text involving matched pairs: “curly quotes”, «french
quote», or various brackets (paren), [square bracket], {braces}, are
almost always semantically meaningful. If you remove them, it effects
the text in major ways.

② A bracket in a text when the text is already marked up, is
redundant. Therefore, in this view, one should not add the brackets in
the text. Even though CSS is considered for appearances, but the fact
is that appearances, layout, and semantics are often intertwined in
various degree. Positioning (layout), sizes, often adds subtle but non-
trivial semantics to a page. In practice, probably a significant
percentage of web pages would become unreadable or its meaning
effected if you turn off CSS, and as a fact, probably less than 0.01%
pages are ever read without CSS. The bottom line of this reasoning is
that, if you use HTML/CSS tech bundle, then you shouldn't add the
bracket in the text, because it's already precisely marked up. Just
let CSS add the bracket for you.

Right now i haven't decided which is “better”. More precisely, i think
one way might be better than the other, if a more precise goal,
purpose, is given. As for now for me, it doesn't matter much for the
purpose of online articles.

As a example where it might matters, is when in defining a document
using XML, or the article in HTML is a basis for printed publication
that goes thru further processing. (for example, The finely printed
book A New Kind of Science is based on Mathematica notebook format.
(see also: Notes on A New Kind of Science.) Some books are based on
HTML/CSS tech. For example, Håkon Wium Lie's book. Some books are
based on unix's troff system (man pages). There are QuarkXPress, Adobe
InDesign (PageMaker), DocBook, LaTeX, etc. )

Matching Brackets in Unicode
HTML Entities, Ampersand, Unicode, Semantics
Problems of Symbol Congestion in Computer Languages (ASCII Jam;
Unicode; Fortress)
Intro to Chinese Punctuation with Computer Language Syntax
Perspectives
HTML6: Your JSON and SXML Simplified
The Writing Style on XahLee.org
The Moronicities of Typography
How to Create a APL or Math Symbols Keyboard Layout
The TeX Pestilence (or, the problems of TeX/LaTex)

Xah

Julian Bradfield

unread,
May 10, 2011, 10:48:31 AM5/10/11
to
On 2011-05-10, Xah Lee <xah...@gmail.com> wrote:
> decided to use a special delimiter to indicate menu. The delimiter is
> the unicode 〖WHITE LENTICULAR BRACKET〗. So, the menu would be written

...

> ① It must be a fairly common character, so that most browsers,
> editors, fonts, or other tools can display them.

...


> If a font or tool has any support for unicode, these brackets are
> probably among the top 100 or so chars supported.


Hmm. White lenticular bracket is in the font I'm using in this editor,
but it wasn't in the font I had in the xterm with which I first read
this, although that font has lots of other characters (critically, for
me, phonetic characters). So I'm not so sure the white lenticular
bracket would make it to the top 100 !
In Europe, the various Latin, Greek, Cyrillic etc. characters are
surely much more important than Chinese punctuation.

Xah Lee

unread,
May 10, 2011, 1:31:04 PM5/10/11
to
2011-05-10

you are right. Maybe within 300 might be a better estimate. (but
actually that's not counting chinese chars...) I'd be interested to
know what's your setup. linux? what year is it? Though, on the other
hand, it's not surprising that in xterm it doesn't show, because
command line apps often have problems with unicode, even in mac or
windows.

Xah

Julian Bradfield

unread,
May 11, 2011, 6:13:13 AM5/11/11
to
On 2011-05-10, Xah Lee <xah...@gmail.com> wrote:
>> > the unicode 〖WHITE LENTICULAR BRACKET〗. So, the menu would be written
...
>> > If a font or tool has any support for unicode, these brackets are
>> > probably among the top 100 or so chars supported.
>>
>> Hmm. White lenticular bracket is in the font I'm using in this editor,
>> but it wasn't in the font I had in the xterm with which I first read
...

> know what's your setup. linux? what year is it? Though, on the other
> hand, it's not surprising that in xterm it doesn't show, because

It turns out it's my problem. My xterm CJK font is a home-brewed
combination of the various Chinese and Japanese legacy fonts on my
system, and when I was mapping them to Unicode, I just used the Unihan
file to get the mainland mappings. But Unihan doesn't include the
punctuation and symbols, and the WHITE LENTICULAR BRACKET is a
mainland-only character.

Xah Lee

unread,
May 17, 2011, 6:35:26 PM5/17/11
to
some further thoughts on this.

〈Syntax Semantics Design: Use of Unicode Ellipsis Character vs Dot Dot
Dot〉
http://xahlee.org/comp/unicode_ellipsis_use.html

--------------------------------------------------
Syntax Semantics Design: Use of Unicode Ellipsis Character vs Dot Dot
Dot

Xah Lee, 2011-05-16

I decided, to use the unicode char HORIZONTAL ELLIPSIS “…” (U+2026)
instead of the common 3 dots “...” for all my online writings. So,
spent the past couple hours replacing all 3 dots to the ellipsis
glyph, starting with my Emacs Tutorial directory (~300 files; 421
replacements.) (I have yet to do it site-wide, about ~5k files.)

Note: the replacement are done on a case-by-case basis with human eye-
balling, and cannot be done blind programatically, because some
occurrences of consecutive 3 dots are parts of computer code, error
messages, or other uses, and must remain as 3 dots, e.g. in regex, 3
dots is a pattern for 3 chars. This task is done using emacs's command
“dired-do-query-replace-regexp”. (See: Emacs: Interactively Find &
Replace String Patterns on Multiple Files and Find & Replace with
Emacs.)

------------------------------
Why Use Ellipsis Instead of Dot Dot Dot?

Why have i decided to use the ellipsis glyph instead of the much
convenient 3 dots? Traditionally, it is done usually for esthetic
reasons in printing. However, for me, the reason is mostly syntax &
semantics design considerations in the context of computer science. I
favored the ellipsis character because ellipsis carries with it a
distinct meaning. That is, the char's sole purpose is to indicate
omission (or other similar purposes). However, using 3 dots for the
same purpose is in some sense a hack and creates certain complexity
and ambiguity.

Here's one way to see it. Let's say a program is to parse the text.
(such as web search engine bots) When the program comes to the
ellipsis char, it knows right there that char's meaning. (assuming the
char is not being abused, such as used in ASCII art) But when it comes
to a period, it is not sure, it has to parse more, until you reach 3
consecutive dots. But even when you got 3 dots, the meaning is still
not as precises when compared to the dedicated ellipsis char, because
3 consecutive dots could mean lots of things. (e.g. such as in regex,
or some other lang uses 2 dots as a sequence generating operator, e.g.
in perl: print 1..9;. In Mathematica, 3 dots is the syntax that
represents a repeating pattern in its pattern matching functions. See:
Source reference.wolfram.com.)

From another perspective, the period character “.” (unicode name FULL
STOP, old name PERIOD) is used for multiple purposes. For example:
decimal separator “3.1415”, section number separator 1.2.3, numbered
items (1. this 2. that), domain name separator 〔www.example.com〕, as
multiplication sign, as vector dot product operator in math. The
meaning of the ellipsis symbol in comparison is far less context
dependent.

Here's Wikipedia articles on them: Ellipsis, Full stop.

------------------------------
Is All This Important?

No, not really, but it's the sort of thing designers think about,
especially those into computer language syntaxes and mathematical
notations, me.

I, for my website, am rather particular and idiosyncratic about just
every aspect. The content, the style of writing, diction, design,
layout, down to, the glyph choices in punctuation (but in general i'm
antithetic to being choosy on fonts and other typographical matters.
See: The Moronicities of Typography ◇ The TeX Pestilence.).

------------------------------
The Naming of Ellipsis

It might be interesting to note that the etymology of the word
ellipsis shares with the math curve ellipse, both are from the Ancient
Greek: ἔλλειψις, “omission” or “falling short”. See this page: Conic
Sections, quote:

Appollonius was the first to base the theory of all three conics
on sections of one circular cone, right or oblique. He is also the one
to give the name ellipse, parabola, and hyperbola. A brief explanation
of the naming can be found in Howard Eves, An Introduction to the
History of Math. 6th ed. page 172. (also see J H Conway's newsgroup
message at conicsEtynomogy.txt.)

Syntax Design: Use of Unicode Matching Brackets as Specialized
Delimiters

HTML Entities, Ampersand, Unicode, Semantics
Problems of Symbol Congestion in Computer Languages (ASCII Jam;
Unicode; Fortress)
Intro to Chinese Punctuation with Computer Language Syntax
Perspectives
HTML6: Your JSON and SXML Simplified
The Writing Style on XahLee.org

How to Create a APL or Math Symbols Keyboard Layout

Xah

0 new messages