Getting Emacs to play nice with Hunspell and apostrophes

Nikolai Weibull

unread,

Jun 7, 2014, 11:39:25 AM6/7/14

to Emacs Users

Hi!

How do I get Emacs to play nice with Hunspell and apostrophes. I
thought I had it covered, but it seems that something has changed and
now M-x ispell won’t recognize “isn’t” as a word anymore.

First off, what English dictionary should I be using?

Second, how do I get Emacs to send words containing apostrophes to Hunspell?

Fiddling with WORDCHARS in en_US.aff seems so wrong, as Emacs will
then send stuff like 'isn't' as a word.

The final step is getting “isn’t” to work with Unicode apostrophes,
but let’s take it one step at a time.

It’s beyond me how this isn’t working, but I’m sure I’m doing something wrong.

Robert Thorpe

unread,

Jun 7, 2014, 1:43:38 PM6/7/14

to Nikolai Weibull, help-gn...@gnu.org

Nikolai Weibull <n...@disu.se> writes:
> “isn’t”

In Britain and Ireland we generally use "isn't", notice there's no angle
on the apostrophe. The one you're using "’" is the Unicode RIGHT
QUOTATION MARK. So, to Emacs you are closing quotes around "isn" and
putting a "t" straight after that.

As far as I know, if you want to use Unicode that would be "isnʼt" which
is MODIFIER LETTER APOSTROPHE. Have a look with C-u C-x =. I don't
know if using that will work though.

BR,
Robert Thorpe

Sharon Kimble

unread,

Jun 7, 2014, 1:53:47 PM6/7/14

to Nikolai Weibull, Emacs Users

This is from my "init.el" -
--8<---------------cut here---------------start------------->8---
;; Use hunspell instead of ispell
(if (file-exists-p "/usr/bin/hunspell")
(progn
(setq ispell-program-name "hunspell")
(eval-after-load "ispell"
'(progn (defun ispell-get-coding-system () 'utf-8)))))
(setq ispell-program-name "hunspell")
(require 'rw-hunspell)
(require 'rw-language-and-country-codes)
(require 'rw-ispell)
(setq ispell-dictionary "en_GB_hunspell")
--8<---------------cut here---------------end--------------->8---

Hopefully this will help you get going with hunspell, which I have
found is good! :) This is all I have set up for hunspell and it
works okay with "isn’t" as I've just corrected it in this email.

If you want copies of "rw-hunspell, rw-language-and-country-codes,
rw-ispell" I can priv mail them to you, but I think that I got them
From emacswiki.

Sharon.
--
A taste of linux = http://www.sharons.org.uk
my git repo = https://bitbucket.org/boudiccas/dots
TGmeds = http://www.tgmeds.org.uk
Debian testing, fluxbox 1.3.5, emacs 24.3.91.1

signature.asc

Yuri Khan

unread,

Jun 7, 2014, 1:59:21 PM6/7/14

to Robert Thorpe, help-gn...@gnu.org

The Unicode tables say:

0027 ' APOSTROPHE
[…]
• 2019 ’ is preferred for apostrophe

2019 ’ RIGHT SINGLE QUOTATION MARK
= single comma quotation mark
• this is the preferred character to use for apostrophe

02BC ʼ MODIFIER LETTER APOSTROPHE
= apostrophe
• glottal stop, glottalization, ejective
• many languages use this as a letter of their alphabets
• used as a tone marker in Bodo, Dogri, and Maithili
• 2019 ’ is the preferred character for a punctuation apostrophe

In English, the apostrophe is neither a glottal stop mark, nor a
letter, nor a tone marker, so 02BC does not apply. 2019 is the correct
code, although it is unfortunate that it is overloaded with a closing
single quote.

Eli Zaretskii

unread,

Jun 7, 2014, 2:17:26 PM6/7/14

to help-gn...@gnu.org

> From: Sharon Kimble <boud...@skimble.plus.com>
> Date: Sat, 07 Jun 2014 18:53:47 +0100
> Cc: Emacs Users <help-gn...@gnu.org>

>
> ;; Use hunspell instead of ispell
> (if (file-exists-p "/usr/bin/hunspell")
> (progn
> (setq ispell-program-name "hunspell")
> (eval-after-load "ispell"
> '(progn (defun ispell-get-coding-system () 'utf-8)))))
> (setq ispell-program-name "hunspell")
> (require 'rw-hunspell)
> (require 'rw-language-and-country-codes)
> (require 'rw-ispell)
> (setq ispell-dictionary "en_GB_hunspell")
> --8<---------------cut here---------------end--------------->8---
>
> Hopefully this will help you get going with hunspell, which I have
> found is good! :) This is all I have set up for hunspell and it
> works okay with "isn’t" as I've just corrected it in this email.

I don't see how that would work, unless you are using a version of an
English dictionary that knows about the ’ character. I don't think
the rw-* packages have anything to do with that, or could have.

So I think a more interesting for the OP question will be where did
you get that "en_GB_hunspell" dictionary you are using, because that
dictionary might actually hold the answer for his questions.

P.S. Most, if not all, of what rw-* packages do is already handled in
the ispell.el that comes with the latest versions of Emacs, which
include a full support for Hunspell. You might as well try using
Emacs without those add-ons, you will probably find they are not
needed anymore.

Nikolai Weibull

unread,

Jun 7, 2014, 2:18:49 PM6/7/14

to Robert Thorpe, Emacs Users

On Sat, Jun 7, 2014 at 7:43 PM, Robert Thorpe

<r...@robertthorpeconsulting.com> wrote:
> Nikolai Weibull <n...@disu.se> writes:
>> “isn’t”
>
> In Britain and Ireland we generally use "isn't", notice there's no angle
> on the apostrophe.

It’s generally used (in Britain and in other places) instead of the
more correct “typographical” apostrophe/right single quotation mark
because it’s more easily accessible on the standard computer keyboard,
not because it’s preferred.

> The one you're using "’" is the Unicode RIGHT
> QUOTATION MARK.

No, it’s the U+2019 RIGHT SINGLE QUOTATION MARK.

> So, to Emacs you are closing quotes around "isn" and
> putting a "t" straight after that.

I realize that ‘’’ is seen as punctuation by Emacs, which is true in
some cases, when, for example, doing quoting in British English, for
example, ‘this is a quote’ or when doing nested quoting in American
English, for example, “this quote quotes ‘another quote’”, but it’s
also sometimes a character that should be seen as part of a word.

> As far as I know, if you want to use Unicode that would be "isnʼt" which
> is MODIFIER LETTER APOSTROPHE. Have a look with C-u C-x =. I don't
> know if using that will work though.

No, that’s incorrect, please see, for example,

http://en.wikipedia.org/wiki/Apostrophe#Unicode

for a description about the use of apostrophes and Unicode.

Nikolai Weibull

unread,

Jun 7, 2014, 2:28:08 PM6/7/14

to Emacs Users

On Sat, Jun 7, 2014 at 5:39 PM, Nikolai Weibull <n...@disu.se> wrote:

> It’s beyond me how this isn’t working, but I’m sure I’m doing something wrong.

I should perhaps also note that the only word in the sentence above
that is seen as an error is “isn’t”, as “isn” isn’t a word. I guess
either Emacs or hunspell is ignoring single-character words “s” and
“m” after each of the other instances of ‘’’ and “It” and “I” are of
course seen as correctly spelled words…

(…so the simple solution is to add “isn” as a word in my personal dictionary.)

Eli Zaretskii

unread,

Jun 7, 2014, 2:40:24 PM6/7/14

to help-gn...@gnu.org

> Date: Sat, 7 Jun 2014 20:28:08 +0200
> From: Nikolai Weibull <n...@disu.se>

>
> On Sat, Jun 7, 2014 at 5:39 PM, Nikolai Weibull <n...@disu.se> wrote:
>
> > It’s beyond me how this isn’t working, but I’m sure I’m doing something wrong.
>
> I should perhaps also note that the only word in the sentence above
> that is seen as an error is “isn’t”, as “isn” isn’t a word. I guess
> either Emacs or hunspell is ignoring single-character words “s” and
> “m” after each of the other instances of ‘’’ and “It” and “I” are of
> course seen as correctly spelled words…

Emacs just goes with whatever the .aff file of the dictionary you use
says. And it cannot do anything else, because the speller uses that
dictionary, and decides by its rules what can and what cannot be in a
word.

Look in the .aff file you use, and you will see that it knows about '
and about n't and about 's, that's why these work. There's no magic
here.

So I think you must get a hold of a Hunspell-compliant dictionary that
knows about the ’ apostrophe.

Nikolai Weibull

unread,

Jun 7, 2014, 3:59:47 PM6/7/14

to Eli Zaretskii, Emacs Users

On Sat, Jun 7, 2014 at 8:40 PM, Eli Zaretskii <el...@gnu.org> wrote:
>> Date: Sat, 7 Jun 2014 20:28:08 +0200
>> From: Nikolai Weibull <n...@disu.se>
>>
>> On Sat, Jun 7, 2014 at 5:39 PM, Nikolai Weibull <n...@disu.se> wrote:
>>
>> > It’s beyond me how this isn’t working, but I’m sure I’m doing something wrong.
>>
>> I should perhaps also note that the only word in the sentence above
>> that is seen as an error is “isn’t”, as “isn” isn’t a word. I guess
>> either Emacs or hunspell is ignoring single-character words “s” and
>> “m” after each of the other instances of ‘’’ and “It” and “I” are of
>> course seen as correctly spelled words…

> Emacs just goes with whatever the .aff file of the dictionary you use
> says. And it cannot do anything else, because the speller uses that
> dictionary, and decides by its rules what can and what cannot be in a
> word.

Yes, I realize that, but that raises the question of how ‘isn’t’ will
be parsed if I straight up add ’ to WORDCHARS, but I guess that only
matters for the curses interface that I don’t use.

> Look in the .aff file you use, and you will see that it knows about '
> and about n't and about 's, that's why these work. There's no magic
> here.

OK, so having read hunspell(5), it seems that my .aff that comes from
OpenOffice doesn’t include “n't” as a possible SFX.

The .dic does list the word “isn't”, however, so I’m not sure what to
make of this.

The one from SCOWL, version 7.1.0, looks about the same as the OpenOffice one.

The one from Mozilla is also about the same.

> So I think you must get a hold of a Hunspell-compliant dictionary that
> knows about the ’ apostrophe.

Yes, I suppose so.

One solution that seems to work is to add ‘’’ (or ‘'’ to WORDCHARS and
then change ispell-dictionary-alist to include ‘’’ in the OTHERCHARS
element. This works with hunspell 1.3.3 (which was released a couple
of days ago and still doesn’t include the patch for handling offsets
correctly).

Perhaps this should be handled automatically for OTHERCHARS in ispell.el?

Emanuel Berg

unread,

Jun 10, 2014, 8:04:10 PM6/10/14

to

Nikolai Weibull <n...@disu.se> writes:

> How do I get Emacs to play nice with Hunspell and
> apostrophes. I thought I had it covered, but it
> seems that something has changed and now M-x ispell
> won’t recognize “isn’t” as a word anymore.

“isn’t” isn't an error according to my spellchecker,
aspell.

But before I go on about that, I agree with everyone
else saying don't use those silly chars - what's the
benefit? They look stupid and they bring along problems
like this (though not for aspell, it would seem, but
for Hunspell in your case, and in other situations as
well). Again, what's the gain using them?

(setq ispell-program-name "aspell")
(setq ispell-dictionary "english")
(setq ispell-silently-savep t)

Ignore within special delimiters:

(add-to-list 'ispell-skip-region-alist '("`" . "`"))
(add-to-list 'ispell-skip-region-alist '("`" . "'"))

> First off, what English dictionary should I be using?

With aspell, get it from the repositories - likewise
dictionaries, which are called aspell-en, aspell-sv,
etc.

More aspell:

http://user.it.uu.se/~embe8573/conf/emacs-init/spell.el

To really get the blood pumping behind your ears, you
need shortcuts for the different dictionaries, as well.

--
underground experts united:
http://user.it.uu.se/~embe8573

Nikolai Weibull

unread,

Jun 11, 2014, 1:23:59 AM6/11/14

to Emanuel Berg, Emacs Users

On Wed, Jun 11, 2014 at 2:04 AM, Emanuel Berg <embe...@student.uu.se> wrote:
> Nikolai Weibull <n...@disu.se> writes:

> But before I go on about that, I agree with everyone
> else saying don't use those silly chars - what's the
> benefit?

No one said that.

Emanuel Berg

unread,

Jun 11, 2014, 10:24:49 AM6/11/14

to

Nikolai Weibull <n...@disu.se> writes:

>> But before I go on about that, I agree with everyone
>> else saying don't use those silly chars - what's the
>> benefit?
>
> No one said that.

OK, so I did. That is beside the point. Still, why use
it? It looks stupid (people are used to the other way)
and it isn't practical, as your post shows. Also, you
use other chars that aren't practical - the three dots
as one char, for example. Why? If you don't have real
problems, I guess you can always be a snob and get a
bunch of artificial problems, that you can then pretend
to solve.

Nikolai Weibull

unread,

Jun 11, 2014, 11:03:21 AM6/11/14

to Emanuel Berg, Emacs Users

I don’t have any interest in creating problems, real or otherwise, but
you sure seem to want to, which is why I won’t discuss this further
with you. (And now I can at least pretend to have solved this
thread’s troll problem.)

Emanuel Berg

unread,

Jun 11, 2014, 11:20:31 AM6/11/14

to

Nikolai Weibull <n...@disu.se> writes:

> I don’t have any interest in creating problems, real
> or otherwise, but you sure seem to want to, which is
> why I won’t discuss this further with you.

You still haven't said one word why anyone would
benefit from using those chars instead of the standard
" and ' (and ...) that works everywhere and that
everyone is familiar with (having trained their eyes
for them year-in, year-out).

If you can't motivate why something is a problem, it is
not a problem.

Nonetheless I suggested another solution (using aspell,
the first suggestion being stop using those chars).

> (And now I can at least pretend to have solved this
> thread’s troll problem.)

(No comments.)

Teemu Likonen

unread,

Jun 11, 2014, 12:57:04 PM6/11/14

to help-gn...@gnu.org

Emanuel Berg [2014-06-11 17:20:31 +02:00] wrote:

> You still haven't said one word why anyone would benefit from using
> those chars instead of the standard " and ' (and ...) that works
> everywhere and that everyone is familiar with (having trained their
> eyes for them year-in, year-out).

For instance, every book uses real quotation marks and apostrophes. They
are standard in the publishing world. Many people use Emacs to write
text that will be published (web, printed material).

signature.asc

Emanuel Berg

unread,

Jun 11, 2014, 5:32:38 PM6/11/14

to

Teemu Likonen <tlik...@iki.fi> writes:

> For instance, every book uses real quotation marks
> and apostrophes. They are standard in the publishing
> world. Many people use Emacs to write text that will
> be published (web, printed material).

Well, OK, sort of...

On the other hand, that would typically be produced
with LaTeX. (In latex-mode, there are some annoying ``
and '' automatically when you do " - I never learned
the reason for that.) But in short forms like "isn't",
at least I simply write ' in the LaTeX source - I
haven't noticed how they turn out - probably that can
be tweaked (I'll get back to you on that, as I happen
to work on such a document right now).

For web material I think '/" is preferable still,
because people like to yank it into mails and the like,
and it would just be extra work having them change to
'/" whenever that happens.

Some examples from the computer world:

A quote from the ls man page:

-G, --no-group
in a long listing, don't print group names

From the emacs ditto:

-Q, --quick
Similar to "-q --no-site-file --no-splash".

From RFC 3676:

If the line is flowed and DelSp is "yes", the
trailing space immediately prior to the line's CRLF
is logically deleted. If the DelSp parameter is
"no" (or not specified, or set to an unrecognized
value), the trailing space is not deleted.

And speaking of mails - we are using mails/posts right
now - so why use it in mails and Usenet posts?

On a more general/human scale: you are Finish (I take
it), I am Swedish. (I'm not reeling you or anyone else
to my side, just stating facts.) We have acquired
English and use it because we accept that is very
practical and it is simply how it works. Not to mention
the Russian and Chinese who had to get fluent with a
whole new alphabet and language system! Or this quote
from this thread: "In Britain and Ireland we generally

use "isn't", notice there's no angle on the

apostrophe." Besides virtually all US computer people
use '/" what I can tell!

So yes, I feel it is close to arrogance that the OP
cannot in one word tell me why this would benefit
anyone, and even more so as I actually tried to help
him in my first post!

Anyway, I'm not angry or anything. Peace in the Middle
East. Feel free to carry on this discussion though (of
course).

Yuri Khan

unread,

Jun 12, 2014, 1:43:24 AM6/12/14

to Emanuel Berg, help-gn...@gnu.org

On Wed, Jun 11, 2014 at 10:20 PM, Emanuel Berg <embe...@student.uu.se> wrote:

> You still haven't said one word why anyone would
> benefit from using those chars instead of the standard
> " and ' (and ...) that works everywhere and that
> everyone is familiar with (having trained their eyes
> for them year-in, year-out).

The fact that everybody uses " and ' and ` is a historical artifact, a
workaround of sorts, due to the limitations of the mechanical
typewriter. We need not be affected by it any more.

There was no possibility of including all the required typographical
characters or accented letters into the printing ball, so both quotes
(“ and ”) and the diaeresis got conflated into a straight quote ",
both single quotes (‘ and ’) into a straight single quote/apostrophe
', and the backtick ` and tilde ~ were there to facilitate typing
accented letters.

This limitation then crept into computers, because this way the
character set could be encoded in 7 bits. The computer keyboard was
just modeled after the typewriter keyboard, with a few extensions.

Then the inevitable struck: computers expanded from the US and UK into
Germany, Sweden, Finland, France, Canada, and then countries with
non-Latin scripts (Greek, Cyrillic, and CJK). And all of them wanted
to have dedicated code points for their characters, e.g. type a single
ä instead of [a, backspace-no-delete, "].

For a good while, we lived in a nightmare of ten thousand code pages.
In Russia, you could receive an email and see a jumble of utterly
meaningless words because the message could be re-encoded (or the
Content-Type charset= stripped or re-labeled) on any of the
intermediate servers; there existed programs which were able to
heuristically detect the chain of re-encodings applied on the way and
decode your message for you. You could order a book in an Internet
shop, have them completely b0rk up the encoding of the shipping
address:
http://cdn.imagepush.to/in/625x2090/i/3/30/301/24.jpg
Then somebody at the postal system might decode the characters and the
package would still be delivered at the intended address.

Now that every widely used operating system supports Unicode, we don’t
have an excuse for clinging to those workarounds of the past century.
We are not limited by the 7-bit ASCII encoding and can store texts in
their true form. We also are not constrained by the typewriter
keyboard, having input methods based on Compose or Level3 allowing us
to conveniently enter all the necessary diverse characters. On
X11/GNU/Linux in particular it comes bundled with the system; on
Windows, one has to install a third-party package.

Much of the software has already evolved to support Unicode. That
which hasn’t, has to catch up. From a spell checker, in particular, I
expect that it should (perhaps with an optional switch) be able to
flag as error any spelling of “isn’t” where the character between n
and t is not the preferred apostrophe character U+2019.

Stefan Monnier

unread,

Jun 12, 2014, 8:51:44 AM6/12/14

to help-gn...@gnu.org

> The fact that everybody uses " and ' and ` is a historical artifact, a
> workaround of sorts, due to the limitations of the mechanical
> typewriter. We need not be affected by it any more.

The limitation of "number of keys on a keyboard" along with the
limitations of the human brain mean that it's still very convenient for
users to be able to just use " and ' rather than having umpteen
subvariants and needing to remember which to use where and how to type
them in.

Stefan

Nikolai Weibull

unread,

Jun 12, 2014, 9:36:31 AM6/12/14

to Stefan Monnier, Emacs Users

Remembering when to use which of four symbols is hardly taxing (and –
even when considering additional “variants” such as ‘′’, ‘″’ for prime
and double prime – not close to the definition of umpteen, I’d say),
though the “how to type them in” arguments deserves a bit more
consideration, such as the automatic replacement that many editors
perform. Personally, keyboard bindings such as \C-k ' 9 (from Vim and
now Evil) are wired deep into my fingers, so much so that I still
haven’t been able to move over to using the more convenient & ' 9 from
the rfc1345 input method.

Stefan Monnier

unread,

Jun 12, 2014, 10:48:08 AM6/12/14

to Nikolai Weibull, Emacs Users

> Remembering when to use which of four symbols is hardly taxing (and –

I'd expect that most people would first have to *learn* before they
could have a chance at remembering. After all, these are typographical
conventions that aren't taught at school.

And then seeing how people mix up "your/you're" and friends, I think
you're being overly optimistic.

Stefan

Eli Zaretskii

unread,

Jun 12, 2014, 12:58:21 PM6/12/14

to help-gn...@gnu.org

> Date: Thu, 12 Jun 2014 12:43:24 +0700
> From: Yuri Khan <yuri....@gmail.com>
> Cc: "help-gn...@gnu.org" <help-gn...@gnu.org>

>
> From a spell checker, in particular, I expect that it should
> (perhaps with an optional switch) be able to flag as error any
> spelling of “isn’t” where the character between n and t is not the
> preferred apostrophe character U+2019.

You cannot expect that from a speller. You should expect that from
people who produce the dictionaries for the speller, because it's the
dictionary files that tell the speller which characters can and cannot
appear in a word, and which suffixes can and cannot be appended to a
word for it to remain correctly spelled.

Hunspell already supports all that, it's just your dictionary that
doesn't. Look at the *.aff files to understand how the ' apostrope
works, and you will see why the speller is not the issue here.

Emanuel Berg

unread,

Jun 13, 2014, 9:35:05 PM6/13/14

to

First, let me tell you I very much appreciated this
post!

We agree that ', ", and the rest of the non-Unicode
chars that may (not) be used in more or less the same
context - we agree that those are there (not there) for
techno-historical reasons.

Where we *don't* agree is that you think that, if I'm
allowed to pseudo-quote you:

- Today, now that there aren't any technical
limitations, we should go for the more advanced
chars.

Here is where I say:

Just because it is possible, doesn't mean it is desired
if there is no gain. It is possible to change all the
software in the world to be able to use those
chars. But why? For the reasons you stated, in the
Internet and Usenet and otherwise computer culture,
many, many people have come to use English, and the 7-
(or 8) bits chars have spread and became a de facto
standard. So people's eyes and brains and fingers are
trained to use those. We have all came together from
different starting points. The UK and US people had to
go the shortest way (as the pioneers, perhaps they
earned it). The Swedes had to learn English. The
Russians had to go somewhat further because Russian is
farther from English than Swedish. And so on. So when
we finally have something in common - why break it just
because it is possible? With some computer languages
like Java it is possible for me to program in Swedish,
using the ä, å, and ö. But why would I want to do that?
It would bring havoc to my brain as the rest of the
language would still be English. But more importantly,
it would isolate my program from the rest of the
world. I couldn't communicate about it (ask questions,
tell people about it with the support of code snippets,
etc.) and it couldn't be configured/extended by a
non-Swedish speaking person. So I'll just stick to C,
in English. Just as I will stick to ' as that is the
correct way (as I see it) to write in "Computer
English".

Emanuel Berg

unread,

Jun 13, 2014, 9:49:59 PM6/13/14

to

Nikolai Weibull <n...@disu.se> writes:

> Remembering when to use which of four symbols is
> hardly taxing (and – even when considering additional
> “variants” such as ‘′’, ‘″’ for prime and double
> prime – not close to the definition of umpteen, I’d
> say), though the “how to type them in” arguments
> deserves a bit more consideration, such as the
> automatic replacement that many editors perform.

The “ and ’ just looks silly and they are
disruptive. The two chars after the words "such as" I
cannot see (they are shown as diamonds).

As for remembering/typing, it is again not a question
of - "is it possible to do?" - not with respect to
humans nor to technology - of course it is possible! -
the question is - and what I can see you still haven't
answered it with one word - the question is *why* -
what is the gain? who would benefit from it, and how
so?

This entire thread is an example why not to do it
(though I agree a spellchecker should be fixed to cope,
anyway, as some people have the poor taste to use those
chars and those have to be accounted for) - and I just
raised additional problems, on top of the fact that so
much software around is just not up to it - so why this
is (and can be) a problem (annoyance) is clear - the
only thing that is a mystery is why anyone would want
it to begin with.

> Personally, keyboard bindings such as \C-k ' 9 (from
> Vim and now Evil) are wired deep into my fingers, so
> much so that I still haven’t been able to move over
> to using the more convenient & ' 9 from the rfc1345
> input method.

OK, let me tell you how I do ' and ". ' I do by moving
my right little finger one step (key) to the right. The
" I do by moving the right little finger to the right
shift, at the same time as the ring finger slides along
to the ' key.

So can you find one singe area in which anyone (human
or technology) benefits from those goofy chars?

It is just snobbish, not reality. Don't do it!

Emanuel Berg

unread,

Jun 13, 2014, 10:38:55 PM6/13/14

to

Yuri Khan <yuri....@gmail.com> writes:

> You could order a book in an Internet shop, have them
> completely b0rk up the encoding of the shipping
> address:
>
> http://cdn.imagepush.to/in/625x2090/i/3/30/301/24.jpg
>
> Then somebody at the postal system might decode the
> characters and the package would still be delivered
> at the intended address.

Ha-ha, unbelievable! How did that happen? First you
wrote in Russian at the Internet shop's web page - then
it got like that because of them translating Unicode
(?) to ISO-8859-1 (which is 8-bit, with the ASCII as
its lower half) - ? Why didn't the Internet shop do it?
Did they actually think that was a language or some
transcription of Russian? How was it translated to
Russian at the postal office? I can only make out the
first line: Russia, Moscow.

Emanuel Berg

unread,

Jun 14, 2014, 1:19:25 AM6/14/14

to

Emanuel Berg <embe...@student.uu.se> writes:

>> Yuri Khan writes:
>>
>> You could order a book in an Internet shop, have
>> them completely b0rk up the encoding of the shipping
>> address:
>>
>> http://cdn.imagepush.to/in/625x2090/i/3/30/301/24.jpg
>>
>> Then somebody at the postal system might decode the
>> characters and the package would still be delivered
>> at the intended address.
>
> Ha-ha, unbelievable! How did that happen? First you
> wrote in Russian at the Internet shop's web page -
> then it got like that because of them translating
> Unicode (?) to ISO-8859-1 (which is 8-bit, with the
> ASCII as its lower half) - ? Why didn't the Internet
> shop do it? Did they actually think that was a
> language or some transcription of Russian? How was it
> translated to Russian at the postal office? I can
> only make out the first line: Russia, Moscow.

I read an article on this:

Pre-1990s: the 7-bit period. US-ASCII, with ISO 646 in
Scandinavia and Finland (with 0x5B-D and 0x7B-D
replaced with national chars: those were [ \ ]
and { | } respectively in the US-ASCII).

The 90s: 8-bits. ISO 8859-1 with the ASCII as its lower
half. Russian: KOI8-R, ISO 8859-5, and CP1251.

2000s: the multi-byte era. EUC and ISO 2022-JP for
CJK. Linux moves from 8859 to UTF-8, an
ASCII-compatible implementation of the likely future
standard Unicode/ISO 10646.

Yuri Khan

unread,

Jun 14, 2014, 1:45:50 AM6/14/14

to Emanuel Berg, help-gn...@gnu.org

On Sat, Jun 14, 2014 at 8:49 AM, Emanuel Berg <embe...@student.uu.se> wrote:

> The “ and ’ just looks silly and they are
> disruptive. The two chars after the words "such as" I
> cannot see (they are shown as diamonds).

This is where I disagree. Curly quotes (and, in Russian print
tradition, double angle quotes) are what I am used to seeing in print
and consider to be the correct way to write, independent of the
medium. Straight quotes I recognize in both print and on screen as a
no longer necessary homage to the old clunky typewriter, and perceive
as silly.

As for your problems seeing curly quotes, that’s because of your
display engine. Text mode Linux console is limited to at most 512
character shapes; this limitation dates back to the original VGA card
and is another one that should no longer affect us. Nowadays, you
should be able to use a graphical-based text renderer — be it X11 or
framebuffer. Myself, I haven’t bothered to set up a framebuffer
console on any of my computers — I prefer working in an X11
environment with Freetype-rendered, subpixel-antialiased Unicode fonts
and rich xkb customizability.

> the question is *why* -
> what is the gain? who would benefit from it, and how
> so?

By encoding more precise character semantics into our texts, we make
them better suited for any kind of automated processing. Conflating
similarly shaped characters, on the other hand, makes it more
complicated.

For example, the task of producing nice printouts from an
ASCII-encoded source requires a complex piece of software like
[La]TeX, or the mechanism of entity references in HTML (“). On
the other hand, with UTF-8, we can directly encode the desired
characters in a text document and print it out with any text editor or
web browser.

(You can, of course, argue that a printout of an ASCII document with
straight quotes is not too ugly; or that TeX is not exceedingly
complex; or that entity references are not very disrupting.)

> OK, let me tell you how I do ' and ". ' I do by moving
> my right little finger one step (key) to the right. The
> " I do by moving the right little finger to the right
> shift, at the same time as the ring finger slides along
> to the ' key.

Now let me tell you how I do curly quotes.

First, with my right thumb, I hold the AltGr modifier. Then, I press k
and l in sequence to get a balanced pair of double curly quotes, or ;
and ' for single quotes (I customized my xkb configuration files to
get this but it works similarly with the out-of-the-box config). This
works for me in both Latin/English and Cyrillic/Russian layouts. On
the other hand, the straight quote is only available in the Latin
layout; in Russian, I would have to first switch to Latin, then type
the single quote, and finally switch back to Russian.

Yuri Khan

unread,

Jun 14, 2014, 3:11:51 AM6/14/14

to Emanuel Berg, help-gn...@gnu.org

On Sat, Jun 14, 2014 at 9:38 AM, Emanuel Berg <embe...@student.uu.se> wrote:
> Yuri Khan <yuri....@gmail.com> writes:
>
>> You could order a book in an Internet shop, have them
>> completely b0rk up the encoding of the shipping
>> address:
>>
>> http://cdn.imagepush.to/in/625x2090/i/3/30/301/24.jpg
>>
>> Then somebody at the postal system might decode the
>> characters and the package would still be delivered
>> at the intended address.
>
> Ha-ha, unbelievable! How did that happen? First you
> wrote in Russian at the Internet shop's web page - then
> it got like that because of them translating Unicode
> (?) to ISO-8859-1 (which is 8-bit, with the ASCII as
> its lower half) - ? Why didn't the Internet shop do it?

First I must say it’s not mine and likely not a common occurrence for
the Russian Post which is nowadays notorious for its lack of customer
orientedness.

In technical terms, I can think of the following sequence of events:

* The user comes to a website containing an order form. (The form
contains a free input <textarea> for the street address and possibly
an <input> for the recipient name, and a <select> for the country. The
latter ensures that the word RUSSIE is printed in its legible form.)
* The user enters her address and name into the web form, in Russian;
also selects Russian Federation from the country dropdown.
* The browser encodes the address in KOI8-R, one of the three code
pages used in Russia. In this encoding, the string Москва (Moscow) has
the following byte representation: ED CF D3 CB D7 C1. (The KOI8-R
encoding was designed in such a way that it remains readable if the
high bit is stripped: mOSKWA. Too bad the links were already
8-bit-clean at the time Harry Potter was published.)
* The browser sends the form data to the web server, labeled as
Content-Type: application/x-www-form-urlencoded; encoding=KOI8-r. (At
that time, Unicode was not as ubiquitous as it is now; browsers
operated in an encoding that best matched the user’s input.)
* The web server passes the form data to the backend script (Perl CGI
or possibly PHP running as a module).
* The backend script disregards the encoding= parameter, reinterprets
the string as if it were encoded in ISO-8859-1 (or possibly
windows-1252, which is an extension of ISO-8859-1). The byte
representation ED CF D3 CB D7 C1 decodes into íÏÓË×Á (small i with
acute, capital I with diaeresis, capital O with acute, capital E with
diaeresis, multiplication sign, capital A with acute). This string
then gets stored in the database (which is likely configured to
operate in ISO-8859-1 or windows-1251) and lives happily ever after.

> Did they actually think that was a language or some
> transcription of Russian?

Most probably, at the time a human being at the merchant side got
involved, the address was already mangled. They did not have the
knowledge of Russian code pages, and decided to make a best reasonable
effort — “send it as is and let those crazy Russians sort it out”.

> How was it translated to
> Russian at the postal office? I can only make out the
> first line: Russia, Moscow.

The package contains two pieces of information — the country name in
French (RUSSIE) and the postal code 119415 — which get the package to
the postal office 119415 at 14 Udaltsov street in Moscow, near the
customer’s place of residence. (Postal codes are unique within Russia,
the first three digits unambiguously identifying the city.)

https://goo.gl/maps/4TK3D (pin at the post office building).

The worker at the post office might be familiar with both the KOI8-R
and Windows-1250 encoding tables, but that is highly unlikely.

Alternatively, the worker might regard the mysteriously labeled
package as a peculiar form of a substitution cypher puzzle. [Challenge
Accepted] He takes a red pen and starts scribbling right on the
package.

* First, he notices that the two middle letters in the first word are
identical, and guesses that this word must be Rossi[ya] (Russia).
* This allows him to decode two letters of the next word, which can
then be guessed as Moskva (Moscow) — what else could be
* Substituting the known letters into the customer’s first name gives
“Св***а**” (Sv***a**), which the postal worker recognizes as Svetlana
(a fairly common Russian feminine name, and the most common of those
starting with Sv). (The last letter does not match because of
grammatical case declension.)
* This now gives enough information to decode and guess the street as
pr. Vernadskogo and deliver the package to the Moscow State University
dormitory at Vernadskogo, 37 (other marker at the map linked above),
room 1817-1. Probably also lecture Svetlana that, until all web sites
embrace Unicode, it’s safer to write your address in transliteration.

Now, while this all makes for great war stories, it Should Not be
necessary. Unicode should be used in all stages of Internet shop order
processing, and addresses written in any local language should be
deliverable without post office workers having to solve a challenge.

Yuri Khan

unread,

Jun 14, 2014, 3:37:02 AM6/14/14

to Emanuel Berg, help-gn...@gnu.org

On Sat, Jun 14, 2014 at 12:19 PM, Emanuel Berg <embe...@student.uu.se> wrote:
> Emanuel Berg <embe...@student.uu.se> writes:
>
> I read an article on this:
>

> The 90s: 8-bits. ISO 8859-1 with the ASCII as its lower
> half. Russian: KOI8-R, ISO 8859-5, and CP1251.

http://bash.im/quote/423

Translated: “They there think that our streets are roamed by bears
with balalaikas who speak in iso-8859-5.”

To my knowledge, Russia has never spoken iso-8859-5. It is regarded as
a design-by-committee encoding bearing no connection to the needs of
real-world users.

DOS used the IBM code page 866. Windows used Windows-1251 and users
had to convert to and from DOS 866. Linux old-faith users used (and
some still use) KOI8-R. Internet users had to be able to convert at
least between Windows-1251 and KOI8-r.

Eli Zaretskii

unread,

Jun 14, 2014, 4:28:30 AM6/14/14

to help-gn...@gnu.org

> Date: Sat, 14 Jun 2014 14:37:02 +0700

> From: Yuri Khan <yuri....@gmail.com>
> Cc: "help-gn...@gnu.org" <help-gn...@gnu.org>
>

> [iso-8859-5] is regarded as a design-by-committee encoding bearing

> no connection to the needs of real-world users.

I guess you mean non-Russian committee, because who do you think
invented KOI8-R? some private Russian citizen? KOI8-R was defined by
a Soviet State Standard GOST-19768-74, which already means at least
one committee was involved.

(Interested readers should see http://czyborra.com/charsets/cyrillic.html
for more details.)

No one really thinks Russian cities are full of bears with balalaikas,
but please don't pretend non-Russian cities are full of fools who
cannot design a character set on a good day. Next we will probably
hear that no one except Russians can design railways, because the only
correct standard of railway track width is the Russian one. Sheesh...

There should be no place for such bigotry on this forum.

Yuri Khan

unread,

Jun 14, 2014, 6:46:34 AM6/14/14

to Eli Zaretskii, help-gn...@gnu.org

On Sat, Jun 14, 2014 at 3:28 PM, Eli Zaretskii <el...@gnu.org> wrote:

>> [iso-8859-5] is regarded as a design-by-committee encoding bearing
>> no connection to the needs of real-world users.
>
> I guess you mean non-Russian committee, because who do you think
> invented KOI8-R? some private Russian citizen? KOI8-R was defined by
> a Soviet State Standard GOST-19768-74, which already means at least
> one committee was involved.
>
> (Interested readers should see http://czyborra.com/charsets/cyrillic.html
> for more details.)

Sure. KOI-8 is from GOST-19768-74, and ISO-8859-5 was derived from
GOST-19768-87. The former had the advantage of being already
established and widely used, so the latter never caught on. The next
incompatible standard, CP866, had the advantage of preserving
pseudographic characters from CP437 used on PCs, so became a de facto
standard for DOS even before Microsoft started officially supporting
it.

> No one really thinks Russian cities are full of bears with balalaikas,
> but please don't pretend non-Russian cities are full of fools who
> cannot design a character set on a good day. Next we will probably
> hear that no one except Russians can design railways, because the only
> correct standard of railway track width is the Russian one. Sheesh...
>
> There should be no place for such bigotry on this forum.

Of course not. My point is that it is a frequent misconception among
Europeans that ISO-8859-5 was in any position to be a standard
encoding for Russia.

The other point is that, from the current standpoint, all of these
encodings are horrible and must die in favor of UTF-8 as soon as
possible.

Emanuel Berg

unread,

Jun 14, 2014, 7:14:45 AM6/14/14

to

Yuri Khan <yuri....@gmail.com> writes:

>> The “ and ’ just looks silly and they are
>> disruptive. The two chars after the words "such as"
>> I cannot see (they are shown as diamonds).
>
> This is where I disagree. Curly quotes (and, in
> Russian print tradition, double angle quotes) are
> what I am used to seeing in print and consider to be
> the correct way to write

OK, I believe you. However, the point I made with all
people coming from different cultures is that it
doesn't matter where we are from individually. When I
went to school, I suppose I was most comfortable with
Swedish. But I'm not supposing we all switch to
Swedish! OK, that's a ridiculous example as it is
extreme, while what we discuss now is perhaps trivial
(' or ’) - but in principle it is the same. The
computer language is English, and as I showed - the man
pages for ls and emacs, as well as the RFC excerpt, as
well as all experience with mails and Usenet and
programming culture - all show that in "Computer
English", ' (not ’) is correct. In a sense, this
language is something that even the US, UK, etc. people
have to acquire, though in another way altogether, of
course. You see, kernel, allocation, dynamic, data
structure, heap, process, deadlock, etc. are all
English words. But put together a sentence and show it
to a surfer in Southern California. You know what I'm
saying? (By the way, do you know what they call a guy
in Southern California who is interested in cars?
Well, a "sensitive intellectual" :)) - now, the Scots
and Irish are of course not calling their variables
McDigit or O'String, but do they write <centre>,
DialogueBox, background-colour, and so on? No - in
Computer English it is <center>, DialogBox, and
background-color, just as it is ', not ’.

> independent of the medium

There is no such independence. There are computers.

> Straight quotes I recognize in both print and on
> screen as a no longer necessary homage to the old
> clunky typewriter, and perceive as silly.

They are not homages to anything - they exist. It is of
course interesting to know why they are there but as
for as for this discussion it doesn't matter. What
matters is that they are there, they exist.

> As for your problems seeing curly quotes, that’s
> because of your display engine.

Yes, another reason why not to use them.

> Text mode Linux console is limited to at most 512
> character shapes; this limitation dates back to the
> original VGA card and is another one that should no
> longer affect us. Nowadays, you should be able to use
> a graphical-based text renderer — be it X11 or
> framebuffer. Myself, I haven’t bothered to set up a
> framebuffer console on any of my computers — I prefer
> working in an X11 environment with Freetype-rendered,
> subpixel-antialiased Unicode fonts and rich xkb
> customizability.

The Linux console is faster with text than Emacs
running in for example xterm. I could get a faster
computer hypothetically but then I'd also have to spend
hours getting the keyboard and fonts and everything as
I want them. But I already have that, so why do it? But
I don't think the console is that much "better" than
X/xterm in general - just in my case with all the
configuration, I'm very happy with that and see no
reason to do it again in X. And certainly not for this
reason...

> By encoding more precise character semantics into our
> texts, we make them better suited for any kind of
> automated processing. Conflating similarly shaped
> characters, on the other hand, makes it more
> complicated.
>
> For example, the task of producing nice printouts
> from an ASCII-encoded source requires a complex piece
> of software like [La]TeX, or the mechanism of entity
> references in HTML (“). On the other hand, with
> UTF-8, we can directly encode the desired characters
> in a text document and print it out with any text
> editor or web browser.
>
> (You can, of course, argue that a printout of an
> ASCII document with straight quotes is not too ugly;
> or that TeX is not exceedingly complex; or that
> entity references are not very disrupting.)

ASCII doesn't look ugly printed, it looks the same as
it does on computers. But the main propose of ASCII of
course isn't to be printed but to be processed and
crunched... and read (on computers).

I can't say I have that much respect for HTML as a
technical system but yes, I think ' should be used,
both when typing and in presentation - where the
material will be read in a browser (i.e., a computer
program) and sometimes yanked to a mail or post or
configuration file.

LaTeX is indeed complex but it is for a good reason -
so there won't be any limitations creating complex
documents. When you print LaTeX I don't really care
what the chars look like because with LaTeX you
typically print ambitious documents of several pages so
then you get into the flow when reading, so you stop
thinking about the chars really fast. However, every
code/configuration file snippet, man page quote and so
on should use '. Also, when you write LaTeX, only '
(and the like) should be used just as is the case for
programming, HTML, and all other computer writing and
programming. But after that, when a PDF has been
created, that is sort of beyond the dynamic world of
computers and more into the book world - there, I don't
see any real benefits of using either ' or ’. However,
since it doesn't really matter, why not stick to ' as
it is the de facto standard?

>> OK, let me tell you how I do ' and ". ' I do by
>> moving my right little finger one step (key) to the
>> right. The " I do by moving the right little finger
>> to the right shift, at the same time as the ring
>> finger slides along to the ' key.
>
> Now let me tell you how I do curly quotes.
>
> First, with my right thumb, I hold the AltGr
> modifier. Then, I press k and l in sequence to get a
> balanced pair of double curly quotes, or ; and ' for
> single quotes (I customized my xkb configuration
> files to get this but it works similarly with the
> out-of-the-box config). This works for me in both
> Latin/English and Cyrillic/Russian layouts. On the
> other hand, the straight quote is only available in
> the Latin layout; in Russian, I would have to first
> switch to Latin, then type the single quote, and
> finally switch back to Russian.

Yes, but when you program and write in English (like
now), don't you use the US keyboard layout? That's what
I do to get the brackets and the semicolon and all that
with no fuss - it is not that I use the Swedish chars
that much, anyway! (Which is again the whole point.)
And with the US layout, ' (and so on) are easier to
type than the chars you suggest.

Emanuel Berg

unread,

Jun 14, 2014, 7:20:10 AM6/14/14

to

Yuri Khan <yuri....@gmail.com> writes:

> First I must say it’s not mine and likely not a
> common occurrence for the Russian Post which is
> nowadays notorious for its lack of customer
> orientedness.

That was my first thought, what cool guys you have
working the mails!

> Now, while this all makes for great war stories, it
> Should Not be necessary. Unicode should be used in
> all stages of Internet shop order processing, and
> addresses written in any local language should be
> deliverable without post office workers having to
> solve a challenge.

Of course I agree, however sending your MGU address to
some company is sort of beyond this discussion. Of
course there should be ways to communicate in Russian
with computers. Yes, a very cool story!

Emanuel Berg

unread,

Jun 14, 2014, 7:21:07 AM6/14/14

to

Yuri Khan <yuri....@gmail.com> writes:

> To my knowledge, Russia has never spoken iso-8859-5.

OK - I got that from the Wikipedia article on IRC.

Yuri Khan

unread,

Jun 14, 2014, 10:51:43 AM6/14/14

to Emanuel Berg, help-gn...@gnu.org

On Sat, Jun 14, 2014 at 6:14 PM, Emanuel Berg <embe...@student.uu.se> wrote:
> Yuri Khan <yuri....@gmail.com> writes:
>
>> Curly quotes (and, in
>> Russian print tradition, double angle quotes) are
>> what I am used to seeing in print and consider to be
>> the correct way to write
>
> OK, I believe you. However, the point I made with all
> people coming from different cultures is that it
> doesn't matter where we are from individually. When I
> went to school, I suppose I was most comfortable with
> Swedish. But I'm not supposing we all switch to
> Swedish!

OK, so what? I expect that people of all cultures who were exposed to
books printed before the advent of the computer and the word processor
are used to typographic characters.

> OK, that's a ridiculous example as it is
> extreme, while what we discuss now is perhaps trivial
> (' or ’) - but in principle it is the same. The
> computer language is English, and as I showed - the man
> pages for ls and emacs, as well as the RFC excerpt, as
> well as all experience with mails and Usenet and
> programming culture - all show that in "Computer
> English", ' (not ’) is correct.

They are that way because they were written in the dark age of ten
thousand code pages and never updated to Unicode.

The GCC error messages in the en_US.utf8 locale, on the other hand, do
use curly quotes.

>> Straight quotes I recognize in both print and on
>> screen as a no longer necessary homage to the old
>> clunky typewriter, and perceive as silly.
>
> They are not homages to anything - they exist. It is of
> course interesting to know why they are there but as
> for as for this discussion it doesn't matter. What
> matters is that they are there, they exist.

They exist *because* there was a certain technical limitation in the
last fifty years or so. Since this limitation has been removed, there
is no reason for them.

OK, I do not suggest that Perl should drop its backtick operator or
that computer languages universally start using curly quotes for
character and string literals (although that would make many languages
more elegant by simplifying parsing). But how about we reserve all
these artificial characters for computer languages, one of which
English is not.

>> As for your problems seeing curly quotes, that’s
>> because of your display engine.
>
> Yes, another reason why not to use them.

I believe users of the VGA text console are intelligent beings and
respect their decision to suffer.

> I can't say I have that much respect for HTML as a
> technical system but yes, I think ' should be used,
> both when typing and in presentation - where the
> material will be read in a browser (i.e., a computer
> program) and sometimes yanked to a mail or post or
> configuration file.

For configuration files, by all means, the character which is proper
for that particular file format must be used.

Otherwise, primarily, the material will be read by a human being, and
only secondarily in a computer program. I wish for a future where the
Web replaces the printed book, therefore, the Web must do all things
books do, and then some.

> LaTeX is indeed complex but it is for a good reason -
> so there won't be any limitations creating complex
> documents. When you print LaTeX I don't really care
> what the chars look like because with LaTeX you
> typically print ambitious documents of several pages so
> then you get into the flow when reading, so you stop
> thinking about the chars really fast.

No. If I have to read a printed document, every straight quote, every
hyphen used in place of a dash, every uneven space, pulls me out of
the flow. The only way for me to stop thinking about the characters is
if they are exactly as in a book typeset by a skilled typesetter on a
pre-computer-era press.

Yes, LaTeX does a lot to produce a beautifully typeset printout from
an ASCII source. This is not enough; I want that same beautiful
typesetting on screen, in browser, in any page width I happen to have,
in my favorite typeface and font size, without having to recompile the
document. And at the same time, it does too much. It has to maintain,
and document authors have to utilize, a multitude of workarounds that
are caused by TeX not using Unicode internally.

> when you program and write in English (like
> now), don't you use the US keyboard layout? That's what
> I do to get the brackets and the semicolon and all that
> with no fuss - it is not that I use the Swedish chars
> that much, anyway! (Which is again the whole point.)
> And with the US layout, ' (and so on) are easier to
> type than the chars you suggest.

The difference between ' and AltGr+' is almost negligible for me.
Additionally, when I use an apostrophe in a string constant in a
language where strings are delimited by single quotes, or double curly
quotes where delimited by double quotes, I don’t have to
backslash-quote them.

I do understand we have engaged in a holy war not directly related to
the original poster’s problem. Let’s agree to disagree.

Teemu Likonen

unread,

Jun 14, 2014, 11:26:46 AM6/14/14

to Yuri Khan, help-gn...@gnu.org, Emanuel Berg

Yuri Khan [2014-06-14 21:51:43 +07:00] wrote:

> Yes, LaTeX does a lot to produce a beautifully typeset printout from
> an ASCII source. This is not enough; I want that same beautiful
> typesetting on screen, in browser, in any page width I happen to have,
> in my favorite typeface and font size, without having to recompile the
> document. And at the same time, it does too much. It has to maintain,
> and document authors have to utilize, a multitude of workarounds that
> are caused by TeX not using Unicode internally.

Yes, but you know, Xelatex with fontspec/mathspec package takes UTF-8
input files and uses Truetype and Opentype fonts. That’s the today’s
Latex actually. I don’t use ``quotes'' and other character-level markup
anymore. Just plain “ ” will do (or ” ”, as we do in Finnish).

signature.asc

Emanuel Berg

unread,

Jun 14, 2014, 12:13:27 PM6/14/14

to

Yuri Khan <yuri....@gmail.com> writes:

>>> Curly quotes (and, in Russian print tradition,
>>> double angle quotes) are what I am used to seeing in
>>> print and consider to be the correct way to write
>> OK, I believe you. However, the point I made with all
>> people coming from different cultures is that it
>> doesn't matter where we are from individually. When I
>> went to school, I suppose I was most comfortable with
>> Swedish. But I'm not supposing we all switch to
>> Swedish!
>
> OK, so what? I expect that people of all cultures who
> were exposed to books printed before the advent of
> the computer and the word processor are used to
> typographic characters.

I'm OK disagreeing but I want you to understand me. The
point is: the cultures are in this discussion
irrelevant. If the cultures were what decided things
you should be speaking Russian and I Swedish. We don't,
because we have travelled to a common point so that
when we interact in the computer world, we are using
the "Computer English" language, which I have described
several times now. This is the English in the man
pages, in the RFCs, in the C code, in the HTML, and all
that. In this language you don't write <mitten> if you
are Swedish, <centre> if you are British, etc., *all*
write <center>, otherwise it doesn't work! Likewise, to
quote in Usenet post we use >, to double quote, >>, and
so on; to mark where the signature starts we use --,
because otherwise highlighting/hiding of the
quotes/signature doesn't work, because the clients are
looking for those specific chars! In "Computer
English", the de facto standard is ' and ", and it
doesn't matter what books anyone read as kids. Because
we are not doing that *now*! All of us have moved to a
common culture which is common for practical reasons -
it is not aesthetics or snobbism, it is reality - and
there is no reason whatsoever to fight it. It only
creates exactly the problems as was the very reason the
OP had to write to this list.

>> OK, that's a ridiculous example as it is extreme,
>> while what we discuss now is perhaps trivial (' or
>> ) - but in principle it is the same. The computer
>> language is English, and as I showed - the man pages
>> for ls and emacs, as well as the RFC excerpt, as
>> well as all experience with mails and Usenet and
>> programming culture - all show that in "Computer
>> English", ' (not ) is correct.
>
> They are that way because they were written in the
> dark age of ten thousand code pages and never updated
> to Unicode.

It doesn't matter. That's the way it is. Like the
sentence I just wrote. I don't care why the English
word for "way" is "way". It just is, and it is very,
very unpractical and extremely arrogant for anyone to
say, I don't like it to be "way", for no reason
whatsoever save for aesthetics (which isn't a consensus
by the way) I like it to be "yaw" - and the argument
for changing, is that there are (of course!) historical
roots for the word "way" being "way" - if someone had
thought about it really hard (and exactly like me,
today) he or she would have decided the word for "way"
should be "yaw" --- it doesn't make any sense!

> They exist *because* there was a certain technical
> limitation in the last fifty years or so. Since this
> limitation has been removed, there is no reason for
> them.

They do not exist because there was a technical
limitation fifty years ago. They exist, today, because
they are useful, today!

> I believe users of the VGA text console are
> intelligent beings and respect their decision to
> suffer.

Forget it. I have Gnus configured to transparently
replace your goofy chars with the correct ones.

> Otherwise, primarily, the material will be read by a
> human being, and only secondarily in a computer
> program. I wish for a future where the Web replaces
> the printed book

Lunacy.

> therefore, the Web must do all things books do, and
> then some.

The web can already do that in principle but that
doesn't mean books, papers, libraries, and so on will
disappear. That's a horrible thought but luckily it
won't happen.

> If I have to read a printed document, every straight
> quote, every hyphen used in place of a dash, every
> uneven space, pulls me out of the flow. The only way
> for me to stop thinking about the characters is if
> they are exactly as in a book typeset by a skilled
> typesetter on a pre-computer-era press.

Yes, this is only snobbism and aesthetics for the sake
of it. This is what I have expected from day one. Yes,
LaTeX can produce very good looking documents and I
have spent countless of hours in that department - but
that you isn't able to read a book without it is just -
I don't know. It is not reality. In reality you read
what you have to read.

>> when you program and write in English (like now),
>> don't you use the US keyboard layout? That's what I
>> do to get the brackets and the semicolon and all
>> that with no fuss - it is not that I use the Swedish
>> chars that much, anyway! (Which is again the whole
>> point.) And with the US layout, ' (and so on) are
>> easier to type than the chars you suggest.
>
> The difference between ' and AltGr+' is almost
> negligible for me.

We don't have to "almost" that: ' is one key, AltGr+'
is two.

> I do understand we have engaged in a holy war not

> directly related to the original posters
> problem. Lets agree to disagree.

The OP had a problem because he used the incorrect
chars. While the spellchecker still should cope, I
still haven't heard one argument that makes sense why
anyone should benefit from those goofy chars.

Sergio Pokrovskij

unread,

Jun 14, 2014, 10:48:05 PM6/14/14

to

I can't tell you how much I dislike the ugly quotes in Emacs Info,
e.g.

╭────
│ `C-c C-a (`org-attach')'
│ The dispatcher for commands related to the attachment system.
╰────

I always use paired quotes, but normally I use the ASCII
apostrophe. I admit this causes problems for the speller with
e.g. the Wikipedia convention about its representation of
''italics'', '''bold face''' etc.

>>>>> "Yuri" == Yuri Khan skribis:

[...]

Yuri> Now let me tell you how I do curly quotes.

Yuri> First, with my right thumb, I hold the AltGr
Yuri> modifier. Then, I press k and l in sequence to get a
Yuri> balanced pair of double curly quotes, or ; and ' for
Yuri> single quotes (I customized my xkb configuration files to
Yuri> get this but it works similarly with the out-of-the-box
Yuri> config).

In Emacs I use (on both Linux and MS Windows):

C-c 6 to produce the English 66-99 pair “_”
C-c 9 to produce the German 99-66 pair „_“
C-c " to produce the French angular pair «_»

The point gets positioned in between:

--8<---------------cut here---------------start------------->8---
(defun insert-66-99 ()
"Make a pair of 66-99 quotes and be positioned to type inside."
(interactive)
(insert "“”")
(backward-char))

(global-set-key "\C-c6" 'insert-66-99)
--8<---------------cut here---------------end--------------->8---

--
Sergio

Joost Kremers

unread,

Jun 16, 2014, 11:35:45 AM6/16/14

to

Emanuel Berg wrote:
> The OP had a problem because he used the incorrect
> chars. While the spellchecker still should cope, I
> still haven't heard one argument that makes sense why
> anyone should benefit from those goofy chars.

That is becaues you define the word "benefit" in your own way and refuse
to accept that where you only see needless "goofiness", others actually
see benefit.

As for the topic of the discussion, Unicode is gradually replacing
ASCII, and that will mean more and more people using typographic quotes.
And more and more hoops to jump through for people that prefer to stick
to older software that doesn't properly implement unicode.

--
Joost Kremers joostk...@fastmail.fm
Selbst in die Unterwelt dringt durch Spalten Licht
EN:SiS(9)

Garreau, Alexandre

unread,

Jun 16, 2014, 9:09:18 PM6/16/14

to Emanuel Berg, help-gn...@gnu.org, Yuri Khan

On 2014-06-14 at 07:45, Yuri Khan wrote:

> On Sat, Jun 14, 2014 at 8:49 AM, Emanuel Berg <embe...@student.uu.se> wrote:
>> OK, let me tell you how I do ' and ". ' I do by moving my right
>> little finger one step (key) to the right. The " I do by moving the
>> right little finger to the right shift, at the same time as the ring
>> finger slides along to the ' key.
>

> Now let me tell you how I do curly quotes.
>
> First, with my right thumb, I hold the AltGr modifier. Then, I press k
> and l in sequence to get a balanced pair of double curly quotes, or ;
> and ' for single quotes (I customized my xkb configuration files to
> get this but it works similarly with the out-of-the-box config). This
> works for me in both Latin/English and Cyrillic/Russian layouts. On
> the other hand, the straight quote is only available in the Latin
> layout; in Russian, I would have to first switch to Latin, then type
> the single quote, and finally switch back to Russian.

Now let me tell you how *and why* I do curly apostrophe and
quotes.

First I hold the AltGr modifier with my right thumb, then I press “,”
key (which is at the center of my keyboard, so the “g” key on
QWERTY/AZERTY keyboards, or the “I” key on the Dvorak layout I didn’t
learn enough yet), and I obtain fluidly a “’” without hurting my
fingers.

For curly quotes I hold AltGr and Shift with my right thumb and little
finger, and “2” or “3” (“7” and “5” on Dvorak layout) key to get either
the left or right one. That doesn’t hurt my fingers, that’s quick, and
that let really efficiently accessible the main quote symbols of one of
my main language I learnt before 6 (and I can’t anymore choose another,
since now my brain definitely loosed plasticity, for ever, that’s it,
for any other human being: let’s do ido/lojban/esperanto/whatever to
stop torture us with insane languages such as English, French or
Italian).

I would recall you that’s it’s the *machine* that should serve you, not
the opposite. Your keyboard layout should be made to let *you* write
text that *you* or people like *you* with *you* would like to speak
should read, so the more readable the text is, the better: curly quotes
make text more understandable and more readable, and that’s the way it
should *always* be, except on old limited english-oriented occidental
typewriters.

Never let tradition and cowardise enslave you. These are the worst (and
maybe the only) curse of mankind.

signature.asc

Garreau, Alexandre

unread,

Jun 16, 2014, 9:30:40 PM6/16/14

to help-gn...@gnu.org

On 2014-06-15 at 04:48, Sergio Pokrovskij wrote:
> I can't tell you how much I dislike the ugly quotes in Emacs Info,

It would be great to have Emacs Info doing like TeX and replacing `' by
“”.

> skribis

Let’s recall some things works better for universality of human language
:) (actually I prefer lojban (even if I can still criticize it too), but
Esperanto is still better than any other non-constructed language).

> C-c " to produce the French angular pair «_»

You forgot fine unbreakable spaces, that are used in clean French
typography inside French angular quotes, just like this: « _ » (but most
of time normal unbreakable spaces are used, like that: « _ »).

signature.asc

Garreau, Alexandre

unread,

Jun 16, 2014, 9:42:11 PM6/16/14

to Yuri Khan, help-gn...@gnu.org

On 2014-06-14 at 16:51, Yuri Khan wrote:
> The GCC error messages in the en_US.utf8 locale, on the other hand, do
> use curly quotes.

Indeed, just because “computer English” is made for computers, not human
beings, who prefer to have readable text, just like it was before
typewriters.

> OK, I do not suggest that Perl should drop its backtick operator or
> that computer languages universally start using curly quotes for
> character and string literals (although that would make many languages
> more elegant by simplifying parsing). But how about we reserve all
> these artificial characters for computer languages, one of which
> English is not.

Having more language neutral programming languages would be cool, even
languages based on semantic interpretation of binary data that would
move the complexity of syntactic representation of its content from data
toward editor would be really more useful, clean, simple, egalitarian,
etc.

> Otherwise, primarily, the material will be read by a human being, and
> only secondarily in a computer program. I wish for a future where the

> Web replaces the printed book, therefore, the Web must do all things

> books do, and then some.

I hope that by “the Web” you mean “the concept of the ensemble of linked
interpreted documents to read shared by the medium of computer networks
and read on computers interfaces”, not the poor current implementation
of it, which is still using obsolete and despotic client–server model
(<http://thewebmustdie.com/>, <http://secushare.org/>).

> Yes, LaTeX does a lot to produce a beautifully typeset printout from
> an ASCII source. This is not enough; I want that same beautiful
> typesetting on screen, in browser, in any page width I happen to have,
> in my favorite typeface and font size, without having to recompile the
> document. And at the same time, it does too much. It has to maintain,
> and document authors have to utilize, a multitude of workarounds that
> are caused by TeX not using Unicode internally.

Having something technically and typographically good like LaTeX,
semantic and interpreted like HTML and language-neutral like
markdown/any-binary-interpreted-format would be great.

signature.asc

Garreau, Alexandre

unread,

Jun 16, 2014, 9:46:41 PM6/16/14

to Emanuel Berg, help-gn...@gnu.org

On 2014-06-14 at 13:14, Emanuel Berg wrote:
> Yuri Khan <yuri....@gmail.com> writes:

> Yes, but when you program and write in English (like
> now), don't you use the US keyboard layout?

Using the US Dvorak keyboard layout is more efficient anyway, and easier
to learn than the horrible Qwerty.

> And with the US layout, ' (and so on) are easier to type than the
> chars you suggest.

The keyboard (layout) should adapt to you, not the opposite. You
shouldn’t be the slave of your keyboard layout.

And anyway if you used US Dvorak layout, you could just use Programer
Dvorak layout for programming, where “computer English” symbols are
really accessible, while making clean English symbols such as “”‘’… more
accessible with the US Dvorak layout.

signature.asc

Rusi

unread,

Jun 16, 2014, 10:12:09 PM6/16/14

to

On Tuesday, June 17, 2014 7:12:11 AM UTC+5:30, Garreau, Alexandre wrote:
> On 2014-06-14 at 16:51, Yuri Khan wrote:
> > The GCC error messages in the en_US.utf8 locale, on the other hand, do
> > use curly quotes.

> Indeed, just because “computer English” is made for computers, not human
> beings, who prefer to have readable text, just like it was before
> typewriters.

> > OK, I do not suggest that Perl should drop its backtick operator or
> > that computer languages universally start using curly quotes for
> > character and string literals (although that would make many languages
> > more elegant by simplifying parsing). But how about we reserve all
> > these artificial characters for computer languages, one of which
> > English is not.

> Having more language neutral programming languages would be cool, even
> languages based on semantic interpretation of binary data that would
> move the complexity of syntactic representation of its content from data
> toward editor would be really more useful, clean, simple, egalitarian,
> etc.

Interesting thread that I missed…

As a noob member of the «enthusiastically embrace unicode» camp

Ironically I was introduced to the possibility of using unicode by
gmail tantalizingly showing me an अ [devanagari letter A] Later on
however Ive found gmail too clever in how it transliterates eg a into
अ. emacs is more predictable. So now I type into emacs and paste into
gmail if necessary.

So I'd like to express my thanks that emacs is doing unicode very well

And now that programming languages — the original forté of emacs —
are beginning to get out of ASCII-hell, here are two of my blog posts.

I started by writing
http://blog.languager.org/2014/04/unicoded-python.html
to express my wishlist (for python) for getting out of ASCII-prison
and into what you call a more 'neutral' frame¹

Discovered later that Haskell is already doing some of this
http://blog.languager.org/2014/05/unicode-in-haskell-source.html
[And a good deal more]

And finally APL is making a resurgence: http://baruchel.hd.free.fr/apps/apl/

> > Otherwise, primarily, the material will be read by a human being, and
> > only secondarily in a computer program. I wish for a future where the
> > Web replaces the printed book, therefore, the Web must do all things
> > books do, and then some.

> I hope that by “the Web” you mean “the concept of the ensemble of linked
> interpreted documents to read shared by the medium of computer networks
> and read on computers interfaces”, not the poor current implementation
> of it, which is still using obsolete and despotic client–server model
> (<http://thewebmustdie.com/>, <http://secushare.org/>).

> > Yes, LaTeX does a lot to produce a beautifully typeset printout from
> > an ASCII source. This is not enough; I want that same beautiful
> > typesetting on screen, in browser, in any page width I happen to have,
> > in my favorite typeface and font size, without having to recompile the
> > document. And at the same time, it does too much. It has to maintain,
> > and document authors have to utilize, a multitude of workarounds that
> > are caused by TeX not using Unicode internally.

> Having something technically and typographically good like LaTeX,
> semantic and interpreted like HTML and language-neutral like
> markdown/any-binary-interpreted-format would be great.

Yes its important that we start moving to xetex (luatex) where I can
directly write α etc than \alpha. Just have to multiply this one char
by the 100s that occur in proofs and we should see why the latter is
clunky, ugly, unreadable, bug-spreading compared to the former

PS [Travelling for a few days so may not respond to responses]

¹ Dare I say 'universal'? As math is the only language
approaching universality known to humanity.

Garreau, Alexandre

unread,

Jun 16, 2014, 10:21:07 PM6/16/14

to Emanuel Berg, help-gn...@gnu.org

On 2014-06-14 at 18:13, Emanuel Berg wrote:
> Yuri Khan <yuri....@gmail.com> writes:
>>>> Curly quotes (and, in Russian print tradition, double angle quotes)
>>>> are what I am used to seeing in print and consider to be the
>>>> correct way to write
>>> OK, I believe you. However, the point I made with all people coming
>>> from different cultures is that it doesn't matter where we are from
>>> individually. When I went to school, I suppose I was most
>>> comfortable with Swedish. But I'm not supposing we all switch to
>>> Swedish!
>>
>> OK, so what? I expect that people of all cultures who were exposed to
>> books printed before the advent of the computer and the word
>> processor are used to typographic characters.
>
> I'm OK disagreeing but I want you to understand me. The point is: the
> cultures are in this discussion irrelevant. If the cultures were what
> decided things you should be speaking Russian and I Swedish.

As we still do, most of our time.

> We don't, because we have travelled to a common point so that when we
> interact in the computer world, we are using the “Computer English”
> language, which I have described several times now.

No, we travelled to a common point where colonialism and oppression
imposed English as a poor (for that purpose) international language we
all make decades of more or less hard work to poorly learn (that’s less
visible when we write, but that’s really visible when we speak). While
we could just more think to constructed languages that we make some
weeks or months to perfectly speak (and however, it were demonstrated it
takes less time learning Esperanto and *then* English than just learning
English alone).

> This is the English in the man pages, in the RFCs,

Oh, good point. But let’s agree to disagree on this standard of
standards.

> in the C code,

Let’s bet how much time again C will stay around… before we move to
something more powerful (some interesting ideas:
<https://www.gnu.org/software/epsilon>), that we could make more
neutral, or even where we could move syntactic representation of content
separated of content itself (like MVC) and move complexity from the
compiler toward the editor (and just letting the compiler doing things
like JIT, native code caching or on-the-fly optimization).

> in the HTML, and all that. In this language you don't write <mitten>

> if you are Swedish, <centre> if you are British, etc. *all* write

> <center>, otherwise it doesn't work!

Because HTML is not language-neutral. But if you think HTML (and more
generally XML, and even more generally things based on XML like XMPP) is
well made and really efficient, you have some problems.

> Likewise, to quote in Usenet post we use >, to double quote, >>, and
> so on; to mark where the signature starts we use --, because otherwise
> highlighting/hiding of the quotes/signature doesn't work, because the
> clients are looking for those specific chars!

Yes, standards. But standards aren’t necessarily not
language-neutral. Just like TCP/IP *is* a world wide standard and *is*
language-neutral (since it’s binary, for performance and simplicity
reasons).

> In “Computer English”, the de facto standard is ' and ", and it
> doesn't matter what books anyone read as kids.

Yes it matters, because here we speak English, not a programming
language that’s based on English.

> Because we are not doing that *now*! All of us have moved to a common
> culture which is common for practical reasons

For causes, but not for reasons. Otherwise we would be speaking a Lojban
with a more logical alphabet and base 12.

> — it is not aesthetics or snobbism, it is reality —

It is tradition. But when tradition stays too much time reality we have
a problem.

> and there is no reason whatsoever to fight it.

Efficiency, readability, etc. all these things that help to increase our
every-days freedom.

>>> OK, that's a ridiculous example as it is extreme, while what we

>>> discuss now is perhaps trivial (' or ) — but in principle it is the
>>> same. The computer language is English, and as I showed — the man

>>> pages for ls and emacs, as well as the RFC excerpt, as well as all

>>> experience with mails and Usenet and programming culture — all show

>>> that in “Computer English”, ' (not ) is correct.
>>
>> They are that way because they were written in the dark age of ten
>> thousand code pages and never updated to Unicode.
>
> It doesn't matter. That's the way it is. Like the sentence I just
> wrote. I don't care why the English word for “way” is “way”.

Just as people don’t care about what’s an operating system, a cli, etc.

> It just is,

Yeah, it is magical. Just as people consider computers, you consider
language. Except language is really a more general and important thing
than just “computing”. Because the notion of “language” include many
concept of “computing”.

> and it is very, very unpractical and extremely arrogant for anyone to
> say, I don't like it to be “way”, for no reason whatsoever save for

> aesthetics (which isn't a consensus by the way) I like it to be “yaw”

Esperantists, and Lojbanists, and all people working on language are
doing that “arrogant” thing, and they proved it is a lot more practical
than what people do by default —that to say: almost nothing.

> - and the argument for changing, is that there are (of course!)

> historical roots for the word “way” being “way” — if someone had

> thought about it really hard (and exactly like me, today) he or she

> would have decided the word for “way” should be “yaw” — it doesn't
> make any sense!

Yes it doesn’t, and that’s a reason for changing. Because we’re doing a
lot of unpractical things every days, and changing, “progressing” allows
us to gain more freedom.

>> They exist *because* there was a certain technical
>> limitation in the last fifty years or so. Since this
>> limitation has been removed, there is no reason for
>> them.
>
> They do not exist because there was a technical
> limitation fifty years ago. They exist, today, because
> they are useful, today!

No, they’re useless and unpractical, they always were, and they always
will.

>> I believe users of the VGA text console are
>> intelligent beings and respect their decision to
>> suffer.
>
> Forget it. I have Gnus configured to transparently
> replace your goofy chars with the correct ones.

Thanks for the idea, I’m going to do the opposite. How did you do?

>> therefore, the Web must do all things books do, and
>> then some.
>
> The web can already do that in principle but that
> doesn't mean books, papers, libraries, and so on will
> disappear. That's a horrible thought but luckily it
> won't happen.

Just as *calligraphy* didn’t disappear with printer invention. But since
you need *one lifetime* to write a calligraphied big book (let’s say,
some documentation), and since there’s a *looooooot* more interesting
things to do (just like reading all sorts of the really interesting
things human beings can write all across the globe), we just *all* read
printed books.

For the same reasons, we will *almost* (but like calligraphy, some will
continue for the sake of the art, and “snobbism and aesthetics” just
like you like to say) stop to print books as soon as printers will stop
being obsessed with money, editors with proprietary coercion, and
computer makers to not-pluggable OLED screens (planned obsolescence and
profit optimization) and eInk patents.

>> If I have to read a printed document, every straight
>> quote, every hyphen used in place of a dash, every
>> uneven space, pulls me out of the flow. The only way
>> for me to stop thinking about the characters is if
>> they are exactly as in a book typeset by a skilled
>> typesetter on a pre-computer-era press.
>
> Yes, this is only snobbism and aesthetics for the sake of it.

All this is just studied for readability, to read better, to read
quicker, to read the more. That’s pragmatism.

>>> when you program and write in English (like now),
>>> don't you use the US keyboard layout? That's what I
>>> do to get the brackets and the semicolon and all
>>> that with no fuss - it is not that I use the Swedish
>>> chars that much, anyway! (Which is again the whole
>>> point.) And with the US layout, ' (and so on) are
>>> easier to type than the chars you suggest.
>>
>> The difference between ' and AltGr+' is almost
>> negligible for me.
>
> We don't have to "almost" that: ' is one key, AltGr+'
> is two.

But you press AltGr with the thumb, and the thumb is made to be moved
without disturbing the rest of the hand (you know, to *take* objects,
that thing monkeys and primates can and other mammals can’t) so when you
use your thumb to use a modifier it is biomechanically equivalent to
just press one key, not two. That’s the reason why more modifiers should
be near the thumb.

>> I do understand we have engaged in a holy war not
>> directly related to the original posters
>> problem. Lets agree to disagree.
>
> The OP had a problem because he used the incorrect
> chars. While the spellchecker still should cope, I
> still haven't heard one argument that makes sense why
> anyone should benefit from those goofy chars.

Because they make text more readable and understandable. Then you can
disagree, refuse to see the importance of details, it is your right. But
it is our right to have our software working well for the rest of us,
like we want.

signature.asc

Garreau, Alexandre

unread,

Jun 16, 2014, 10:33:48 PM6/16/14

to Rusi, help-gn...@gnu.org

On 2014-06-17 at 04:12, Rusi wrote:
> On Tuesday, June 17, 2014 7:12:11 AM UTC+5:30, Garreau, Alexandre wrote:
>> On 2014-06-14 at 16:51, Yuri Khan wrote:
>>> Yes, LaTeX does a lot to produce a beautifully typeset printout from
>>> an ASCII source. This is not enough; I want that same beautiful
>>> typesetting on screen, in browser, in any page width I happen to have,
>>> in my favorite typeface and font size, without having to recompile the
>>> document. And at the same time, it does too much. It has to maintain,
>>> and document authors have to utilize, a multitude of workarounds that
>>> are caused by TeX not using Unicode internally.
>
>> Having something technically and typographically good like LaTeX,
>> semantic and interpreted like HTML and language-neutral like
>> markdown/any-binary-interpreted-format would be great.
>

> Yes its important that we start moving to XeteX (luatex) where I can

> directly write α etc than \alpha.

I know XeteX, but I wasn’t thinking to it… And yet LaTeX is not fully
language-neutral because of command names (\emph, \textbf, \title,
\section, etc.) and isn’t interpreted, and not reaaaaally semantic
(since it’s only made to be compiled into a graphical thing).

> ¹ Dare I say “universal”? As math is the only language approaching
> universality known to humanity.

Well, nothing is really universal (everything need a shared knowledge,
thus, a culture). Even math, when it isn’t based on latin or greek
language, stay based on occidental/arabic/indo-European culture and
symbols.

But we can artifically make universal things, just as we more or less did
with lojban, or TCP/IP, etc.

So what we can do is just invent new pieces of culture based on the most
universal things we can, and avoiding linguistic/geographic/gender/class
cultural biases.

signature.asc

Rusi

unread,

Jun 16, 2014, 10:41:37 PM6/16/14

to

On Tuesday, June 17, 2014 7:51:07 AM UTC+5:30, Garreau, Alexandre wrote:
> On 2014-06-14 at 18:13, Emanuel Berg wrote:
> > in the C code,

> Let's bet how much time again C will stay around... before we move to

> something more powerful (some interesting ideas:
> <https://www.gnu.org/software/epsilon>), that we could make more

Epsilon looks interesting but too theoretical, so it probably ends up
making the opposite case to the one you want to Alexandre.

Agda and Julia are two recent languages with strongly increasing popularity
that would make your case better.

Agda is the <<type-hackery lab>> of haskell
Julia aims to get the ease-of-use of scripting languages with the
efficiency of C/FORTRAN specifically for modern parallel hardware

Agda http://wiki.portal.chalmers.se/agda/pmwiki.php?n=Docs.UnicodeInput
Julia http://iaindunning.com/2014/julia-unicode.html

Rusi

unread,

Jun 16, 2014, 11:05:15 PM6/16/14

to

Some immediate evidence that unicode is not exactly stable yet…

I wrote:
As a noob member of the «enthusiastically embrace unicode» camp

The LEFT POINTING DOUBLE ANGLE QUOTATION MARK (guillemet) stayed
probably because of this:

Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

However when I wrote the guillemet here, GG messed it:

Agda is the <<type-hackery lab>> of haskell

it didn't I guess because of :

Content-Type: text/plain; charset=ISO-8859-1

So evidently google groups is doing exactly the opposite of what it should be
doing. Its trying very hard NOT to use UTF-8

[And now to force UTF-8 in this message, let me sign in devanagari:

रुसि मोदि — Rusi Mody
]