Case-insensitivity considered harmful

John Mashey

unread,

Sep 26, 2006, 1:52:15 AM9/26/06

to

Bill Todd wrote:

>Out of curiosity, does anyone know of a good reason why file names
>should *ever* be case-sensitive (aside from the fact that Unix users and
>applications have become used to this)?

Ofcourse, I had nothing to do with the original decision, but In 1977,
I wrote a letter-to-editor of SIGPLAN NOTICES replying to a complaint
about this, primarily for programming languages, but the same rationale
applies to file names.

1) At the time, the following had been confirmed by numerous studies:
a) For legibility of text, mixed case > lower-case only > upper-case
only.
b) Mixed case is about 10-20% better than upper-case only.

In the usual fonts, this happens because:
a) lower case letters differ visually more than do UPPERCASE LETTERS,
and so are more quickly recognnized - some are small, some are tall,
some have descenders, whereas UPPERCASE IS ALL THE SAME SIZE AND THERE
ARE NO DESCENDERS.
b) MixedCase is even better than either monocase, because there is yet
more visual variety, without having to write mixed_case of MIXED_CASE.
Obviously, there is plenty of room for bad taste, but reasonable style
standards have often taked advantage of the flexibility.

2) Also, at the time, there were millions of lines of code of languages
like PL/I, BAL, FORTRAN, COBOL, which originated as upper-case only,
but were managed by people in lower or mixed case (on PWB/UNIX
systems). Why? Because it was more readable, even though the code was
translated to upper-case on the way over to an attached IBM mainframe.

3) I claimed that on modern systems that naturally supported both upper
and lower case, compilers should treat them as the different
characters they were (internal representation). I wrote:

"This permits a wide choice of conventions that must forever remain
unavailable if the language itself requires such symbols to be
identical."

Of course, case-insensitivity also requires extra cycles to do
case-insensitive string comparisons. As Dennis notes, it isn't a lot
of code, but it costs cycles.

4) So, when you start with a language (like C), where UPPER and upper
are distinct
- and where everybody was doing lots of text-processing [where they are
also distinct]
- and where commands often had both lowercase and uppercase flags
- and where it is easy enough to translate anything to monocase when
one wanted to
- and where a line or two of shell script could give a list of
identifiers differing only in case

It is quite consistent that filenames be case-sensitive as well.

Terje Mathisen

unread,

Sep 26, 2006, 3:32:22 AM9/26/06

to

John Mashey wrote:
> "This permits a wide choice of conventions that must forever remain
> unavailable if the language itself requires such symbols to be
> identical."

I disagree:

Writing Pascal, I used MixedCase conventional for almost all my code,
and the compiler would handle it if I accidentally mis-cased an
identifier somewhere.

I.e. programming languages that insist that two long variable names,
differing only in the case of a single letter has to be distinct, invite
a big class of potential errors, particularly when said language also
allows undeclared variables.

Yes, I'm specifically targeting Perl here!

>
> Of course, case-insensitivity also requires extra cycles to do
> case-insensitive string comparisons. As Dennis notes, it isn't a lot
> of code, but it costs cycles.

Using Unicode, it is actually quite a bit of code, to the point where
the only fast way to do case-insensitive compares/searches for a set of
(more or less fixed) identifiers, is to store each identifier in both
UPPER and lower case, then compare a potential candidate against both
simultaneously:

int match(wchar *candidate, wchar *UPPER, wchar *lower)
{
int c;
wchar c;
for (i = 0;true ;i++) { /* Do forever! */
c = candidate[i];
if ((c != UPPER[i]) && (c != lower[i])) return false;
if (c == '\0') return true;
}
}

The code above disregards the possibility that the UPPER[] and lower[]
target strings can have different lengths!

Terje
--
- <Terje.M...@hda.hydro.com>
"almost all programming can be viewed as an exercise in caching"

Jan Vorbrüggen

unread,

Sep 26, 2006, 3:52:04 AM9/26/06

to

> 4) So, when you start with a language (like C), where UPPER and upper
> are distinct
> - and where everybody was doing lots of text-processing [where they are
> also distinct]
> - and where commands often had both lowercase and uppercase flags
> - and where it is easy enough to translate anything to monocase when
> one wanted to
> - and where a line or two of shell script could give a list of
> identifiers differing only in case
>
> It is quite consistent that filenames be case-sensitive as well.

I must disagree.

Your basic premise, supported by empirical data, is that the _display_ of
textual data should be in mixed case. I strongly agree with that - having
been brought up in a language that uses upper-case initials for nouns
(German), I do see the advantage that brings in ease-of-reading (even if
it's a drag learning the orthography).

However, in no way does it follow from this that a difference in case should
also convey a difference in semantics. _That_ is what's at the heart of the
issue.

> "This permits a wide choice of conventions that must forever remain
> unavailable if the language itself requires such symbols to be
> identical."

In my considered opinion, all such conventions are spawn of the devil and
massively decrease the maintainablity of such code.

> - and where commands often had both lowercase and uppercase flags

As I wrote elsewhere, one might make an exception for such cases, although
having been brought up on VMS instead of Unix, I'm quite the opposite of a
fan of single-letter flags.

> - and where a line or two of shell script could give a list of
> identifiers differing only in case

That is just not good enough. The really bad things happen because there is
no automated interlock or warning.

There is a fascinating experiment you can do with a computer display. YoU
sHOw A PErSOn a TEXt THaT iS DIsPlaYEd IN a RaNDoM jUMbLE of UppER AnD lOwER
CaSe. While the person is reading the text, a device is watching out for
saccades (the movements of the eyes when you go from one chunk of text to
the next). Once in a while, the display of all letters is changed with regard
to case _during_a_saccade_. The result is...nothing - the reader does not
notice a thing, although _every_single_letter_ on the display changed, and
this is very perceptible to any onlooker where the change was not synchronized
with her eye movement. Thus, you actually discard memory of a letter's case
at the next saccade!

No, I don't think you will be able to convince me that a difference in case
should convey a difference in meaning.

Jan

Nick Maclaren

unread,

Sep 26, 2006, 4:06:58 AM9/26/06

to

In article <1159249935.0...@i42g2000cwa.googlegroups.com>,

"John Mashey" <old_sys...@yahoo.com> writes:
|> Bill Todd wrote:
|>
|> >Out of curiosity, does anyone know of a good reason why file names
|> >should *ever* be case-sensitive (aside from the fact that Unix users and
|> >applications have become used to this)?
|>
|> Ofcourse, I had nothing to do with the original decision, but In 1977,
|> I wrote a letter-to-editor of SIGPLAN NOTICES replying to a complaint
|> about this, primarily for programming languages, but the same rationale
|> applies to file names.

Well, I am afraid that it is mistaken. There is a lot of truth in it,
but it omits the converse reasons. As OscarWilde said, the truth is
rarely pure and never simple.

|> 1) At the time, the following had been confirmed by numerous studies:
|> a) For legibility of text, mixed case > lower-case only > upper-case
|> only.

The order of the latter two depends almost entirely on the font. There
are lots where it is very hard to distinguish many lower-case letters.
nmm is frequently misread as nnm.

And lower-case in many scripts is a nightmare - perhaps the extreme is
old German black letter, but many English ones are pretty bad. That is
one of the main reasons that people use capitals when writing information
that must be conveyed precisely. Yes, people DO write file names, URLs,
passwords etc. for other people - and this is a BIG problem.

|> b) Mixed case is about 10-20% better than upper-case only.

Well, yes and no. Once you include digits (you ARE allowing them, aren't
you?), mixed case maximimises the number of ambiguities. Most computer
fonts use a non-traditional '0' for that reason, but 'Il1' is a very
common similarity that is VERY hard to spot.

|> In the usual fonts, this happens because:
|> a) lower case letters differ visually more than do UPPERCASE LETTERS,
|> and so are more quickly recognnized - some are small, some are tall,
|> some have descenders, whereas UPPERCASE IS ALL THE SAME SIZE AND THERE
|> ARE NO DESCENDERS.

Again, entirely font dependent.

|> b) MixedCase is even better than either monocase, because there is yet
|> more visual variety, without having to write mixed_case of MIXED_CASE.
|> Obviously, there is plenty of room for bad taste, but reasonable style
|> standards have often taked advantage of the flexibility.

Yes, it maximises the dispersion, but it also maximises the number of
serious ambiguities. 'Il1' and 'O0o' are perhaps the worst, but 'ce'
and 'Cc', 'Xx' and 'Zz' in the middle of digit strings (hence no scale)
are also tricky.

|> 2) Also, at the time, there were millions of lines of code of languages
|> like PL/I, BAL, FORTRAN, COBOL, which originated as upper-case only,
|> but were managed by people in lower or mixed case (on PWB/UNIX
|> systems). Why? Because it was more readable, even though the code was
|> translated to upper-case on the way over to an attached IBM mainframe.

Because the Unix people typed their programs while holding a hamburger
in one hand and so didn't like the shift key? :-)

More seriously, that predated Unix by a long way, and the vastly most
common convention was commentary in mixed case and code in upper. That
was generally (but not universally) agreed to lead to the least confusion
among Autocode/Fortran/Algol 60 etc. programmers on systems that supported
both cases. There was a period when upper case was favoured for keywords
and lower for identifiers, and that also works well - but experience is
that using mixed case for syntactic clarification is a mixed blessing.

Stu Feldman also explained to me why he had made the original f2c take
lower-case only, and it was nothing to do with its desirability.

|> 3) I claimed that on modern systems that naturally supported both upper
|> and lower case, compilers should treat them as the different
|> characters they were (internal representation). I wrote:
|>
|> "This permits a wide choice of conventions that must forever remain
|> unavailable if the language itself requires such symbols to be
|> identical."

Hmm. I recommend getting hold of the SEAS / SHARE Europe White Papers
on this issue. Adding case is good enough for USA and modern UK English,
but doesn't really help for even the other modern Latin-based languages.
And it gets nowhere for ones that are not Latin-based. Case is not as
simple a concept as you and most of our compatriots assume.

|> Of course, case-insensitivity also requires extra cycles to do
|> case-insensitive string comparisons. As Dennis notes, it isn't a lot
|> of code, but it costs cycles.

ALL correct Unicode comparisons have that cost, even if monocase.
Seriously.

|> 4) So, when you start with a language (like C), where UPPER and upper

|> are distinct ...

|>
|> It is quite consistent that filenames be case-sensitive as well.

True. And so is the converse.

Regards,
Nick Maclaren.

Tim Bradshaw

unread,

Sep 26, 2006, 4:36:53 AM9/26/06

to

On 2006-09-26 06:52:15 +0100, "John Mashey" <old_sys...@yahoo.com> said:

> 3) I claimed that on modern systems that naturally supported both upper
> and lower case, compilers should treat them as the different
> characters they were (internal representation). I wrote:
>
> "This permits a wide choice of conventions that must forever remain
> unavailable if the language itself requires such symbols to be
> identical."

The fact that they are different characters is really an artifact of
the representation system. If you'd asked someone in 1900 whether
"fish" and "Fish" were the same word what would they have answered?
Imagine going to the fishmonger and asking for some Fish: "oh no sir,
we only sell fish here, you need to go to the Fishmonger down the
road". Indeed you can write things like: "Please stop hogging the
phone ... I said PLEASE stop hogging the PHONE!" where here "phone" and
"PHONE" clearly refer to the same thing, you're just saying one more
loudly.

Of course, you could (rightly) argue that programming languages (and
file names) are inherently written things, so it's reasonable to make
case distinctions matter. Against that is millions of years of
evolution of the human brain which is optimised for understanding
spoken language where case matters not at all.

I also suspect that the `mixed-case is more readable (you probably mean
readable rather than legible by the way, there's a subtle but important
distinction, at least in typographical circles) is a bit of a myth.
For instance, in written English there has been a fairly dramatic move
over the last 200 years away from using lots of mixed-case to really
rather minimal use of capitals: if you look at things written in the
mid 19th century they often have really a lot of capitalisation (but
there was a lot of variability as well). Has that shift reduced
readability? I suspect not. Written German is, I think, moving the
same way (from a standard where all nouns are capitalised).

There probably is an effect of mixed case (I'm sure there is), but I
suspect it's far lower than other effects: spacing, for one. In
"yellowbrickroad", "yellowBrickRoad" "yellow brick road" I think the
difference between the third and the other two dwarfs the difference
between the first two. Of course computer languages make using spaces
hard, but perhaps "yellow-brick-road" is OK. Unfortunately almost all
computer languages make even that hard, so you might have to resort to
"yellow_brick_road" which takes me about twice as long to type on a
normal keyboard, and doesn't read so well because I'm used to
hyphenated words in written English.

But again: computer languages aren't English, and studies based on
English probably do not apply very well: people have high-performance
parsers for English & can read perhaps a thousand words a minute:
reading ten words a minute of a programming language would probably be
rather good going.

None of this is meant to imply that I'm against case-sensitivity, just
that I think that the arguments are a lot more subtle than they often
seem. (I think, actually, I'm in favour of case sensitivity, but I
hate the sudlyCaps style it seems to have lead to.)

--tim

Bill Todd

unread,

Sep 26, 2006, 5:19:29 AM9/26/06

to

Terje Mathisen wrote:
> John Mashey wrote:

...

>> Of course, case-insensitivity also requires extra cycles to do
>> case-insensitive string comparisons. As Dennis notes, it isn't a lot
>> of code, but it costs cycles.
>
> Using Unicode, it is actually quite a bit of code, to the point where
> the only fast way to do case-insensitive compares/searches for a set of
> (more or less fixed) identifiers, is to store each identifier in both
> UPPER and lower case,

IIRC Unicode has at least one additional case as well, for at least some
characters.

then compare a potential candidate against both
> simultaneously:
>
> int match(wchar *candidate, wchar *UPPER, wchar *lower)
> {
> int c;
> wchar c;
> for (i = 0;true ;i++) { /* Do forever! */
> c = candidate[i];
> if ((c != UPPER[i]) && (c != lower[i])) return false;
> if (c == '\0') return true;
> }
> }
>
> The code above disregards the possibility that the UPPER[] and lower[]
> target strings can have different lengths!

When you're trying to match an input string against a large array of
candidates (as in scanning a directory or b+ tree page sequentially), it
often makes more sense to convert all strings to a common case and then
just perform a byte-wise comparison. IIRC Unicode specifically provides
for such conversions (including normalization of multiple characters in
a single case that stand for the same thing and instances when there may
be no other-case representation for a character).

If you want case-preserving behavior, you then add one or two bits per
stored character to indicate what its presentation case should be (or
store the original string redundantly if space is not a problem).

- bill

Greg Lindahl

unread,

Sep 26, 2006, 5:28:45 AM9/26/06

to

In article <7l1nu3...@osl016lin.hda.hydro.com>,
Terje Mathisen <terje.m...@hda.hydro.com> wrote:

>I.e. programming languages that insist that two long variable names,
>differing only in the case of a single letter has to be distinct, invite
>a big class of potential errors, particularly when said language also
>allows undeclared variables.
>
>Yes, I'm specifically targeting Perl here!

use strict;

If you choose not to use suspenders, don't be surprised when your
pants fall down.

The reverse complaint is made about classic Fortran, which also has
its own version of suspenders.

-- greg

Andrew Reilly

unread,

Sep 26, 2006, 5:36:26 AM9/26/06

to

On Tue, 26 Sep 2006 09:32:22 +0200, Terje Mathisen wrote:

> Yes, I'm specifically targeting Perl here!

I've discovered a similar problem since I started doing prototypes in
Python. Not case issues, usually: I'm pretty strict with myself on symbol
case styles, but different spellings of contractions (how many "f"s in
"coef"?) or plurals. When variables and even object members can pop into
existence by being set, then that becomes an issue... I dare say that
tools (of the predictive, or pointy-clicky variety) will eventually make
this less of a problem, in the same way that file browsers do for file
names.

Cheers,

--
Andrew

Terje Mathisen

unread,

Sep 26, 2006, 6:08:23 AM9/26/06

to

Greg Lindahl wrote:
> In article <7l1nu3...@osl016lin.hda.hydro.com>,
> Terje Mathisen <terje.m...@hda.hydro.com> wrote:
>
>> I.e. programming languages that insist that two long variable names,
>> differing only in the case of a single letter has to be distinct, invite
>> a big class of potential errors, particularly when said language also
>> allows undeclared variables.
>>
>> Yes, I'm specifically targeting Perl here!
>
> use strict;

Oh, I do!

Now. :-(

I also run with -w.

>
> If you choose not to use suspenders, don't be surprised when your
> pants fall down.

Sure. It is still a trap that will hurt each new user until she learns
to avoid it.

> The reverse complaint is made about classic Fortran, which also has
> its own version of suspenders.

IMPLICIT LOGICAL afair?

PH

unread,

Sep 26, 2006, 8:02:01 AM9/26/06

to

Jan Vorbrüggen wrote:
> Your basic premise, supported by empirical data, is that the _display_ of
> textual data should be in mixed case. I strongly agree with that - having
> been brought up in a language that uses upper-case initials for nouns
> (German), I do see the advantage that brings in ease-of-reading (even if
> it's a drag learning the orthography).

Having enjoyed four years of German language education, I still fail
to see how noun capitalization improves readability.

Peter

FredK

unread,

Sep 26, 2006, 8:54:20 AM9/26/06

to

"Terje Mathisen" <terje.m...@hda.hydro.com> wrote in message
news:7l1nu3...@osl016lin.hda.hydro.com...

>
> I.e. programming languages that insist that two long variable names,
> differing only in the case of a single letter has to be distinct, invite
> a big class of potential errors, particularly when said language also
> allows undeclared variables.
>

When porting a widely used inductry standard graphics test suite to VMS
(which is generally speaking case-insensitive) I discovered that someone had
created an external routine "Delete()" and (of course) nothing had
prototypes. Want to know how many times the code called the *wrong* routine
with the *wrong* arguments? More than once. The code tended to be in error
paths.

I only found it because the VMS linker complained that there were multiple
definitions of the routine (since it was seeing the names case
insensitive)... and when I looked at the places it was used...

Elcaro Nosille

unread,

Sep 26, 2006, 3:07:25 PM9/26/06

to

Nothing real but another discussion about IT-geeks-compulsivites ...

Peter Grandi

unread,

Sep 26, 2006, 5:40:49 PM9/26/06

to

Tim> Indeed you can write things like: "Please stop hogging the
Tim> phone ... I said PLEASE stop hogging the PHONE!" where here
Tim> "phone" and "PHONE" clearly refer to the same thing, you're
Tim> just saying one more loudly.

Tim> Of course, you could (rightly) argue that programming
Tim> languages (and file names) are inherently written things,
Tim> so it's reasonable to make case distinctions matter.

That's the argument for COBOL: it reads like English, so it must
be easy to use :-).

Tim> But again: computer languages aren't English, and studies
Tim> based on English probably do not apply very well:

Computer languages anyhow are notations, just a bit more verbose
than mathematical or musical notations. Calling them "languages"
is a bit misleading.

Tim> (I think, actually, I'm in favour of case sensitivity, but I
Tim> hate the sudlyCaps style it seems to have lead to.)

Well, that or underscores (or dashes in Lisp) for part-separator.
The single advantage of the CMU convention of using case switching
is simply brevity, and I got used to it, and reads as well as
underscores or better.

Jan> Your basic premise, supported by empirical data, is that
Jan> the _display_ of textual data should be in mixed case. I
Jan> strongly agree with that - having been brought up in a
Jan> language that uses upper-case initials for nouns (German),
Jan> I do see the advantage that brings in ease-of-reading (even
Jan> if it's a drag learning the orthography).

ph> Having enjoyed four years of German language education, I
ph> still fail to see how noun capitalization improves
ph> readability.

Because German overdoes it. It generally help to have easily
identified ''parts of speech'' and in sensible languages
capitalization capitalization distinguishes proper names from
generic names, as in "I asked John for the bill, and in the
meantime Bill went to the john". What is capitalization is a form
of semantic markup, as in:

Mark Name's name is "Mark".
The the identifier bound to variable 'x' is "x".
There is certain ''je ne sais quoi'' in semantics.

In less sensible languages regrettably capitalization is a tone,
and indicates respect or importance, which is not so useful in
computer science, but perhaps in sycophantic speech.

Since there are a few ''parts of speech'' worth distinguising to
avoid ambiguities, case is well worth using...

ken...@cix.compulink.co.uk

unread,

Sep 26, 2006, 8:36:38 PM9/26/06

to

In article <7l1nu3...@osl016lin.hda.hydro.com>,
terje.m...@hda.hydro.com (Terje Mathisen) wrote:

> only in the case of a single letter has to be distinct, invite
> a big class of potential errors, particularly when said
> language also allows undeclared variables.

What was also interesting was languages or systems that had no
limit on variable name length but only considered the first "n"
characters as significant.

Ken Young

jsa...@ecn.ab.ca

unread,

Sep 27, 2006, 1:00:30 AM9/27/06

to

Jan Vorbrüggen wrote:

>No, I don't think you will be able to convince me that a difference in case
>should convey a difference in meaning.

How about this situation:

A language allows Greek, Cyrillic, and Latin characters in variable
names, and the terminal equipment available allows them to easily be
entered.

If the variable names are case-insensitive, then a and A are the same
letter. Unfortunately, an upper-case Greek alpha looks exactly like a
Latin capital A... but a lowercase alpha looks completely different.

It would be quite possible to write a malicious program where someone
not familiar with foreign alphabets might not be able to tell that two
variables are really the *same* variable.

To avoid that, I would think one would have to do this:

Have the rule be that the variable names are *case-sensitive* but
*language-insensitive*, so that only letters that look virtually
identical are equivalent.

Capital A and capital alpha are equivalent, but small a and small alpha
are differnt from them and from each other.

Neither capital upsilon nor the capital Russian letter that stands for
U and looks like Y would be equivalent to capital Y in the Latin
alphabet - but the small Russian letter and lowercase y would be
equivalent.

And so on.

John Savard

Anton Ertl

unread,

Sep 27, 2006, 2:36:59 AM9/27/06

to

jsa...@ecn.ab.ca writes:
> Unfortunately, an upper-case Greek alpha looks exactly like a
>Latin capital A... but a lowercase alpha looks completely different.

Actually, the way lower-case alphas are usually written in Greece, you
could use them instead of a Latin a, and most people would not notice
that it is an alpha and not a Latin a; it has the same form as
the Latin a has in italic fonts (except that it is not slanted). You
can see this nicely in

http://www.complang.tuwien.ac.at/anton/tmp/xanthi.jpg

My guess is that we only see a different style of writing alphas in
our mathematics books in order to avoid confusion between alphas and
as.

- anton
--
M. Anton Ertl Some things have to be seen to be believed
an...@mips.complang.tuwien.ac.at Most things have to be believed to be seen
http://www.complang.tuwien.ac.at/anton/home.html

Ketil Malde

unread,

Sep 27, 2006, 4:22:38 AM9/27/06

to

Jan Vorbrüggen <jvorbr...@not-mediasec.de> writes:

> However, in no way does it follow from this that a difference in case should
> also convey a difference in semantics. _That_ is what's at the heart of the
> issue.

Oh, come on, this is a cultural issue - de gustibus, and all that.

> No, I don't think you will be able to convince me that a difference in case
> should convey a difference in meaning.

Difference in case does convey a difference in meaning in many context
- John is not to be confused with the john, the Big Apple is not a big
apple, and so on. Some programming languages require different case
for different kinds of entities (types and variables, for instance).
Somebody brought up physical and chemical units, we could also mention
the metric prefixes. And of course, there's Unix.

I think it boils down to "worse is better": storing file names as a
bunch of bytes is simple to implement, and unambigous on the
programming level. Being 'case-blind' means you need to make a lot of
decisions, including how to deal with locale. If you want to prevent
the user from creating files that could be confused, you can do it in
application space.

In a way, this is similar to compression - and it too belongs in
application space. The application is in a much better position to
make the right decision, and any bugs is likely to have a lot less
impact (you don't really want your file system to know how and where
to find images in power point files in order to compress them, do
you?)

Same goes for disk drive redundancy - more redundancy management in
the firmware increases its complexity (and the chance for bugs), and
slows it down. Putting it in the disk driver, fs, or volume manager,
it is easier to fix, and possible to avoid (what if I need the speed,
but don't care too much for data integrity?)

-k
--
If I haven't seen further, it is by standing in the footprints of giants

Ketil Malde

unread,

Sep 27, 2006, 4:27:52 AM9/27/06

to

jsa...@ecn.ab.ca writes:

> Have the rule be that the variable names are *case-sensitive* but
> *language-insensitive*, so that only letters that look virtually
> identical are equivalent.

Right. Now we just need to decide which font the file system uses.

:-)

Rob Warnock

unread,

Sep 27, 2006, 4:32:28 AM9/27/06

to

<ken...@cix.compulink.co.uk> wrote:
+---------------

| What was also interesting was languages or systems that had no
| limit on variable name length but only considered the first "n"
| characters as significant.

+---------------

The FOCAL language <http://en.wikipedia.org/wiki/Focal> definitely
wins the "prize"(?) here: Arbitrary command name length but *ONLY*
the *first* character was significant. So SET [as in "SET X = 12.34"]
could be replaced by S, or SAVE, or SEND, or SUBSTITUTE, or SELECT...

-Rob

-----
Rob Warnock <rp...@rpw3.org>
627 26th Avenue <URL:http://rpw3.org/>
San Mateo, CA 94403 (650)572-2607

Torben Ægidius Mogensen

unread,

Sep 27, 2006, 4:40:07 AM9/27/06

to

ken...@cix.compulink.co.uk writes:

> What was also interesting was languages or systems that had no
> limit on variable name length but only considered the first "n"
> characters as significant.

As early versions of C did. I remember having a function and an array
whose names coincided on the significant characters. The function
worked fine until I started writing values to the array, which caused
the code of the function to be overwritten.

This was on a PDP-11, before write-protection of the code area became
common.

Torben

Nick Maclaren

unread,

Sep 27, 2006, 4:58:42 AM9/27/06

to

In article <pOudnTjWZLSBqofY...@speakeasy.net>,

rp...@rpw3.org (Rob Warnock) writes:
|>
|> The FOCAL language <http://en.wikipedia.org/wiki/Focal> definitely
|> wins the "prize"(?) here: Arbitrary command name length but *ONLY*
|> the *first* character was significant. So SET [as in "SET X = 12.34"]
|> could be replaced by S, or SAVE, or SEND, or SUBSTITUTE, or SELECT...

I used one like that (but it wasn't FOCAL), and got annoyed, so I used
mainly obscenities. I junked the program before running it :-)

This approach is another lunacy. Despite being generally accepted to
be a Bad Idea by the end of the 1960s, it kept being reinvented during
the 1970s and 1980s. I was involved in several designs where the
authors said that the users DEMAND abbreviations, but I couldn't
persuade them either that using the minimal unambiguous string was an
obstacle to later extensions or that, if extra characters were present,
they should be correct.

People really don't seem to like the idea of compilers checking for
programmer error :-(

Regards,
Nick Maclaren.

Stephen Fuld

unread,

Sep 27, 2006, 1:07:05 PM9/27/06

to

"Ketil Malde" <ketil...@ii.uib.no> wrote in message
news:egr6xxw...@polarvier.ii.uib.no...

snip

> Same goes for disk drive redundancy - more redundancy management in
> the firmware increases its complexity (and the chance for bugs), and
> slows it down. Putting it in the disk driver, fs, or volume manager,
> it is easier to fix, and possible to avoid (what if I need the speed,
> but don't care too much for data integrity?)

There are arguments on both sides of this. Some of them ar reminiscent of
the wheel of reincarnation in the graphics world. One of the big advantages
of putting say RAID function (I think that is what you are talking about) in
an external controller is that you can offload all of the work (extra I/O
requests, memory bandwidth from the extra data transfers and and needed XOR
operations) from the main CPUs and memory system. Also, since one can
usually manage to keep the memory in the controller non-volatile but the
main system memory is usually not, writes go much faster from the
application view when they are considered complete when the data is in the
controller's non-volatile memory rather than waiting for the disk to seek,
rotate etc.

As for answering your last question, RAID 0 or striping seems to fit your
requirement.

--
- Stephen Fuld
e-mail address disguised to prevent spam

Peter Grandi

unread,

Sep 27, 2006, 1:33:23 PM9/27/06

to

[ ... lots on case sensitivity ... ]

> Difference in case does convey a difference in meaning in many
> context - John is not to be confused with the john, the Big
> Apple is not a big apple, and so on. Some programming
> languages require different case for different kinds of
> entities (types and variables, for instance). Somebody brought
> up physical and chemical units, we could also mention the
> metric prefixes. And of course, there's Unix.

What is this "Unix" thing mentioned here? A unit of measure or
or a metric prefix? If so, isn't is curious that it was named
similarly to UNIX, the operating system? :-)

http://WWW.UNIX.org/trademark.html

[ ... ]

Nick Maclaren

unread,

Sep 27, 2006, 1:39:52 PM9/27/06

to

In article <yf3psdh...@base.gp.example.com>,

Please don't be silly. The name Unix was used long before The Open
Group was started, even including its predecessor organisations.
I am pretty sure that it was used outside AT&T before AT&T trademarked
it. It is unclear whether UNIX is a valid trademark at all and, even
if it is, its owners have no powers to stop people using it or any
derivative form of it for non-commercial purposes.

Regards,
Nick Maclaren.

Bill Todd

unread,

Sep 27, 2006, 2:07:50 PM9/27/06

to

Stephen Fuld wrote:
> "Ketil Malde" <ketil...@ii.uib.no> wrote in message
> news:egr6xxw...@polarvier.ii.uib.no...
>
> snip
>
>> Same goes for disk drive redundancy - more redundancy management in
>> the firmware increases its complexity (and the chance for bugs), and
>> slows it down. Putting it in the disk driver, fs, or volume manager,
>> it is easier to fix, and possible to avoid (what if I need the speed,
>> but don't care too much for data integrity?)
>
> There are arguments on both sides of this. Some of them ar reminiscent of
> the wheel of reincarnation in the graphics world. One of the big advantages
> of putting say RAID function (I think that is what you are talking about) in
> an external controller is that you can offload all of the work (extra I/O
> requests, memory bandwidth from the extra data transfers and and needed XOR
> operations) from the main CPUs and memory system.

Another advantage is that you can easily share the RAID device among
multiple concurrent accessors: while the current balance between
directly-attached storage cost and the added costs of such outboard
hardware plus the availability of cheap processing power and memory
bandwidth make me lean toward host-software-based RAID as being more
cost-effective (though you'll pay a RAID-4/5/6 performance cost if you
attempt to plug the infamous 'write hole' there), *if* you have actual
need to scale storage capacity and performance independently of
processing performance (and aren't able to use another level of
indirection, such that concurrently-shared servers can be used as
intermediaries) such shared-storage configurations are important.

Also, since one can
> usually manage to keep the memory in the controller non-volatile but the
> main system memory is usually not, writes go much faster from the
> application view when they are considered complete when the data is in the
> controller's non-volatile memory rather than waiting for the disk to seek,
> rotate etc.

That's a biggie for small-write-intensive environments - though you
really need a level of redundancy in the write-back cache at least equal
to that on the platters (e.g., either independent per-disk NV write-back
caches or mirrored NV write-back cache in the controller).

- bill

John L

unread,

Sep 27, 2006, 4:07:44 PM9/27/06

to

>The FOCAL language <http://en.wikipedia.org/wiki/Focal> definitely
>wins the "prize"(?) here: Arbitrary command name length but *ONLY*
>the *first* character was significant. So SET [as in "SET X = 12.34"]
>could be replaced by S, or SAVE, or SEND, or SUBSTITUTE, or SELECT...

Considering that FOCAL had to fit the language interpreter, the
software floating point package, and your program into 4K 12-bit words
on a PDP-8, it's remarkable it was as good as it was. It was a cut
down version of JOSS and its clones such as PDP-6 AID which had a much
more reasonable syntax, and mixed case.

R's,
John

Alex Colvin

unread,

Sep 27, 2006, 6:08:11 PM9/27/06

to

>The FOCAL language <http://en.wikipedia.org/wiki/Focal> definitely
>wins the "prize"(?) here: Arbitrary command name length but *ONLY*
>the *first* character was significant. So SET [as in "SET X = 12.34"]
>could be replaced by S, or SAVE, or SEND, or SUBSTITUTE, or SELECT...

I think it was the first 2 characters. I rememmber it took me quite a
while to realize! TELCOMP was stricter.

On the other hand, CDC Fortran allowed 8 letter names, but the linker only
used 7. I spent a while figuring out why my array was being executed.

Leaving "." in your search path can sometimes act the same way.

--
mac the naïf

jsa...@ecn.ab.ca

unread,

Sep 27, 2006, 10:30:22 PM9/27/06

to

Ketil Malde wrote:
> Right. Now we just need to decide which font the file system uses.

> :-)

No, you don't have to do that. All you have to do is determine which
fonts are used on the display terminals, the printing terminals, and
the line printers attached to the system.

Of course, I *do* agree that a rule such as I suggested for
case-dependence but language-independence, based on corresponding
letterforms in conventional fonts...

would be a *very bad* rule for a *file system*. No, here we need strict
case-dependence; a file name is a sequence of characters. If you can't
figure out which ones to type, fire up the GUI and click on the file
name from a list.

But for a *language processor*, while I don't think my idea would ever
be put into practice - multi-language keyboard setups are too awkward
to be worth bothering with - I don't think it is too unreasonable. It
actually *would* make code maintainable that passed through English,
Russian, and Greek hands...

John Savard

NOTE to Google Groups users: If you put "Re: " the original thread
title in the subject (and copy anything you want to quote by hand) even
with the current problems in this group, the post will go in the right
spot in the right thread. Despite the message saying it wasn't posted,
it _will_ be posted successfully on the first try, so you don't need to
repost.

Terje Mathisen

unread,

Sep 28, 2006, 12:09:43 AM9/28/06

to

John L wrote:
>> The FOCAL language <http://en.wikipedia.org/wiki/Focal> definitely
>> wins the "prize"(?) here: Arbitrary command name length but *ONLY*
>> the *first* character was significant. So SET [as in "SET X = 12.34"]
>> could be replaced by S, or SAVE, or SEND, or SUBSTITUTE, or SELECT...
>
> Considering that FOCAL had to fit the language interpreter, the
> software floating point package, and your program into 4K 12-bit words
> on a PDP-8, it's remarkable it was as good as it was. It was a cut

OTOH you have the counterexample of Turbo Pascal V1.0 to show that they
could have done significantly better:

This was about 6 x larger, at 35-37 K 8-bit bytes, but it did contain
quite a lot more:

A full Pascal compiler, with no real limits (actually 255 bytes) on
identifier lengths, nor any other caveats like that.

A full-screen editor.

A linker, loader & debugger.

The entire RTL.

Oh, BTW: It was written in a few months by a single danish teenager.

In my book, this is one of the all-time greatest SW hacks.

Torben Ægidius Mogensen

unread,

Sep 28, 2006, 3:58:07 AM9/28/06

to

Terje Mathisen <terje.m...@hda.hydro.com> writes:

> John L wrote:
>> Considering that FOCAL had to fit the language interpreter, the
>> software floating point package, and your program into 4K 12-bit words
>> on a PDP-8, it's remarkable it was as good as it was. It was a cut
>
> OTOH you have the counterexample of Turbo Pascal V1.0 to show that
> they could have done significantly better:
>
> This was about 6 x larger, at 35-37 K 8-bit bytes, but it did contain
> quite a lot more:
>
> A full Pascal compiler, with no real limits (actually 255 bytes) on
> identifier lengths, nor any other caveats like that.

It was not really a full Pascal compiler: It did not (IIRC) support
procedures as parameters, nor did it implement reference parameters
correctly, the point being that it used shallow binding which made it
impossible to pass a reference to a local variable to a recursive call
(since the same location would be reused in the new function
invocation).

> A full-screen editor.
>
> A linker, loader & debugger.
>
> The entire RTL.
>
> Oh, BTW: It was written in a few months by a single danish teenager.

Anders Hejlsberg, now better known for C#. But it was not really
written in a few months. It started out as Blue Label Pascal for the
Nascom-2, then Compas Pascal for CP/M and MS-DOS, then Poly Pascal and
finally Turbo Pascal when it was bought by Borland. The first Blue
Label Pascal may have been written in a few months, but it didn't have
all the features of Turbo Pascal. Most other compilers produced
P-code, but Turbo Pascal generated native code, which made programs
run a lot faster. It was also itself quite fast at compiling.

> In my book, this is one of the all-time greatest SW hacks.

Yes, it was indeed very impressive, and I believe Turbo Pascal was one
of the main factors in making Pascal wide-spread outside academia.

Torben

Nick Maclaren

unread,

Sep 28, 2006, 5:29:18 AM9/28/06

to

In article <efelmg$qfm$1...@xuxa.iecc.com>, jo...@iecc.com (John L) writes:
|>
|> Considering that FOCAL had to fit the language interpreter, the
|> software floating point package, and your program into 4K 12-bit words
|> on a PDP-8, it's remarkable it was as good as it was. It was a cut
|> down version of JOSS and its clones such as PDP-6 AID which had a much
|> more reasonable syntax, and mixed case.

Well, yes, but that is completely orthogonal to the choice of whether
to check in just the first letter (or two) or whether to reject words
with more than two. The amount of code needed is usually less for the
latter!

I agree that using long keywords would have taken too much space.
Been there - been caught by that :-)

Regards,
Nick Maclaren.

Terje Mathisen

unread,

Sep 28, 2006, 5:25:26 AM9/28/06

to

Torben Ćgidius Mogensen wrote:
> Terje Mathisen <terje.m...@hda.hydro.com> writes:
>
>> John L wrote:
>>> Considering that FOCAL had to fit the language interpreter, the
>>> software floating point package, and your program into 4K 12-bit words
>>> on a PDP-8, it's remarkable it was as good as it was. It was a cut
>> OTOH you have the counterexample of Turbo Pascal V1.0 to show that
>> they could have done significantly better:
>>
>> This was about 6 x larger, at 35-37 K 8-bit bytes, but it did contain
>> quite a lot more:
>>
>> A full Pascal compiler, with no real limits (actually 255 bytes) on
>> identifier lengths, nor any other caveats like that.
>
> It was not really a full Pascal compiler: It did not (IIRC) support
> procedures as parameters, nor did it implement reference parameters

Afair, you could indeed pass a function as a parameter to a procedure,
but I might be mistaken.

I do remember using that functionality at a somewhat later date, but
that would have been a later version of the compiler as well.

> correctly, the point being that it used shallow binding which made it
> impossible to pass a reference to a local variable to a recursive call
> (since the same location would be reused in the new function
> invocation).

Huh?

I'm quite sure I used something like the following to implement
recursive directory scanning:

function scan(path: string, var total: integer, var files: integer):
integer;
var
t, f : integer;
begin
t := 0; f := 0;
... for each file do ...
t := t + size; f := f + 1;
... for each subdir do ...
if (scan(path+"\"+dirname, t, f) <> 0) then begin
scan := -1;
exit;
end;

total := total + t; files := files + f;
scan := 0;
end;

>> In my book, this is one of the all-time greatest SW hacks.
>
> Yes, it was indeed very impressive, and I believe Turbo Pascal was one
> of the main factors in making Pascal wide-spread outside academia.

And inside it: My alma mater used TP for all first-year programming
assignments for several years around 1982-1985. Before that, i.e. when I
started there, the same classes used a horrible Fortran-2 compiler on an
1100 machine.

4 hours turnaround time from submitting your stack of hand-puched cards
until you got the lineprinter output, with a syntax error on one of the
initial @@ command lines. :-(

This is the kind of environment which forces anyone interested in
computers to become a night owl. :-(

Ketil Malde

unread,

Sep 28, 2006, 6:50:30 AM9/28/06

to

pg...@0610.exp.sabi.co.UK (Peter Grandi) writes:

> What is this "Unix" thing mentioned here? A unit of measure or
> or a metric prefix? If so, isn't is curious that it was named
> similarly to UNIX, the operating system? :-)

Unix the family of operating systems, and in this context, rather the
cultural baggage that comes with it. Which is distinct from UNIX the
trademark, so there.

English is rather typical of natural languages in that it allows for
ambiguity. I remain unconvinced that the resolution of linguistic (or
graphical) ambiguities is a job for a file system.

Ketil Malde

unread,

Sep 28, 2006, 6:59:21 AM9/28/06

to

"Stephen Fuld" <s.f...@PleaseRemove.att.net> writes:

> There are arguments on both sides of this.

I was thinking of in-drive redundancy, as discussed in the thread
starting with:
Message-ID: <u0pku3-...@osl016lin.hda.hydro.com>

Sorry if that wasn't clear from my post.

Jonathan Thornburg -- remove -animal to reply

unread,

Sep 28, 2006, 8:27:41 AM9/28/06

to

Ketil Malde <ketil...@ii.uib.no> wrote:
> Difference in case does convey a difference in meaning in many context
> - John is not to be confused with the john, the Big Apple is not a big
> apple, and so on.

Another (in)famous example is that of "polish" (what you might do to
a dirty sink) vs "Polish" (refers to a country in North-Eastern Europe).

ciao,

--
-- "Jonathan Thornburg -- remove -animal to reply" <jth...@aei.mpg-zebra.de>
Max-Planck-Institut fuer Gravitationsphysik (Albert-Einstein-Institut),
Golm, Germany, "Old Europe" http://www.aei.mpg.de/~jthorn/home.html
"Washing one's hands of the conflict between the powerful and the
powerless means to side with the powerful, not to be neutral."
-- quote by Freire / poster by Oxfam

Torben Ægidius Mogensen

unread,

Sep 28, 2006, 10:39:35 AM9/28/06

to

Terje Mathisen <terje.m...@hda.hydro.com> writes:

[example deleted]

That may also be a difference between versions. My memories may be
for Compas Pascal, the predecessor to Turbo Pascal.

>> I believe Turbo Pascal was one of the main factors in making Pascal
>> wide-spread outside academia.
>
> And inside it: My alma mater used TP for all first-year programming
> assignments for several years around 1982-1985. Before that, i.e. when
> I started there, the same classes used a horrible Fortran-2 compiler
> on an 1100 machine.
>
> 4 hours turnaround time from submitting your stack of hand-puched
> cards until you got the lineprinter output, with a syntax error on one
> of the initial @@ command lines. :-(
>
> This is the kind of environment which forces anyone interested in
> computers to become a night owl. :-(

I had a similar experience. When I started CS, we also used an 1100
with punched cards which were collcted and run once every other hour
(so slightly better than your experience, I agree). We later bought
some "Piccolo"s (Z80-based PCs) from Regnecentralen and used Compas
Pascal on these. These were later replaced by Wintel PCs with Turbo
Pascal. So it may very well be true that the limitations I mentioned
were for Compas Pascal only.

Torben

Stephen Fuld

unread,

Sep 28, 2006, 12:28:43 PM9/28/06

to

"Ketil Malde" <ketil...@ii.uib.no> wrote in message

news:egbqp0u...@polarvier.ii.uib.no...

>
> "Stephen Fuld" <s.f...@PleaseRemove.att.net> writes:
>
>> There are arguments on both sides of this.
>
> I was thinking of in-drive redundancy, as discussed in the thread
> starting with:
> Message-ID: <u0pku3-...@osl016lin.hda.hydro.com>
>
> Sorry if that wasn't clear from my post.

No problem. I also apologize for any confusion.

ken...@cix.compulink.co.uk

unread,

Sep 28, 2006, 1:45:15 PM9/28/06

to

In article <81hsu3-...@osl016lin.hda.hydro.com>,
terje.m...@hda.hydro.com (Terje Mathisen) wrote:

> 4 hours turnaround time from submitting your stack of
> hand-puched cards until you got the lineprinter output, with a
> syntax error on one of the initial @@ command lines. :-(

You were lucky, we did not even get to punch our own cards and
turn round was about a week. That was of course basic not
fortran.

Ken Young

ken...@cix.compulink.co.uk

unread,

Sep 28, 2006, 1:45:16 PM9/28/06

to

In article <efelmg$qfm$1...@xuxa.iecc.com>, jo...@iecc.com (John L)

wrote:

> Considering that FOCAL had to fit the language interpreter,

The point is that in the systems I used (Basic not Focal) the
limit on significant variable name use was not enforced. It's not
so much that only a few letters (one) were used it is that using
longer names than were significant was possible and not flagged
as an error by the system. Come to that with the Basic I used on
my first home computer the limits on variable names, forbidden
characters and variable name length were in the small print.

Ken Young

Jan Vorbrüggen

unread,

Sep 29, 2006, 7:19:10 AM9/29/06

to

>>However, in no way does it follow from this that a difference in case should
>>also convey a difference in semantics. _That_ is what's at the heart of the
>>issue.
> Oh, come on, this is a cultural issue - de gustibus, and all that.

No, taste or culture has nothing to do with it. It is an ergonomics issue,
and has to do with making a device safe and easy to use.

> If you want to prevent the user from creating files that could be confused,
> you can do it in application space.

That doesn't work because there isn't a unified "application space", and such
basic rules need to be enforced properly. Thus, only the OS is able to do this
as a central agency.

> Same goes for disk drive redundancy - more redundancy management in
> the firmware increases its complexity (and the chance for bugs), and
> slows it down. Putting it in the disk driver, fs, or volume manager,
> it is easier to fix, and possible to avoid (what if I need the speed,
> but don't care too much for data integrity?)

That sort of attitude has got us Winwoes with all the security holes because
of backwards compatibility and "let's make sure the user experience is great!"
Bah pfui.

Jan

Richard

unread,

Sep 29, 2006, 12:32:30 PM9/29/06

to

[Please do not mail me a copy of your followup]

=?UTF-8?B?SmFuIFZvcmJyw7xnZ2Vu?= <jvorbr...@not-mediasec.de> spake the secret code
<4o4do1F...@individual.net> thusly:

>That sort of attitude has got us Winwoes with all the security holes because
>of backwards compatibility and "let's make sure the user experience is great!"
>Bah pfui.

Yeah, lets have a crappy user experience.
--
"The Direct3D Graphics Pipeline" -- DirectX 9 draft available for download
<http://www.xmission.com/~legalize/book/download/index.html>

John Mashey

unread,

Sep 29, 2006, 12:55:48 PM9/29/06

to

John Mashey wrote:
> Bill Todd wrote:
>
> >Out of curiosity, does anyone know of a good reason why file names
> >should *ever* be case-sensitive (aside from the fact that Unix users and
> >applications have become used to this)?

> 4) So, when you start with a language (like C), where UPPER and upper
> are distinct
> - and where everybody was doing lots of text-processing [where they are
> also distinct]
> - and where commands often had both lowercase and uppercase flags
> - and where it is easy enough to translate anything to monocase when
> one wanted to
> - and where a line or two of shell script could give a list of
> identifiers differing only in case
>
> It is quite consistent that filenames be case-sensitive as well.

Well, now that posting again, one last comment before getting back to
work.

This straightforward answer to a question elicited a lot of comments,
of which some were of the form "but my taste is...", and some of which
were weird analogies, and some of which were wrong.
So, let me summarize:

1) Some historical facts
a) ASCII was first standardized in 1967. As did EBCDIC, it had
standard encodings of a-zA-Z.
it did *not* have standard representations of:

- font families, sizes, Bold, Italic, etc, etc.
- colors

b) In many environments of the 1960s, monocase, specifically uppercase,
was what most people could use in practice, given:
- keypunches and punch cards
- TTY33 terminals, used widely
- line printers [in the classic IBM 1403, the normal print chain was
only uppercase; you could get a lowercase print chain mounted, but you
only did this if needed, as it was slower, since there were less copies
of each letter, back when this mattered.]
- early CRTs often supported uppercase only, or lowercase was an extra
cost option, as most were not general bitmapped displays.

c) By the early 1970s, we at least had Selectric-typewriter-derived
terminals (2741), daisy-wheel terminals, and CRTs, dot-matrix printers,
and (a little later) a few of us got typesetters. Some of us used APL
typeballs.

d) However, keyboards didn't change a lot (and haven't generally),
remaining akin to the standard Selectric, with, of course, minor
geographical variations for national-language character sets.

As far as I know, on such keyboards:
- it is fastest to type lowercase
- UPPERCASE SEQUENCES, POSSIBLY_WITH_UNDERSCORES are not too bad
- CamelCase, either UpperCamelCase, or lowerCamelCase, are not too bad,
as they only need occasional shifts.
-Human typing of RAndoM CAPitalIzaTIOn is painful, and people simply
don't do that. Computers may well generate such things for passwords,
and occasionally filenames, but humans don't, even on systems that
allow that.

Likewise, normal keyboards don't have "Bold-shift" keys or "red-shift"
keys.

2) In programming languages (1960s)
- Some were UPPERCASE and always shown that way.
- Some (Algol in particular, ) wrestled with system limitations by
adopting 3 levels of language:
Reference Language, Publication Language, Hardware Representations.
See http://www.masswerk.at/algol60/report.htm if you're not familiar
with this approach: we don't do this any more, thank goodness.

3) There is rarely anything new, but one observes that two of the
strongest influences on modern software came from the early 1970s:

- Bell Labs UNIX & C

- Xerox PARC software, including Smalltalk

CamelCase and its variants are often attributed to the latter.

In C, of course, lowercase is dominant, with all reserved words being
lowercase, but with UPPER_CASE usually used for preprocessor items, but
with choices up to the taste of the developers. There was a time when
linker limitations (short length of identifiers) influenced practices
in C at least (i.e., wish to make C code portable across UNIX systems,
IBM S/370, GECOS, etc, using vendor-supplied linkers). Although the
earliest UNIX shell variables were only lowercase, UPPERCASE
ENVIRONMENT VARIABLES became the stylistic standard, somewhat akin to
C.

Of course, it is actually quite common to find C code where people use
UPPERCASE names identical to lowercase reserved words, i.e., for things
like CONST, VOLATILE, etc.

A useful summary is in:
http://merd.sourceforge.net/pixel/language-study/syntax-across-languages/Vrs.html

>From that:
Case-sensitive languages include:
"Awk, B, C, C#, C++, Haskell, Java, JavaScript, Lua, Maple, Matlab,
merd, Modula-3, OCaml, Perl, Perl6, Pike, Pliant, Prolog, Python, Ruby,
sh, Smalltalk, Tcl, XML, YAML

Case-insensitive:
Ada, Assembler, Classic REXX, Common Lisp, Eiffel, Forth, HTML, Logo,
Pascal, PL/I, Rebol, Scheme, SGML, SQL92, Visual Basic"

(Some important omissions include Postscript in first group, and
FORTRAN & COBOL in second).

While there are some exceptions (notably HTML), older languages tend to
be case-insensitive, newer languages case-sensitive, i.e., people have
voted with their feet...

4) People have different tastes, even when using the same facilities.
The question is, when designing a system is:
a) Do you have a very clear idea that you *know* the right way that
everyone should use it, and you will make restrictions to make sure
that that's the only way to use it
OR
b) Do allow the straightforward capabilities of terminals and encodings
to be used, set good stylistic examples, and assume that if someone
sensible does something different, they either have a good reason to do
it, or if they are idiots [and for example, type lots of filenames with
random capitailization], they'll get what they deserve.

[I know which one I prefer, but perhaps that's taste. ]

I've probably seen many thousands (maybe millions) of UNIX filenames,
and almost all of them were lowercase.
I've occasionally seen projects where some set of commands was
UPPERCASE, and (akin to the CONST, etc usage in C), and occasionally,
people would even have UPPERCASE commands that were purposefully chosen
to be identical to standard lowercase ones, but where different
defaults or environment variables were set. I wouldn't be surprised if
CameCase fans had related filenames.

5) I'd argued (in 1977) for lowercase/mixed case for legibility,
knowing that readability is the real end goal, but because while it's
easy tobe legible, but unreadable, it's hard to be illegible and still
eadable. Even a legible font family can be used to write unreadable
code.

If someone wants to claim that people shouldn't be allowed to write
obfuscated code by writing hoRRIBLY_MiXed_CaSe and hOrribly_MIXed_CaSe
together, that's an opinion, but I observe that people don't do that,
or they don't last long as programmers.

However, I wasn't relying on natural-langauge comaprisons in general,
because comptuer languages (including file-naming) are not natural
languages.

Natural languages typically started with speech, but most computer
languages are not optimized to be spoken, jsut as mathematical notation
is rarely spoken.

a) They should be reasonable to enter.
b) They must be even better to be read, since typical code is read far
more often than written.

c) But spoken? The fact that people, when speaking, don't distinguish
between "Fish" and "fish" is irrelevant here, because people don't
speak code very often, and rarely speak long filenames.

When was the last time you spoke:
if ( a < b) { a++; b++;}

i.e.,
if left-parenthesis a less-than b right-parenthesis left-curly a
plus-plus semicolon b plus-plus semicolon right curly

Other than automated readers for sight-impaired folks, people just
don't do that much.

A typical English reading rate is about 300 WPM. If a programmer were
lucky to read 10 WPM of code ... they wouldn't be a successful
programmer very long. Any given human can only process so many tokens,
and for code, you want to do what you can to avoid hindering that rate.

6) Everyone is entitled to their own tastes, some of which are
rational, and some ofwhich are more from habit and existing experience
than anything else. On the other hand, this whole discussion got
started as a straightforward answer to a simple question, whose answer
was basically that:

-given mixed-case keyboards, I/O devices, and ASCI, it was easy and
consistent to support case-sensitivity

- this preserved flexibility

But of course, if people wanted to be foolish, that allowed them to be.

It's just like unmoderated newsgroups: people are free to post what
they like, even if it's silly.

Terje Mathisen

unread,

Sep 30, 2006, 4:32:27 AM9/30/06

to

John Mashey wrote:
> When was the last time you spoke:
> if ( a < b) { a++; b++;}
>
> i.e.,
> if left-parenthesis a less-than b right-parenthesis left-curly a
> plus-plus semicolon b plus-plus semicolon right curly

You (or at least I :-) don't pronounce code that way, but rather like this:

"If a (is) less than (then) b increment a and b", with the (is) and
(then) being optional.

I.e. I speak code in more or less the way I'd explain it, in a sort of
pseudocode.

Tarjei T. Jensen

unread,

Sep 30, 2006, 7:32:05 AM9/30/06

to

Terje Mathisen wrote:
> Afair, you could indeed pass a function as a parameter to a procedure, but
> I might be mistaken.

There were several versions of Turbo Pascal. From version 4 or 5 it looked
quite like UCSD-Pascal.

I believe that the OO stuff was lifted from Object pascal. Not entirely sure
about that.

greetings,

jsa...@ecn.ab.ca

unread,

Sep 30, 2006, 9:14:15 AM9/30/06

to

Ketil Malde wrote:
> jsa...@ecn.ab.ca writes:

> > Have the rule be that the variable names are *case-sensitive* but
> > *language-insensitive*, so that only letters that look virtually
> > identical are equivalent.

Jan Vorbrüggen

unread,

Oct 2, 2006, 3:36:32 AM10/2/06

to

I agree with most of your points, John...but disagree with some.

> As far as I know, on such keyboards:
> - it is fastest to type lowercase
> - UPPERCASE SEQUENCES, POSSIBLY_WITH_UNDERSCORES are not too bad
> - CamelCase, either UpperCamelCase, or lowerCamelCase, are not too bad,
> as they only need occasional shifts.
> -Human typing of RAndoM CAPitalIzaTIOn is painful, and people simply
> don't do that. Computers may well generate such things for passwords,
> and occasionally filenames, but humans don't, even on systems that
> allow that.

Agreed. I described that experiment with random case only to show that
your (the collective) cognitive system discards case information very
soon in the processing chain. Thus, if case carries semantics - instead
of being used as a visual segmentation aid, as in CamelCase - those
semantics have to be extracted at a very early stage, which won't happen
easily, if at all, without "looking back" - which will decrease readability.

> I've probably seen many thousands (maybe millions) of UNIX filenames,
> and almost all of them were lowercase.

Really? I have seen too many cases along the lines of makefile, Makefile
and MAKEFILE meaning different things to believe that. It still gets on
my nerves that on one Unix system using one flavour of mail UA, I need to
look in .mail for mail files, in the other in .Maildir - just entering
".mai" and asking for filename expansion won't do for both. I don't need
to reserve synapses just to memorize that sort of useless distinctions,
thank you.

> I've occasionally seen projects where some set of commands was
> UPPERCASE, and (akin to the CONST, etc usage in C), and occasionally,
> people would even have UPPERCASE commands that were purposefully chosen
> to be identical to standard lowercase ones, but where different
> defaults or environment variables were set.

I have seen those, and I have yet to see case where using such a scheme
created advantages that are more important than the disadvantages caused
by confusion et al. in the long run. People just aren't good at remembering
such subtle distinctions.

> If someone wants to claim that people shouldn't be allowed to write
> obfuscated code by writing hoRRIBLY_MiXed_CaSe and hOrribly_MIXed_CaSe
> together, that's an opinion, but I observe that people don't do that,
> or they don't last long as programmers.

Oh but they do - and enve where they don't, other people are then saddled
with maintaining such code; which of course will not be documented at all,
and in particular the strange choices in case will need to be re-engineered
from context by the maintainer. Thank you, but no thanks - BTDT.

> a) They should be reasonable to enter.
> b) They must be even better to be read, since typical code is read far
> more often than written.

We strongly agree on that.

> -given mixed-case keyboards, I/O devices, and ASCI, it was easy and
> consistent to support case-sensitivity
>
> - this preserved flexibility
>
> But of course, if people wanted to be foolish, that allowed them to be.

That latter premise I don't agree with. In many cases of wide-spread use,
people are _not_ allowed to be foolish: In most (all?) of Europe, you need
to get your car checked for continued safety (e.g., the braking system) at
reasonable intervals (from 6 months to 3 years in Germany, depending on age
and type of use) by a trusted third party. If you don't, your car will be
pulled from circulation until you have it checked, and you will be fined.
Programming is already ubiquitous enough that we don't have to make the tools
of our trade unnecessarily dangerous to all involved.

Jan

Torben Ægidius Mogensen

unread,

Oct 2, 2006, 4:55:56 AM10/2/06

to

"John Mashey" <old_sys...@yahoo.com> writes:

> 2) In programming languages (1960s)
> - Some were UPPERCASE and always shown that way.
> - Some (Algol in particular, ) wrestled with system limitations by
> adopting 3 levels of language:
> Reference Language, Publication Language, Hardware Representations.
> See http://www.masswerk.at/algol60/report.htm if you're not familiar
> with this approach: we don't do this any more, thank goodness.

Have you seen Fortress
(http://research.sun.com/projects/plrg/JapanLecture2006public.pdf)?

> While there are some exceptions (notably HTML), older languages tend to
> be case-insensitive, newer languages case-sensitive,

I think the main reason so many older langauges are case-insensitive
is that they started out as uppercase-only and when lower case became
common, it was decided to allow keywords in both cases.

> Natural languages typically started with speech, but most computer
> languages are not optimized to be spoken, jsut as mathematical notation
> is rarely spoken.
>
> a) They should be reasonable to enter.
> b) They must be even better to be read, since typical code is read far
> more often than written.

A very good point. I think b should take precedence over a (to a
degree), exactly for that reason.

> c) But spoken? The fact that people, when speaking, don't distinguish
> between "Fish" and "fish" is irrelevant here, because people don't
> speak code very often, and rarely speak long filenames.
>
> When was the last time you spoke:
> if ( a < b) { a++; b++;}
>
> i.e.,
> if left-parenthesis a less-than b right-parenthesis left-curly a
> plus-plus semicolon b plus-plus semicolon right curly

When I do read something like the above, I read the semantics, not the
text, i.e., "If a less than b, then increment a and b".

> A typical English reading rate is about 300 WPM. If a programmer were
> lucky to read 10 WPM of code ... they wouldn't be a successful
> programmer very long.

Reading is the easy part. Understanding may take longer, especially
if the code is using obscure library routines or complex algorithms.

> 6) Everyone is entitled to their own tastes, some of which are
> rational, and some ofwhich are more from habit and existing experience
> than anything else.

If the only people who will ever read the code is the programemrs
themselves, I agree. But there is a limit to how much individuality
is acceptable in an environment where other people are expected to
work with your code. For example, I wouldn't accept variable names
in, say, Welsh unless the code was intended for Welsh-speakers only.

Torben

Nick Maclaren

unread,

Oct 2, 2006, 6:48:59 AM10/2/06

to

In article <1159548948.6...@m73g2000cwd.googlegroups.com>,

It wasn't the modern version, either. There were three "national
characters", which varied, and control characters were definitely NOT
data character (yes, I mean TAB!)

|> b) In many environments of the 1960s, monocase, specifically uppercase,
|> was what most people could use in practice, given:
|> - keypunches and punch cards
|> - TTY33 terminals, used widely
|> - line printers [in the classic IBM 1403, the normal print chain was
|> only uppercase; you could get a lowercase print chain mounted, but you
|> only did this if needed, as it was slower, since there were less copies
|> of each letter, back when this mattered.]
|> - early CRTs often supported uppercase only, or lowercase was an extra
|> cost option, as most were not general bitmapped displays.

Er, no, not in the UK. Paper tape devices were derived from electric
typewriters, and usually had both cases - and they date from the dawn
of modern computing and dominated here up to the 1960s. Probably for
related reasons, early CRTs in the UK often had both cases.

|> As far as I know, on such keyboards:
|> - it is fastest to type lowercase
|> - UPPERCASE SEQUENCES, POSSIBLY_WITH_UNDERSCORES are not too bad
|> - CamelCase, either UpperCamelCase, or lowerCamelCase, are not too bad,
|> as they only need occasional shifts.
|> -Human typing of RAndoM CAPitalIzaTIOn is painful, and people simply
|> don't do that. Computers may well generate such things for passwords,
|> and occasionally filenames, but humans don't, even on systems that
|> allow that.

Generally true, even for paper tape, as shifting was often mechanical.

|> Likewise, normal keyboards don't have "Bold-shift" keys or "red-shift"
|> keys.

Red-shift was rare, but not all that rare on paper tape devices, and
was used by at least one programming language.

|> 2) In programming languages (1960s)
|> - Some were UPPERCASE and always shown that way.
|> - Some (Algol in particular, ) wrestled with system limitations by
|> adopting 3 levels of language:
|> Reference Language, Publication Language, Hardware Representations.
|> See http://www.masswerk.at/algol60/report.htm if you're not familiar
|> with this approach: we don't do this any more, thank goodness.

Indeed. Even given the constraints of the time, it was a Bad Idea.

I never saw CamelCase in early BCPL programs, but it may have been
used by some people. Upper case for code and lower case for strings
and comments was common.

|> While there are some exceptions (notably HTML), older languages tend to
|> be case-insensitive, newer languages case-sensitive, i.e., people have
|> voted with their feet...

No, not at all. You should distinguish monocase from case-insensitive.
before Fortran 90, Fortran was monocase (and the same was true for
most languages before about 1980).

And, no, they have NOT "voted with their feet"! Most programmers have
not been given the chance to choose. The reason for the change is that
CS students were taught case-sensitive languages (often C), have
merely copied what they were familiar with, and it is they who have
designed languages and written compilers.

|> 4) People have different tastes, even when using the same facilities.

True.

|> The question is, when designing a system is:
|> a) Do you have a very clear idea that you *know* the right way that
|> everyone should use it, and you will make restrictions to make sure
|> that that's the only way to use it
|> OR
|> b) Do allow the straightforward capabilities of terminals and encodings
|> to be used, set good stylistic examples, and assume that if someone
|> sensible does something different, they either have a good reason to do
|> it, or if they are idiots [and for example, type lots of filenames with
|> random capitailization], they'll get what they deserve.
|>
|> [I know which one I prefer, but perhaps that's taste. ]

That may be the question but, phrased like that, it is propaganda.
I am agnostic on this issue, but let's phrase it the other way:

The question is, when designing a system is:

a) Do you attempt to make its interface fail-safe, so that it detects
many of the most common human errors, and diagnoses them so the user
has a chance to fix the bug before it does the wrong thing?
OR
b) Do you allow anything that allows the compiler to proceed and
say that, if users make mistakes, fails to notice them and the code
does the wrong thing, they are idiots and get what they deserve?

|> I've probably seen many thousands (maybe millions) of UNIX filenames,
|> and almost all of them were lowercase.

Perhaps because Unix users find it difficult to use the shift key
while holding a hamburger in one hand :-)

|> If someone wants to claim that people shouldn't be allowed to write
|> obfuscated code by writing hoRRIBLY_MiXed_CaSe and hOrribly_MIXed_CaSe
|> together, that's an opinion, but I observe that people don't do that,
|> or they don't last long as programmers.

Unfortunately, user-interface designers tend to design interfaces for
themselves, and not for their customers - who are often not the same
type of programmer, even when they are programmers.

Regards,
Nick Maclaren.

Jan Vorbrüggen

unread,

Oct 2, 2006, 6:56:47 AM10/2/06

to

> The question is, when designing a system is:
> a) Do you attempt to make its interface fail-safe, so that it detects
> many of the most common human errors, and diagnoses them so the user
> has a chance to fix the bug before it does the wrong thing?
> OR
> b) Do you allow anything that allows the compiler to proceed and
> say that, if users make mistakes, fails to notice them and the code
> does the wrong thing, they are idiots and get what they deserve?

Thanks for summarizing this so nicely, Nick - this is precisely the point
I was trying to make.

Jan

dg...@barnowl.research.intel-research.net

unread,

Oct 2, 2006, 12:12:31 PM10/2/06

to

nm...@cus.cam.ac.uk (Nick Maclaren) writes:

> In article <1159548948.6...@m73g2000cwd.googlegroups.com>,
> "John Mashey" <old_sys...@yahoo.com> writes:
> |> The question is, when designing a system is:
> |> a) Do you have a very clear idea that you *know* the right way that
> |> everyone should use it, and you will make restrictions to make sure
> |> that that's the only way to use it
> |> OR
> |> b) Do allow the straightforward capabilities of terminals and encodings
> |> to be used, set good stylistic examples, and assume that if someone
> |> sensible does something different, they either have a good reason to do
> |> it, or if they are idiots [and for example, type lots of filenames with
> |> random capitailization], they'll get what they deserve.
> |>
> |> [I know which one I prefer, but perhaps that's taste. ]
>
> That may be the question but, phrased like that, it is propaganda.
> I am agnostic on this issue, but let's phrase it the other way:
>
> The question is, when designing a system is:
> a) Do you attempt to make its interface fail-safe, so that it detects
> many of the most common human errors, and diagnoses them so the user
> has a chance to fix the bug before it does the wrong thing?
> OR
> b) Do you allow anything that allows the compiler to proceed and
> say that, if users make mistakes, fails to notice them and the code
> does the wrong thing, they are idiots and get what they deserve?

I think this second phrasing is just about as agnostic as the first,
which makes me doubt your agnosticisim. What's more, when I create
apple.txt to describe my favourite apple reciples, and Apple.txt to
describe some of my investments in the stock market, the two categories
above get reversed (the case insensitive file system happily does the
wrong thing, and the idiots get what they deserve - and no, you don't
get to claim that a good UI will fix this, or that users should've
expected those files to be the same).

--
David Gay
dg...@acm.org

Nick Maclaren

unread,

Oct 2, 2006, 12:21:40 PM10/2/06

to

In article <79mz8ek...@barnowl.research.intel-research.net>,

|> which makes me doubt your agnosticisim. ...

Your observation is entirely correct, but you have misunderstood.
Yes, it is propaganda, and what the expression "let's phrase it the
other way" meant was "let's phrase it with a similar bias towards the
other side."

And that is what I did :-)

No, I wouldn't phrase it like that if I were asking the question
fairly. There are good arguments both ways, and no right answer
that applies to all requirements.

Regards,
Nick Maclaren.

Robert Mabee

unread,

Oct 2, 2006, 3:19:44 PM10/2/06

to

Nick Maclaren wrote:
> In article <1159548948.6...@m73g2000cwd.googlegroups.com>,
> "John Mashey" <old_sys...@yahoo.com> writes:
> |> 3) There is rarely anything new, but one observes that two of the
> |> strongest influences on modern software came from the early 1970s:
> |>
> |> - Bell Labs UNIX & C
> |>
> |> - Xerox PARC software, including Smalltalk
> |>
> |> CamelCase and its variants are often attributed to the latter.
>
> I never saw CamelCase in early BCPL programs, but it may have been
> used by some people. Upper case for code and lower case for strings
> and comments was common.

BCPL's standard library was CamelCase, since all-caps is too ugly and
all-lower-case was reserved for system words. [If C (++) had copied
that destinction then it wouldn't be stuck now for extensions that don't
break existing programs.] The port to the 360 (monocase EBCDIC on
punched cards) foolishly gave this up. We should have used some other
distinguishing mark, such as underscore, required in user-generated
names over a certain length.

Tom Gardner

unread,

Oct 2, 2006, 3:27:26 PM10/2/06

to

nm...@cus.cam.ac.uk (Nick Maclaren) wrote in
news:efqqqr$fl3$1...@gemini.csx.cam.ac.uk:

> Er, no, not in the UK. Paper tape devices were derived from electric
> typewriters, and usually had both cases - and they date from the dawn
> of modern computing and dominated here up to the 1960s. Probably for
> related reasons, early CRTs in the UK often had both cases.

The ones I used were more likely to have been
derived from telex machines. Editing 5 channel
paper tape at 5cps was not fun. But it is fun to try
to get youngsters to think about how 26 uppercase
letters, plus digits, plus assorted other symbols
could be encoded in 2**5 combinations.

Niels Jørgen Kruse

unread,

Oct 2, 2006, 3:51:54 PM10/2/06

to

<dg...@barnowl.research.intel-research.net> wrote:

Well, if the idiots clicked yes to overwriting an existing file, then
they ought to have wondered what that file was.

--
Mvh./Regards, Niels Jørgen Kruse, Vanløse, Denmark

Nick Maclaren

unread,

Oct 2, 2006, 4:16:18 PM10/2/06

to

In article <4Y2dnToUPdOU-7zY...@comcast.com>,

Robert Mabee <rma...@comcast.net> writes:
|>
|> BCPL's standard library was CamelCase, since all-caps is too ugly and
|> all-lower-case was reserved for system words. [If C (++) had copied
|> that destinction then it wouldn't be stuck now for extensions that don't
|> break existing programs.] The port to the 360 (monocase EBCDIC on
|> punched cards) foolishly gave this up. We should have used some other
|> distinguishing mark, such as underscore, required in user-generated
|> names over a certain length.

Interesting. I didn't use BCPL on the Titan, though I saw it, but my
recollection of it on the System/370 was that it was case-insensitive.
Unless I am losing my marbles (which is possible), it accepted both
cases by 1972.

Certainly, we never supported only upper-case on Phoenix, and I input
quite a lot of my programs from paper tape (to get mixed case).

Regards,
Nick Maclaren.

Eugene Miya

unread,

Oct 2, 2006, 4:36:00 PM10/2/06

to

In article <efqqqr$fl3$1...@gemini.csx.cam.ac.uk>,

Nick Maclaren <nm...@cus.cam.ac.uk> wrote:
>In article <1159548948.6...@m73g2000cwd.googlegroups.com>,
>"John Mashey" <old_sys...@yahoo.com> writes:
>|> 1) Some historical facts

..

>That may be the question but, phrased like that, it is propaganda.
>I am agnostic on this issue, but let's phrase it the other way:

No. You are not agnostic.

>The question is, when designing a system is:

>a) Do you attempt to make its interface fail-safe,...
>OR
>b) Do you allow anything that allows the compiler ..
>get what they deserve?

This is the computer equivalent to Bill O'Reilly and his "No spin" Zone.

>Perhaps because Unix users find it difficult to use the shift key
>while holding a hamburger in one hand :-)

Possibly one of the best programmers I know only uses one hand all the time.

>Unfortunately, user-interface designers tend to design interfaces for
>themselves, and not for their customers - who are often not the same
>type of programmer, even when they are programmers.

Consider staying with COBOL Nick. Or better: improving it. 8^)

--

dg...@barnowl.research.intel-research.net

unread,

Oct 3, 2006, 1:50:10 PM10/3/06

to

You ignored the bit about not getting to claim that a good UI will fix
this (if you get to play UI fixes, you can do lots of things in
case-sensitive land too).

--
David Gay
dg...@acm.org

Hugh McIntyre

unread,

Oct 8, 2006, 2:08:20 AM10/8/06

to

In article <4obtqiF...@individual.net>,

=?ISO-8859-1?Q?Jan_Vorbr=FCggen?= <jvorbr...@not-mediasec.de> writes:
|> Really? I have seen too many cases along the lines of makefile, Makefile
|> and MAKEFILE meaning different things to believe that.

Wassn't part of the justification of this so that Makefile, README, and other
such files appear first in directory listings, not mixed in with program
files beginning with lower-case a-z.

Of course you can use "00README" and the like on VMS, but I seem to remember
this being the justification for "Makefile".

|> It still gets on
|> my nerves that on one Unix system using one flavour of mail UA, I need to
|> look in .mail for mail files, in the other in .Maildir - just entering
|> ".mai" and asking for filename expansion won't do for both. I don't need
|> to reserve synapses just to memorize that sort of useless distinctions,
|> thank you.

Part of the answer here is that this is an issue with the shell, i.e.
the application-not-OS, on Unix at least. For example on MacOS X with
a case-allowed-but-ignored-on-lookups filesystem, you still get the problem
you see with the command-line shell.

Someone could presumably enhance the shells to do better completions though.

For mail you may be stuck though, since (if I remember), IMAP at least
specifies case-sensitive folder names, so if you use folder names as
filenames then you need to be able to store "name" and "Name" distinctly.

Not that such names are necessarily a good idea.

Hugh.

Jan Vorbrüggen

unread,

Oct 8, 2006, 12:24:21 PM10/8/06

to

>>Well, if the idiots clicked yes to overwriting an existing file, then
>>they ought to have wondered what that file was.
> You ignored the bit about not getting to claim that a good UI will fix
> this (if you get to play UI fixes, you can do lots of things in
> case-sensitive land too).

What in hell does "a good UI fix" have to do with that in any case? Any
system which overwrites an existing file without expressly being told to
do so is broken to begin with.

Jan

toby

unread,

Oct 8, 2006, 12:23:40 PM10/8/06

to

Hugh McIntyre wrote:
> In article <4obtqiF...@individual.net>,
> =?ISO-8859-1?Q?Jan_Vorbr=FCggen?= <jvorbr...@not-mediasec.de> writes:
> |> Really? I have seen too many cases along the lines of makefile, Makefile
> |> and MAKEFILE meaning different things to believe that.
>
> Wassn't part of the justification of this so that Makefile, README, and other
> such files appear first in directory listings, not mixed in with program
> files beginning with lower-case a-z.
>
> Of course you can use "00README" and the like on VMS, but I seem to remember
> this being the justification for "Makefile".

It is.

> ...
>
> Hugh.

toby

unread,

Oct 8, 2006, 12:34:43 PM10/8/06

to

Anton Ertl wrote:
> jsa...@ecn.ab.ca writes:
> > Unfortunately, an upper-case Greek alpha looks exactly like a
> >Latin capital A... but a lowercase alpha looks completely different.
>
> Actually, the way lower-case alphas are usually written in Greece, you
> could use them instead of a Latin a, and most people would not notice
> that it is an alpha and not a Latin a; it has the same form as
> the Latin a has in italic fonts (except that it is not slanted). You
> can see this nicely in
>
> http://www.complang.tuwien.ac.at/anton/tmp/xanthi.jpg

In that font, it has a similar form. Not always.

>
> My guess is that we only see a different style of writing alphas in
> our mathematics books in order to avoid confusion between alphas and
> as.

No, they were historically different. Using the Latin cursive-like form
is a rather modern thing, and likely you'd only see it in sans-serif
like that road sign (which is a highly specialised branch of typography
anyway).

>
> - anton
> --
> M. Anton Ertl Some things have to be seen to be believed
> an...@mips.complang.tuwien.ac.at Most things have to be believed to be seen
> http://www.complang.tuwien.ac.at/anton/home.html

toby

unread,

Oct 8, 2006, 1:12:46 PM10/8/06

to

PH wrote:
> Jan Vorbrüggen wrote:
> > Your basic premise, supported by empirical data, is that the _display_ of
> > textual data should be in mixed case. I strongly agree with that - having
> > been brought up in a language that uses upper-case initials for nouns
> > (German), I do see the advantage that brings in ease-of-reading (even if
> > it's a drag learning the orthography).
>
> Having enjoyed four years of German language education, I still fail
> to see how noun capitalization improves readability.

this is not hard to show with an analogy to proper nouns in english.
imagine english text where proper nouns and sentence beginnings were
never distinguished by capital letters.

>
> Peter

toby

unread,

Oct 8, 2006, 1:28:13 PM10/8/06

to

Nick Maclaren wrote:
> In article <1159249935.0...@i42g2000cwa.googlegroups.com>,

> |> Ofcourse, I had nothing to do with the original decision, but In 1977,
> |> I wrote a letter-to-editor of SIGPLAN NOTICES replying to a complaint
> |> about this, primarily for programming languages, but the same rationale
> |> applies to file names.
>
> Well, I am afraid that it is mistaken. There is a lot of truth in it,
> but it omits the converse reasons. As OscarWilde said, the truth is
> rarely pure and never simple.
>
> |> 1) At the time, the following had been confirmed by numerous studies:
> |> a) For legibility of text, mixed case > lower-case only > upper-case
> |> only.
>
> The order of the latter two depends almost entirely on the font. There
> are lots where it is very hard to distinguish many lower-case letters.
> nmm is frequently misread as nnm.

Unskilled typographic design.

>
> And lower-case in many scripts is a nightmare - perhaps the extreme is
> old German black letter, but many English ones are pretty bad. That is
> one of the main reasons that people use capitals when writing information
> that must be conveyed precisely. Yes, people DO write file names, URLs,
> passwords etc. for other people - and this is a BIG problem.

Many capital letters must be abandoned if you want to eliminate
ambiguity. E.g. airlines often do not label a B seat. One presumes the
disadvantage of breaking sequence is outweighed by the problems of
confusing the letter and digit (among the *myriad* complexities of
travel document handling).

>
> |> b) Mixed case is about 10-20% better than upper-case only.
>
> Well, yes and no. Once you include digits (you ARE allowing them, aren't
> you?), mixed case maximimises the number of ambiguities. Most computer
> fonts use a non-traditional '0' for that reason, but 'Il1' is a very
> common similarity that is VERY hard to spot.
>
> |> In the usual fonts, this happens because:
> |> a) lower case letters differ visually more than do UPPERCASE LETTERS,
> |> and so are more quickly recognnized - some are small, some are tall,
> |> some have descenders, whereas UPPERCASE IS ALL THE SAME SIZE AND THERE
> |> ARE NO DESCENDERS.
>
> Again, entirely font dependent.

Hardly. As stated earlier, this is a well known and well studied
principle in typography, ignored at your peril.

>
> |> b) MixedCase is even better than either monocase, because there is yet
> |> more visual variety, without having to write mixed_case of MIXED_CASE.
> |> Obviously, there is plenty of room for bad taste, but reasonable style
> |> standards have often taked advantage of the flexibility.
>
> Yes, it maximises the dispersion, but it also maximises the number of
> serious ambiguities. 'Il1' and 'O0o' are perhaps the worst, but 'ce'
> and 'Cc', 'Xx' and 'Zz' in the middle of digit strings (hence no scale)
> are also tricky.

Yes. But this is rarely a problem in English text (which includes rich
context redundancy). In computer languages, it's a very familiar issue.
But many good minds have applied themselves to it, and a well designed
code typeface minimises ambiguity.

>
> |> 2) Also, at the time, there were millions of lines of code of languages
> |> like PL/I, BAL, FORTRAN, COBOL, which originated as upper-case only,
> |> but were managed by people in lower or mixed case (on PWB/UNIX
> |> systems). Why? Because it was more readable, even though the code was
> |> translated to upper-case on the way over to an attached IBM mainframe.
>
> Because the Unix people typed their programs while holding a hamburger
> in one hand and so didn't like the shift key? :-)

Laziness is a virtue, according to Larry Wall - hence the short
commands too.

>
> More seriously, that predated Unix by a long way, and the vastly most
> common convention was commentary in mixed case and code in upper. That
> was generally (but not universally) agreed to lead to the least confusion
> among Autocode/Fortran/Algol 60 etc. programmers on systems that supported
> both cases. There was a period when upper case was favoured for keywords
> and lower for identifiers, and that also works well - but experience is
> that using mixed case for syntactic clarification is a mixed blessing.
>
> Stu Feldman also explained to me why he had made the original f2c take
> lower-case only, and it was nothing to do with its desirability.
>
> |> 3) I claimed that on modern systems that naturally supported both upper
> |> and lower case, compilers should treat them as the different
> |> characters they were (internal representation). I wrote:
> |>
> |> "This permits a wide choice of conventions that must forever remain
> |> unavailable if the language itself requires such symbols to be
> |> identical."
>
> Hmm. I recommend getting hold of the SEAS / SHARE Europe White Papers
> on this issue. Adding case is good enough for USA and modern UK English,
> but doesn't really help for even the other modern Latin-based languages.
> And it gets nowhere for ones that are not Latin-based. Case is not as
> simple a concept as you and most of our compatriots assume.

Further, alphabets are not. Luckily computer languages don't (yet?)
need to deal with its diversity in the way that general purpose digital
text processing must. Programmers in non-English speaking countries
typically code in English and comment natively without difficulty. That
technology has been largely incubated in English (McLuhan might argue a
phonetic alphabet is required) may be an indelible legacy.

>
> |> Of course, case-insensitivity also requires extra cycles to do
> |> case-insensitive string comparisons. As Dennis notes, it isn't a lot
> |> of code, but it costs cycles.
>
> ALL correct Unicode comparisons have that cost, even if monocase.
> Seriously.

>
> |> 4) So, when you start with a language (like C), where UPPER and upper

> |> are distinct ...

> |>
> |> It is quite consistent that filenames be case-sensitive as well.
>

> True. And so is the converse.
>
>
> Regards,
> Nick Maclaren.

toby

unread,

Oct 8, 2006, 1:41:26 PM10/8/06

to

Tim Bradshaw wrote:

> On 2006-09-26 06:52:15 +0100, "John Mashey" <old_sys...@yahoo.com> said:
>
> > 3) I claimed that on modern systems that naturally supported both upper
> > and lower case, compilers should treat them as the different
> > characters they were (internal representation). I wrote:
> >
> > "This permits a wide choice of conventions that must forever remain
> > unavailable if the language itself requires such symbols to be
> > identical."
>

> The fact that they are different characters is really an artifact of
> the representation system. If you'd asked someone in 1900 whether
> "fish" and "Fish" were the same word what would they have answered?
> Imagine going to the fishmonger and asking for some Fish: "oh no sir,
> we only sell fish here, you need to go to the Fishmonger down the
> road". Indeed you can write things like: "Please stop hogging the
> phone ... I said PLEASE stop hogging the PHONE!" where here "phone" and
> "PHONE" clearly refer to the same thing, you're just saying one more
> loudly.

BTW that is an entirely modern convention, in fact, a post-typewriter
convention or even post-word-processor. Capitals did not, in English at
least, have a traditional connotation of emphasis: They were
essentially titling, derived from their inscriptional basis (carved in
stone) versus the manuscript (pen/brush drawn) basis for cursive lower
case. (Emphasis of first resort in English is of course italic, and
these days, bold. In German, as far as I know, wide letter spacing
serves this purpose in body text, a convention never seen in English
typography.)

>
> Of course, you could (rightly) argue that programming languages (and
> file names) are inherently written things, so it's reasonable to make
> case distinctions matter. Against that is millions of years of
> evolution of the human brain which is optimised for understanding
> spoken language where case matters not at all.
>
> I also suspect that the `mixed-case is more readable (you probably mean
> readable rather than legible by the way, there's a subtle but important
> distinction, at least in typographical circles) is a bit of a myth.
> For instance, in written English there has been a fairly dramatic move
> over the last 200 years away from using lots of mixed-case to really
> rather minimal use of capitals: if you look at things written in the

This has happened in stages. As usual, much more changed between 1920
and 1970 than between 1700 and 1900 :-)

> mid 19th century they often have really a lot of capitalisation (but
> there was a lot of variability as well). Has that shift reduced
> readability? I suspect not. Written German is, I think, moving the
> same way (from a standard where all nouns are capitalised).
>
> There probably is an effect of mixed case (I'm sure there is), but I
> suspect it's far lower than other effects: spacing, for one. In
> "yellowbrickroad", "yellowBrickRoad" "yellow brick road" I think the
> difference between the third and the other two dwarfs the difference
> between the first two.

If one subscribes to the common "word shape" principle of reading
(which seems to have good basis) then there are significant
distinctions to be made between these. The first fails the word-shape
method of recognition. The second is recognised as a single shape
(perfect for an identifier; the three separate components mean nothing
in that context); and the third, three separate words, correct for
English, wrong for code. The second form - studlyCaps - can be argued
is the most functional on this basis.

> Of course computer languages make using spaces
> hard, but perhaps "yellow-brick-road" is OK.

Slightly better than the first, but worse in English than spaced words,
of course...

> Unfortunately almost all
> computer languages make even that hard, so you might have to resort to
> "yellow_brick_road" which takes me about twice as long to type on a
> normal keyboard, and doesn't read so well because I'm used to
> hyphenated words in written English.
>
> But again: computer languages aren't English, and studies based on
> English probably do not apply very well: people have high-performance
> parsers for English & can read perhaps a thousand words a minute:
> reading ten words a minute of a programming language would probably be
> rather good going.
>
> None of this is meant to imply that I'm against case-sensitivity, just
> that I think that the arguments are a lot more subtle than they often
> seem. (I think, actually, I'm in favour of case sensitivity, but I
> hate the sudlyCaps style it seems to have lead to.)
>
> --tim

Benny Amorsen

unread,

Oct 8, 2006, 3:48:41 PM10/8/06

to

>>>>> "toby" == toby <to...@telegraphics.com.au> writes:

toby> Many capital letters must be abandoned if you want to eliminate
toby> ambiguity. E.g. airlines often do not label a B seat. One
toby> presumes the disadvantage of breaking sequence is outweighed by
toby> the problems of confusing the letter and digit (among the
toby> *myriad* complexities of travel document handling).

I always thought the missing B seat was because some standard said
that window seats on the right are A, middle seats B, and aisle seats
C. Then in the rather common seat configuration with only 5 seats to a
row, the B seat is the one missing.

/Benny

toby

unread,

Oct 8, 2006, 4:46:56 PM10/8/06

to

You could be right. My theory is only guesswork (this is Usenet after
all), although I think the fact that the letter is *always* juxtaposed
with a digit may lend weight to it: think of B8 (or is that 8B?) There
is no control over the dozens of printers and display terminals in
unpredictable locations in the world which may critically print or
display a seat number for various purposes relating to just one trip...
mistakes are extremely costly in that business!

>
>
> /Benny

Bernd Paysan

unread,

Oct 8, 2006, 3:47:09 PM10/8/06

to

toby wrote:

> In German, as far as I know, wide letter spacing
> serves this purpose in body text, a convention never seen in English
> typography.)

I've only seen it only in very old books, and typewrited letters. Today,
with word processors, people just make text bold and underlined, under the
assumption that two (typographical) wrongs make a right ;-).

--
Bernd Paysan
"If you want it done right, you have to do it yourself"
http://www.jwdt.com/~paysan/

Bernd Paysan

unread,

Oct 8, 2006, 3:40:32 PM10/8/06

to

toby wrote:
>> Of course you can use "00README" and the like on VMS, but I seem to
>> remember this being the justification for "Makefile".
>
> It is.

But it's not a reason why a case sensitive file system is mandatory. As long
as the sort is case sensitive and the file system preserves case, README
and Makefile will show up first.

But with GNU ls and bash, this is all moot: They sort case insensitive on a
case sensitive file system. And with a modern fast scrollback terminal, you
want the important files last, not first ;-).

Richard

unread,

Oct 8, 2006, 11:15:05 PM10/8/06

to

[Please do not mail me a copy of your followup]

Bernd Paysan <bernd....@gmx.de> spake the secret code
<gq0ov3-...@vimes.paysan.nom> thusly:

>But with GNU ls and bash, this is all moot: They sort case insensitive on a
>case sensitive file system.

Since when? I just did a directory listing with ls (coreutils) 5.2.1
and it most certainly sorted case sensitive by default.
--
"The Direct3D Graphics Pipeline" -- DirectX 9 draft available for download
<http://www.xmission.com/~legalize/book/download/index.html>

Nicholas Miell

unread,

Oct 9, 2006, 12:35:06 AM10/9/06

to

On Mon, 09 Oct 2006 03:15:05 +0000, Richard wrote:

> [Please do not mail me a copy of your followup]
>
> Bernd Paysan <bernd....@gmx.de> spake the secret code
> <gq0ov3-...@vimes.paysan.nom> thusly:
>
>>But with GNU ls and bash, this is all moot: They sort case insensitive on a
>>case sensitive file system.
>
> Since when? I just did a directory listing with ls (coreutils) 5.2.1
> and it most certainly sorted case sensitive by default.

Depends on locale. C (aka POSIX) is case sensitive, others (like
en_US.UTF-8) are case insensitve.

Hugh McIntyre

unread,

Oct 9, 2006, 4:29:27 AM10/9/06

to

In article <gq0ov3-...@vimes.paysan.nom>,

Granted. Just a reason to capitalize these names given that the filesystem
upports case for other reasons.

|> But with GNU ls and bash, this is all moot: They sort case insensitive on a
|> case sensitive file system. And with a modern fast scrollback terminal, you
|> want the important files last, not first ;-).

This may depend on the settings of some of the LC_* environment variables.
Certainly, several people on Solaris have found out that if you set LANG
and LC_* to "C" then you get case-sensitive listings from "ls", but if
you pick a real language such as English then you get a case-blind sort order.

Differences in behaviour of strcoll() are apparently the underlying cause.

Hugh.

Anton Ertl

unread,

Oct 9, 2006, 4:34:22 AM10/9/06

to

Bernd Paysan <bernd....@gmx.de> writes:
>But with GNU ls and bash, this is all moot: They sort case insensitive on a
>case sensitive file system.

That's because of your locale (which is set to funny default values by
modern Linux distributions). Try

LC_COLLATE=C ls

> And with a modern fast scrollback terminal, you
>want the important files last, not first ;-).

LC_COLLATE=C ls -r

Terje Mathisen

unread,

Oct 9, 2006, 7:42:30 AM10/9/06

to

toby wrote:

> Benny Amorsen wrote:
>> I always thought the missing B seat was because some standard said
>> that window seats on the right are A, middle seats B, and aisle seats
>> C. Then in the rather common seat configuration with only 5 seats to a
>> row, the B seat is the one missing.
>
> You could be right. My theory is only guesswork (this is Usenet after
> all), although I think the fact that the letter is *always* juxtaposed
> with a digit may lend weight to it: think of B8 (or is that 8B?) There
> is no control over the dozens of printers and display terminals in
> unpredictable locations in the world which may critically print or
> display a seat number for various purposes relating to just one trip...
> mistakes are extremely costly in that business!

At least within SAS and Braathens, the Norwegian carriers, the B seat
was the one that went away when a given plane type went from 3+3 to 2+3
seats.

I.e. A & F are always window seats, C & D aisle.

I have also seen the same pattern on other carriers.

Afaik, this has nothing to do with font legibility.

Terje
--
- <Terje.M...@hda.hydro.com>
"almost all programming can be viewed as an exercise in caching"

Nick Maclaren

unread,

Oct 9, 2006, 8:19:08 AM10/9/06

to

In article <1160328493....@h48g2000cwc.googlegroups.com>,

"toby" <to...@telegraphics.com.au> writes:
|> >
|> > |> 1) At the time, the following had been confirmed by numerous studies:
|> > |> a) For legibility of text, mixed case > lower-case only > upper-case
|> > |> only.
|> >
|> > The order of the latter two depends almost entirely on the font. There
|> > are lots where it is very hard to distinguish many lower-case letters.
|> > nmm is frequently misread as nnm.
|>
|> Unskilled typographic design.

Well, sometimes. Most fonts are designed for displaying natural languages,
and that ambiguity is extremely rare in English. Even when it is, my
general point stands.

|> > |> In the usual fonts, this happens because:
|> > |> a) lower case letters differ visually more than do UPPERCASE LETTERS,
|> > |> and so are more quickly recognnized - some are small, some are tall,
|> > |> some have descenders, whereas UPPERCASE IS ALL THE SAME SIZE AND THERE
|> > |> ARE NO DESCENDERS.
|> >
|> > Again, entirely font dependent.
|>
|> Hardly. As stated earlier, this is a well known and well studied
|> principle in typography, ignored at your peril.

Well, I have seen dozens of fonts where lower-case has few or even no
descenders. Indeed, this is being displayed in one now.

And I have seen a fair number of fonts with the upper-case having
descenders, too, though mostly fairly old fonts.

|> > I also suspect that the `mixed-case is more readable (you probably mean
|> > readable rather than legible by the way, there's a subtle but important
|> > distinction, at least in typographical circles) is a bit of a myth.
|> > For instance, in written English there has been a fairly dramatic move
|> > over the last 200 years away from using lots of mixed-case to really
|> > rather minimal use of capitals: if you look at things written in the
|>
|> This has happened in stages. As usual, much more changed between 1920
|> and 1970 than between 1700 and 1900 :-)

Er, no. It is debatable whether that is true in this case, but it
assuredly isn't in the matter of English spelling and syntax.

Regards,
Nick Maclaren.

Torbjorn Lindgren

unread,

Oct 9, 2006, 8:37:54 AM10/9/06

to

Terje Mathisen <terje.m...@hda.hydro.com> wrote:
>toby wrote:
>> Benny Amorsen wrote:
>>> I always thought the missing B seat was because some standard said
>>> that window seats on the right are A, middle seats B, and aisle seats
>>> C. Then in the rather common seat configuration with only 5 seats to a
>>> row, the B seat is the one missing.
>>
>> You could be right. My theory is only guesswork (this is Usenet after
>> all), although I think the fact that the letter is *always* juxtaposed
>> with a digit may lend weight to it: think of B8 (or is that 8B?) There
>> is no control over the dozens of printers and display terminals in
>> unpredictable locations in the world which may critically print or
>> display a seat number for various purposes relating to just one trip...
>> mistakes are extremely costly in that business!
>
>At least within SAS and Braathens, the Norwegian carriers, the B seat
>was the one that went away when a given plane type went from 3+3 to 2+3
>seats.
>
>I.e. A & F are always window seats, C & D aisle.
>I have also seen the same pattern on other carriers.

That seems to be the pattern for all single aisle aircrafts I've seen
so far, you can have missing letters in other places depending on
seating configuration (ie a 2+2 with A+C/D+F seats).

I suspect this is caused by a desire for commonality and 3+3 seats was
selected as an "unlikely to be exceeded" value... (since they have to
be able to use it and get people in and out).

On wide-bodies (with two aisles) I've seen "missing letters" in all
three "sides", one example I flew semi-recently used 3+4+3 based
labeling, but had a 2+2+2/2+3+2/2+4+2 configuration (so all but
Economy had "missing" seats in both the sides and the middle part, but
I don't remember WHICH seats was missing in the middle).

Again, going above 3+4+3 starts to present problems, so it would kind
of sense to standardize on that. Wikipedia says A380 economy is 3+4+3
and 2+4+2 (lower/upper), so even that one works.

>Afaik, this has nothing to do with font legibility.

That's my read on it too, commonality makes much more sense here.

If my suspicion is correct (that everyone does it the same way) it
also makes the travel agents work MUCH simpler since they don't have
to look that specific flight or even airline, seat letter plus single
vs dual isle is the only data you need. This could very well be the
most important factor...

Torben Ægidius Mogensen

unread,

Oct 9, 2006, 10:34:08 AM10/9/06

to

nm...@cus.cam.ac.uk (Nick Maclaren) writes:

> In article <1160328493....@h48g2000cwc.googlegroups.com>,
> "toby" <to...@telegraphics.com.au> writes:
> |> >
> |> > |> 1) At the time, the following had been confirmed by numerous studies:
> |> > |> a) For legibility of text, mixed case > lower-case only > upper-case
> |> > |> only.
> |> >
> |> > The order of the latter two depends almost entirely on the font. There
> |> > are lots where it is very hard to distinguish many lower-case letters.
> |> > nmm is frequently misread as nnm.
> |>
> |> Unskilled typographic design.
>
> Well, sometimes. Most fonts are designed for displaying natural languages,
> and that ambiguity is extremely rare in English. Even when it is, my
> general point stands.

While nmm and nnm are unusual in natural languages, the combination rn
is fairly common and is often misread as m. I would not call it
unskilled typographic design to make this confusion possible, as
typographical tradition makes this the case, so I would blame
tradition instead. It is even worse in gothic typefaces, where f and
s look almost identical. It is certainly possible to design legible
typefaces that reduce such ambiguity, but that will go against
typographical traditions.

> |> > |> In the usual fonts, this happens because:
> |> > |> a) lower case letters differ visually more than do UPPERCASE LETTERS,
> |> > |> and so are more quickly recognnized - some are small, some are tall,
> |> > |> some have descenders, whereas UPPERCASE IS ALL THE SAME SIZE AND THERE
> |> > |> ARE NO DESCENDERS.
> |> >
> |> > Again, entirely font dependent.
> |>
> |> Hardly. As stated earlier, this is a well known and well studied
> |> principle in typography, ignored at your peril.
>
> Well, I have seen dozens of fonts where lower-case has few or even no
> descenders. Indeed, this is being displayed in one now.

While I can't see what typeface you use for reading news, most fonts I
know of have descenders for g, j, p, q and y. Most italic fonts
additionally have descenders for f. Gothic fonts often have
descenders for x and y.

> And I have seen a fair number of fonts with the upper-case having
> descenders, too, though mostly fairly old fonts.

Indeed. F, J, P and Y being the most common that I recall.

> |> > I also suspect that the `mixed-case is more readable (you probably mean
> |> > readable rather than legible by the way, there's a subtle but important
> |> > distinction, at least in typographical circles) is a bit of a myth.
> |> > For instance, in written English there has been a fairly dramatic move
> |> > over the last 200 years away from using lots of mixed-case to really
> |> > rather minimal use of capitals: if you look at things written in the
> |>
> |> This has happened in stages. As usual, much more changed between 1920
> |> and 1970 than between 1700 and 1900 :-)
>
> Er, no. It is debatable whether that is true in this case, but it
> assuredly isn't in the matter of English spelling and syntax.

My own mother tounge, Danish, has changed quite a bit, though: Books
were almost exclusively printed in Gothic/Fraktur until 1875 and it
was not until after 1900 that the Roman typefaces became universal.
The last newspaper to change was Berlingske Tidende, which didn't
change until 1903. Even after the change, nouns were capitalized (as
in German), and Å/å was written as Aa/aa, though there had been a
movement since around 1870 to change this. The change wasn't
officially made until 1948, where the wish to distance the language
from German probably was influential. We have had several
modifications of comma rules since then, the most recent being in
2004, and there are continous changes to what is deemed correct
spelling.

Torben

Nick Maclaren

unread,

Oct 9, 2006, 10:46:38 AM10/9/06

to

In article <7z7iz9i...@app-6.diku.dk>,

tor...@app-6.diku.dk (=?iso-8859-1?q?Torben_=C6gidius_Mogensen?=) writes:
|> > Well, I have seen dozens of fonts where lower-case has few or even no
|> > descenders. Indeed, this is being displayed in one now.
|>
|> While I can't see what typeface you use for reading news, most fonts I
|> know of have descenders for g, j, p, q and y. Most italic fonts
|> additionally have descenders for f. Gothic fonts often have
|> descenders for x and y.

That's only 5 descenders in most; traditionally, f and z had, too,
and once upon a time s did in some positions. I have seen others
(l, if I recall) as well. I am not sure exactly what ambiguities the
descenders solve, anyway.

|> > And I have seen a fair number of fonts with the upper-case having
|> > descenders, too, though mostly fairly old fonts.
|>
|> Indeed. F, J, P and Y being the most common that I recall.

Yup.

|> My own mother tounge, Danish, has changed quite a bit, though: Books
|> were almost exclusively printed in Gothic/Fraktur until 1875 and it
|> was not until after 1900 that the Roman typefaces became universal.

As you say, a real pain to disambiguate. Did you use it in writing,
too? Well, not you personally, unless you are implausibly old :-)

|> and there are continous changes to what is deemed correct
|> spelling.

English, of course, has never worked out what that would mean :-)

Regards,
Nick Maclaren.

Andrew Reilly

unread,

Oct 10, 2006, 12:12:43 AM10/10/06

to

Telling the system to write a file with a specific name *is* telling it
expressly. Continually, and annoyingly, being second-guessed by the OS is
the main reason that I can't stand to use Windows, myself. Obviously,
others' milage varies...

Yes, I run the home partition on my Mac HFS+ file system in case sensitive
mode, even though case-preserving case insensitive is the default. Keeps
me sane. (I have discovered that I can't run the system partition case
sensitive because the several Microsoft products that I use break if I try...)

Cheers,

--
Andrew

Torben Ægidius Mogensen

unread,

Oct 10, 2006, 4:11:54 AM10/10/06

to

nm...@cus.cam.ac.uk (Nick Maclaren) writes:

> In article <7z7iz9i...@app-6.diku.dk>,
> tor...@app-6.diku.dk (=?iso-8859-1?q?Torben_=C6gidius_Mogensen?=) writes:

> |> My own mother tounge, Danish, has changed quite a bit, though: Books
> |> were almost exclusively printed in Gothic/Fraktur until 1875 and it
> |> was not until after 1900 that the Roman typefaces became universal.
>
> As you say, a real pain to disambiguate. Did you use it in writing,
> too? Well, not you personally, unless you are implausibly old :-)

Gothic handwriting (which looks nothing like Gothic print, see
http://www.sa.dk/sa/brugearkiver/gotisk/1800haand.htm) was taught in
schools as the default handwriting until 1875, when it was replaced by
a more modern cursive writing. Gothic lower case has many
letters that are easily confused, such as c/i and f/h/s.

When I went to school in the 1970s, cursive was replaced by
"formskrift", a writing of connected non-cursive letters. You can see
a sample of Gothic, cursive and formskrift as used in Denmark on
http://www.sa.dk/sa/boern/skrift/3slags.htm .

I don't use either today, but use (mostly) separated letters when
writing by hand (which I do only for personal notes and postcards).

> |> and there are continous changes to what is deemed correct
> |> spelling.
>
> English, of course, has never worked out what that would mean :-)

If you are thinking of the lack of correspondence between spelling and
pronounciation, Danish is as bad as English.

Torben

Nick Maclaren

unread,

Oct 10, 2006, 4:34:19 AM10/10/06

to

In article <7zac44m...@app-0.diku.dk>,
tor...@app-0.diku.dk (=?iso-8859-1?q?Torben_=C6gidius_Mogensen?=) writes:

|> nm...@cus.cam.ac.uk (Nick Maclaren) writes:
|>
|> Gothic handwriting (which looks nothing like Gothic print, see
|> http://www.sa.dk/sa/brugearkiver/gotisk/1800haand.htm) was taught in
|> schools as the default handwriting until 1875, when it was replaced by
|> a more modern cursive writing. Gothic lower case has many
|> letters that are easily confused, such as c/i and f/h/s.

It's better than the script I was shown (which, I believe, was German
from about that era)! If I recall, a good half dozen letters were
made up of identical vertical strokes, which were run together, and
n and u were actually identical.

|> > |> and there are continous changes to what is deemed correct
|> > |> spelling.
|> >
|> > English, of course, has never worked out what that would mean :-)
|>
|> If you are thinking of the lack of correspondence between spelling and
|> pronounciation, Danish is as bad as English.

Nope. The lack of any accepted meaning on what the word 'correct' means.

Regards,
Nick Maclaren.

toby

unread,

Oct 10, 2006, 5:18:11 AM10/10/06

to

Terje Mathisen wrote:
> toby wrote:
> > Benny Amorsen wrote:
> >> I always thought the missing B seat was because some standard said
> >> that window seats on the right are A, middle seats B, and aisle seats
> >> C. Then in the rather common seat configuration with only 5 seats to a
> >> row, the B seat is the one missing.
> >
> > You could be right. My theory is only guesswork (this is Usenet after
> > all), although I think the fact that the letter is *always* juxtaposed
> > with a digit may lend weight to it: think of B8 (or is that 8B?) There
> > is no control over the dozens of printers and display terminals in
> > unpredictable locations in the world which may critically print or
> > display a seat number for various purposes relating to just one trip...
> > mistakes are extremely costly in that business!
>
> At least within SAS and Braathens, the Norwegian carriers, the B seat
> was the one that went away when a given plane type went from 3+3 to 2+3
> seats.
>
> I.e. A & F are always window seats, C & D aisle.

Well, yesterday my two Lufthansa flights were A/B-C/D and (A340)
A/C-D/E/F/G-H/K, which seems to be avoiding I and J for no other
obvious reason, but maybe I'm seeing through typographic glasses.
Clearly the situation is rather complicated.

>
> I have also seen the same pattern on other carriers.
>
> Afaik, this has nothing to do with font legibility.

I wouldn't have put it that way either. I'd have said that certain
Roman capitals are inherently confusable (with each other or with
digits) and this wouldn't be the only system with special rules to
prevent misreading. I've seen other code sequences that drop letters
for this reason. To get back OT, it's established good style to avoid
these in code too.

ken...@cix.compulink.co.uk

unread,

Oct 10, 2006, 5:51:30 AM10/10/06

to

In article <1160328493....@h48g2000cwc.googlegroups.com>,
to...@telegraphics.com.au (toby) wrote:

> Many capital letters must be abandoned if you want to eliminate
> ambiguity. E.g. airlines often do not label a B seat.

Car number plates in Britain do not use B, I, O or Z.

Ken Young

toby

unread,

Oct 10, 2006, 7:00:55 AM10/10/06

to

Torbjorn Lindgren wrote:
> Terje Mathisen <terje.m...@hda.hydro.com> wrote:
> >toby wrote:
> >> Benny Amorsen wrote:
> >>> I always thought the missing B seat was because some standard said
> >>> that window seats on the right are A, middle seats B, and aisle seats
> >>> C. Then in the rather common seat configuration with only 5 seats to a
> >>> row, the B seat is the one missing.
> >>

> >> You could be right. My theory is only guesswork ...

> >
> >At least within SAS and Braathens, the Norwegian carriers, the B seat
> >was the one that went away when a given plane type went from 3+3 to 2+3
> >seats.
> >
> >I.e. A & F are always window seats, C & D aisle.
> >I have also seen the same pattern on other carriers.
>
> That seems to be the pattern for all single aisle aircrafts I've seen
> so far, you can have missing letters in other places depending on
> seating configuration (ie a 2+2 with A+C/D+F seats).
>
> I suspect this is caused by a desire for commonality and 3+3 seats was
> selected as an "unlikely to be exceeded" value... (since they have to
> be able to use it and get people in and out).
>
> On wide-bodies (with two aisles) I've seen "missing letters" in all
> three "sides", one example I flew semi-recently used 3+4+3 based
> labeling, but had a 2+2+2/2+3+2/2+4+2 configuration (so all but
> Economy had "missing" seats in both the sides and the middle part, but
> I don't remember WHICH seats was missing in the middle).
>
> Again, going above 3+4+3 starts to present problems, so it would kind
> of sense to standardize on that. Wikipedia says A380 economy is 3+4+3
> and 2+4+2 (lower/upper), so even that one works.

At http://www.seatguru.com , I looked at the Lufthansa A340 and LH
747-400. The Boeing already supports a 3-4-3 configuration, which looks
like ABC-DEFG-HJK. In Business this shrinks to AC-DEG-HK, so the aisle
seats remain consistent. (All compatible with their A340 numbering.)
But Qantas uses the B designation on 747-400 in a 2-4-2 layout, and
British Airways labels theirs ABC-DEFG-HJK.

However Swiss labels their A340 AB-DEFG-JK - different again.

Out of interest, an Air Canada 767-300 is incompatibly AB-DEF-HK but
their A340 is labelled exactly as Lufthansa.

Alitalia's 777 stretches to ABC-DEG-JKL, so K's not a window any more.

>
> >Afaik, this has nothing to do with font legibility.

Probably you're right, and I jumped to the wrong conclusion, though I
still wonder if it's sometimes a factor for B and I.

>
> That's my read on it too, commonality makes much more sense here.
> If my suspicion is correct (that everyone does it the same way)
> it also makes the travel agents work MUCH simpler since they don't have
> to look that specific flight or even airline, seat letter plus single
> vs dual isle is the only data you need.

You can't rely on much between different planes or airlines except that
A and L are window. B is never window. C is always aisle. But D can be
either aisle or window in 1-aisle layouts (2-2 versus 3-3) although
perhaps always aisle in 2-aisle. Letters are inconsistently omitted,
although it might be safe to say that G, H (but not J) are always
aisle. And so on. You'd have to use seatguru.com to be sure :)

R.A.Omond

unread,

Oct 10, 2006, 7:10:09 AM10/10/06

to

Don't know where you get that idea, but it's not true.

toby

unread,

Oct 10, 2006, 9:05:32 AM10/10/06

to

I can't speak for Britain, but I know elsewhere letters (particularly O
and I) are sometimes omitted.

Bernd Paysan

unread,

Oct 10, 2006, 9:20:00 AM10/10/06

to

Nick Maclaren wrote:

>
> In article <7zac44m...@app-0.diku.dk>,
> tor...@app-0.diku.dk (=?iso-8859-1?q?Torben_=C6gidius_Mogensen?=) writes:
> |> nm...@cus.cam.ac.uk (Nick Maclaren) writes:
> |>
> |> Gothic handwriting (which looks nothing like Gothic print, see
> |> http://www.sa.dk/sa/brugearkiver/gotisk/1800haand.htm) was taught in
> |> schools as the default handwriting until 1875, when it was replaced by
> |> a more modern cursive writing. Gothic lower case has many
> |> letters that are easily confused, such as c/i and f/h/s.
>
> It's better than the script I was shown (which, I believe, was German
> from about that era)! If I recall, a good half dozen letters were
> made up of identical vertical strokes, which were run together, and
> n and u were actually identical.

You probably mean Sütterlin (see
http://www.suetterlinschrift.de/Lese/Sutterlin0.htm). It looks remarkable
close to the "Gothic" handwriting. I don't know why it's called "gothic",
because the real handwriting of that time was much easier to read (ok, but
that's compensated by the difficulty to *understand* what you read ;-).

http://www.phil-gesch.uni-hamburg.de/edition/Palaeographie/38diegotischeminuskel.html

dcw

unread,

Oct 10, 2006, 9:40:02 AM10/10/06

to

In article <egfv2i$je4$1$8300...@news.demon.co.uk>,
R.A.Omond <Roy....@BlueBubble.UK.Com> wrote:
>ken...@cix.compulink.co.uk wrote:

>> Car number plates in Britain do not use B, I, O or Z.
>
>Don't know where you get that idea, but it's not true.

Certain letters were omitted as date letters in UK number
plates: I, O, U, and Z. Q was used for special purposes.
It looks as if confusability with digits was a consideration,
but I don't understand U and not B or S.

David

Torben Ægidius Mogensen

unread,

Oct 10, 2006, 10:14:07 AM10/10/06

to

ken...@cix.compulink.co.uk writes:

When in Russia I noticed that the letters used on license plates are
all in the intersection (of shapes, not sounds) between Cyrillic and
Latin, i.e., A, B (=V), C (=S), E, H (=N), K, M, O, P (=R), T, V, X
(=CH), Y (=U). This was not the case for the old soviet license
plates.

Torben

dg...@barnowl.research.intel-research.net

unread,

Oct 10, 2006, 1:35:58 PM10/10/06

to

The same changed happened in Bulgaria - as a result they had to change
some of the location codes (C=Sofia, etc) used on the number
plates. Presumably makes the police/customs in the latin-alphabet-part
of Europe happier ;-)

--
David Gay
dg...@acm.org