Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
case-sensitivity and identifiers (was Re: Wide character implementation)
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  Messages 51 - 75 of 160 - Collapse all  -  Translate all to Translated (View all originals) < Older  Newer >
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
Dorai Sitaram  
View profile  
 More options Mar 25 2002, 5:13 pm
Newsgroups: comp.lang.lisp
From: d...@goldshoe.gte.com (Dorai Sitaram)
Date: 25 Mar 2002 22:13:50 GMT
Local: Mon, Mar 25 2002 5:13 pm
Subject: Re: case-sensitivity and identifiers (was Re: Wide character implementation)
In article <m3zo0wuzsm....@elgin.eder.de>,
Andreas Eder  <Andreas.E...@t-online.de> wrote:

To me, that case is indeed ornamental is supported by
the fact that it appears to be permissible to
upper-case a German sentence in its entirety
without construing it as a loss of information.

BITTE EIN BIT
ICH BIN EIN BERLINER
DIE MAUER MUSS WEG!

usw.

Ie, things like titles, slogans, and billboards, but
also consider the GPL or other license text in the
German, where large globs of the prose are in all caps.
Legal prose, it seems to me, would especially not court
information loss in this manner if it was felt there
really was a risk.

I'm curious: Is there an example, however
frivolous, where WEG in an all-caps sentence
could be ambiguous?

BTW, the {Weg, weg} pair seems very like the {produce
(noun), produce (verb)} pair in English.  Like Weg/weg,
produce/produce are pronounced differently.
However, they  don't rely on capitalization, even
though the grammatical context used to disambiguate
between them has fewer cues than the German.

--d


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Pierre R. Mai  
View profile  
 More options Mar 25 2002, 5:31 pm
Newsgroups: comp.lang.lisp
From: "Pierre R. Mai" <p...@acm.org>
Date: 25 Mar 2002 22:44:54 +0100
Local: Mon, Mar 25 2002 4:44 pm
Subject: Re: case-sensitivity and identifiers (was Re: Wide character implementation)

Matthias Blume <matth...@shimizu-blume.com> writes:
> By the way, here is an example in a case-sensitive natural language
> where the distinction between uppercase and lowercase gets
> *pronounced*: "mit" vs. "MIT" in German.  The first means "with" and is
> pronounced like "mitt", the second is the Massachussetts Institute of
> Technology and is pronounced like speakers of English would pronounce
> it: em-ay-tee.  I think that there are enough examples of this around

This is "supremely silly", if there is such a thing, even ignoring for
the time that MIT is neither a german word, nor a german abbreviation,
and that probably a large number of german speakers will not recognize
MIT as standing for "the" MIT, nor pronounce it as speakers of English
would.  The different pronounciation of mit vs. MIT doesn't result
from the difference in case, at all.  If you receive a telex that
informs you of an invitation to "the mit", you will pronounce "mit"
just as you would "MIT".  qed.

Of course that doesn't mean that case should be completely ignored, it
just means that case is just another attribute of text, like fonts,
and that there is little reason to encode it in the character.

It also means that you want to distinguish between mit (with) and MIT
(the institute) not based on case, but based on packages, i.e.

(and (not (eq 'german-words:mit 'universities:mit))
     ;; And now an example where case will not help in disambiguation
     ;; namely the sequence "tub", standing for both the english word
     ;; tub and the common abbreviation for the Technische Universität
     ;; Berlin
     (not (eq 'english-words:tub 'universities:tub)))

Regs, Pierre.

--
Pierre R. Mai <p...@acm.org>                    http://www.pmsf.de/pmai/
 The most likely way for the world to be destroyed, most experts agree,
 is by accident. That's where we come in; we're computer professionals.
 We cause accidents.                           -- Nathaniel Borenstein


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Matthias Blume  
View profile  
 More options Mar 25 2002, 6:10 pm
Newsgroups: comp.lang.lisp
From: Matthias Blume <matth...@shimizu-blume.com>
Date: 25 Mar 2002 18:00:57 -0500
Local: Mon, Mar 25 2002 6:00 pm
Subject: Re: case-sensitivity and identifiers (was Re: Wide character implementation)

d...@goldshoe.gte.com (Dorai Sitaram) writes:
> I'm curious: Is there an example, however
> frivolous, where WEG in an all-caps sentence
> could be ambiguous?

Yes, there is a joke about a stupid person who tries to figure out
which street he is in and comes up with

   "We are on the trail with the nukes."

because he misread the slogan

   "WEG MIT DEN ATOMWAFFEN"   (meaning "GET RID OF THE NUKES")

as a streetsign.

> BTW, the {Weg, weg} pair seems very like the {produce
> (noun), produce (verb)} pair in English.  Like Weg/weg,
> produce/produce are pronounced differently.

In this case, there is at best a very remote semantic relationship (if
any).  It is definitely nowhere near a noun/verb sort of thing.

Matthias


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Kent M Pitman  
View profile  
 More options Mar 25 2002, 6:47 pm
Newsgroups: comp.lang.lisp
From: Kent M Pitman <pit...@world.std.com>
Date: Mon, 25 Mar 2002 23:46:19 GMT
Local: Mon, Mar 25 2002 6:46 pm
Subject: Re: case-sensitivity and identifiers (was Re: Wide character implementation)

Yes, but this kind of confusion can happen whether case is involved or not,
and I think it's not fair to ascribe it to case as the principal cause.
We have signs on our highways that say "FINE FOR LITTERING".  Writing
them in lowercase won't help. ;-)

> > BTW, the {Weg, weg} pair seems very like the {produce
> > (noun), produce (verb)} pair in English.  Like Weg/weg,
> > produce/produce are pronounced differently.

> In this case, there is at best a very remote semantic relationship (if
> any).  It is definitely nowhere near a noun/verb sort of thing.

There is a phenomenon in English speech wherein stress matters, too,
and we sometimes italicize not just to control emphasis but to
actively disambiguate.  A prime example of this is an effect called
anaphoric de-stressing (that is, lessening stress in order to turn a
reference into an anaphoric reference--that is, a reference to a previously
noun entity--instead of a non-anaphoric referenc--, that is, a reference to a
newly introduced entity).  The example I've seen is a story of a newsreader
misreading an account of how a man, upon hearing his wife had had an affair
with another man, had said he wanted to shoot the bastard.  (Note how the
sentence changes meaning, depending on whether if put stress on _shoot_
or on _bastard_.)  Written Englsh doesn't mark this distinction in writing,
even though it's present and my some stretch important in spoken English.  
People figure it out.

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Erik Naggum  
View profile  
 More options Mar 25 2002, 7:35 pm
Newsgroups: comp.lang.lisp
From: Erik Naggum <e...@naggum.net>
Date: Tue, 26 Mar 2002 00:35:09 GMT
Local: Mon, Mar 25 2002 7:35 pm
Subject: Re: case-sensitivity and identifiers (was Re: Wide character implementation)
* Matthias Blume
| I was under the impression that you thought you already did. :-)

  Wipe that moronic grin off your face, dimwit.  What your retarded
  impression of other people might be should not concern anybody else.
  Such despicably stupid behavior should have been punished by people who
  cared about you.  Why have they not?

| To be frank, I do not care *one bit* about what this discussion was
| originally about.

  Of course not.  Moronic grins are a pretty strong indicator of impaired
  mental capacity, starting with the sheer inability to take other people
  seriously.

| I was merely commenting on your claim about capitalization being
| "incidental".  The debate of whether or not case-sensitive identifiers in
| programming languages are Good or Evil, or which character set design use
| up more bits than others, etc., bore me.

  I tried to suggest _strongly_ that you should go back to daytime TV, but
  did you get it?  No.  How amazingly dense you must be.

///
--
  In a fight against something, the fight has value, victory has none.
  In a fight for something, the fight is a loss, victory merely relief.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Discussion subject changed to "Back to character set implementation thinking" by Erik Naggum
Erik Naggum  
View profile  
 More options Mar 25 2002, 8:34 pm
Newsgroups: comp.lang.lisp, comp.lang.scheme
From: Erik Naggum <e...@naggum.net>
Date: Tue, 26 Mar 2002 01:34:19 GMT
Local: Mon, Mar 25 2002 8:34 pm
Subject: Re: Back to character set implementation thinking
* Thomas Bushnell, BSG
| The GNU/Linux world is rapidly converging on using UTF-8 to hold 31-bit
| Unicode values.  Part of the reason it does this is so that existing byte
| streams of Latin-1 characters can (pretty much) be used without
| modification, and it allows "soft conversion" of existing code, which is
| quite easy and thus helps everybody switch.

  UTF-8 is in fact extreemly hostile to applications that would otherwise
  have dealt with ISO 8859-1.  The addition of a prefix byte has some very
  serious implications.  UTF-8 is an inefficient and stupid format that
  should never have been proposed.  However, it has computational elegance
  in that it is a stateless encoding.  I maintain that encoding is stateful
  regardless of whether it is made explicit or not.  I therefore strongly
  suggest that serious users of Unicode employ the compression scheme that
  has been described in Unicode Technical Report #6.  I recommend reading
  this technical report.

  Incidentally, if I could design things all over again, I would most
  probably have used a pure 16-bit character set from the get-go.  None of
  this annoying 7- or 8-bit stuff.  Well, actually, I would have opted for
  more than 16-bit units -- it is way too small.  I think I would have
  wanted the smallest storage unit of a computer to be 20 bits wide.  That
  would have allowed addressing of 4G of today's bytes with only 20 bits.
  But I digress...

| So even if strings are "compressed" this way, they are not UTF-8.  That's
| Right Out.  They are just direct UCS values.  Procedures like string-set!
| therefore might have to inflate (and thus copy) the entire string if a
| value outside the range is stored.  But that's ok with me; I don't think
| it's a serious lose.

  There is some value to the C/Unix concept of a string as a small stream.
  Most parsing of strings needs to parse so from start to end, so there is
  no point in optimizing them for direct access.  However, a string would
  then be different from a vector of characters.  It would, conceptually,
  be more like a list of characters, but with a more compact encoding, of
  course.  Emacs MULE, with all its horrible faults, has taken a stream
  approach to character sequences and then added direct access into it,
  which has become amazingly expensive.

  I believe that trying to make "string" both a stream and a vector at the
  same time is futile and only leads to very serious problems.  The default
  representation of a string should be stream, not a vector, and accessors
  should use the stream, such as with make-string-{input,output}-stream,
  with new operators like dostring, instead of trying to use the string as
  a vector when it clearly is not.  The character concept needs to be able
  to accomodate this, too.  Such pervasive changes are of course not free.

| Ok, then the second question is about combining characters.  Level 1
| support is really not appropriate here.  It would be nice to support
| Level 3.  But perhaps Level 2 with Hangul Jamo characters [are those
| required for Level 2?] would be good enough.

  Level 2 requires every other combining character except Hangul Jamo.

| It seems to me that it's most appropriate to use Normalization Form D.

  I agree for the streams approach.  I think it is important to make sure
  that there is a single code for all character sequences in the stream
  when it is converted to a vector.  The private use space should be used
  for these things, and a mapping to and from character sequences should be
  maintained such that if a private use character is queried for its
  properties, those of the character sequence would be returned.

| Or is that crazy?  It has the advantage of holding all the Level 3 values
| in a consistent way.  (Since precombined characters do not exist for all
| possibilities, Normalization Form C results in some characters
| precombined and some not, right?)

  Correct.

| And finally, should the Lisp/Scheme "character" data type refer to a
| single UCS code point, or should it refer to a base character together
| with all the combining characters that are attached to it?

  Primarily the code point, but both, effectively, by using the private use
  space as outlined above.

///
--
  In a fight against something, the fight has value, victory has none.
  In a fight for something, the fight is a loss, victory merely relief.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Discussion subject changed to "case-sensitivity and identifiers (was Re: Wide character implementation)" by Erik Naggum
Erik Naggum  
View profile  
 More options Mar 25 2002, 8:53 pm
Newsgroups: comp.lang.lisp
From: Erik Naggum <e...@naggum.net>
Date: Tue, 26 Mar 2002 01:53:11 GMT
Local: Mon, Mar 25 2002 8:53 pm
Subject: Re: case-sensitivity and identifiers (was Re: Wide character implementation)
* Michael Parker
| OTOH, if terminals had gotten color and typefaces earlier, maybe
| programming languages would have evolved to use them.

  Only if we had also had a stateless coding for them, statefulness being
  so frigthening to the kinds of programmers who are likely to invent new
  syntaxes.

| Maybe give each namespace its own color, so you would specify the value
| of a name by putting it in blue, the function by using red, keywords in
| italics, macros in green.  The mind boggles at the possibilities.

  Especially if they also used XML to write it all, and then we can use
  cascading style sheets to control both background and foreground color.
  And programmers would have be selected from those who are not color
  blind.  This is unlikely to succeed, since the current selection from
  those who can spell has not been successful, either, and that is at least
  something you can learn.

  Thanks for the URL, though.  My mind boggles at statements like these:
  "With the huge RAM of modern computers, an operating system is no longer
  necessary."

///
--
  In a fight against something, the fight has value, victory has none.
  In a fight for something, the fight is a loss, victory merely relief.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Erik Naggum  
View profile  
 More options Mar 25 2002, 9:21 pm
Newsgroups: comp.lang.lisp
From: Erik Naggum <e...@naggum.net>
Date: Tue, 26 Mar 2002 02:21:25 GMT
Local: Mon, Mar 25 2002 9:21 pm
Subject: Re: case-sensitivity and identifiers (was Re: Wide character implementation)
* Matthias Blume
| The two words are pronounced very differently.

  But so is house and house, distinguished by a voiced and unvoiced s.
  Some languages also have tonemes, not just phonemes.  Norwegian is among
  them.  The phonemes of the Noreegian words for "farmers", "prayers" and
  "beans" are the same, but the tonemes differ.  Immigrants often have
  farmers for dinner and purchase produce directly from beans as a result.
  The word for "farmers" is spelled "břnder" but "beans" and "prayers" are
  both spelled "břnner".  Note that this is not a question of stress.  All
  three stress the first syllable exactly the same, and do not stress the
  final syllable.

| Anyway, this whole debate is supremely silly, IMHO.

  Then you are supremely silly who continue to post your drivel to it.

| Fortunately neither you nor Erik get to dictate the rules, at least not
| for those languages that I speak or program in...

  OF course, you are a Scheme freak and a tourist in comp.lang.lisp, the
  very canonicalization of the irresponsible trouble-maker who thinks he is
  an outsider to the community he torments with "you are silly who do it
  differently from me" attitudes.  Thank you for contributing to the
  _impression_ that Scheme is the language of choice of deranged lunatics.

///
--
  In a fight against something, the fight has value, victory has none.
  In a fight for something, the fight is a loss, victory merely relief.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Thomas Bushnell, BSG  
View profile  
 More options Mar 25 2002, 9:30 pm
Newsgroups: comp.lang.lisp
From: tb+use...@becket.net (Thomas Bushnell, BSG)
Date: 25 Mar 2002 18:25:27 -0800
Local: Mon, Mar 25 2002 9:25 pm
Subject: Re: case-sensitivity and identifiers (was Re: Wide character implementation)

Erik Naggum <e...@naggum.net> writes:
>   Some languages also have tonemes, not just phonemes.  Norwegian is among
>   them.  The phonemes of the Noreegian words for "farmers", "prayers" and
>   "beans" are the same, but the tonemes differ.  Immigrants often have
>   farmers for dinner and purchase produce directly from beans as a result.
>   The word for "farmers" is spelled "břnder" but "beans" and "prayers" are
>   both spelled "břnner".  Note that this is not a question of stress.  All
>   three stress the first syllable exactly the same, and do not stress the
>   final syllable.

Huh?  If they are different words, then *by the definition of a
phoneme* the sound which distinguishes them is a phoneme.  What is a
"toneme"?

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Erik Naggum  
View profile  
 More options Mar 25 2002, 9:43 pm
Newsgroups: comp.lang.lisp
From: Erik Naggum <e...@naggum.net>
Date: Tue, 26 Mar 2002 02:43:54 GMT
Local: Mon, Mar 25 2002 9:43 pm
Subject: Re: case-sensitivity and identifiers (was Re: Wide character implementation)
* Matthias Blume
| Sorry, I was unreasonably hash on you, Kent.

  You are a clever little asshole, aren't you?

| By the way, here is an example in a case-sensitive natural language where
| the distinction between uppercase and lowercase gets *pronounced*: "mit"
| vs. "MIT" in German.  The first means "with" and is pronounced like
| "mitt", the second is the Massachussetts Institute of Technology and is
| pronounced like speakers of English would pronounce it: em-ay-tee.

  Geez, dude, you are _so_ full of yourself.  No wonder you think this is
  supremely silly -- your own contributions are ludicrous and stupid.

  Whether the M, I, and T of the words that make up "MIT" are capitalized
  or not is incidental.  That one chooses to uppercase initials of words
  is precisely what I am talking about.  Sheesh, some people.

| I think that there are enough examples of this around so that making a
| distinction between uppercase and lowercase is warranted in the natural
| language case.

  Hello?  Of course these is a _distinction_ you incredibly retarded jerk!
  Have you been arguing for a _distinction_?  Man, how can you survive
  being so goddamn _stupid_?  Nobody has argued against a distinction, you
  insufferably arrogant moron.  The point is how it should be REPRESENTED!
  (Incidental capitalization added purely for effect.)  Is it even possible
  to be so unintelligent that this is not something you could have avoided
  by _thinking_ a little?  Of course, you are in this "you guys are silly"
  mode, so thinking on your own is out of the question, but the whole point
  is that you are so unconscious and so unwilling to engage your brain to
  understand what somebody else argues that you effectively reduce the
  discussion to your pathetically ignorant level.  Of _course_ there is a
  distinction!  Geez, you are such an idiot.  The question is: should that
  visible distinction have been coded to represent the incidental quality
  apart from the intrinsic quality, and the answer is so "advanced" that
  your puny little brain will in all likelihood not grasp its simplicity.

  Let me give your sevrely reduced mental capacity a simple enough example
  that you might actually be inspired to think about the ramifications.
  The symbol for Ĺngstrřm in Unicode is exactly the same as the glyph for
  the letter A with ring above, because the guy's name was spelled with
  that letter, just like Celsius and Fahrenheit, but all these three
  letters should never be lowercased even though they are upper-case
  letters.  This is an intrinsic quality.  For this reason, Unicode has
  chosen to represent them as _symbols_, not letters.  The same applies to
  Greek omega, pi, rho, and sigma, which are different symbols in each
  case.  Can you wrap your exceptionally pitiful brain around these few and
  simple examples to perhaps grasp that incidental qualities and intrinsic
  qualities are important?  Or are you so unphilosophical and such a
  leering idiot with a moronic grin permanently attached to his skull that
  being able to grasp what other people have thought about before you has
  become impossible for you?

  On wonder you think those who think are _gods_ in their own mind: If you
  had been able to think at all, you would probably experience _several_
  revelations of such magnitude that one "god" would not be enough.

| Again, I do not think that this needs to be in any way correlated with
| the PL case.

  Is the stuff you are smoking legal?  Go back to your Scheme community,
  where being supremely silly is not considered rude to your compatriots.

///
--
  In a fight against something, the fight has value, victory has none.
  In a fight for something, the fight is a loss, victory merely relief.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Matthias Blume  
View profile  
 More options Mar 25 2002, 10:33 pm
Newsgroups: comp.lang.lisp
From: Matthias Blume <matth...@shimizu-blume.com>
Date: Tue, 26 Mar 2002 03:33:47 GMT
Local: Mon, Mar 25 2002 10:33 pm
Subject: Re: case-sensitivity and identifiers (was Re: Wide character implementation)

Erik Naggum <e...@naggum.net> writes:
>   Some languages also have tonemes, not just phonemes.  Norwegian is among
>   them.  The phonemes of the Noreegian words for "farmers", "prayers" and
>   "beans" are the same, but the tonemes differ.  Immigrants often have
>   farmers for dinner and purchase produce directly from beans as a result.
>   The word for "farmers" is spelled "břnder" but "beans" and "prayers" are
>   both spelled "břnner".  Note that this is not a question of stress.  All
>   three stress the first syllable exactly the same, and do not stress the
>   final syllable.

So what?  What does this have to do with anything?  I have already
pointed out examples (albeit not from Norwegian, which I don't know at
all) for this phenomenon.  Pronunciation and spelling are often at
odds.  Therefore, one cannot argue on the basis of phonetics which
visual distinctions in the written language matter and which ones
don't.  As far as I am concerned, uppercase and lowercase are not the
same.  In German, this is simply a fact of how the written language is
defined.  Getting the capitalization wrong is a spelling error just
like using the wrong vowel, missing an 'h' somewhere, using 'ss' where
'ß' should be used, joining words where they ought to be separated and
vice versa, and so and and so forth.  Of course, many of these
distinctions are redundant to some degree.  Case distinctions are not
the only redundancies.  Should we abolish all whitespace just because
with some practice one can infer where word boundaries are?  I haven't
seen anyone suggesting this.  (And again, there are precedents for
such a things, for example in some far eastern languages where words
are not visibly separated in the written language.)

>   OF course, you are a Scheme freak and a tourist in comp.lang.lisp, the
>   very canonicalization of the irresponsible trouble-maker who thinks he is
>   an outsider to the community he torments with "you are silly who do it
>   differently from me" attitudes.  Thank you for contributing to the
>   _impression_ that Scheme is the language of choice of deranged lunatics.

Quite funny that you think I am a Scheme person...
(Especially considering that Scheme, like CL, uses case-insensitive identifiers.)

Matthias


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Discussion subject changed to "Back to character set implementation thinking" by Christopher Browne
Christopher Browne  
View profile  
 More options Mar 25 2002, 10:36 pm
Newsgroups: comp.lang.lisp, comp.lang.scheme
From: Christopher Browne <cbbro...@acm.org>
Date: Mon, 25 Mar 2002 22:30:56 -0500
Local: Mon, Mar 25 2002 10:30 pm
Subject: Re: Back to character set implementation thinking
The world rejoiced as Erik Naggum <e...@naggum.net> wrote:

You should have a chat with Charles Moore, of Forth fame.  He
designed, using a CAD system he wrote in Forth, called OK, a 20 bit
microprocessor that (surprise, surprise...  NOT!) has an instruction
set designed specifically for Forth.

Something that is unfortunate is that the 36 bit processors basically
died off in favor of 32 bit ones.  Which means we have great gobs of
algorithms that assume 32 bit word sizes, with the only leap anyone
can conceive of being to 64 bits, and meaning that if you need a tag
bit or two for this or that, 32 bit operations wind up Sucking Bad.

But I digress, too...
--
(concatenate 'string "cbbrowne" "@ntlug.org")
http://www.ntlug.org/~cbbrowne/oses.html
Rules of the  Evil Overlord #230. "I will  not procrastinate regarding
any ritual granting immortality."  <http://www.eviloverlord.com/>


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Discussion subject changed to "case-sensitivity and identifiers (was Re: Wide character implementation)" by Christopher Browne
Christopher Browne  
View profile  
 More options Mar 25 2002, 11:07 pm
Newsgroups: comp.lang.lisp
From: Christopher Browne <cbbro...@acm.org>
Date: Mon, 25 Mar 2002 22:43:15 -0500
Local: Mon, Mar 25 2002 10:43 pm
Subject: Re: case-sensitivity and identifiers (was Re: Wide character implementation)
In the last exciting episode, Erik Naggum <e...@naggum.net> wrote::

Yes, that seems rather a strange comment.

Note that one of Moore's more-publicized quasi-recent projects
involved building a CAD system for designing microprocessors.

His approach was to basically write the application-cum-operating
system based on a tiny kernel of Forth instructions which basically
meant he started with 80486 assembler, and built on top of that.

Apparently it offered vast opportunities to avoid all kinds of cruft
that tends to get built into CAD systems, but what it really amounted
to was that he built his system as an embedded system on top of bare
Intel metal.

I think a lot of his argument is that people keep building cruft on
top of cruft, when they might be better off with a _good_ embedded
system.  

Consider the horrors of MS Office: We might be better off if, instead
of continually being mandated by the latest bloatware upgrade to
upgrade their system to the latest "Pentium IV with more memory than
anyone could _conceive_ of ten years ago," people bought cheap
electronic typewriters with bare bits of computing power.  

If people spent their time _typing_, instead of trying to figure out
which menu allows them to change some bit of formatting, they might
get more work done.  Consider that back in the old days, Unix used to
run in 128K words of memory, and CP/M machines could handle word
processing, spreadsheets, and databases in 56K of RAM.  The notion
that you need 256MB of RAM to realistically Windows XP should be
offensive.

In any case, Moore is a fascinating character. He is perhaps not
always to be taken seriously, but he's had more inspired ideas than
most people ever learn about...
--
(reverse (concatenate 'string "gro.mca@" "enworbbc"))
http://www.ntlug.org/~cbbrowne/wp.html
"Cars  move  huge  weights  at  high  speeds  by  controlling  violent
explosions many times a  second. ...car analogies are always fatal..."
-- <westp...@my-dejanews.com>


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Erik Naggum  
View profile  
 More options Mar 25 2002, 11:11 pm
Newsgroups: comp.lang.lisp
From: Erik Naggum <e...@naggum.net>
Date: Tue, 26 Mar 2002 04:11:32 GMT
Local: Mon, Mar 25 2002 11:11 pm
Subject: Re: case-sensitivity and identifiers (was Re: Wide character implementation)
* Thomas Bushnell, BSG
| Huh?  If they are different words, then *by the definition of a phoneme*
| the sound which distinguishes them is a phoneme.  What is a "toneme"?

  Stress is generally not considered to be a difference in phoneme.

  The sound is exactly the same, but whether you have entering, departing,
  rising, falling, high, low, up-down, down-up, or level tone can and does
  change the meaning of the word.  Thai, for instance, has explicit tone
  markers.  Chinese has different ideographs for words that are pronounced
  with the same phonemes and different tonemes.

  Consider the phonemes of the word "really".  The toneme is the difference
  in pronunciation between "Really?" and "Really." and "Really!".

  French, for instance, has no stress, but tends to use maringally shorter
  and longer vowels.  They also have no tonemes, so they French have very
  _serious_ problems dealing with other languages and sound ridiculous in
  almost every other language than their own.

///
--
  In a fight against something, the fight has value, victory has none.
  In a fight for something, the fight is a loss, victory merely relief.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Erik Naggum  
View profile  
 More options Mar 25 2002, 11:20 pm
Newsgroups: comp.lang.lisp
From: Erik Naggum <e...@naggum.net>
Date: Tue, 26 Mar 2002 04:20:45 GMT
Local: Mon, Mar 25 2002 11:20 pm
Subject: Re: case-sensitivity and identifiers (was Re: Wide character implementation)
* Matthias Blume
| So what?  What does this have to do with anything?

  Why are you still talking?  This is "supremely silly" and you keep
  blabbering?  What for?

| As far as I am concerned, uppercase and lowercase are not the same.

  Nobody has said they are.  Please just grasp this, OK?  That some
  distinction is incidental does mean that it is not there.  I wonder what
  your limited brainpower has concluded that this discussion is all about
  when you are so devoid of understanding.  Geez, you are _so_ stupid.

///
--
  In a fight against something, the fight has value, victory has none.
  In a fight for something, the fight is a loss, victory merely relief.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Thomas Bushnell, BSG  
View profile  
 More options Mar 25 2002, 11:40 pm
Newsgroups: comp.lang.lisp
From: tb+use...@becket.net (Thomas Bushnell, BSG)
Date: 25 Mar 2002 20:32:20 -0800
Local: Mon, Mar 25 2002 11:32 pm
Subject: Re: case-sensitivity and identifiers (was Re: Wide character implementation)

Erik Naggum <e...@naggum.net> writes:
> * Thomas Bushnell, BSG
> | Huh?  If they are different words, then *by the definition of a phoneme*
> | the sound which distinguishes them is a phoneme.  What is a "toneme"?

>   Stress is generally not considered to be a difference in phoneme.

Oh, ok.  That's a good point; the term "phoneme" is ambiguous I think.
Tonal differences are sometimes phonemic and sometimes not, but I now
understand what you mean.  Whether a tonal or length difference should
be officially phonemic is a matter style and not any real linguistics,
as far as I can tell.

>   Consider the phonemes of the word "really".  The toneme is the difference
>   in pronunciation between "Really?" and "Really." and "Really!".

Yeah, but there it's a matter of marking, which is different than
tone.  A better example in English is between homographs like
"conduct" (a noun, stress on the first syllable) and "conduct" (a
verb, stress on the second syllable).  

Because stress is contextual, it's not normally counted as a phoneme.
Tone and length are not contextual, so I think those are usually
counted as phonemes.  But (as I said above) I think this is a pretty
gray area.

>   French, for instance, has no stress, but tends to use maringally shorter
>   and longer vowels.  They also have no tonemes, so they French have very
>   _serious_ problems dealing with other languages and sound ridiculous in
>   almost every other language than their own.

Actually French does have stress as a word marker; the last syllable
of each word gets a stress.  (Obviously, stress is therefore not
phonemic in French.)

Thomas


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Matthias Blume  
View profile  
 More options Mar 25 2002, 11:57 pm
Newsgroups: comp.lang.lisp
From: Matthias Blume <matth...@shimizu-blume.com>
Date: Tue, 26 Mar 2002 04:46:34 GMT
Local: Mon, Mar 25 2002 11:46 pm
Subject: Re: case-sensitivity and identifiers (was Re: Wide character implementation)

Erik Naggum <e...@naggum.net> writes:
> | As far as I am concerned, uppercase and lowercase are not the same.

>   Nobody has said they are.  Please just grasp this, OK?  That some
>   distinction is incidental does mean that it is not there.

I meant: they are intrinsically not the same.

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Discussion subject changed to "Back to character set implementation thinking" by cr88192
cr88192  
View profile  
 More options Mar 26 2002, 12:13 am
Newsgroups: comp.lang.lisp, comp.lang.scheme
Followup-To: comp.lang.lisp
From: cr88192 <cr88...@hotmail.com>
Date: Mon, 25 Mar 2002 21:17:00 -0500
Subject: Re: Back to character set implementation thinking

> Something that is unfortunate is that the 36 bit processors basically
> died off in favor of 32 bit ones.  Which means we have great gobs of
> algorithms that assume 32 bit word sizes, with the only leap anyone
> can conceive of being to 64 bits, and meaning that if you need a tag
> bit or two for this or that, 32 bit operations wind up Sucking Bad.

hello, personally I don't really know what the big difference is...
I would have imagined that in any case a slightly larger word size would
have been useful, but it is not...
sometimes for some of my code I use 48 bit ints (when 32 bits is too small
and 64 is overkill). I would think that with 36 bits the next size up would
be 72, and 36 is not evenly divisible by 8 so you would need a different
byte size as well (ie: 9 or 12).
sorry, I don't really know of byte sizes other than 8...
am I missing something?

(little has changed in my life since before, except that I am working on an
os now... again...).


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Erik Naggum  
View profile  
 More options Mar 26 2002, 1:06 am
Newsgroups: comp.lang.lisp
From: Erik Naggum <e...@naggum.net>
Date: Tue, 26 Mar 2002 06:06:55 GMT
Local: Tues, Mar 26 2002 1:06 am
Subject: Re: Back to character set implementation thinking
* cr88192 <cr88...@hotmail.com>
| sorry, I don't really know of byte sizes other than 8...
| am I missing something?

  Yes.  A "byte" is only a contiguous sequence of bits in a machine word,
  and has been used that way by most vendors, for us notably DEC, which
  contributed the machine instructions we know as LDB and DPB and the
  notion of a byte specifier, which has bit position in word and length in
  bits.  Failure to support LDB and DPB in hardware is very costly for a
  large number of useful operations, but on an a byte-addressable world
  with 8-bit bytes, using anything smaller than bytes that might cross byte
  boundaries has serious penalties.  In a word-addressable world, this
  saves a lot of memory, even relative to the byte-adressable machines.  C
  has bit fields because it was intended to run on Honewyell 6000, which
  had 36-bit words, so its "char" was 9 bits wide.  (See page 34 of
  Kernighan & Ritchie, 1st ed.)

  IBM chose a more specific terminology: 4-bit nybbles (the same spelling
  deviation as "byte" from "bite"), 8-bit bytes, 16-bit half-words, 32-bit
  words, and 64-bit double-words.  On the PDP-10, we had 36-bit words,
  18-bit half-words (and halfword instructions), but bytes were all over
  the place.  I knwo several people who think this is a much better design
  than the stupid 8-bit design we have today.  Sadly, only several, not
  millions and millions who think Intel's designs are better just because
  they can buy them.

///
--
  In a fight against something, the fight has value, victory has none.
  In a fight for something, the fight is a loss, victory merely relief.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Discussion subject changed to "case-sensitivity and identifiers (was Re: Wide character implementation)" by Erik Naggum
Erik Naggum  
View profile  
 More options Mar 26 2002, 1:21 am
Newsgroups: comp.lang.lisp
From: Erik Naggum <e...@naggum.net>
Date: Tue, 26 Mar 2002 06:21:08 GMT
Local: Tues, Mar 26 2002 1:21 am
Subject: Re: case-sensitivity and identifiers (was Re: Wide character implementation)
* Thomas Bushnell, BSG
| Oh, ok.  That's a good point; the term "phoneme" is ambiguous I think.
| Tonal differences are sometimes phonemic and sometimes not, but I now
| understand what you mean.  Whether a tonal or length difference should be
| officially phonemic is a matter style and not any real linguistics, as
| far as I can tell.

  *sigh*  My native language has tonemes.  Yours does not.  Trust me on
  this, OK?  Go look it up if you doubt me.

  Tone is the musical tone with which you pronounce a phoneme, or more
  precisely, with the relative direction of the change of the tone
  throughout the word.

> Consider the phonemes of the word "really".  The toneme is the difference
> in pronunciation between "Really?" and "Really." and "Really!".

| Yeah, but there it's a matter of marking, which is different than tone.

  *sigh  No, this is a tone difference.  The rising tone at the end of a
  question is precisely this -- tone.  One does not usually talk about
  tonemes when dealing with the changing meaning of a sentence, but it is
  the same idea.

| A better example in English is between homographs like "conduct" (a noun,
| stress on the first syllable) and "conduct" (a verb, stress on the second
| syllable).

  No, that would be stress, not tone.  I was trying to give you an example
  of what tone is, not how the same sequence of phonemes can have different
  meaning in differing ways.

| Because stress is contextual, it's not normally counted as a phoneme.
| Tone and length are not contextual, so I think those are usually counted
| as phonemes.  But (as I said above) I think this is a pretty gray area.

  No, it is not a grey area.  It just does not apply to English.  Study
  Norwegian or Thai.

///
--
  In a fight against something, the fight has value, victory has none.
  In a fight for something, the fight is a loss, victory merely relief.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Erik Naggum  
View profile  
 More options Mar 26 2002, 1:22 am
Newsgroups: comp.lang.lisp
From: Erik Naggum <e...@naggum.net>
Date: Tue, 26 Mar 2002 06:22:00 GMT
Subject: Re: case-sensitivity and identifiers (was Re: Wide character implementation)
* Matthias Blume
| I meant: they are intrinsically not the same.

  Then your position is not only misguided, but utterly false, you
  supremely silly man.

///
--
  In a fight against something, the fight has value, victory has none.
  In a fight for something, the fight is a loss, victory merely relief.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Florian Hars  
View profile  
 More options Mar 26 2002, 3:00 am
Newsgroups: comp.lang.lisp
From: Florian Hars <flor...@hars.de>
Date: 26 Mar 2002 07:47:10 GMT
Local: Tues, Mar 26 2002 2:47 am
Subject: Re: case-sensitivity and identifiers (was Re: Wide character implementation)
Erik Naggum schrieb im Artikel <3226112482576...@naggum.net>:

> * Thomas Bushnell, BSG
>| Tonal differences are sometimes phonemic and sometimes not

>   *sigh*  My native language has tonemes.  Yours does not.  Trust me on
>   this, OK?  Go look it up if you doubt me.

Some data points on "toneme" from the web:
The American Heritage® Dictionary:
  A type of phoneme
The Concise Oxford Dictionary of Linguistics:
  A unit of pitch, especially in tone languages, treated as or
  analogously to a phoneme.
http://www.factmonster.com:
  a phoneme consisting of a contrastive feature of tone in a tone
  language

Yours, Florian.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Thomas Bushnell, BSG  
View profile  
 More options Mar 26 2002, 3:20 am
Newsgroups: comp.lang.lisp
From: tb+use...@becket.net (Thomas Bushnell, BSG)
Date: 26 Mar 2002 00:13:37 -0800
Local: Tues, Mar 26 2002 3:13 am
Subject: Re: case-sensitivity and identifiers (was Re: Wide character implementation)

Erik Naggum <e...@naggum.net> writes:
> * Thomas Bushnell, BSG
> | Oh, ok.  That's a good point; the term "phoneme" is ambiguous I think.
> | Tonal differences are sometimes phonemic and sometimes not, but I now
> | understand what you mean.  Whether a tonal or length difference should be
> | officially phonemic is a matter style and not any real linguistics, as
> | far as I can tell.

>   *sigh*  My native language has tonemes.  Yours does not.  Trust me on
>   this, OK?  Go look it up if you doubt me.

I'm trusting you about the way Norwegian works, and I'm trying to
understand it in the terminology used in English to speak about
linguistics.

I do understand perfectly well what tone is.

> | Because stress is contextual, it's not normally counted as a phoneme.
> | Tone and length are not contextual, so I think those are usually counted
> | as phonemes.  But (as I said above) I think this is a pretty gray area.

>   No, it is not a grey area.  It just does not apply to English.  Study
>   Norwegian or Thai.

I know perfectly well what tone is.

The question is whether tonal difference is a phonemic difference.

Since a phoneme is a minimal unit distinguishing two words, if there
are two words that differ only in tone, the difference must therefore
be phonemic.

I mentioned stress (in English, with the "conduct" example), because
stress is also sometimes thought not to distinguish phonemes, but
really it does.

What is a gray area is whether how rigid one wants to be about the
definition of "phoneme".

Thomas


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Alain Picard  
View profile  
 More options Mar 26 2002, 3:35 am
Newsgroups: comp.lang.lisp
From: Alain Picard <apic...@optushome.com.au>
Date: 26 Mar 2002 19:29:38 +1100
Local: Tues, Mar 26 2002 3:29 am
Subject: Re: case-sensitivity and identifiers (was Re: Wide character implementation)

Erik Naggum <e...@naggum.net> writes:

>   French, for instance, has no stress, but tends to use maringally shorter
>   and longer vowels.  They also have no tonemes, so they French have very
>   _serious_ problems dealing with other languages and sound ridiculous in
>   almost every other language than their own.

What makes you think they don't sound equally ridiculous in French?  ;-)

In high school, I never did understand what the English teacher was
going on about, with his "iambic pentameter" stuff.  If you come from
a monotonic language, the whole thing doesn't make a lot of sense.
Oh well, _our_ rhymes are a lot more exact.

*Years* later, having married an anglophone and lived in english
society for a few years, it was finally explained to me that english
has this "stress" thing...  my accent improved markedly after that.

--
It would be difficult to construe        Larry Wall, in  article
this as a feature.                       <1995May29.062427.3...@netlabs.com>


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Erik Naggum  
View profile  
 More options Mar 26 2002, 4:18 am
Newsgroups: comp.lang.lisp
From: Erik Naggum <e...@naggum.net>
Date: Tue, 26 Mar 2002 09:18:22 GMT
Local: Tues, Mar 26 2002 4:18 am
Subject: Re: case-sensitivity and identifiers (was Re: Wide character implementation)
* Thomas Bushnell, BSG
| Since a phoneme is a minimal unit distinguishing two words, if there are
| two words that differ only in tone, the difference must therefore be
| phonemic.

  Apparently, this is how some people see it -- I have not seen a
  difference in tone referred to as "phonemic".  However, phonemes are
  supposed to be discrete elments of speech.  A toneme is not -- the change
  in tone usually spans several phonemes.  Therefore, it is either a
  phoneme of its own, which seems odd, or an additional speech element.
  If a "phoneme" is the _only_ smallest unit of sound it appears not
  to be possible to enumerate the phonemes of a language, any longer.

| I mentioned stress (in English, with the "conduct" example), because
| stress is also sometimes thought not to distinguish phonemes, but
| really it does.

  So when something, anything distinguishes phonemes, they become two?
  That does not appear to be useful.  It seems rather to mulitply them
  without bounds.

| What is a gray area is whether how rigid one wants to be about the
| definition of "phoneme".

  Seems if you can put whatever you want into to, it is rendered useless.

///
--
  In a fight against something, the fight has value, victory has none.
  In a fight for something, the fight is a loss, victory merely relief.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Messages 51 - 75 of 160 < Older  Newer >
« Back to Discussions « Newer topic     Older topic »