Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
case-sensitivity and identifiers (was Re: Wide character implementation)
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  Messages 101 - 125 of 160 - Collapse all  -  Translate all to Translated (View all originals) < Older  Newer >
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
Erik Naggum  
View profile  
 More options Mar 27 2002, 12:28 pm
Newsgroups: comp.lang.lisp
From: Erik Naggum <e...@naggum.net>
Date: Wed, 27 Mar 2002 17:28:56 GMT
Local: Wed, Mar 27 2002 12:28 pm
Subject: Re: case-sensitivity and identifiers (was Re: Wide character implementation)
* Thomas Bushnell, BSG
| Does a toneme in Norwegian extend past a single syllable, however?  I
| don't know the answer to that question.

  Basically, the whole word is either rising, falling, or rising-falling,
  and in combining words, the intonation of both words change.  For
  instance, English as a Second Language means something different from
  English as the Second Language.  In Norwegian we have "Norsk som andre
  sprog" og "Norsk som andresprog", where the former means either "like
  other langauges" or "as the second language" depending on tone, and is
  also distinguished from the latter by tone, not by stress.  This is
  particulary funny when those furriners try to find the section in the
  bookstore that would help them get just this point and doubly funny when
  the bookstore cannot even spell it correctly, which their all too young
  information desk attendant could not pick up from the tone difference
  even though several bystanders could, and laughed, when I tried in vain
  to point out the fuuny mistake to her.

| The tones actually extend beyond just the vowel, and affect timing and
| intonation of the whole word, however.  But they are assigned to the
| stressed vowel only, and are counted as various phonemic variants of that
| vowel.
|
| The situation might work out similarly in Norwegian, dunno.

  I do not know Classical Attic Greek so I cannot say for certain, but your
  brief description makes me believe there is a good chance of a similarity.

///
--
  In a fight against something, the fight has value, victory has none.
  In a fight for something, the fight is a loss, victory merely relief.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Ed L Cashin  
View profile  
 More options Mar 27 2002, 8:02 pm
Newsgroups: comp.lang.lisp
From: Ed L Cashin <ecas...@uga.edu>
Date: 27 Mar 2002 20:02:13 -0500
Local: Wed, Mar 27 2002 8:02 pm
Subject: Re: case-sensitivity and identifiers (was Re: Wide character implementation)

Kenny Tilton <ktil...@nyc.rr.com> writes:
> Ed L Cashin wrote:

> >  I'd be happy to hear
> > a good case for case-insensitive identifiers.

> I've done a ton of case-sensitive C and I've done a ton of code in
> case-insensitive languages. I like case-insensitive much, much more.
> Does that count?

Does for me.  Also, I think Erik Naggum provided a good argument
undermining my original assumption that 'a' and 'A' are different
characters in reality (even though they have distinct encodings in
ASCII, as opposed to his ideal), and Kent Pitman pointed out that
psychologically, we think of strings of letters first, remembering
case less frequently.

> A deeper reason is that it seems weird to use case to differentiate
> two things. If I looked down and saw an app with two functions, say,
> ABLE-P and able-p, meaning different things which the case was meant
> to convey, I would have regretably ungenerous thoughts regarding the
> author.

Yes.  The responses I've read beg the question, "Isn't it the code
author's fault if case-sensitivity is abused?"  I mean, if the
language is case sensitive and people write poor-quality code, is that
the language designer's fault?

As for case-folding issues in the face of different languages with
different notions of upper and lower case, it seems like many hairy
issues are associated with it, and I'm looking forward to the day when
I can appreciate them all!

--
--Ed L Cashin            |   PGP public key:
  ecas...@uga.edu        |   http://noserose.net/e/pgp/


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Janis Dzerins  
View profile  
 More options Mar 28 2002, 7:41 am
Newsgroups: comp.lang.lisp
From: Janis Dzerins <jo...@latnet.lv>
Date: 28 Mar 2002 10:01:06 +0200
Local: Thurs, Mar 28 2002 3:01 am
Subject: Re: case-sensitivity and identifiers (was Re: Wide character implementation)
Ed L Cashin <ecas...@uga.edu> writes:

> Yes.  The responses I've read beg the question, "Isn't it the code
> author's fault if case-sensitivity is abused?"  I mean, if the
> language is case sensitive and people write poor-quality code, is that
> the language designer's fault?

Yes, it is the language designer's fault.  At least to some degree.

Languages make some errors more easy to make, some others harder to
make, some concepts easier to express, some harder.  And that is in
the language designer's power to shape.

> As for case-folding issues in the face of different languages with
> different notions of upper and lower case, it seems like many hairy
> issues are associated with it, and I'm looking forward to the day when
> I can appreciate them all!

Just curious -- which languages have different notions of upper and
lower case characters (if you are talking about programming languages,
of course)?

--
Janis Dzerins

  Eat shit -- billions of flies can't be wrong.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Kenny Tilton  
View profile  
 More options Mar 28 2002, 11:55 am
Newsgroups: comp.lang.lisp
From: Kenny Tilton <ktil...@nyc.rr.com>
Date: Thu, 28 Mar 2002 16:51:58 GMT
Local: Thurs, Mar 28 2002 11:51 am
Subject: Re: case-sensitivity and identifiers (was Re: Wide character implementation)

Ed L Cashin wrote:

> Kenny Tilton <ktil...@nyc.rr.com> writes:

> >  If I looked down and saw an app with two functions, say,
> > ABLE-P and able-p, meaning different things which the case was meant
> > to convey,...

> Yes.  The responses I've read beg the question, "Isn't it the code
> author's fault if case-sensitivity is abused?"  

Oh, I jumped in on the middle of this (just can't keep up with c.l.l.
anymore!) and maybe I missed something. Are you saying that ABLE-P vs
able-p is poor quality code, but SomeThingElse vs somethingelse is not?
If so, what would SomeThingElse be? If not...never mind. :)

--

 kenny tilton
 clinisys, inc
 ---------------------------------------------------------------
"Harvey has overcome not only time and space but any objections."
                                                        Elwood P. Dowd


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
IPmonger  
View profile  
 More options Mar 28 2002, 12:38 pm
Newsgroups: comp.lang.lisp
From: IPmonger <ipmon...@delamancha.org>
Date: Thu, 28 Mar 2002 12:17:08 -0500
Local: Thurs, Mar 28 2002 12:17 pm
Subject: Re: case-sensitivity and identifiers (was Re: Wide character implementation)

    I believe what he's saying is that whichever of those are poor quality
  code - I would suggest that they all are - it isn't the fault of the
  language designer(s) but of the programmer.

  Which brings up the question:

  Is it at all possible to use case-sensitivity in an appropriate manner?

-jon
--
------------------
IPmonger
ipmon...@delamancha.org


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Discussion subject changed to "Wide character implementation" by ozan s. yigit
ozan s. yigit  
View profile  
 More options Mar 28 2002, 1:00 pm
Newsgroups: comp.lang.lisp, comp.lang.scheme
From: o...@cs.yorku.ca (ozan s. yigit)
Date: 28 Mar 2002 10:00:38 -0800
Local: Thurs, Mar 28 2002 1:00 pm
Subject: Re: Wide character implementation
Erik Naggum:

>    ... It does an excellent job of explaining the
>   distinction between glyph and character.  I think you need it much more
>   than trying to defend yourself by insulting me with your ignorance.

imagine how much time you would have saved yourself and everyone else
had you just posted a useful part of the actual unicode standard, for
example pp. 13, "Characters, Not Glyphs" [1]

        The Unicode standard draws a distinction betweeb /characters/ which
        are the smallest components of written language that have semantic
        value, and /glyphs/, which represent the shapes that characters can
        have when they are rendered or displayed. Various relationships may
        exist between character and glyph; a single glyph may correspond to
        a single character, or to a number of characters, or multiple glyphs
        may result from a single character.

        [etc]

but it is more fun to lecture, and madly scribble on the board, isn't it? :-]

oz
---
[1] The Unicode Standard Version 3.0, Addison-Wesley, 2000.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Erik Naggum  
View profile  
 More options Mar 28 2002, 1:30 pm
Newsgroups: comp.lang.lisp, comp.lang.scheme
From: Erik Naggum <e...@naggum.net>
Date: Thu, 28 Mar 2002 18:30:03 GMT
Local: Thurs, Mar 28 2002 1:30 pm
Subject: Re: Wide character implementation
* o...@cs.yorku.ca (ozan s. yigit)
| imagine how much time you would have saved yourself and everyone else
| had you just posted a useful part of the actual unicode standard, for
| example pp. 13, "Characters, Not Glyphs" [1]

  Imagine how much time people would have saved _everybody_ if they cared
  to study something before they thought they had the right to produce
  "opinions".  "When did ignorance become a point of view?"  Then imagine
  how much time it would take to find out what some ignorant fuck needs to
  hear in order to become unconfused.  It is not my task to educate people
  who voice opinions on what they do not have the intellectual honesty and
  wherewithal to realize that they do not know sufficiently well.  People
  who cannot keep track of what they know and what they do not know, should
  shut the fuck up, but they never will, precisely because they are unaware
  of what they know and do not know.  Wade Humeniuk gave us a good analogy
  to his yoga classes and the mat-abusers.  Non-thinking cretins who post
  ignorant opinions to newsgroup are just the same kind of inconsiderate
  bastards.  But you choose to _defend_ them.  What does that make you?
  Those who have the intellectual honesty to separate what they know from
  what they just assume, also know where they heard something and can rate
  its probability and credibility.  Those are worth helping, because they
  are likely to learn from it.  Those who are unlikely to learn from what
  you tell them, are a waste of time.

| but it is more fun to lecture, and madly scribble on the board, isn't it? :-]

  Your life experiences apparently differ quite significantly from mine,
  but if you feel happy about exposing yourself like this, please do.  More
  idiotic drivel that lets the world know how you think is probably going
  to be the result of your obvious desire to inflame rather than inform, so
  go ahead, make a spectacle of yourself.  This newsgroup is quite used to
  your kind by now.

///
--
  In a fight against something, the fight has value, victory has none.
  In a fight for something, the fight is a loss, victory merely relief.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Discussion subject changed to "case-sensitivity and identifiers (was Re: Wide character implementation)" by Doug Quale
Doug Quale  
View profile  
 More options Mar 28 2002, 2:44 pm
Newsgroups: comp.lang.lisp
From: Doug Quale <qua...@charter.net>
Date: 28 Mar 2002 13:44:08 -0600
Local: Thurs, Mar 28 2002 2:44 pm
Subject: Re: case-sensitivity and identifiers (was Re: Wide character implementation)

IPmonger <ipmon...@delamancha.org> writes:
>   Which brings up the question:

>   Is it at all possible to use case-sensitivity in an appropriate manner?

Sure, at least in non-lispy languages.  A lot of C code uses the
convention that identifiers in all caps are macros.  Prolog (Edinburgh
syntax) requires leading capitalization to distinguish variables from
constant symbols (eliminating the need to quote constants in Prolog).
In Haskell, capitalized symbols indicate data types and constructors.
This makes the Haskell pattern matching syntax nicer.  Some languages
guarantee that all built-in identifiers will be of one case so that
the user can use identifiers with the other case without fear of
colliding with a current or future language-defined id.

As was noted a long time ago in this thread, in all these cases it's
harder to read the code aloud since capitalization doesn't change the
pronunciation.  In practice, I think users of these languages have
found that case distinctions work well all the same.

--
Doug Quale


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Nils Goesche  
View profile  
 More options Mar 28 2002, 2:46 pm
Newsgroups: comp.lang.lisp
From: Nils Goesche <n...@cartan.de>
Date: 28 Mar 2002 20:46:25 +0100
Local: Thurs, Mar 28 2002 2:46 pm
Subject: Re: case-sensitivity and identifiers (was Re: Wide character implementation)

Torsten <vi...@fraqz.archeron.dk> writes:
> Nils Goesche <n...@cartan.de> skrev:

> > But I have /read/ texts that didn't use capitalization in
> > German, and it was very annoying in that it just makes harder
> > to guess how a sentence is likely to end, something that is
> > very important in German (the verb at the end...). [...] This
> > has been measured, BTW.

> I hope you can see the obvious flaw in such measurements. There
> is no large German speaking group not trained to capitalize
> nouns.

Well, duh.  Giving up the extra effort of capitalization isn't
exactly something you have to train anybody for.  I have read
several *books* in German that didn't use capitalization at all.
It was horrible.  I don't have much time in the morning, but
still manage to read large parts of the Frankfurter Allgemeine
every morning, in very little time.  I sometimes ``observe''
myself how I read that fast, and I found out that when looking at
a whole block of text at a time, the visual structure of
sentences indicated by capitalized words is a very useful help
for the reader.

That's the whole point of capitalization.  It makes reading
easier, in German, anyway.  The /only/ argument I /ever/ heard
against capitalization was that it supposedly makes /writing/
easier for some retarded children.  Well, even if that were true,
which it isn't, who would want to read anything written by
retarded children, anyway?  Why do you write something down in
the first place, if it wasn't for somebody to read?  It's the
reader that counts, not the writer.

I don't know Danish.  Maybe it doesn't make a difference there.
Maybe Danes only used to capitalize nouns because the Germans
did, and as the Germans weren't exactly very popular in 1948,
that might have been a good opportunity to give up on it.

Or maybe it wasn't.  Who is supposed to know anymore?  You said
there was a controversy about it; maybe the people who were
against it were right?  Who would remember?  How could you tell?
When you lose a piece of culture like that, later generations
don't remember or miss any of it anymore, but that doesn't mean
it was right to drop it.

Suppose all the governments in the world suddenly decide to put
an end to this babylonian mess of programming languages and make
it a law that from now on, you are only allowed to program in,
say, Java.  We'd hate that and complain, but would be arrested
and put into concentration camps until we either learn and
publicly announce how great Java actually is, and how sorry we
are for not recognizing that earlier, or are given the coup de
grace if too stubborn.

What would happen after a few decades or so?  I tell you what:
People would be happy.  They'd laugh about us crazy freaks who
were too stupid to recognize the merits of their progressive,
modern ideas.  Nobody would remember any of the old languages,
and how would young people know that they were worth anything, if
every history book tells them that they were stupid and
anti-modern?  Everybody can see that they can write everything in
Java, so why should they miss anything?

Regards,
--
Nils Goesche
Ask not for whom the <CONTROL-G> tolls.

PGP key ID 0xC66D6E6F


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Discussion subject changed to "Back to character set implementation thinking" by Pekka P. Pirinen
Pekka P. Pirinen  
View profile  
 More options Mar 28 2002, 2:50 pm
Newsgroups: comp.lang.lisp, comp.lang.scheme
From: Pekka.P.Piri...@globalgraphics.com (Pekka P. Pirinen)
Date: 28 Mar 2002 19:39:40 +0000
Local: Thurs, Mar 28 2002 2:39 pm
Subject: Re: Back to character set implementation thinking
tb+use...@becket.net (Thomas Bushnell, BSG) writes:

> So, getting back to my original question about charset implementations
> in Lisp/Scheme (though actually Smalltalk or any such
> [much snippage]
> So that means it's pretty easy to make sure the whole space of
> UCS values fits in an immediate representation.  That's fine for
> working with actively used data.

Even for actively used data, compactness of representation pays off in
better cache efficency.  In fact, particularly for actively used data
should we be mindful of this.  Since you seem to be thinking of a
32-bit immediate representation, an improvement to 16-bit strings or
even 8-bit strings is nothing to be sneezed at.

> However, strings that are going to be kept around a long time should,
> it seems to me, be stored more compactly.  Essentially all strings
> will be in the Basic Multilingual Plane, so they can fit in 16 bits.
> That means there would be two underlying string datatypes.  I don't
> think this is a serious problem.

As an implementor, I can tell you that actually the step from one
string type to two is the hardest bit.  Once you've figured out how
you want to implement that, having more is not such a big deal.  From
a programmer's point of view, the efficiency gains from more string
types outweigh the costs (unless you think you could do without the
larger ones), even if you have to deal with it explicitly.

> Is it worth having a third (for 8-bit characters) so that Latin-1
> files don't have to be inflated by a factor of two?  It seems to me
> that this would be important too.

Files and strings don't really have much to do with each other.  Files
are an externalization issue.  Of course you can store files in UCS,
and sometimes that's the right thing to do, but in the real world, you
have to deal with all kinds of encodings, so you need the machinery,
anyway, to read and write Shift-JIS, Big5, Latin-1, UTF-8, etc.

Like I said above, it _is_ important to have an 8-bit string type.
People in the West, who rarely even realize they could easily support
16-bit users, will get great benefits.  And between files and
"actively used data", there are those people who want to load their
entire database in main memory and compute with that; they'll get
their size limit extended as well.

> Basically then we would have strings which are UCS-4, UCS-2 and
> Latin-1 restricted (internally, not visibly to users). [...]
> Procedures like string-set! therefore might have to inflate (and
> thus copy) the entire string if a value outside the range is stored.
> But that's ok with me; I don't think it's a serious lose.

I suppose that is a viable implementation strategy, but I don't think
it's the right option.  The language should expose the range of string
data types to the programmer, and let them choose, because the range
of memory usage is just too great to sweep under the mat.  Also,
having strings automatically reallocated means an extra indirection
for access which cannot always be optimized away.

I note that offering multiple string types is exactly what all the CL
implementations seem to have done.  This doesn't preclude having
features that automatically select the smallest feasible type, e.g.,
for "" read syntax or a STRING-APPEND function.
--
Pekka P. Pirinen
The gap between theory and practice is bigger in practice than in theory.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Thomas Bushnell, BSG  
View profile  
 More options Mar 28 2002, 3:10 pm
Newsgroups: comp.lang.lisp, comp.lang.scheme
From: tb+use...@becket.net (Thomas Bushnell, BSG)
Date: 28 Mar 2002 12:08:19 -0800
Local: Thurs, Mar 28 2002 3:08 pm
Subject: Re: Back to character set implementation thinking
Pekka.P.Piri...@globalgraphics.com (Pekka P. Pirinen) writes:

> > Is it worth having a third (for 8-bit characters) so that Latin-1
> > files don't have to be inflated by a factor of two?  It seems to me
> > that this would be important too.

> Files and strings don't really have much to do with each other.  Files
> are an externalization issue.  Of course you can store files in UCS,
> and sometimes that's the right thing to do, but in the real world, you
> have to deal with all kinds of encodings, so you need the machinery,
> anyway, to read and write Shift-JIS, Big5, Latin-1, UTF-8, etc.

In the system I'm contemplating, there are no files in the normal
sense of the term; all user data lives as strings, more or less (there
might be something more clever, but whateve).  Whatever strategies are
done for strings (and similar structures) will be important for all
files.

So such data has to be efficiently stored...

> I note that offering multiple string types is exactly what all the CL
> implementations seem to have done.  This doesn't preclude having
> features that automatically select the smallest feasible type, e.g.,
> for "" read syntax or a STRING-APPEND function.

But this is, it seems to me, unclean.

I think of it as being similar to the way numbers work.  Yes, I can
find out whether a given number is a fixnum or a bignum, and I might
well care in some special case.  But normally I just use numbers and
expect the system to automagically do the right thing.

Similarly, I want the string type to simply encode Unicode strings,
and the user should not be forced to deal with more.  The user should
not need to guess at the time the string is created whether or not it
will later need to hold a bigger character code, for example.

Thomas


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Discussion subject changed to "case-sensitivity and identifiers (was Re: Wide character implementation)" by Julian Stecklina
Julian Stecklina  
View profile  
 More options Mar 28 2002, 5:01 pm
Newsgroups: comp.lang.lisp
From: Julian Stecklina <der_jul...@web.de>
Date: 28 Mar 2002 22:58:49 +0100
Local: Thurs, Mar 28 2002 4:58 pm
Subject: Re: case-sensitivity and identifiers (was Re: Wide character implementation)

Nils Goesche <n...@cartan.de> writes:
> Torsten <vi...@fraqz.archeron.dk> writes:

> > Nils Goesche <n...@cartan.de> skrev:
> > > But I have /read/ texts that didn't use capitalization in
> > > German, and it was very annoying in that it just makes harder
> > > to guess how a sentence is likely to end, something that is
> > > very important in German (the verb at the end...). [...] This
> > > has been measured, BTW.

[...]

hmm... I have been speaking and writing in German quite a while now
and my knowledge of grammar says that the predicate is at the second
place in the main clause. Only its infinite part goes to the end.

> What would happen after a few decades or so?  I tell you what:
> People would be happy.  They'd laugh about us crazy freaks who
> were too stupid to recognize the merits of their progressive,
> modern ideas.  Nobody would remember any of the old languages,
> and how would young people know that they were worth anything, if
> every history book tells them that they were stupid and
> anti-modern?  Everybody can see that they can write everything in
> Java, so why should they miss anything?

That reminds me of Paul Graham saying that when he was using BASIC
which at his time did not support recursion, he never needed
recursion, as he did not know that it existed.
And writing a long sentence in English reminds me how I love commas in
German to make reading easier. ;)

Regards,
Julian
--
Meine Hompage: http://julian.re6.de

Ich suche eine PCMCIA v1.x type I/II/III Netzwerkkarte.
Ich biete als Tauschobjekt eine v2 100MBit Karte in OVP.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Nils Goesche  
View profile  
 More options Mar 28 2002, 5:11 pm
Newsgroups: comp.lang.lisp
From: Nils Goesche <n...@cartan.de>
Date: 28 Mar 2002 23:11:27 +0100
Local: Thurs, Mar 28 2002 5:11 pm
Subject: Re: case-sensitivity and identifiers (was Re: Wide character implementation)

Anscheinend will er mich einfach nicht verstehen.  Hat er
wirklich kein Beispiel, kein einziges Beispiel, nicht einmal nach
langem Sinnen ueber mein wundervolles Posting, dafuer gefunden?
(Yes, that's what I meant)

> > What would happen after a few decades or so?  I tell you what:
> > People would be happy.  They'd laugh about us crazy freaks who
> > were too stupid to recognize the merits of their progressive,
> > modern ideas.  Nobody would remember any of the old languages,
> > and how would young people know that they were worth anything, if
> > every history book tells them that they were stupid and
> > anti-modern?  Everybody can see that they can write everything in
> > Java, so why should they miss anything?

> That reminds me of Paul Graham saying that when he was using BASIC
> which at his time did not support recursion, he never needed
> recursion, as he did not know that it existed.

Like that.

> And writing a long sentence in English reminds me how I love commas in
> German to make reading easier. ;)

Me too, hehe :-)

Regards,
--
Nils Goesche
Ask not for whom the <CONTROL-G> tolls.

PGP key ID 0xC66D6E6F


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Discussion subject changed to "Wide character implementation" by ozan s yigit
ozan s yigit  
View profile  
 More options Mar 28 2002, 8:15 pm
Newsgroups: comp.lang.lisp, comp.lang.scheme
From: ozan s yigit <o...@blue.cs.yorku.ca>
Date: 28 Mar 2002 20:04:18 -0500
Local: Thurs, Mar 28 2002 8:04 pm
Subject: Re: Wide character implementation
[erik's bombastic drivel elided]

heh heh heh, nice try erik, but you are no mikhail zeleny, alas. :]

oz


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Erik Naggum  
View profile  
 More options Mar 28 2002, 10:25 pm
Newsgroups: comp.lang.lisp, comp.lang.scheme
From: Erik Naggum <e...@naggum.net>
Date: Fri, 29 Mar 2002 03:25:27 GMT
Local: Thurs, Mar 28 2002 10:25 pm
Subject: Re: Wide character implementation
* ozan s yigit <o...@blue.cs.yorku.ca>
| [erik's bombastic drivel elided]
|
| heh heh heh, nice try erik, but you are no mikhail zeleny, alas. :]

  Oh, great, another nutjob at large.

///
--
  In a fight against something, the fight has value, victory has none.
  In a fight for something, the fight is a loss, victory merely relief.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
ozan s. yigit  
View profile  
 More options Mar 29 2002, 1:32 am
Newsgroups: comp.lang.lisp, comp.lang.scheme
From: o...@cs.yorku.ca (ozan s. yigit)
Date: 28 Mar 2002 22:32:12 -0800
Local: Fri, Mar 29 2002 1:32 am
Subject: Re: Wide character implementation
Erik Naggum:

> | heh heh heh, nice try erik, but you are no mikhail zeleny, alas. :]

>   Oh, great, another nutjob at large.

read your previous post. it speaks volumes.

oz
---
dreams already are. -- mark v. shaney


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Brian Spilsbury  
View profile  
 More options Mar 29 2002, 2:07 am
Newsgroups: comp.lang.lisp, comp.lang.scheme
From: br...@designix.com.au (Brian Spilsbury)
Date: 28 Mar 2002 23:07:30 -0800
Local: Fri, Mar 29 2002 2:07 am
Subject: Re: Wide character implementation

Ray Dillinger <b...@sonic.net> wrote in message <news:3C990D44.78917567@sonic.net>...

> I wouldn't want to muck about internally with a format that had
> characters of various different widths: too much pain to implement,
> too many chances to introduce bugs, not enough space savings.
> Besides, when people read whole files as strings, do you really
> want to run through the whole string counting multi-byte characters
> and single-byte characters to find the value of an expression like

> (string-ref FOO charcount)  ;; lookups in a 32 million character string!

> where charcount is large?  I don't.  Constant width means O(1) lookup
> time.

Well, there are several mitigating factors and some issues with CL
which cause difficulties here.

If you consider your string as a sequence, then you can see that the
issues with variable width encodings produce a data-type which has the
access characteristics of a list.

The arguments for and against lists apply directly to variable-width
strings.

If we look at the use of strings it falls into two fairly distinct
categories;

(a) Iteration:
    Printing, writing, reading, appending, scanning, copying, etc.
(b) Random-Access:
    Randomly accessing characters.

Infact almost everything we do with strings is iterative (which makes
sense when you remember why strings are called strings).

The problem is that Cl has rather poor support for iterating
sequences.

If we considered a sequence to be addressed though two spaces, one
being Index-Space, and the other Point-Space we could avoid a lot of
these issues, and make lists more efficiently usable as sequences.

(elt seq index) would access the sequence though index space (which
might involve walking down a list N steps).
(elt-p seq point) would access the sequence though a point (which
would involve no traversal).

The trick to efficiently exploiting this then would be to get a point
from an index.

(dosequence (element point sequence)
  (when (char= element #\!)
    (setf (elt-p sequence point) #\$)))

for a fairly lame example.

with things like (subseq sequence :start-point a :end-point b) it
starts to become more flexible.

Or the ability to say (dosequence (element point sequence :start-point
point) ...) to allow the continuation of an iteration.

I'm not suggesting that this is an ideal solution, but it should at
least point out some inadequacies in the current model.

With appropriate primitives the wide-spread use of list-like strings
should not even be considered problematic, imho.

And in answer to the example above, I don't think that anyone would
suggest forcing someone to use a variable-width string representation
at all times. If random access to a particular string is important to
you, then a vector-like string is obviously the way to go.

Regards,

Brian


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Discussion subject changed to "Back to character set implementation thinking" by Brian Spilsbury
Brian Spilsbury  
View profile  
 More options Mar 29 2002, 4:19 am
Newsgroups: comp.lang.lisp, comp.lang.scheme
From: br...@designix.com.au (Brian Spilsbury)
Date: 29 Mar 2002 01:19:09 -0800
Local: Fri, Mar 29 2002 4:19 am
Subject: Re: Back to character set implementation thinking
tb+use...@becket.net (Thomas Bushnell, BSG) wrote in message

> Similarly, I want the string type to simply encode Unicode strings,
> and the user should not be forced to deal with more.  The user should
> not need to guess at the time the string is created whether or not it
> will later need to hold a bigger character code, for example.

I think you need to differentiate between mutable and immutable
strings.

A mutable string which is not explicitly restricted (such as
simple-base-string) needs to be able to hold any character, so it
needs to be conservative.

An immutable string cannot be modified, so you are free to encode it
however you like, as long as you can represent whatever you have it
in.

The remainder of the problem is the idea of strings as vectors rather
than sequences, as sequences the O(1) access is no-longer an issue
(although you'd want better iteration support than CL currently
provides).

Beyond this it should be trivial to have an immutable string type
which knows what encoding it is using, and can tell the system what
accessor to use.

As a side-note, string literals and the names of symbols are immutable
in CL.

In addition you would need an operator to encode a mutable string as
an immutable string (using a given encoding), options for immutable
construction for subseq, concatenate, string-output-stream, etc would
also be useful.

Regards,

Brian


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Discussion subject changed to "case-sensitivity and identifiers (was Re: Wide character implementation)" by Torsten
Torsten  
View profile  
 More options Mar 29 2002, 4:44 am
Newsgroups: comp.lang.lisp
From: Torsten <vi...@fraqz.archeron.dk>
Date: Fri, 29 Mar 2002 09:44:19 +0000 (UTC)
Local: Fri, Mar 29 2002 4:44 am
Subject: Re: case-sensitivity and identifiers (was Re: Wide character implementation)
Nils Goesche <n...@cartan.de> skrev:

> Well, duh.  Giving up the extra effort of capitalization isn't
> exactly something you have to train anybody for.  I have read
> several *books* in German that didn't use capitalization at all.
> It was horrible.

I am talking about the measurement results. You claimed that the
capitalization makes reading easier. But what was measured? The
ability to read something that differ from the conventional way
of writing German in comparison to the way eveybody and his dog
learned in school. Somehow the result isn't all that surprising.
Where was the control group? The people who had grown up using
only non-capitalized German. How do you think they would fare in
such a test if they existed? They would have had no preconceived
ideas about non-capitalization looking weird.

> I don't have much time in the morning, but still manage to read
> large parts of the Frankfurter Allgemeine every morning, in
> very little time. I sometimes ``observe'' myself how I read
> that fast, and I found out that when looking at a whole block
> of text at a time, the visual structure of sentences indicated
> by capitalized words is a very useful help for the reader.

Most likely because that's what you are used to.

> I don't know Danish. [...] You said there was a controversy
> about it; maybe the people who were against it were right?

The most vocal opponents were the kind of people who always think
the world is coming to an end if anything changes. They are now
spending their energy on how to, or not to, place commas. In
general, they are surprisingly clueless about the subjects they
make sarcastic remarks about.

--
Torsten


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Discussion subject changed to "Back to character set implementation thinking" by Erik Naggum
Erik Naggum  
View profile  
 More options Mar 29 2002, 9:52 am
Newsgroups: comp.lang.lisp
From: Erik Naggum <e...@naggum.net>
Date: Fri, 29 Mar 2002 14:52:04 GMT
Local: Fri, Mar 29 2002 9:52 am
Subject: Re: Back to character set implementation thinking
* Brian Spilsbury
| I think you need to differentiate between mutable and immutable
| strings.

  I have suggested that strings need to be separated into two mor basic
  types: a stream which you read one element at a time, and a vector which
  provides random access.  The former maps directly to files and is
  suitable for parsing and formatting, while a vector of characters is more
  useful for repeated access to the same characters.

  We have the system class string-stream today, which offers stream access
  to a string, but I think we need a subclass of string like stream-string,
  which may contain such things as the octets from another stream such as
  directly from an input file, and be processed sequentially, and therefore
  should also be able to use stateful encodings such that reading through
  them with the string-stream functions would maintain that state.

///
--
  In a fight against something, the fight has value, victory has none.
  In a fight for something, the fight is a loss, victory merely relief.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Discussion subject changed to "case-sensitivity and identifiers (was Re: Wide character implementation)" by Matthias Blume
Matthias Blume  
View profile  
 More options Mar 29 2002, 10:10 am
Newsgroups: comp.lang.lisp
From: Matthias Blume <matth...@shimizu-blume.com>
Date: 29 Mar 2002 10:06:28 -0500
Local: Fri, Mar 29 2002 10:06 am
Subject: Re: case-sensitivity and identifiers (was Re: Wide character implementation)

I can give you an example from a learner of Japanese: The use of Kanji
(Chinese characters) in place of their hiragana (phonetic)
counterparts is just as redundant as capitalization is in German.  A
literate speaker of Japanese can easily read text that is written
entirely in hiragana (although it might bother him or her a bit).  In
fact, historically, there was a time when women were not allowed to
write Kanji, so this had to be true more or less by definition.

There are on the order of 50 hiragana, but there are several thousands
of Kanji -- which means that learing just hiragana is immensely easier
than learning both.  According to the above, one would expect that
someone without prior exposure to either system would have an easier
time reading pure hiragana text.

I, having not been raised in Japan, fall into this category of having
no prior exposure.  But what can I tell you?  The moment I managed to
memorize even just a tiny number of Kanji, sentences that actually
used them (in place of their hiragana spellings) became *vastly*
easier to read for me.  I am not a psychologist or linguist, so I
won't speculate on why that is.

So if it were true that either way would be equally easy to read for
someone without prior training, why would an utterly untrained person
such as I (and pretty much all of my fellow students as well, BTW) see
this effect?  In other words, there is certainly more going on than
just a "trained dog effect".

> The most vocal opponents were the kind of people who always think
> the world is coming to an end if anything changes. They are now
> spending their energy on how to, or not to, place commas. In
> general, they are surprisingly clueless about the subjects they
> make sarcastic remarks about.

[ ... which brings us back to the topic of "ad hominems".
  Just because idiots or bigots defend something, that something does
  not have to be wrong.  (It, of course, does not mean that it is
  right either.) ]

Matthias


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Dorai Sitaram  
View profile  
 More options Mar 29 2002, 10:48 am
Newsgroups: comp.lang.lisp
From: d...@goldshoe.gte.com (Dorai Sitaram)
Date: 29 Mar 2002 15:48:03 GMT
Local: Fri, Mar 29 2002 10:48 am
Subject: Re: case-sensitivity and identifiers (was Re: Wide character implementation)
In article <fo3cyjfm6j....@trex10.cs.bell-labs.com>,
Matthias Blume  <matth...@shimizu-blume.com> wrote:

Do kanji perhaps serve as some sort of abbreviation,
or, I should rather say, syntactic abstraction?  If so,
their appeal may have the same reason as why
no-longer-newbie users of a programming language prefer
to extend the language with (their own or
others') procedural and textual abstractions rather
than sticking to core procedures and core syntax.  I'm
speculating only.  

(BTW, abbreviations I should note are not just for
saving space or "typing".  They actually aid
comprehension by reducing the time taken for cliches
that don't deserve that time, and correspondingly
letting the non-cliche part of the communication be
highlighted more.  Even electronic communcation,
where space is not expensive the same way as on paper,
and where "completion" aids abound, profits from
abbreviations.)

--d


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Nils Goesche  
View profile  
 More options Mar 29 2002, 11:17 am
Newsgroups: comp.lang.lisp
From: Nils Goesche <n...@cartan.de>
Date: 29 Mar 2002 17:17:33 +0100
Local: Fri, Mar 29 2002 11:17 am
Subject: Re: case-sensitivity and identifiers (was Re: Wide character implementation)

Kanji's often directly denote a certain meaning of a word,
they're like images.  When I ask my wife about the meaning of a
Japanese word I'd heard or read (in Latin characters) somewhere,
she is usually helpless: She can't tell until she sees the kanji
sign.  Hiragana only describes the sound of a word, like our
Latin characters.  She told me that sometimes Japanese would
actually draw a kanji sign in the air when talking, to indicate
what meaning of a word they're saying is intended.  For instance,
there is a kanji sign that has one and only one meaning: Kant's
notion of ``category'' :-)

Regards,
--
Nils Goesche
Ask not for whom the <CONTROL-G> tolls.

PGP key ID 0xC66D6E6F


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Takehiko Abe  
View profile  
 More options Mar 29 2002, 12:17 pm
Newsgroups: comp.lang.lisp
From: k...@ma.ccom (Takehiko Abe)
Date: Fri, 29 Mar 2002 17:17:28 GMT
Local: Fri, Mar 29 2002 12:17 pm
Subject: Re: case-sensitivity and identifiers (was Re: Wide character implementation)
In article <fo3cyjfm6j....@trex10.cs.bell-labs.com>, Matthias Blume <matth...@shimizu-blume.com> wrote:

> [...] The moment I managed to
> memorize even just a tiny number of Kanji, sentences that actually
> used them (in place of their hiragana spellings) became *vastly*
> easier to read for me.  I am not a psychologist or linguist, so I
> won't speculate on why that is.

_One_ reason is that Japanese does not use white spaces to delimit
the words. So, all hiragana text will be felt like reading

   MakeLoadFormSavingSlots

instead of

   make-load-form-saving-slots

--
<keke at mac com>
Are you sure that sound might want to have an idiot?


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Matthias Blume  
View profile  
 More options Mar 29 2002, 1:28 pm
Newsgroups: comp.lang.lisp
From: Matthias Blume <matth...@shimizu-blume.com>
Date: 29 Mar 2002 13:21:09 -0500
Local: Fri, Mar 29 2002 1:21 pm
Subject: Re: case-sensitivity and identifiers (was Re: Wide character implementation)

k...@ma.ccom (Takehiko Abe) writes:
> In article <fo3cyjfm6j....@trex10.cs.bell-labs.com>, Matthias Blume <matth...@shimizu-blume.com> wrote:

> > [...] The moment I managed to
> > memorize even just a tiny number of Kanji, sentences that actually
> > used them (in place of their hiragana spellings) became *vastly*
> > easier to read for me.  I am not a psychologist or linguist, so I
> > won't speculate on why that is.

> _One_ reason is that Japanese does not use white spaces to delimit
> the words.

Right.

> So, all hiragana text will be felt like reading

>    MakeLoadFormSavingSlots

Wouldn't it be more like

     makeloadformsavingslots

?

Matthias


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Messages 101 - 125 of 160 < Older  Newer >
« Back to Discussions « Newer topic     Older topic »