Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
Name for the set of characters legal in identifiers
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  Messages 1 - 25 of 56 - Collapse all  -  Translate all to Translated (View all originals)   Newer >
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
Russell Wallace  
View profile  
 More options Jan 13 2004, 11:34 pm
Newsgroups: comp.lang.lisp
From: wallacethinmi...@eircom.net (Russell Wallace)
Date: Wed, 14 Jan 2004 04:36:14 GMT
Local: Tues, Jan 13 2004 11:36 pm
Subject: Name for the set of characters legal in identifiers
A trivial little question, but one that's been bugging me: Is there a
name for that set of characters legal in Lisp identifiers? For most
languages this would be "alphanumeric" (perhaps with a footnote that _
is regarded as a letter in this context), but Lisp includes characters
like + and - that most languages regard as punctuation.

Thanks,

--
"Sore wa himitsu desu."
To reply by email, remove
the small snack from address.
http://www.esatclear.ie/~rwallace


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
rydis_(martin_rydstr|m)_  
View profile  
 More options Jan 14 2004, 12:00 am
Newsgroups: comp.lang.lisp
From: rydis (Martin Rydstr|m) @CD.Chalmers.SE
Date: 14 Jan 2004 05:57:06 +0100
Local: Tues, Jan 13 2004 11:57 pm
Subject: Re: Name for the set of characters legal in identifiers

wallacethinmi...@eircom.net (Russell Wallace) writes:
> A trivial little question, but one that's been bugging me: Is there a
> name for that set of characters legal in Lisp identifiers? For most
> languages this would be "alphanumeric" (perhaps with a footnote that _
> is regarded as a letter in this context), but Lisp includes characters
> like + and - that most languages regard as punctuation.

I think "constituent character" is quite close, if not "it".

Regards,

'mr

--
[Emacs] is written in Lisp, which is the only computer language that is
beautiful.  -- Neal Stephenson, _In the Beginning was the Command Line_


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Wade Humeniuk  
View profile  
 More options Jan 14 2004, 12:01 am
Newsgroups: comp.lang.lisp
From: Wade Humeniuk <whume...@delete-this-antispam-device.telus.net>
Date: Wed, 14 Jan 2004 05:01:11 GMT
Local: Wed, Jan 14 2004 12:01 am
Subject: Re: Name for the set of characters legal in identifiers

Russell Wallace wrote:
> A trivial little question, but one that's been bugging me: Is there a
> name for that set of characters legal in Lisp identifiers?

In CL that would be _all_.

Wade


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Erik Naggum  
View profile  
 More options Jan 14 2004, 12:39 am
Newsgroups: comp.lang.lisp
From: Erik Naggum <e...@naggum.no>
Date: 14 Jan 2004 05:39:34 +0000
Local: Wed, Jan 14 2004 12:39 am
Subject: Re: Name for the set of characters legal in identifiers
* Russell Wallace
| A trivial little question, but one that's been bugging me: Is there
| a name for that set of characters legal in Lisp identifiers?  For
| most languages this would be "alphanumeric" (perhaps with a footnote
| that _ is regarded as a letter in this context), but Lisp includes
| characters like + and - that most languages regard as punctuation.

  The type STANDARD-CHAR covers the set of characters from which all
  symbols in the standard packages are made.  This simple fact may
  give rise to the invalid assumption that there must be a particular
  character set from which all symbols must be made.

  However, the functions INTERN and MAKE-SYMBOL take a STRING as the
  name of the symbol to be created, and there is no restriction on
  this /string/ to be of type BASE-STRING.  Likewise, the value of
  SYMBOL-NAME is only specified to be of type STRING, with no mention
  of the common observation that it may be a SIMPLE-STRING regardless
  of whether the corresponding argument to INTERN or MAKE-SYMBOL was.

  Since the symbols are normally created by the Common Lisp reader,
  your question is therefore really which characters the reader is
  able to build into a string that it will pass to INTERN.  There is
  no upper bound on this character set in the standard, but an actual
  implementation will necessarily place restrictions on this set.  In
  the worst case, the Common Lisp reader does not understand which
  character is has just read the encoding of, and may produce symbols
  with garbage bytes that nevertheless reproduce the character in your
  editor or other character display equipment.

  Pessimistically, therefore, your question is whether you will find
  any mention in the standard of any invalid characters in symbols,
  but you find quite the opposite: After a single-escape character,
  normally \, any following character will be a constituent character
  in the symbol name being read, and between the multiple-escape
  characters, normally |, all characters will be constituent.  The
  best you can hope for is thus that whatever reads the byte stream
  that is your source file will reject unacceptable encodings.  As
  long as you use an encoded character set that includes the standard
  characters, there is no restriction on what you can do, and if you
  use an encoding that does not confuse standard characters and one of
  your other characters even in the least capable decoders, you will
  find that there is not even any useful restriction on the /length/
  of Common Lisp symbol names.

  Optimistically, however, the answer to your question is that the set
  of characters that are legal in identifiers is the standard-class
  CHARACTER, but you may not be able to produce all of them in any
  given source file.

  I am particularly fond of using the non-breaking space in symbol
  names, just as I use it in filenames under operating systems that
  believe that ordinary spaces are separators regardless of how much
  effort one puts into convincing its various programs otherwise.  I
  know people who think there ought to be laws against this practice,
  but sadly, the Common Lisp standard does not come to their aid.

--
Erik Naggum | Oslo, Norway                      Yes, I survived 2003.

Act from reason, and failure makes you rethink and study harder.
Act from faith, and failure makes you blame someone and push harder.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Duane Rettig  
View profile  
 More options Jan 14 2004, 1:38 am
Newsgroups: comp.lang.lisp
From: Duane Rettig <du...@franz.com>
Date: 13 Jan 2004 22:37:58 -0800
Local: Wed, Jan 14 2004 1:37 am
Subject: Re: Name for the set of characters legal in identifiers

> Erik Naggum | Oslo, Norway                      Yes, I survived 2003.

Welcome back, Erik!

--
Duane Rettig    du...@franz.com    Franz Inc.  http://www.franz.com/
555 12th St., Suite 1450               http://www.555citycenter.com/
Oakland, Ca. 94607        Phone: (510) 452-2000; Fax: (510) 452-0182  


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Russell Wallace  
View profile  
 More options Jan 14 2004, 2:44 am
Newsgroups: comp.lang.lisp
From: wallacethinmi...@eircom.net (Russell Wallace)
Date: Wed, 14 Jan 2004 07:46:55 GMT
Local: Wed, Jan 14 2004 2:46 am
Subject: Re: Name for the set of characters legal in identifiers
On 14 Jan 2004 05:39:34 +0000, Erik Naggum <e...@naggum.no> wrote:

>  However, the functions INTERN and MAKE-SYMBOL take a STRING as the
>  name of the symbol to be created, and there is no restriction on
>  this /string/ to be of type BASE-STRING.  Likewise, the value of
>  SYMBOL-NAME is only specified to be of type STRING, with no mention
>  of the common observation that it may be a SIMPLE-STRING regardless
>  of whether the corresponding argument to INTERN or MAKE-SYMBOL was.

Welcome back, Erik!

Thanks for the explanation - okay, so basically any character _can_ be
part of a symbol... fair enough... my question is really about the
English terminology, though. That is, say you write...

 (defun +-?-+ ...)

...that's fine, you can use the characters +, - and ? in a function
name, they're... "constituent characters", one poster said? Whereas if
you write...

 (defun )(')( ...)

That won't work; (, ) and ' are "punctuation" (?) and normally
recognized by the reader as special characters. (I'm talking about the
normal case, not what you can persuade the reader, interner or
whatever to do if you try hard enough :)) So there's "whitespace",
"punctuation" and... what's the third category called? Not
"alphanumeric"... "constituent characters"?

--
"Sore wa himitsu desu."
To reply by email, remove
the small snack from address.
http://www.esatclear.ie/~rwallace


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Erik Naggum  
View profile  
 More options Jan 14 2004, 3:22 am
Newsgroups: comp.lang.lisp
From: Erik Naggum <e...@naggum.no>
Date: 14 Jan 2004 08:22:42 +0000
Local: Wed, Jan 14 2004 3:22 am
Subject: Re: Name for the set of characters legal in identifiers
* Russell Wallace
| Thanks for the explanation - okay, so basically any character _can_
| be part of a symbol... fair enough... my question is really about
| the English terminology, though.

  The terminology is really pretty simple, but you have to look at it
  from the right angle.  In languages that require identifiers to be
  made up of particular characters, there is obviously a name for the
  character set, but in a language that goes out of its way to make it
  possible to use absolutely any character you want, there are only
  names for those characters that need special treatment to become
  part of a symbol name because their "normal" function is not to.

| Whereas if you write...
|
|  (defun )(')( ...)
|
| That won't work; (, ) and ' are "punctuation" (?) and normally
| recognized by the reader as special characters.

  Well, they are known as "macro characters".  The important thing is
  that the set of macro characters is not defined by the language, but
  by the readtable in effect when the Common Lisp reader processes
  your source.  There is a standard readtable, however, and one would
  have to say "unescaped terminating macro characters in the standard
  readtable" or another phrasing that tries to hide the obvious anal
  retentiveness to really speak about the characters that will not be
  part of a symbol name unless you have changed the rules.  There is
  nothing particularly special about any of these macro characters.
  There are some restrictions on what the readtable can do and how the
  reader collects characters into symbol names.  If you really insist,
  calling them "constituent characters" will help, but realize that
  this property is a result of falling through every other test --
  unless it is escaped, in which case it wins its constituency right
  away.  (There's an awful pun waiting to happen here, about Iowa, but
  I'll ignore the temptation.)

| (I'm talking about the normal case, not what you can persuade the
| reader, interner or whatever to do if you try hard enough :))

  While this may seem reasonable from the angle you chose to look at
  this problem, it is the a priori reasonability of the position that
  has produced your problem.  It is in fact unreasonable to approach
  Common Lisp from this angle.  The problem does not exist.  This

  (defun |)(')(| ...)

  is in fact fully valid Common Lisp code.  You cannot define away the
  solution to the problem and insist that you still have a problem in
  need of an answer.

| So there's "whitespace", "punctuation" and... what's the third
| category called? Not "alphanumeric"... "constituent characters"?

  I have to zoom out and ask you what you would do with the elusive
  name for this category.  If I guess correctly at your intentions, I
  would perhaps have said that "any character can be part of a symbol
  name, but most macro characters need to be escaped to prevent them
  from having their macro function".  (The important exception is #,
  the only non-terminating macro character in the standard readtable,
  meaning that #xF will be interpreted as hexadecimal number, but F#x
  is a three-character-long symbol name with a # in it.)

  Unless you have a simple need that can be resolved by a nice, vague
  explanation that only informs your reader that Common Lisp is a lot
  different from languages that require particular characters in the
  names of identifiers/symbols, I think Chapter 23 in the standard, on
  the Common Lisp Reader, would be a really good suggestion right now.

  Yeah, I'm back allright, with undesirably high levels of precision,
  scaring away frail newbies from day one.  Maybe I'll go hibernate.

--
Erik Naggum | Oslo, Norway

Act from reason, and failure makes you rethink and study harder.
Act from faith, and failure makes you blame someone and push harder.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
james anderson  
View profile  
 More options Jan 14 2004, 5:50 am
Newsgroups: comp.lang.lisp
From: james anderson <james.ander...@setf.de>
Date: Wed, 14 Jan 2004 11:56:12 +0100
Local: Wed, Jan 14 2004 5:56 am
Subject: Re: Name for the set of characters legal in identifiers

i would have thought that a useful characterization would be "constituent
character in the current readtable, with the constituent traits 'alphabetic'
or 'alphadigit'", as that describes the set of characters which could be read,
without escaping, as part of a symbol name, by means of readtable adjustments
with set-syntax-from-char.

upon experimentation, however, i observe that

? (defun test-constituent-character (code)
  (handler-case
    (read-from-string (concatenate 'string "a" (string (code-char code)) "b"))
    (error (e) e)))
TEST-CONSTITUENT-CHARACTER
? (let ((*rt* (copy-readtable)))
    (dotimes (i char-code-limit)
      (set-syntax-from-char (code-char i)  #\a *rt*))
    (let ((result nil)
          (*readtable* *rt*))
      (dotimes (i char-code-limit)
        (typecase (setf result (test-constituent-character i))
          (symbol )
          (t (format *trace-output* "~%~6,'0d (~c) : *** : ~a"
                     i (code-char i) result))))))

000058 (:) : *** : There is no package named "A" .
NIL
?

i would have expected the token parser to have signaled errors when reading
from strings which contained those characters for which 2.1.4.2 specifies the
constituent trait 'invalid'.

is this an implementation bug, or have i misunderstood 2.1.4.2?

...


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Russell Wallace  
View profile  
 More options Jan 14 2004, 6:32 am
Newsgroups: comp.lang.lisp
From: wallacethinmi...@eircom.net (Russell Wallace)
Date: Wed, 14 Jan 2004 11:34:28 GMT
Local: Wed, Jan 14 2004 6:34 am
Subject: Re: Name for the set of characters legal in identifiers
On 14 Jan 2004 08:22:42 +0000, Erik Naggum <e...@naggum.no> wrote:

>  Well, they are known as "macro characters".  The important thing is
>  that the set of macro characters is not defined by the language, but
>  by the readtable in effect when the Common Lisp reader processes
>  your source.  There is a standard readtable, however, and one would
>  have to say "unescaped terminating macro characters in the standard
>  readtable" or another phrasing that tries to hide the obvious anal
>  retentiveness to really speak about the characters that will not be
>  part of a symbol name unless you have changed the rules.

Right, so another way of phrasing my question would be: is there a
shorter term for the noun phrase "unescaped..." above :)

>  While this may seem reasonable from the angle you chose to look at
>  this problem, it is the a priori reasonability of the position that
>  has produced your problem.  It is in fact unreasonable to approach
>  Common Lisp from this angle.  The problem does not exist.

You're right, of course, and if my objective was to understand Common
Lisp, I wouldn't give this issue any more thought - it isn't a problem
in that language.

>  I have to zoom out and ask you what you would do with the elusive
>  name for this category.

What I'm actually doing is designing a new language that's intended to
share Lisp's property of allowing characters like + and - in symbols
(though not the feature of also allowing things like brackets in
symbols if you ask nicely), and I found when thinking about the syntax
I was making heavy use of a concept I didn't have a name for, which
rather bugged me; Lisp is one of the very few languages which allow
non-alphanumeric characters in symbols, so I was wondering if it had a
name for the concept.

It seems the answer is that it doesn't have a name because it doesn't
particularly need the concept... hmm. I think I'll call them "ordinary
characters".

>  Yeah, I'm back allright, with undesirably high levels of precision,
>  scaring away frail newbies from day one.  Maybe I'll go hibernate.

*grin* No, stick around. The newsgroup's more fun with you around.

--
"Sore wa himitsu desu."
To reply by email, remove
the small snack from address.
http://www.esatclear.ie/~rwallace


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Lars Brinkhoff  
View profile  
 More options Jan 14 2004, 7:50 am
Newsgroups: comp.lang.lisp
From: Lars Brinkhoff <lars.s...@nocrew.org>
Date: 14 Jan 2004 13:45:54 +0100
Local: Wed, Jan 14 2004 7:45 am
Subject: Re: Name for the set of characters legal in identifiers

wallacethinmi...@eircom.net (Russell Wallace) writes:
> I was making heavy use of a concept I didn't have a name for, which
> rather bugged me; Lisp is one of the very few languages which allow
> non-alphanumeric characters in symbols

So does Forth, so perhaps programmers using that language have a name
for it.

--
Lars Brinkhoff,         Services for Unix, Linux, GCC, HTTP
Brinkhoff Consulting    http://www.brinkhoff.se/


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Erik Naggum  
View profile  
 More options Jan 14 2004, 12:47 pm
Newsgroups: comp.lang.lisp
From: Erik Naggum <e...@naggum.no>
Date: 14 Jan 2004 17:47:50 +0000
Local: Wed, Jan 14 2004 12:47 pm
Subject: Re: Name for the set of characters legal in identifiers
* james anderson
| upon experimentation, however, i observe that

  Your experiment has only uncovered that it is impossible to override
  the package marker status of colon.  Other than that, you have only
  clobbered the constituent traits of all characters, forcing them the
  same as for #\a.  It is unclear which hypotheses your experiment has
  actually tested.

  This goes to show that : must always be escaped if it is to be part
  of a symbol name, however, further complicating the "name" for the
  set of allowable characters in a symbol.

--
Erik Naggum | Oslo, Norway

Act from reason, and failure makes you rethink and study harder.
Act from faith, and failure makes you blame someone and push harder.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
james anderson  
View profile  
 More options Jan 14 2004, 1:20 pm
Newsgroups: comp.lang.lisp
From: james anderson <james.ander...@setf.de>
Date: Wed, 14 Jan 2004 19:25:09 +0100
Local: Wed, Jan 14 2004 1:25 pm
Subject: Re: Name for the set of characters legal in identifiers

Erik Naggum wrote:

> * james anderson
> | upon experimentation, however, i observe that

>   Your experiment has only uncovered that it is impossible to override
>   the package marker status of colon.  Other than that, you have only
>   clobbered the constituent traits of all characters, forcing them the
>   same as for #\a.  It is unclear which hypotheses your experiment has
>   actually tested.

the hypothesis was that the constituent traits as set out in the table on
standard and semi-standard characters, which traits are not supposed to be
clobbered by set-syntax-from-char, would be useful to characterise the set of
characters which could be used in symbol names without explicit escaping.

>   This goes to show that : must always be escaped if it is to be part
>   of a symbol name, however, further complicating the "name" for the
>   set of allowable characters in a symbol.

i would have expected the same status as that for #\: to apply to whitespace
characters and to rubout.

...


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Russell Wallace  
View profile  
 More options Jan 14 2004, 1:50 pm
Newsgroups: comp.lang.lisp
From: wallacethinmi...@eircom.net (Russell Wallace)
Date: Wed, 14 Jan 2004 18:52:33 GMT
Local: Wed, Jan 14 2004 1:52 pm
Subject: Re: Name for the set of characters legal in identifiers
On 14 Jan 2004 13:45:54 +0100, Lars Brinkhoff <lars.s...@nocrew.org>
wrote:

>wallacethinmi...@eircom.net (Russell Wallace) writes:
>> I was making heavy use of a concept I didn't have a name for, which
>> rather bugged me; Lisp is one of the very few languages which allow
>> non-alphanumeric characters in symbols

>So does Forth, so perhaps programmers using that language have a name
>for it.

So it does; good idea. I'll try asking there, thanks.

--
"Sore wa himitsu desu."
To reply by email, remove
the small snack from address.
http://www.esatclear.ie/~rwallace


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Joe Marshall  
View profile  
 More options Jan 14 2004, 2:03 pm
Newsgroups: comp.lang.lisp
From: Joe Marshall <j...@ccs.neu.edu>
Date: Wed, 14 Jan 2004 14:03:40 -0500
Local: Wed, Jan 14 2004 2:03 pm
Subject: Re: Name for the set of characters legal in identifiers

Erik Naggum <e...@naggum.no> writes:

[snip]

Welcome back!


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Thomas F. Burdick  
View profile  
 More options Jan 14 2004, 2:37 pm
Newsgroups: comp.lang.lisp
From: t...@famine.OCF.Berkeley.EDU (Thomas F. Burdick)
Date: 14 Jan 2004 11:37:18 -0800
Local: Wed, Jan 14 2004 2:37 pm
Subject: Re: Name for the set of characters legal in identifiers

wallacethinmi...@eircom.net (Russell Wallace) writes:
> What I'm actually doing is designing a new language that's intended to
> share Lisp's property of allowing characters like + and - in symbols
> (though not the feature of also allowing things like brackets in
> symbols if you ask nicely)

So you won't be having first-class symbols?  I'd be pretty appalled if
I couldn't give make-symbol any arbitrary string.

--
           /|_     .-----------------------.                        
         ,'  .\  / | No to Imperialist war |                        
     ,--'    _,'   | Wage class war!       |                        
    /       /      `-----------------------'                        
   (   -.  |                              
   |     ) |                              
  (`-.  '--.)                              
   `. )----'                              


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Erik Naggum  
View profile  
 More options Jan 14 2004, 4:36 pm
Newsgroups: comp.lang.lisp
From: Erik Naggum <e...@naggum.no>
Date: 14 Jan 2004 21:36:55 +0000
Local: Wed, Jan 14 2004 4:36 pm
Subject: Re: Name for the set of characters legal in identifiers
* james anderson
| the hypothesis was that the constituent traits as set out in the
| table on standard and semi-standard characters, which traits are not
| supposed to be clobbered by set-syntax-from-char, would be useful to
| characterise the set of characters which could be used in symbol
| names without explicit escaping.

  That does not appear to be an unreasonable hypothesis, but it was
  not the hypothesis you tested.  You tested whether a string of three
  characters, varying the middle one, would be read as a symbol or
  would signal an error.  Any number of middle characters that cause a
  termination of the reader algorithm will produce a symbol read from
  the first character, a letter.

| i would have expected the same status as that for #\: to apply to
| whitespace characters and to rubout.

  But (read-from-string "a b") will return a symbol, namely A, when
  the constituent trait of the space is /invalid/.  You did not test
  the length or any other property of the symbol-name of the returned
  symbol, only that it did not error.  The secondary value returned
  from READ-FROM-STRING should be educational.

--
Erik Naggum | Oslo, Norway

Act from reason, and failure makes you rethink and study harder.
Act from faith, and failure makes you blame someone and push harder.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Don Geddis  
View profile  
 More options Jan 14 2004, 4:50 pm
Newsgroups: comp.lang.lisp
From: Don Geddis <d...@geddis.org>
Date: 14 Jan 2004 11:12:12 -0800
Local: Wed, Jan 14 2004 2:12 pm
Subject: Re: Name for the set of characters legal in identifiers

wallacethinmi...@eircom.net (Russell Wallace) writes:
> What I'm actually doing is designing a new language that's intended to
> share Lisp's property of allowing characters like + and - in symbols
> (though not the feature of also allowing things like brackets in
> symbols if you ask nicely)

I think you're still missing the point.  As Erik explained, _all_ characters
are valid in a Lisp symbol name.

You seem to be trying to find the set of characters that don't require
escaping in order to use them in symbol names.  This is really a question about
the Lisp reader.  Basically, things will get turned into symbols if they don't
parse as some other kind of thing.

I think you're mistaken to assume there is some subset of characters in CL that
does what you want.  Otherwise, what do you think of this:

        Lisp> (type-of '123)
        FIXNUM
        Lisp> (type-of '123d0)
        DOUBLE-FLOAT
        Lisp> (type-of 'd1230)
        SYMBOL
        Lisp> (type-of '123j0)
        SYMBOL

If your concern is what you can type to the reader, to result in a symbol,
the answer is not simply a subset of characters.  The syntax of those
characters matters a lot as well.  Are numerals in your set?  By themselves,
without escaping, the reader will turn them into numbers, not symbols.
How about the letter "d", along with some numerals?  Depends where in the
sequence it appears.

All of the sequences above, if escaped, can be the names of symbols.  If not
escaped, then whether they become symbols or not when passed through the
reader is _not_ a simple matter of character subsets; it's a matter of
fallthrough in a series of parse attempts.

(And yes, I'm sure you can find a sufficiently small subset of characters, such
that any sequence from the subset will parse only as a symbol.  But that set
is much _smaller_ than alphanumeric, whereas you were clearly looking for
a subset of characters larger than that, e.g. including punctuation.)

        -- Don
___________________________________________________________________________ ____
Don Geddis                  http://don.geddis.org/               d...@geddis.org
Underachievement:  The tallest blade of grass is the first to be cut by the
lawnmower.  -- Despair.com


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Russell Wallace  
View profile  
 More options Jan 14 2004, 5:16 pm
Newsgroups: comp.lang.lisp
From: wallacethinmi...@eircom.net (Russell Wallace)
Date: Wed, 14 Jan 2004 22:18:47 GMT
Local: Wed, Jan 14 2004 5:18 pm
Subject: Re: Name for the set of characters legal in identifiers
On 14 Jan 2004 11:37:18 -0800, t...@famine.OCF.Berkeley.EDU (Thomas F.

Burdick) wrote:
>wallacethinmi...@eircom.net (Russell Wallace) writes:

>> What I'm actually doing is designing a new language that's intended to
>> share Lisp's property of allowing characters like + and - in symbols
>> (though not the feature of also allowing things like brackets in
>> symbols if you ask nicely)

>So you won't be having first-class symbols?

Right.

>I'd be pretty appalled if
>I couldn't give make-symbol any arbitrary string.

Well, in Common Lisp you'd probably be right. Arete (provisional name
for my new language) is designed differently - symbols are only used
for lexically scoped name-value mappings; strings do most of the other
things you use symbols for in Lisp. (For example, 'FOO is just
syntactic sugar for "FOO", it's not a symbol.)

--
"Sore wa himitsu desu."
To reply by email, remove
the small snack from address.
http://www.esatclear.ie/~rwallace


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Russell Wallace  
View profile  
 More options Jan 14 2004, 5:18 pm
Newsgroups: comp.lang.lisp
From: wallacethinmi...@eircom.net (Russell Wallace)
Date: Wed, 14 Jan 2004 22:21:12 GMT
Local: Wed, Jan 14 2004 5:21 pm
Subject: Re: Name for the set of characters legal in identifiers
On 14 Jan 2004 11:12:12 -0800, Don Geddis <d...@geddis.org> wrote:

>I think you're still missing the point.  As Erik explained, _all_ characters
>are valid in a Lisp symbol name.

No, that's fine, I understand that - my question wasn't about Lisp,
but about English terminology. I gather from Erik's explanation that
the answer is "Lisp doesn't regard any such set as special enough to
merit a short name", though, so I'll just make up one myself,
something like "ordinary characters".

--
"Sore wa himitsu desu."
To reply by email, remove
the small snack from address.
http://www.esatclear.ie/~rwallace


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Marc Spitzer  
View profile  
 More options Jan 14 2004, 5:27 pm
Newsgroups: comp.lang.lisp
From: Marc Spitzer <mspit...@optonline.net>
Date: Wed, 14 Jan 2004 22:27:37 GMT
Local: Wed, Jan 14 2004 5:27 pm
Subject: Re: Name for the set of characters legal in identifiers

Erik Naggum <e...@naggum.no> writes:

Glad to see you here again,

marc


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Pascal Costanza  
View profile  
 More options Jan 14 2004, 5:49 pm
Newsgroups: comp.lang.lisp
From: Pascal Costanza <costa...@web.de>
Date: Wed, 14 Jan 2004 23:49:02 +0100
Local: Wed, Jan 14 2004 5:49 pm
Subject: Re: Name for the set of characters legal in identifiers

Russell Wallace wrote:
> What I'm actually doing is designing a new language that's intended to
> share Lisp's property of allowing characters like + and - in symbols
> (though not the feature of also allowing things like brackets in
> symbols if you ask nicely), and I found when thinking about the syntax
> I was making heavy use of a concept I didn't have a name for, which
> rather bugged me; Lisp is one of the very few languages which allow
> non-alphanumeric characters in symbols, so I was wondering if it had a
> name for the concept.

I don't know any language that has a name for this concept. Instead, you
will find grammars for most languages, in BNF notation or something
along these lines, that define what characters are accepted as part of
identifiers. Chapter 2.2 in the HyperSpec is pretty close to what other
languages do in this regard, for example.

When defining a new language, it's probably a good idea to define such a
grammar at a certain stage anyway, and try to convince yourself that
it's an LL(1) grammar. Minimizing the lookahead that's needed for
parsing a program source is likely to improve the programmer's
understanding of the language.

As a result you will get a single definitive point to refer to when
someone wants to know what characters are accepted. That's probably
better than inventing a term for this concept. Later on you can just use
terms like "identifier" or "symbol", and it's clear from the grammar
what is meant.

Further note that the idea to include characters like + and - in
identifiers is IMHO only a good idea in prefix and probably postfix
languages. In infix languages, it's very likely to be confusing when a+b
and a + b mean different things. (If your language is not an infix
language, then just forget this remark. ;)

Pascal

--
Tyler: "How's that working out for you?"
Jack: "Great."
Tyler: "Keep it up, then."


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Russell Wallace  
View profile  
 More options Jan 14 2004, 5:57 pm
Newsgroups: comp.lang.lisp
From: wallacethinmi...@eircom.net (Russell Wallace)
Date: Wed, 14 Jan 2004 22:59:15 GMT
Local: Wed, Jan 14 2004 5:59 pm
Subject: Re: Name for the set of characters legal in identifiers
On Wed, 14 Jan 2004 23:49:02 +0100, Pascal Costanza <costa...@web.de>
wrote:

>When defining a new language, it's probably a good idea to define such a
>grammar at a certain stage anyway, and try to convince yourself that
>it's an LL(1) grammar. Minimizing the lookahead that's needed for
>parsing a program source is likely to improve the programmer's
>understanding of the language.

*nod-nod* I agree completely. I've the outline of a BNF grammar
sketched in my head, and I'm pretty sure it's LL(1). Simple grammer is
good ^.^

>Further note that the idea to include characters like + and - in
>identifiers is IMHO only a good idea in prefix and probably postfix
>languages. In infix languages, it's very likely to be confusing when a+b
>and a + b mean different things. (If your language is not an infix
>language, then just forget this remark. ;)

It is an infix language, and I agree that's a downside. I just think
it's very heavily outweighed by the ability to write multiword
identifiers with dashes instead of mixed case.

--
"Sore wa himitsu desu."
To reply by email, remove
the small snack from address.
http://www.esatclear.ie/~rwallace


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
james anderson  
View profile  
 More options Jan 14 2004, 5:59 pm
Newsgroups: comp.lang.lisp
From: james anderson <james.ander...@setf.de>
Date: Thu, 15 Jan 2004 00:04:39 +0100
Local: Wed, Jan 14 2004 6:04 pm
Subject: Re: Name for the set of characters legal in identifiers

i had thought that circumstance was specified to signal an error. there was a
different version, which printed a bit too much to post, which noted and
printed everything - exactly because the result was a surprise, which neither
signalled an error, nor did it demonstrate the length-1-symbol-name behaviour.

>       [The posted code] did not test
>   the length or any other property of the symbol-name of the returned
>   symbol, only that it did not error.  The secondary value returned
>   from READ-FROM-STRING should be educational.

it was always 3.

...


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Erik Naggum  
View profile  
 More options Jan 14 2004, 11:00 pm
Newsgroups: comp.lang.lisp
From: Erik Naggum <e...@naggum.no>
Date: 15 Jan 2004 04:00:22 +0000
Local: Wed, Jan 14 2004 11:00 pm
Subject: Re: Name for the set of characters legal in identifiers
* Erik Naggum

> But (read-from-string "a b") will return a symbol, namely A, when
> the constituent trait of the space is /invalid/.

* james anderson
| i had thought that circumstance was specified to signal an error.

  Hm.  This appears to be unexplored territory.  You deserve credit
  for pointing to the map and the real world and urging me to take a
  closer look at both.

  We have the following situation: A character whose syntax type is
  /constituent/ is used to set the syntax type of a character whose
  previous syntax type was /whitespace/, but this means that the
  constituent trait of that character remains /invalid/, which makes
  the syntax type /invalid/.  According to the specification, such a
  character can never occur in the input except under the control of a
  single escape character, so (read-from-string "a b") should indeed
  signal an error, as per 2.1.4.3.  (In case anyone else wonders, the
  multiple escape mechanism already forces all characters to have the
  alphabetic trait.)

  I thought I caught an obvious oversight in your test, but it would
  have been strong enough to test the hypothesis, were it not for the
  sorry fact that none of the Common Lisp environments I have access
  to signal an error when encountering invalid characters in the input
  stream.

| it was always 3.

  OK, then this is definitely surprising and in clear violation of the
  standard.  You're right that SET-SYNTAX-FROM-CHAR should not clobber
  the constituent trait for any character, not just the package marker.

  Where is that annoying conformance test guy who stresses the useless
  corners and boundary conditions of the standard when you need him?

--
Erik Naggum | Oslo, Norway

Act from reason, and failure makes you rethink and study harder.
Act from faith, and failure makes you blame someone and push harder.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Christophe Rhodes  
View profile  
 More options Jan 15 2004, 3:28 am
Newsgroups: comp.lang.lisp
From: Christophe Rhodes <cs...@cam.ac.uk>
Date: Thu, 15 Jan 2004 08:28:30 +0000
Local: Thurs, Jan 15 2004 3:28 am
Subject: Re: Name for the set of characters legal in identifiers

Erik Naggum <e...@naggum.no> writes:
>   Where is that annoying conformance test guy who stresses the useless
>   corners and boundary conditions of the standard when you need him?

Since he may not respond to that description, I'll just say that
Paul's tests are currently in progress up to chapter 21 (Streams), so
it shouldn't be too long before chapter 23 (Reader) is breached.

Christophe
--
http://www-jcsu.jesus.cam.ac.uk/~csr21/       +44 1223 510 299/+44 7729 383 757
(set-pprint-dispatch 'number (lambda (s o) (declare (special b)) (format s b)))
(defvar b "~&Just another Lisp hacker~%")    (pprint #36rJesusCollegeCambridge)


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Messages 1 - 25 of 56   Newer >
« Back to Discussions « Newer topic     Older topic »