Definition of ident?

1 view
Skip to first unread message

Gregory Woodhouse

unread,
Mar 28, 2011, 2:44:12 PM3/28/11
to hardhats
Okay, I have a question about he standard. What is the definition of the ident character class? More importantly, what is the intent? If you were to expand the supported character sets to include ISO 8859-* or Unicode, how would you define it?

Sent from my iPhone

David Whitten

unread,
Mar 28, 2011, 3:03:49 PM3/28/11
to hard...@googlegroups.com
According to the standard 1995 Standard online :
  ident::= The ASCII/M codes 65–90 and 97–122 ('A'-'Z' and 'a'-'z') are ident characters, all other characters in the range 0–127 are not ident characters. Additional characters, with codes greater than 127, may be defined as ident through the algorithm specified in ^$Character(charsetexpr,"IDENT")

(this definition is found at:
   http://71.174.62.16/Demo/AnnoStd?Frame=Main&Edition=1995&Page=a106002#Def_0007
)

There is also a page that says:

Valid name characters:

^$Character( charsetexpr , expr V "IDENT" ) = expr V algoref

This node specifies the identification algorithm used to determine which characters in a charset are valid for use in names (i.e. is a character in the set ident).

The ident truth-value truth, of a character char using an identification algorithm ident, may be evaluated by executing the expression: ("S truth="_ident_"($ASCII(char))"). When truth is "true", char is an ident; when truth is "false", char is not an ident. Note that for $ASCII(char) values less than 128, 65–90 and 97–122 are required to be "true" and all other values less than 128 are required to be "false". If the identification algorithm node is undefined, or is the empty string, then it will return "false" for all $ASCII(char) greater than 127; values less than 128 will be returned as indicated.


As to the expansion to other characters, it would depend upon the Character Set Profile (the charsetexpr metalanguage expression above)

The 1995 standard says:

7.1.3.1 ^$Character

^$C[HARACTER] ( charsetexpr )

    charsetexpr::= expr V charset

^$Character provides information regarding the available Character Set Profiles on a system, such as collation order and pattern code definitions.

When and only when a Character Set Profile identified by charset exist, ^$Character(charset) has a value; all nonempty string values are reserved for future extension of the standard.

Data manipulation and the execution of commands within a process are performed in the context of the process charset. (See 7.1.3.4 ^$Job)

Gregory Woodhouse

unread,
Mar 28, 2011, 3:42:00 PM3/28/11
to hard...@googlegroups.com

On Mar 28, 2011, at 12:03 PM, David Whitten wrote:

According to the standard 1995 Standard online :
  ident::= The ASCII/M codes 65–90 and 97–122 ('A'-'Z' and 'a'-'z') are ident characters, all other characters in the range 0–127 are not ident characters. Additional characters, with codes greater than 127, may be defined as ident through the algorithm specified in ^$Character(charsetexpr,"IDENT")

(this definition is found at:
   http://71.174.62.16/Demo/AnnoStd?Frame=Main&Edition=1995&Page=a106002#Def_0007
)

I saw that, and in my toy lexer, I define ident to be equivalent to alphabetic, Reading ^$C is a problem for [f]lex because if the character set definition changes, you have to recompile the lexer.




"Those who are enamored of practice
without theory are like a pilot who goes
into a ship without rudder or compass."
--Leonardo da Vinci (1452-1519)

gregwo...@me.com
http://www.grgwoodhousephoto.com




Reply all
Reply to author
Forward
0 new messages