returns false, but
string is wordchar "_"
and
string is punct "-"
return true.
Tcl string man says wordchar is "Any Unicode word character. That is any
alphanumeric character, and any Unicode connector punctuation characters
(e.g. underscore)."
The Unicode standard indicates all forms of the dash (aka minus, hyphen,
en-dash ...) are in the Unicode punctuation group. There is no mention of
"connectors" as a special class.
Why isn't "dash" a wordchar?
Bob
There is, for some definiton of "special class". See
<http://www.unicode.org/reports/tr18/> and search for "connector"
(there are 3 matches, all of which are significant). Also see
<http://www.fileformat.info/info/unicode/category/Pc/list.htm> for the
list of 10 characters that fall in the "Punctuation, Connector" category.
> Why isn't "dash" a wordchar?
Ummmm, because Unicode says it's not? 8-)
One explanation is given by the third match for "connector" in
<http://www.unicode.org/reports/tr18/>, ie. that underscore and related
"connector punctuation" characters are generally used as connectors for
identifiers in many (most?) programming languages. It just so happens
that Tcl is *not* one of them, but we can't all be conformists. 8-)
While we're on this subject, can the Tcl Core Unicode deity
(Jeff?) confirm whether "Unicode word character" includes all characters
under the Mark category umbrella, as listed against the "word" property
in Annex C of <http://www.unicode.org/reports/tr18/>? Not that I'm in
a position to use any of those particular characters, just curious. 8-)
- Adrian
> On 2006-03-25, Bob Binder <nos...@domain.com> wrote:
> "connector punctuation" characters are generally used as connectors for
> identifiers in many (most?) programming languages. It just so happens
> that Tcl is *not* one of them, but we can't all be conformists. 8-)
On the contrary! While any character may be used as part of a
variable or command name, different characters interact with the
syntax in different ways. The underscore is unusual in that it
behaves just like an alphanumeric.
% puts $a.foo
Yes.foo
% puts $a,foo
Yes,foo
% puts $a*foo
Yes*foo
% puts $a@foo
Yes@foo
% puts $a_foo
can't read "a_foo": no such variable
--
Donald Arseneau as...@triumf.ca
Just so it's clear, I wrote the following nonsense, not Bob. 8-)
>> "connector punctuation" characters are generally used as connectors for
>> identifiers in many (most?) programming languages. It just so happens
>> that Tcl is *not* one of them, but we can't all be conformists. 8-)
>
> On the contrary! While any character may be used as part of a
> variable or command name, different characters interact with the
> syntax in different ways. The underscore is unusual in that it
> behaves just like an alphanumeric.
Sorry. You're right that it's completely nonsensical because I screwed
up my edit. The above originally read:
"connector punctuation" characters are generally used as connectors for
identifiers in many (most?) programming languages. It just so happens
that Tcl is *not* one of them (in that it's not so restrictive, but
allows just about any character in an identifier with careful bracing),
but we can't all be conformists. 8-)
but in a fit of rapid keyboard-pounding, I accidentally deleted the
entire parenthetical comment. Guess which paren-heavy language I've
been dabbling with lately. (It rhymes with "beam". 8-)
- Adrian
PL/1? ;-)
Donal.
Now, having spent some time in the UK recently, I am
trying to remember which regional dialect would have
those rhyming. maybe welsh ;)
Bruce
Did you mean "PL/I" ?
Bob
Although this behavior is apparently consistent with the cited Unicode
standard, it is inconvenient. I'd like a built-in string class that
recognizes all valid Tcl names, including option identifiers with their
leading dash, the dollar sign, etc.
Bob
"Adrian Ho" <t...@03s.net> wrote in message
news:7bnif3-...@rover.03s.net...
If that is what you want then it IS convenient as the dash is not valid
for $ substitution. Hence for your usage the dash is not a valid tcl
variable name*.
% set foo-bar "test"
% puts $foo-bar
can't read "foo": no such variable
% set -bat "test2"
%puts $-bat
$-bat
* Note: If you consider the dash as valid variable name in Tcl (which
technically it is) then the function to check for valid Tcl variable
names is:
proc validVarname {name} {return 1}
since the [set] command does not restrict any symbols which can be used
as variable name. You can even use the null character \0 or the symbol
255(0xff) as variable names:
% set \000\017 "test3"
% set \000\017
test3
My favourite is of course the empty variable name which is exactly zero
bytes long:
% set {} "test4"
% set {}
test4
Similarly, there is actually no restriction of what symbols can be used
as characters in proc names.
If by "valid Tcl names" you mean "valid Tcl identifiers", then pretty
much *all* characters (including whitespace) are legal in one (some form
of escaping may of course be necessary in certain cases).
Perhaps if you could describe the problem you're actually trying to
solve, we may be able to suggest a different approach.
- Adrian
Them pesky Romans!
"IV, stat!"
"Four what?"
- Adrian
> If by "valid Tcl names" you mean "valid Tcl identifiers", then pretty
> much *all* characters (including whitespace) are legal in one (some form
> of escaping may of course be necessary in certain cases).
Exceptions: a pair of parenthesis "(...)", whose closing
one is right at the end of the "varname"
and two subsequent colons "::"
may induce some meta-meaning.
These *can* be contained in valid varnames, but not arbitrarily.
e.g.
"abc)))(((())))(((" is a valid (non-array) varname
"abc)))(((())))((()" denotes an arrayelement "((())))(((" of
an array named "abc)))"