Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Re: Invalid identifier claimed to be valid by docs (methinks)

10 views
Skip to first unread message

Ian Kelly

unread,
Sep 23, 2012, 6:57:53 PM9/23/12
to Python
On Sun, Sep 23, 2012 at 4:24 PM, Joshua Landau
<joshua.l...@gmail.com> wrote:
> The docs describe identifiers to have this grammar:
>
> identifier ::= xid_start xid_continue*
> id_start ::= <all characters in general categories Lu, Ll, Lt, Lm, Lo,
> Nl, the underscore, and characters with the Other_ID_Start property>
> id_continue ::= <all characters in id_start, plus characters in the
> categories Mn, Mc, Nd, Pc and others with the Other_ID_Continue property>
> xid_start ::= <all characters in id_start whose NFKC normalization is in
> "id_start xid_continue*">
> xid_continue ::= <all characters in id_continue whose NFKC normalization is
> in "id_continue*">
>
> So I would assume that
> exec("a{} = None".format(char))
> would be valid if
> unicodedata.normalize("NFKC", char) == "1"
> as
> exec("a1 = None")
> is valid.
>
> BUT "a¹ = None" is not valid*.
>
> *a<superscript 1>, accessible through <ALT-GR>+1 if your keyboard's set up
> to do that stuff.
>
> Thank you for your times.

Or if you don't have a keyboard for that, you can do the same thing via:

exec("x\u00b9 = None") # U+00B9 is superscript 1

On the other hand, this does work:

exec("x\u2071 = None") # U+2071 is superscript i

So it seems to be only an issue with superscript and subscript digits.
Looks like a compiler bug to me.

Terry Reedy

unread,
Sep 23, 2012, 10:42:35 PM9/23/12
to pytho...@python.org
On 9/23/2012 6:57 PM, Ian Kelly wrote:
> On Sun, Sep 23, 2012 at 4:24 PM, Joshua Landau
> <joshua.l...@gmail.com> wrote:
>> The docs describe identifiers to have this grammar:
>>
>> identifier ::= xid_start xid_continue*
>> id_start ::= <all characters in general categories Lu, Ll, Lt, Lm, Lo,
>> Nl, the underscore, and characters with the Other_ID_Start property>
>> id_continue ::= <all characters in id_start, plus characters in the
>> categories Mn, Mc, Nd, Pc and others with the Other_ID_Continue property>
>> xid_start ::= <all characters in id_start whose NFKC normalization is in
>> "id_start xid_continue*">

xid_start is a subset of id_start

>> xid_continue ::= <all characters in id_continue whose NFKC normalization is
>> in "id_continue*">

xid_continue is a subset of id_continue.

>> So I would assume that
>> exec("a{} = None".format(char))
>> would be valid if
>> unicodedata.normalize("NFKC", char) == "1"

Read more carefully the definition of xid_continue. The un-normalized
character must also be in id_continue.

>> as
>> exec("a1 = None")
>> is valid.
>>
>> BUT "a¹ = None" is not valid*.

>>> ud.category("\u00b9")
'No'

Category No is *not* in id_continue, and therefore not in xid_continue.

> exec("x\u00b9 = None") # U+00B9 is superscript 1
>
> On the other hand, this does work:
>
> exec("x\u2071 = None") # U+2071 is superscript i
>
> So it seems to be only an issue with superscript and subscript digits.
> Looks like a compiler bug to me.

The problem, if there were one, would be in the tokenizer that finds
identifiers. However,

>>> exec("x\u00b9 = None")
...
x¹ = None
^
SyntaxError: invalid character in identifier

this is correct.

--
Terry Jan Reedy


0 new messages