Unclear definition of identifiers/numbers

23 views

Daphne Preston-Kendal

Dec 15, 2022, 12:37:01 PM12/15/22
Hello,

Ondřej Majerech and I discovered together in #scheme that sections 2.1 and 7.1.1 of the R7RS small report are inconsistent with each other in what they define to be identifiers vs. numbers.

Section 2.1 says:

> An identifier is any sequence of letters, digits, and “extended identifier characters” provided that it does not have a prefix which is a valid number.

Section 7.1.1 defines this as a formal grammar, but says:

> Note that +i, -i and ⟨infnan⟩ below are exceptions to the ⟨peculiar identifier⟩ rule; they are parsed as numbers, not identifiers.

(⟨peculiar identifier⟩s are identifiers which start with a +, -, or ., which are not allowed to be followed by a digit if they are going to be interpreted as identifiers rather than numbers; ⟨infnan⟩ is the constants +inf.0 | -inf.0 | +nan.0 | -nan.0.)

Section 7.1.1’s rule as stated (that only the exact tokens +i, -i, +inf.0, -inf.0, +nan.0, -nan.0 are numbers) means that, for example, +ia, +ib, +ic etc. are identifiers; according to section 2.1, they’re not, because they start with the valid numbers +i. Section 7.1.1 also implies this again by exceptio probat regulam: it wouldn’t need to explicitly state that +inf.0 and -inf.0 are not identifiers if anything that starts with +i or -i is not an identifier.

Section 2.1’s rule also prevents use of the common Lisp (pun intended) convention for constants of variable names stropped by + signs if the constant’s name starts with an i. (Thanks to John Cowan for pointing this out.)

However, only Section 2.1’s rule can explain the (probably correct) parsing of +inf.0i as the imaginary infinity. Under section 7.1.1’s rule, this would have to be written 0+inf.0i.

Possible resolutions are:

• amend section 7.1.1 to reflect section 2.1’s rule: ‘Note that tokens that begin with +i, -i, +nan.0, and -nan.0 are exceptions to the ⟨peculiar identifier⟩ rule; they are parsed as numbers, not identifiers.’ – in my proposed wording, the reference to ⟨infnan⟩ is eliminated to avoid the confusion where the same exception would cover two cases

• amend section 2.1 to reflect section 7.1.1’s rule, e.g.: ‘An identifier is any sequence of letters, digits, and “extended identifier characters” provided that it is not a valid number.’ (This change also causes problems because it makes e.g. a token which starts with a digit but cannot be parsed as a number an identifier, against the clear intention of the original versions of both sections.)

• amend both in some way such that +inf.0i and +incredibly-contrived-constant-name+ work as expected

Daphne

Daphne Preston-Kendal

Feb 12, 2024, 6:03:26 AMFeb 12
Hello again,

In WG2 we are approaching the time (some time in the next few months) when we will decide on the lexical syntax for identifiers and numbers, and WG1 has still not resolved this issue.

If WG1 does not address this inconsistency by an erratum, WG2 will discuss and resolve the issue itself, with an appropriate compatibility note in the specification.

As an informal and very incomplete test, I evaluated '+inf.0a in the R7RS small implementations I know of (explicitly activating R7RS mode where necessary).
• '+inf.0a is a parse error in Chibi, Mosh
• '+inf.0a returns a symbol in Gambit (and by extension Gerbil), Gauche, Guile, Loko, MIT/GNU Scheme, Picrin, STklos

I was unable to get Sagittarius and Ypsilon to start (this is probably a problem with qemu — I didn’t spend very long on this).

If WG1 would like a more extensive survey of how implementations treat the affected lexical syntax, I would be happy to do one.

Daphne
> --
> You received this message because you are subscribed to the Google Groups "scheme-reports-wg1" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to scheme-reports-...@googlegroups.com.

Daphne Preston-Kendal

Feb 12, 2024, 12:08:29 PMFeb 12
On 12 Feb 2024, at 12:03, Daphne Preston-Kendal <d...@nonceword.org> wrote:

> As an informal and very incomplete test, I evaluated '+inf.0a in the R7RS small implementations I know of (explicitly activating R7RS mode where necessary).
> • '+inf.0a is a parse error in Chibi, Mosh
> • '+inf.0a returns a symbol in Gambit (and by extension Gerbil), Gauche, Guile, Loko, MIT/GNU Scheme, Picrin, STklos
>
> I was unable to get Sagittarius and Ypsilon to start (this is probably a problem with qemu — I didn’t spend very long on this).

Update: Jason Hemann installed Sagittarius for me and confirmed that it reads +inf0.a as an identifier. Thank you, Jason!

Daphne

Arthur A. Gleckler

Feb 12, 2024, 3:07:08 PMFeb 12
On Mon, Feb 12, 2024 at 3:03 AM Daphne Preston-Kendal <d...@nonceword.org> wrote:

I was unable to get Sagittarius and Ypsilon to start (this is probably a problem with qemu — I didn’t spend very long on this).

There are Docker images for many Scheme implementations here.  That makes it much easier to try experiments like this.

Daphne Preston-Kendal

Feb 12, 2024, 3:40:10 PMFeb 12
It is precisely these which did not work — since I am on an Aarch64 box, these AMD64 images have to run under emulation, which doesn’t work with all implementations

Daphnes

Arthur A. Gleckler

Feb 12, 2024, 3:58:01 PMFeb 12
On Thu, Dec 15, 2022 at 9:37 AM Daphne Preston-Kendal <d...@nonceword.org> wrote:

• amend section 2.1 to reflect section 7.1.1’s rule, e.g.: ‘An identifier is any sequence of letters, digits, and “extended identifier characters” provided that it is not a valid number.’ (This change also causes problems because it makes e.g. a token which starts with a digit but cannot be parsed as a number an identifier, against the clear intention of the original versions of both sections.)

I'm inclined to support this proposal. I've always liked MIT Scheme's `1+` and `-1+`, which are procedures which increment and decrement by one, respectively.  There are other ways to make them legal, but this rule is the simplest to state.  We could then simplify the grammar for identifier, dropping the whole peculiar identifier subtree.

Alex Shinn

Feb 12, 2024, 5:04:57 PMFeb 12
That's a rather large change and loses the spirit of the original rule, which is to leave room for implementation-specific extensions to the number syntax (e.g. octonions, algebraic numbers, Conway arrow notation, unit suffixes, etc).

--
Alex

--
You received this message because you are subscribed to the Google Groups "scheme-reports-wg1" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scheme-reports-...@googlegroups.com.

Arthur A. Gleckler

Feb 12, 2024, 6:25:42 PMFeb 12
to scheme-reports-wg1
On Mon, Feb 12, 2024, 2:04 PM Alex Shinn <alex...@gmail.com> wrote:

That's a rather large change and loses the spirit of the original rule, which is to leave room for implementation-specific extensions to the number syntax (e.g. octonions, algebraic numbers, Conway arrow notation, unit suffixes, etc).

I welcome an alternative that admits 1+ and -1+ and similar identifiers without preventing such extensions to numeric constants.

Aaron Hsu

Feb 12, 2024, 7:39:04 PMFeb 12
to WG1 Scheme Reports

I think this is going to be quite difficult. I don't even know if it would be possible to have a situation in which we can ensure that all valid identifiers don't suddenly become numbers in some implementation where numbers are extended. Is this okay? Maybe it's fine to say that identifiers may suddenly become numbers in some extended implementations? If we do that, maybe there is something that can be done to ensure that at least some subset of identifiers can never become numbers?

Alex Shinn

Feb 12, 2024, 7:45:33 PMFeb 12
2024年2月13日(火) 9:39 Aaron Hsu <arc...@sacrideo.us>:
On Mon, Feb 12, 2024, at 6:25 PM, Arthur A. Gleckler wrote:
On Mon, Feb 12, 2024, 2:04 PM Alex Shinn <alex...@gmail.com> wrote:

That's a rather large change and loses the spirit of the original rule, which is to leave room for implementation-specific extensions to the number syntax (e.g. octonions, algebraic numbers, Conway arrow notation, unit suffixes, etc).

I welcome an alternative that admits 1+ and -1+ and similar identifiers without preventing such extensions to numeric constants.

I think this is going to be quite difficult. I don't even know if it would be possible to have a situation in which we can ensure that all valid identifiers don't suddenly become numbers in some implementation where numbers are extended. Is this okay?

The existing rule already did this.  Extensions to numbers must begin with an existing number, which the rule prevents from having been an identifier.

Maybe it's fine to say that identifiers may suddenly become numbers in some extended implementations? If we do that, maybe there is something that can be done to ensure that at least some subset of identifiers can never become numbers?

--
You received this message because you are subscribed to the Google Groups "scheme-reports-wg1" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scheme-reports-...@googlegroups.com.

Daphne Preston-Kendal

Feb 13, 2024, 6:08:48 AMFeb 13
Neither of the current conflicting rules allows these identifiers. An implementation is always free to extend the numeric syntax (most commonly afaik by applying the rule that anything that isn’t a valid number is an identifier), but the standard, I think, must stay relatively conservative. Especially in this case I think WG1 should, on principle, at most try to find a compromise between the two existing mutually inconsistent specifications, rather than rework its decision from ten years ago entirely.

As a non-WG1 member I would say my own preference is for the 7.1.1 rule.

Daphne

Alex Shinn

Feb 22, 2024, 9:36:28 PMFeb 22
Regarding +inf.0i, there is already a specific case handling this:

<complex R> -> <infnan> i

Preserving the spirit of the change as voted on (the grammar was written later), I think +ia etc should be disallowed.
An extension allowing unit suffixes could interpret this as one imaginary Ampere.
Therefore I would prefer to update the 7.1.1 note to say:

Note that tokens that begin with +i, -i, +nan.0, and -nan.0 are exceptions to the ⟨peculiar identifier⟩ rule and are not identifiers.

leaving out Daphne's phrasing "they are parsed as numbers" because they aren't necessarily numbers either.

I think this is worth an errata.  Any objections from the WG1 members?

--
Alex

--
You received this message because you are subscribed to the Google Groups "scheme-reports-wg1" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scheme-reports-...@googlegroups.com.