Probable bug in R7RS syntax-rules

20 views
Skip to first unread message

Chris Hanson

unread,
Dec 4, 2022, 1:23:21 AM12/4/22
to scheme-reports-wg1
I’ve been doing a fresh implementation of syntax-rules in MIT/GNU Scheme and
noticed a problem that was likely unintended.

Specifically, a pattern of the form

(<pat> ... <pat> <ellipsis> <pat> ... . <pat>)

is likely more general than was intended.

Unstated, but I believe implicit in the report, are two properties of syntax-
rules patterns:

1. No backtracking is required in order to match the pattern; and
2. there is at most one valid match.

However, a pattern like

(<pat> . <id>)

is effectively the same as

(<pat> <id> <ellipsis>)

which means that

(<pat> <ellipsis> . <id>)

is effectively the same as

(<pat> <ellipsis> <id> <ellipsis>)

the latter being disallowed by the report. This latter form can have multiple
possible matches, and could require backtracking during the match.

I am considering what to do in our implementation since I want to preserve the
above properties. So far the best thing I’ve come up with is: if the RHS of
the dotted list is an identifier, then no ellipsis is allowed in the LHS of
the pattern. Anything else in the RHS doesn’t cause this problem, i.e. a
literal or a vector, so the LHS ellipsis is allowed in that case.

I am not sure how to fix the text. Assuming I’m correct about the intended
properties, it seems to me that something should be said about this
possibility. Also, perhaps making those properties explicit might be good.

Anyway, I leave the details up to you. I think I’ll implement it as I
suggested above; but if you figure out a better solution I’ll update it later
to conform.


Alex Shinn

unread,
Dec 4, 2022, 8:51:31 AM12/4/22
to scheme-re...@googlegroups.com
The motivation behind

   (<pat> <ellipsis> . <id>)

was to allow an ellipsis in improper list patterns.  The current implementation
in chibi doesn't require the list to be improper, but does match greedily without
backtracking.

Either requiring the matching list for these patterns to be improper (in contrast
to the non-ellipsis case where proper lists must be allowed), or explicitly specifying
the greedy matching should be a sufficient clarification.

--
Alex

--
You received this message because you are subscribed to the Google Groups "scheme-reports-wg1" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scheme-reports-...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/scheme-reports-wg1/20431524.qVr9pxDxrh%40kleph.

Steven Ganz

unread,
Dec 5, 2022, 1:19:57 AM12/5/22
to scheme-reports-wg1
So it seems we have three proposals:

1) Only allow (<pat> <ellipsis> . <id>) to match an improper list.
2) Match (<pat> <ellipsis> . <id>) greedily.  So presumably <id> will bind to whatever remains after the last successful match of <pat>.
3) Prohibit <ellipsis> on the LHS if the RHS of an improper list pattern is <id>.

Consider each proposal under a couple of examples.

Matching (7 ... . x) against '(7 7 7 5) "should" succeed, binding x = (5)
(1) match fails, as it does against '(7 7 7 . (5)), although matching against '(7 7 7 . 5) succeeds, binding x = 5
(2) match succeeds, binding x = (5).  
(3) error

Matching ((7 ... . x) x) against '((7 7 7 5) (7 5)) "should" succeed, binding x = (7 5)
(1) match fails
(2) match fails
(3) error

(1) is at least consistent in failing in both cases, but I'd consider failing to be unexpected behavior.  And it seems odd at best for the behavior to be so different for those two similar expressions.
(2) succeeds for the simple example but fails for the more complex one, due to refraining from backtracking.
Between these three, other things being equal, I prefer Chris's solution of calling them both errors.  Otherwise, I'd prefer (1) over (2).

To what extent any of these fall under what's solvable through errata is another matter.  Adding an error, as in (3) would break existing instances of the first example under implementations that follow (2).  At this point, perhaps best to make the behavior explicitly undefined for (<pat> <ellipsis> . <id>)?  That would allow a fourth option as well, of doing the backtracking, which could make sense with other extensions to patterns.


From: "Alex Shinn" <alex...@gmail.com>
To: "scheme-reports-wg1" <scheme-re...@googlegroups.com>
Sent: Sunday, December 4, 2022 8:51:16 AM
Subject: Re: [scheme-reports-wg1] Probable bug in R7RS syntax-rules

Aaron Hsu

unread,
Dec 5, 2022, 2:43:16 AM12/5/22
to WG1 Scheme Reports
On Sun, Dec 4, 2022, at 1:18 AM, Chris Hanson wrote:
> However, a pattern like
>
> (<pat> . <id>)
>
> is effectively the same as
>
> (<pat> <id> <ellipsis>)

I'm not sure we can say that. At least in the spec, they can match the same inputs, but their resulting bindings are different, obviously. But the difference in how they bind is sufficient for me to feel uncomfortable with this equivalence.

> which means that
>
> (<pat> <ellipsis> . <id>)
>
> is effectively the same as
>
> (<pat> <ellipsis> <id> <ellipsis>)
>
> the latter being disallowed by the report. This latter form can have multiple
> possible matches, and could require backtracking during the match.

The way that I read the report, and which seems consistent with what I generally would see from various implementations is encapsulated in this verbage:

"P is of the form (P1 . . . Pk Pe ⟨ellipsis⟩ Pm+1 . . .
Pn . Px) where E is a list or improper list of n elements, the first k of which match P1 through Pk,
whose next m − k elements each match Pe, whose remaining n−m elements match Pm+1 through Pn, and
whose nth and final cdr matches Px; ..."

The key phrase, I think, here is the "nth and final" phrase. I think this is meant to say that this must really be the final CDR of E, not some sub-list of E.

--
Aaron W. Hsu | arc...@sacrideo.us | http://www.sacrideo.us

Aaron Hsu

unread,
Dec 5, 2022, 2:48:01 AM12/5/22
to WG1 Scheme Reports
On Mon, Dec 5, 2022, at 1:19 AM, Steven Ganz wrote:
> 1) Only allow (<pat> <ellipsis> . <id>) to match an improper list.

This clearly deviates from standard practice, and the report, IMO.

> 2) Match (<pat> <ellipsis> . <id>) greedily. So presumably <id> will
> bind to whatever remains after the last successful match of <pat>.

I think this is the way that (most?) implementations implement it (at least, in various implementations I've seen), and I think it matches the language of the report.

> 3) Prohibit <ellipsis> on the LHS if the RHS of an improper list
> pattern is <id>.

Clearly not the intention, IMO, of the report's design.

> Matching ((7 ... . x) x) against '((7 7 7 5) (7 5)) "should" succeed,
> binding x = (7 5)

I believe this is illegal regardless of the proposals, as you are not supposed to have duplicate pattern variables inside of a pattern. This is an error in the report. The way that I read the report, this shouldn't succeed, but error out, or if you want to implement some sort of backtracking over repeated pattern variables, then x should match the "final" CDR of the pattern, which would either be () or the improper list tail. In either case, the pattern above would fail against the input '((7 7 7 5) (7 5)).

Steven Ganz

unread,
Dec 5, 2022, 3:59:56 AM12/5/22
to scheme-reports-wg1
Ignore the quotes I put on the expressions, as the matching should of course be defined on expressions pre-evaluation.  That makes the distinction with dotted pairs under (1) that much more odd.

And yes, you are right about the duplicate pattern variable being considered an error in the report.  But in that case, why is there concern with (<pat> <ellipsis> . <id>) causing backtracking?  Shouldn't the greedy match then always be correct (although not necessarily uniquely correct)?



From: "Aaron W. Hsu" <arc...@sacrideo.us>
To: "WG1 Scheme Reports" <scheme-re...@googlegroups.com>
Sent: Monday, December 5, 2022 2:47:40 AM

Subject: Re: [scheme-reports-wg1] Probable bug in R7RS syntax-rules
--
You received this message because you are subscribed to the Google Groups "scheme-reports-wg1" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scheme-reports-...@googlegroups.com.


Alex Shinn

unread,
Dec 5, 2022, 4:26:20 AM12/5/22
to scheme-re...@googlegroups.com
On Mon, Dec 5, 2022 at 4:43 PM Aaron Hsu <arc...@sacrideo.us> wrote:
On Sun, Dec 4, 2022, at 1:18 AM, Chris Hanson wrote:
> However, a pattern like
>
>    (<pat> . <id>)
>
> is effectively the same as
>
>    (<pat> <id> <ellipsis>)

I'm not sure we can say that. At least in the spec, they can match the same inputs, but their resulting bindings are different, obviously. But the difference in how they bind is sufficient for me to feel uncomfortable with this equivalence.

Technically they don't match the same inputs either.  Only the former can match an improper list.

--
Alex

Aaron Hsu

unread,
Dec 6, 2022, 12:30:43 AM12/6/22
to WG1 Scheme Reports
On Mon, Dec 5, 2022, at 3:59 AM, Steven Ganz wrote:
> And yes, you are right about the duplicate pattern variable being
> considered an error in the report. But in that case, why is there
> concern with (<pat> <ellipsis> . <id>) causing backtracking? Shouldn't
> the greedy match then always be correct (although not necessarily
> uniquely correct)?

I think the question arises when you draw an equivalency from the following chain:

(<pat> . <id>) ←→ (<pat> <id> <ellipsis>)
=>
(<pat> <ellipsis> . <id>) ←→ (<pat> <ellipsis> <id> <ellipsis>)

But I think the wording of the report indicates that these are not in fact to be treated the same, and that the equivalences above are not necessarily kosher.

I agree that the greedy interpretation seems to be the correct interpretation, but I'd go even further to suggest that the report seems to prescribe the greedy interpretation. The report specifically indicates for an input E of n elements, that the final <id> matches the nth and "final" CDR. This requires a greedy interpretation, I think.

Steven Ganz

unread,
Dec 7, 2022, 9:50:57 PM12/7/22
to scheme-reports-wg1
On Tuesday, December 6, 2022, at 12:30:19 AM, Aaron W. Hsu wrote:
> On Mon, Dec 5, 2022, at 3:59 AM, Steven Ganz wrote:
> > And yes, you are right about the duplicate pattern variable being
> > considered an error in the report.  But in that case, why is there
> > concern with (<pat> <ellipsis> . <id>) causing backtracking?  Shouldn't
> > the greedy match then always be correct (although not necessarily
> > uniquely correct)?

> I think the question arises when you draw an equivalency from the following chain:

> (<pat> . <id>) ←→ (<pat> <id> <ellipsis>)
> =>
> (<pat> <ellipsis> . <id>) ←→ (<pat> <ellipsis> <id> <ellipsis>)

> But I think the wording of the report indicates that these are not in fact to be treated the same, and that the equivalences above are not necessarily kosher.
The last pattern is invalid because it has two ellipses as elements of the same list, right?  But I think the main point is that in any of the four examples, once we've successfully matched <pat> or <pat> <ellipsis>, it seems that nothing that could be fixed by backtracking can go wrong to prevent a match of . <id> or <id> <ellipsis> with the remainder, or for that matter (given no duplicate pattern variables) to prevent matching any pattern context.

> I agree that the greedy interpretation seems to be the correct interpretation, but I'd go even further to suggest that the report seems to prescribe the greedy
> interpretation. The report specifically indicates for an input E of n elements, that the final <id> matches the nth and "final" CDR. This requires a greedy
> interpretation, I think.
Looking at it closely now. the report seems both confusing and wrong there, in using the same variable n for both the list expression length and the index of the last pattern element before the dot, and similarly reusing m.
Fixing this, it could read:
P is of the form (P_1 ... P_{k_1} P_{k_1+1} <ellipsis> P_{k_1+2} ... P_{k_2} . P_{k_2+1}) where E is a list or improper list of n elements, the first k_1 of which match P_1 through P_{k_1}, whose next m-k_1 elements each match P_{k_1+1}, whose remaining n-m elements match P_{k_1+2} through P_{k_2}, and whose nth and final cdr matches P_{k_2+1}; or
It's also possible to maintain P_e and P_x (at the cost of potential ambiguity) with:
P is of the form (P_1 ... P_k P_e <ellipsis> P_{k+1} ... P_r . P_x) where E is a list or improper list of n elements, the first k of which match P_1 through P_k, whose next m-k elements each match P_e, whose remaining n-m elements match P_{k+1} through P_r, and whose nth and final cdr matches P_x; or
Two of the other clauses have similar issues.  BTW, an earlier clause uses "tail" in place of "cdr", 

Anyway, back on topic, I think the "and final" was unintentional and should be removed.  It is the final cdr of the pattern, but not necessarily of the list.
Otherwise, we'd have to say that (x y ... . z) can't match (8 7 8), because the final cdr of the latter is ().
I think we want the final cdr of the pattern to be able to reach into at least a proper list, if not an improper one.
The former only can be allowed with (using the second format):
P is of the form (P_1 ... P_k P_e <ellipsis> P_{k+1} ... P_r . P_x) where E is a list of n or more elements or an improper list of n elements, the first k of which match P_1 through P_k, whose next m-k elements each match P_e, whose remaining n-m elements match P_{k+1} through P_r, and whose nth cdr matches P_x; or
Both can be allowed with:
P is of the form (P_1 ... P_k P_e <ellipsis> P_{k+1} ... P_r . P_x) where E is a list or an improper list of n or more elements, the first k of which match P_1 through P_k, whose next m-k elements each match P_e, whose next l-m elements match P_{k+1} through P_q, and whose lth cdr matches P_x, if l=n and q=r, or (P_{q+1} ... P_r . P_x)), if l < n; or
Alex's comments have been suggesting that this (the final cdr of the pattern reaching into an improper list) isn't allowed, but I don't see any reason to exclude it. 

If there's no "correct" example that can only be matched with backtracking, then there's no reason to introduce an error for (<pat> <ellipsis> . <id>).

That would still leave the question of whether the specification should enforce any of the multiple correct matchings (affecting the resulting bindings).  I've been assuming a particular "greedy" solution of <pat> <ellipsis> matching as much as possible, but "possible" here would need to be carefully defined to allow for saving list elements that could otherwise match for necessary subsequent pattern elements, and we should probably avoid the "greedy" terminology in distinguishing this from any other non-backtracking solutions. 



Aaron Hsu

unread,
Dec 7, 2022, 11:34:34 PM12/7/22
to WG1 Scheme Reports
On Wed, Dec 7, 2022, at 9:50 PM, Steven Ganz wrote:
> On Tuesday, December 6, 2022, at 12:30:19 AM, Aaron W. Hsu wrote:
>> I think the question arises when you draw an equivalency from the following chain:
>>
>> (<pat> . <id>) ←→ (<pat> <id> <ellipsis>)
>> =>
>> (<pat> <ellipsis> . <id>) ←→ (<pat> <ellipsis> <id> <ellipsis>)

> The last pattern is invalid because it has two ellipses as elements of
> the same list, right?

Yes, it is illegal, and that was the point that Chris was making in his original email. If such equivalencies hold, then it leads to an equivalence that is known to be illegal in the report. Chris' conclusion, as I understand it, is that this must imply that the patterns described are buggy. My alternative interpretation is that the equivalencies are in fact not true.

> Looking at it closely now. the report seems both confusing and wrong
> there, in using the same variable *n* for both the list expression
> length and the index of the last pattern element before the dot, and
> similarly reusing *m*.

My argument is that these are not in fact wrong. Assuming that a list has /n/ elements (not including the object in the final CDR), then there are /n/ CDRs, and the use of the same /n/ in both cases is intentional to bound them to the same value.

> Anyway, back on topic, I think the "and final" was unintentional and
> should be removed. It is the final cdr of the pattern, but not
> necessarily of the list.

I think it was intentional and that this is in fact meant to speak of the final CDR of the list, not of the pattern. The combination of the use of /n/ in reference to the length of /E/ as well as the use of the term final is a double emphasis on the position of /P_x/.

> Otherwise, we'd have to say that (x y ... . z) can't match (8 7 8),
> because the final cdr of the latter is ().

It does in fact match (8 7 8) and z is in fact (), as the final CDR. That's how I read it.

> I think we want the final cdr of the pattern to be able to reach into
> at least a proper list, if not an improper one.

Rather, I think the intention was that the final piece would either be () or the improper list's final CDR value. This allows you to match all of the elements of an improper or proper list using the same pattern and distinguishing one from the other reliably.

> That would still leave the question of whether the specification should
> enforce any of the multiple correct matchings (affecting the resulting
> bindings). I've been assuming a particular "greedy" solution of <pat>
> <ellipsis> matching as much as possible, but "possible" here would need
> to be carefully defined to allow for saving list elements that could
> otherwise match for necessary subsequent pattern elements, and we
> should probably avoid the "greedy" terminology in distinguishing this
> from any other non-backtracking solutions.

I think Chris' original assertions that a given pattern should have a single canonical and correct match without backtracking or ambiguity is correct. I don't think the pattern, as defined in the report, is intended to admit ambiguity as to the way in which the pattern is matched, and that the "greedy" match is in fact the one that was intended, with the final CDR matching the final /P_x/ pattern, which, with a proper list, will always be ().

Steven Ganz

unread,
Dec 8, 2022, 12:57:54 AM12/8/22
to scheme-reports-wg1
On Wednesday, December 7, 2022, at 11:34:10 PM< Aaron W. Hsu wrote:
>> Looking at it closely now. the report seems both confusing and wrong
>> there, in using the same variable *n* for both the list expression
>> length and the index of the last pattern element before the dot, and
>> similarly reusing *m*.
>My argument is that these are not in fact wrong. Assuming that a list has /n/ elements (not including the object in the final CDR), then there are /n/ CDRs, and
>the use of the same /n/ in both cases is intentional to bound them to the same value.
Oh, I see, the indeces of the non-repeating and non-tail pattern elements are referring to the list (or improper list) element that they are matching.  I guess I haven't been through that in a while and was thinking that they were referring to the position within the pattern.

>> I think we want the final cdr of the pattern to be able to reach into
>> at least a proper list, if not an improper one.
>Rather, I think the intention was that the final piece would either be () or the improper list's final CDR value. This allows you to match all of the elements of an
>improper or proper list using the same pattern and distinguishing one from the other reliably.
That does indeed seem to work as you describe and provide a non-ambiguous definition of matching.  It doesn't seem consistent with the (/P_1/ /P_2/ ... /P_n/ . /P_{n+1}/) case, though.  There, when matching against a proper list, the tail pattern need not bind an empty list.
So I think we have (1 2 3 . z) matching against (1 2 3 4), binding z to (4), but (1 y ... 3 . z) failing to match against (1 2 3 4), because the z can't bind to (4).  I was thinking that we wanted to allow that.  As you said, it comes down to whether there's a need to distinguish improper lists through patterns (other than by specifying, for example, literal or vector constant patterns to match the cdr).

>> That would still leave the question of whether the specification should
>> enforce any of the multiple correct matchings (affecting the resulting
>> bindings).  I've been assuming a particular "greedy" solution of <pat>
>> <ellipsis> matching as much as possible, but "possible" here would need
>> to be carefully defined to allow for saving list elements that could
>> otherwise match for necessary subsequent pattern elements, and we
>> should probably avoid the "greedy" terminology in distinguishing this
>> from any other non-backtracking solutions.
>I think Chris' original assertions that a given pattern should have a single canonical and correct match without backtracking or ambiguity is correct. I don't think the pattern, as defined in
>the report, is intended to admit ambiguity as to the way in which the pattern is matched, and that the "greedy" match is in fact the one that was intended, with the final CDR matching the
>final /P_x/ pattern, which, with a proper list, will always be ().
Agreed that those are attributes that we want to have.  I was looking to provide them while allowing a greater set of matches.  One could still remove ambiguity by imposing a greedy match, without constraining the cdr match in a way that seems inconsistent with other rules, which seem more intuitive to me.  But yes, I also agree (now) that what's already in the report does meet Chris's constraints.



Aaron Hsu

unread,
Dec 8, 2022, 1:56:24 AM12/8/22
to WG1 Scheme Reports
On Thu, Dec 8, 2022, at 12:57 AM, Steven Ganz wrote:
> So I think we have (1 2 3 . z) matching against (1 2 3 4), binding z to
> (4), but (1 y ... 3 . z) failing to match against (1 2 3 4), because
> the z can't bind to (4). I was thinking that we wanted to allow that.

Yes, I think that the examples you give reflect the current state of the report, and you couldn't get z bound to (4) in the second case. While I could see wanting to allow that, I think that doing so immediately gets you to the problem Chris pointed out, which is that such a possibility introduces ambiguity and a requirement for backtracking at that point; that removes it from serious consideration, I think, as the intended behavior of the report.

Steven Ganz

unread,
Dec 8, 2022, 2:42:15 AM12/8/22
to scheme-reports-wg1
Yes, I was just realizing that my comment about imposing greedy matching while allowing more general matching of the cdr pattern holds for the given (<pat> <ellipsis> . <id>), but not for more general patterns, and in particular not where the cdr pattern is a list pattern with its own ellipsis.  That could be disallowed, or it doesn't seem like much of an imposition to disallow list patterns following a dot altogether.  Do you see any other problematic cases?  The alternatives are to tighten the earlier rule (either change brings up potential compatibility issues) or to live with the inconsistency.



Steven Ganz

unread,
Dec 8, 2022, 10:10:06 AM12/8/22
to scheme-reports-wg1
If, as I suspect, that (and its recursive extension) are the only problematic cases, perhaps the least obtrusive way to safely allow both examples, compatibility-wise, would be to evaluate patterns prior to matching.  Evaluating a pattern would recursively splice list and improper list patterns in the cdr into the surrounding improper list pattern, creating a resulting list or improper list pattern.  If the resulting pattern has multiple ellipses as elements, it's invalid.  The canonical match counts the required pattern elements following the ellipsis in that resulting pattern and saves that many list elements in matching the repeating pattern.  This should satisfy the constraint, as I see it, which is that backtracking is unnecessary, but no match should fail if a successful match could have been found by backtracking (matching fewer list elements against the repeating pattern).  To maintain full consistency with the non-ellipsis rule, I would also suggest allowing an improper list pattern to match against an improper list with more elements.



Aaron Hsu

unread,
Dec 8, 2022, 5:19:15 PM12/8/22
to WG1 Scheme Reports
On Thu, Dec 8, 2022, at 10:10 AM, Steven Ganz wrote:
> If, as I suspect, that (and its recursive extension) are the only
> problematic cases, perhaps the least obtrusive way to safely allow both
> examples, compatibility-wise, would be to evaluate patterns prior to
> matching.

At this point, I've lost what we're talking about. I think the point was to figure out whether there is a bug in the R7RS report, and if so, how to address it. My current understanding is that there is not a bug in the way the report specifies its behavior, but there is perhaps some area for misunderstanding.

What you seem to be proposing is new behaviors that aren't described in the report, or am I missing something? A pattern like (P ... . (X ...)) is already illegal in the report, I believe.

Chris Hanson

unread,
Dec 8, 2022, 10:19:43 PM12/8/22
to scheme-reports-wg1
Following up after viewing the conversation on the web archive of the list.
If you wish me to be involved in the discussion please CC me as I’m not on the
mailing list.

I missed the text "and final cdr" in my reading. That resolves any ambiguity
I see here. It also disallows the case that started me thinking about this,
which was a pattern of the form (a b . c) where c was intended to match the
tail, much like a rest argument in a lambda list. Given the "final" text
that’s not a valid use of the pattern, and I’ll change my implementation.

I would have preferred that the pattern only match a dotted list, but that’s a
minor nit. The only advantage to that is it removes any redundancy between
proper-list patterns and dotted-list patterns.

Two other nits:

1. A wording ambiguity in "The <pattern> in a <syntax-rule> is a list
<pattern>" which doesn’t specify whether the pattern is a proper-list pattern
or a dotted-list pattern. I suppose it could be either, though requiring a
proper-list pattern makes more sense here since dotted lists aren’t allowed as
special forms. It might be nice to clarify what is meant though.

2. The requirement that “Pattern variables [...] followed by one or more
instances of the identifier <ellipsis> are allowed only [...] followed by as
many instances of <ellipsis>.” is insufficient. For example, the pattern

(foo ((a b) <ellipsis>) c <ellipsis>)

could, according to this text, be paired with a template containing

... ((a c) <ellipsis>) b <ellipsis> ...

but this isn’t a reasonable thing to do. The problem is that while a and b
are guaranteed to be the same length, a and c are not. So in a sense there is
a uniqueness to the ellipses that needs to be taken into account. This was
relatively easy to implement, but it’s not obvious to me how to word it.

(I feel like I’m an apprentice to Julie Sussman PPA with all this close
reading and nit picking.)

Best wishes,
Chris



Steven Ganz

unread,
Dec 9, 2022, 1:15:40 AM12/9/22
to scheme-reports-wg1

On Thu, Dec 8, 2022, at 12:57 AM, Steven Ganz wrote:
> So I think we have (1 2 3 . z) matching against (1 2 3 4), binding z to
> (4), but (1 y ... 3 . z) failing to match against (1 2 3 4), because
> the z can't bind to (4).  I was thinking that we wanted to allow that.
On Thu, December 8, 2022, at 5:18:53 PM, Aaron W. Hsu wrote:
>At this point, I've lost what we're talking about. I think the point was to figure out whether there is a bug in the R7RS report, and if so, how to address it. My
>current understanding is that there is not a bug in the way the report specifies its behavior, but there is perhaps some area for misunderstanding.
>
>What you seem to be proposing is new behaviors that aren't described in the report, or am I missing something? A pattern like (P ... . (X ...)) is already illegal in
>the report, I believe.

As far I as I know, an ellipsis as an element in an improper list pattern and also within the cdr pattern is permitted in the report.  If you know that it's prohibited, I'd be curious to see where, but as long as we have the "and final" text in there, it shouldn't matter, because the outer pattern can match independently of how the cdr pattern is matched (so the two ellipses are handled separately).

We're in agreement that we're not required to make any changes, because what's in the report doesn't violate the constraints that we expect to hold.

The reason we might want to is to relieve what I see as a blemish in what is specified by the report, the inconsistency in how the two examples above are handled.  I've found that confusing and I think others are likely to as well.  It can be fixed either by restricting matching in the first example or expanding it in the second.  Again, there should be some discussion of code compatibility before changing either.  Maintaining consistency by expanding matching in the second example has the advantage that the additional matching seems more intuitive and presumably useful (although some may argue otherwise).  That would involve both removing "and final", which would make any nested ellipses problematic, and imposing greediness in some other way, to avoid violating our constraints.  My recent comments were proposing a way of navigating this to expand matching in the second example, I believe without creating any of the problems that we've been confirming that we're currently steering clear of.

At least that's how I see it at the moment.



Steven Ganz

unread,
Dec 9, 2022, 2:37:53 AM12/9/22
to scheme-reports-wg1, c...@chris-hanson.org

On Thu, December 8, 2022, at 10:19:36 PM, Chris Hanson wrote:
> Following up after viewing the conversation on the web archive of the list.  
>If you wish me to be involved in the discussion please CC me as I’m not on the
>mailing list.
Just saw this, so didn't cc you on my last response.  You can find it in the archive.

>I missed the text "and final cdr" in my reading.  That resolves any ambiguity
>I see here.  It also disallows the case that started me thinking about this,
>which was a pattern of the form (a b . c) where c was intended to match the
>tail, much like a rest argument in a lambda list.  Given the "final" text
>that’s not a valid use of the pattern, and I’ll change my implementation.
I believe that IS a valid use of the pattern, as the report stands.  The "final" text is only under the rule for an improper list pattern with an ellipsis.  Without an ellipsis, as above, the report says "/E/ is a list or improper list of /n/ or more elements ...".  In other words it works like the lambda pattern, but more general, which I think is what most people would expect.   Without the ellipsis, there's no need for backtracking, so this shouldn't cause problems, but as I pointed out, the difference seems inconsistent and confusing. 


>1. A wording ambiguity in "The <pattern> in a <syntax-rule> is a list
><pattern>" which doesn’t specify whether the pattern is a proper-list pattern
>or a dotted-list pattern.  I suppose it could be either, though requiring a
>proper-list pattern makes more sense here since dotted lists aren’t allowed as
>special forms.  It might be nice to clarify what is meant though.
Given the comment above, I think an improper list pattern could make sense, but we usually use "list" for "proper list" and make it explicit if an improper list is allowed.  So not sure what that should be.

>2. The requirement that “Pattern variables [...] followed by one or more
>instances of the identifier <ellipsis> are allowed only [...] followed by as
>many instances of <ellipsis>.” is insufficient.
>So in a sense there is
>a uniqueness to the ellipses that needs to be taken into account.  This was
>relatively easy to implement, but it’s not obvious to me how to word it.
After playing with it for a while, it's not coming to me either.  I'd like to add something like "if two pattern variables appear under the same occurrences of <ellipsis> in the template, they must appear under the same occurrences of <ellipsis> in the pattern".  But that doesn't address pattern variables, one under a suffix of the sequence of occurrences of <ellipsis> of the other.

>(I feel like I’m an apprentice to Julie Sussman PPA with all this close
>reading and nit picking.)
Thanks for kicking off the discussion!



Aaron Hsu

unread,
Dec 9, 2022, 3:05:43 AM12/9/22
to WG1 Scheme Reports
On Fri, Dec 9, 2022, at 1:15 AM, Steven Ganz wrote:
> As far I as I know, an ellipsis as an element in an improper list
> pattern and also within the cdr pattern is permitted in the report. If
> you know that it's prohibited, I'd be curious to see where, but as long
> as we have the "and final" text in there, it shouldn't matter, because
> the outer pattern can match independently of how the cdr pattern is
> matched (so the two ellipses are handled separately).

I think because (A ... . (B ...)) is literally equivalent to (A ... B ...) that this would be considered illegal and not allowed. I think an implication is that if you have anything that is a list pattern in the CDR of your pattern, that's going to be treated the same as if you didn't use the dotted notation to represent it, so the restriction on what makes a valid pattern still applies there, and it wouldn't "match" the dotted pattern, but one of the list patterns.

> The reason we might want to is to relieve what I see as a blemish in
> what is specified by the report, the inconsistency in how the two
> examples above are handled. I've found that confusing and I think
> others are likely to as well.

I agree that it's subtle and easy to miss at the least. It might be worth it to make some kind of "out of band" commentary or note about this, but at the same time, I'm not sure it's worth actually changing the report to clarify, given that I don't think there's anything actually technically incorrect. However, that's the extent I'll push one way or the other, and I'll probably have to bow out of this now and let the rest of you all handle things! ;-)

Steven Ganz

unread,
Dec 9, 2022, 5:58:10 PM12/9/22
to scheme-reports-wg1

On Fri, Dec 9, 2022, at 3:05 AM, Aaron W. Hsu wrote:
>I think because (A ... . (B ...)) is literally equivalent to (A ... B ...) that this would be considered illegal and not allowed. I think an implication is that if you have
>anything that is a list pattern in the CDR of your pattern, that's going to be treated the same as if you didn't use the dotted notation to represent it, so the
>restriction on what makes a valid pattern still applies there, and it wouldn't "match" the dotted pattern, but one of the list patterns.
I can see why that equivalence would seem second-nature.  But as I see it, the report defines a transformation from markings on the page (such as "syntax-rules"), defined by a grammar, to other markings on the page, defined by another grammar, and must cover the full distance between the two grammars, whether the work in an actual implementation is to be performed, for example, by a reader, preprocessor, or pretty-printer, and within a compiler or interpreter, not between box-and-arrow diagrams generated by a reader and consumed by a pretty-printer.  If we want to define separate transformations for individual stages, we should do that explicitly, with intermediate grammars.
So I think when we make a statement or implicit assumption about a syntactic class, it's expected to hold over markings of that class, not over corresponding diagrams or using unstated equivalences, however second-nature.  But this may be getting a bit philosophical and for the particular issue at hand regarding the ellipses, it shouldn't matter, given the current state of the report including "and final".  

>I agree that it's subtle and easy to miss at the least. It might be worth it to make some kind of "out of band" commentary or note about this,
Such commentary (in or out of the report) may be helpful to implementers or to those defining subsequent versions of the language, but it's unlikely to help those currently using the language.

> but at the same time,
>I'm not sure it's worth actually changing the report to clarify, given that I don't think there's anything actually technically incorrect. However, that's the extent I'll >push one way or the other, and I'll probably have to bow out of this now and let the rest of you all handle things! ;-)
I've said my piece on this topic as well, so will also pipe down.  If others have an opinion to share, they should speak up.  



Chris Hanson

unread,
Dec 9, 2022, 9:14:59 PM12/9/22
to scheme-reports-wg1
On Friday, December 9, 2022 at 12:05:43 AM UTC-8 Aaron W. Hsu wrote:
On Fri, Dec 9, 2022, at 1:15 AM, Steven Ganz wrote:
> As far I as I know, an ellipsis as an element in an improper list
> pattern and also within the cdr pattern is permitted in the report. If
> you know that it's prohibited, I'd be curious to see where, but as long
> as we have the "and final" text in there, it shouldn't matter, because
> the outer pattern can match independently of how the cdr pattern is
> matched (so the two ellipses are handled separately).

I think because (A ... . (B ...)) is literally equivalent to (A ... B ...) that this would be considered illegal and not allowed. I think an implication is that if you have anything that is a list pattern in the CDR of your pattern, that's going to be treated the same as if you didn't use the dotted notation to represent it, so the restriction on what makes a valid pattern still applies there, and it wouldn't "match" the dotted pattern, but one of the list patterns.

Agreed. The presence of a dot is not what makes this a "dotted" pattern. After the reader is done with that syntax there's no dot.

The syntax is overly general in this case: the "dotted" part of the pattern can never be a list pattern, even though it is permitted by the BNF.

Chris Hanson

unread,
Dec 9, 2022, 9:19:23 PM12/9/22
to scheme-reports-wg1
On Thursday, December 8, 2022 at 11:37:53 PM UTC-8 Steve Ganz wrote:

On Thu, December 8, 2022, at 10:19:36 PM, Chris Hanson wrote:
> Following up after viewing the conversation on the web archive of the list.  
>If you wish me to be involved in the discussion please CC me as I’m not on the
>mailing list.
Just saw this, so didn't cc you on my last response.  You can find it in the archive.

>I missed the text "and final cdr" in my reading.  That resolves any ambiguity
>I see here.  It also disallows the case that started me thinking about this,
>which was a pattern of the form (a b . c) where c was intended to match the
>tail, much like a rest argument in a lambda list.  Given the "final" text
>that’s not a valid use of the pattern, and I’ll change my implementation.
I believe that IS a valid use of the pattern, as the report stands.  The "final" text is only under the rule for an improper list pattern with an ellipsis.  Without an ellipsis, as above, the report says "/E/ is a list or improper list of /n/ or more elements ...".  In other words it works like the lambda pattern, but more general, which I think is what most people would expect.   Without the ellipsis, there's no need for backtracking, so this shouldn't cause problems, but as I pointed out, the difference seems inconsistent and confusing. 

I don't have a problem with interpreting the dotted pattern tail differently depending on whether the LHS has an ellipsis. I agree that it's a little confusing and inconsistent, not to mention being unnecessary since one could always just use a non-dotted list pattern with an ellipsis at the end. But as long as the match is properly pinned down on the RHS I'm happy.

Aaron Hsu

unread,
Dec 10, 2022, 4:20:11 AM12/10/22
to Chris Hanson, WG1 Scheme Reports
On Fri, Dec 9, 2022, at 9:14 PM, Chris Hanson wrote:
> The syntax is overly general in this case: the "dotted" part of the
> pattern can never be a list pattern, even though it is permitted by the
> BNF.

That's how I see it. The BNF allows for any pattern, but the actual set of applicable patterns is smaller than this. I suspect that this was a "shortcut" to simplify the number of patterns, so that all of the valid patterns in a dotted pattern didn't have to be explicitly enumerated and formalized.

Steven Ganz

unread,
Dec 11, 2022, 6:37:05 PM12/11/22
to scheme-reports-wg1, Chris Hanson

On Fri, Dec 10, 2022, at 4:19 AM, Aaron W. Hsu wrote:
> On Fri, Dec 9, 2022, at 9:14 PM, Chris Hanson wrote:
>> The syntax is overly general in this case: the "dotted" part of the
>> pattern can never be a list pattern, even though it is permitted by the
>> BNF.
>
>That's how I see it. The BNF allows for any pattern, but the actual set of applicable patterns is smaller than this. I suspect that this was a "shortcut" to simplify
>the number of patterns, so that all of the valid patterns in a dotted pattern didn't have to be explicitly enumerated and formalized.

If the grammar were too general and just taking a shortcut from the actual grammar, so that it really should be replaced with the more specific grammar (whether we chose to do that or not), I believe we'd have to accept either:
A) that (<pat> . (<pat>)) is not really a pattern and (<expr> . (<expr>)) is not really an expression, so any conforming implementations should have the reader give errors for them.  I doubt that any of us believe that; or
B) that, as Chris described, the grammar that matters is for the language after the reader.  But I think that would commit us to both of the following:
1) that it's fine for a standards document to describe a language that is not what is entered by any programmer, and for that matter is not even necessarily directly generated by the reader; and
2) that because the reader is outside the scope of the language definition, it's OK for a conforming implementation to give errors for (<pat> . (<pat>)) or (<expr> . (<expr>)) using a non-standard reader; or  
C) that it's fine to treat the grammar as needing to be taken literally in some contexts but not in others, with us being under no obligation to say which is which (or attempting to distinguish them with some commentary).

I'm suspecting that option (C) above will be popular, but if we wanted to do something more formally correct and complete, there are two clear possibilities:
D)  Take the existing grammar literally and define the semantics to include the operation of the reader throughout (my earlier proposal fell under this approach); or
E)  Define both grammars, semantics for a reader working off of the former and generating the latter, and, for example, semantics for an evaluator working off of the latter, such that conforming applications need to work as if they were composing a conforming reader with a conforming evaluator (this would involve fewer modifications to the report, but more additions).  This is closer to an implementor's view, but IMO it would be better, if imposing that separation, to do so completely, creating separate specifications for the reader and evaluator, and fixing the intermediate representation.
All of this is separate from the "and final" issue, and I'd say less important, but related as described below. 

I do see benefit in using the terminology of "proper list" and "improper list", especially for values, and that is certainly common vocabulary.  But really, I think that we'd do well to avoid making the proper/improper distinction unnecessarily relevant for (<expr> . (<expr>)) or (<pat> . (<pat>)).  It would be nice if the semantics for (/E_1/ ... /E_n/ . (/E_{n+1}/)) were always the same as that for (/E_1/ ... /E_n/ /E_{n+1}/), and for that matter the semantics for (/E_1/ ... /E_k/ /E_e/ <ellipsis> /E_{m+1}/ ... /E_n/ . (/E_{n+1}/)) were always the same as that for (/E_1/ ... /E_k/ /E_e/ <ellipsis> /E_{m+1}/ ... /E_n/ /E_{n+1}/) (if only by reverting to the other clause of our definition), and similarly for patterns.  Then it wouldn't matter in those cases which clause were triggered (literally or otherwise).  And then we wouldn't have to care whether (<expr> . (<expr>)) or (<pat> . (<pat>)) is "proper" or not, if we had to use the terminology in that context.  If the action of the reader engenders an additional equivalence (Aaron calls them "literally equivalent"), which I agree that it should, then definitions over the larger grammar should respect that equivalence.  I believe that they currently do not, with (applying the definitions literally) (1 2 ... 3 . (4 5)) failing to match against (1 2 3 4 5), but (1 2 ... 3 4 5) matching (1 2 3 4 5).  So I think this sheds some additional light on that other discussion.




Steven Ganz

unread,
Dec 11, 2022, 8:26:48 PM12/11/22
to scheme-reports-wg1, Chris Hanson
Because there's been a lot of back-and-forth on this, I'd like to emphasize that all my comments are intended constructively and provided with great respect for everybody on the team that put this document together, many of whom put in a lot more time and effort than I did.


Alex Shinn

unread,
Dec 11, 2022, 8:33:05 PM12/11/22
to scheme-re...@googlegroups.com, Chris Hanson
I too apologize, I was trying to be disrespectful but didn't have time to do it justice.  Consider yourselves insulted.

--
Alex

On Mon, Dec 12, 2022 at 10:26 AM Steven Ganz <steve...@genetius.com> wrote:
Because there's been a lot of back-and-forth on this, I'd like to emphasize that all my comments are intended constructively and provided with great respect for everybody on the team that put this document together, many of whom put in a lot more time and effort than I did.


--
You received this message because you are subscribed to the Google Groups "scheme-reports-wg1" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scheme-reports-...@googlegroups.com.

Alex Shinn

unread,
Dec 11, 2022, 9:42:34 PM12/11/22
to scheme-re...@googlegroups.com, Chris Hanson
(forgot the smiley, obviously I'm not trying to offend anyone, my normally weird sense of humor has probably strayed into incomprehensibility due to sleep deprivation caused by a 3 week old baby)

Steven Ganz

unread,
Dec 11, 2022, 10:02:25 PM12/11/22
to scheme-reports-wg1, Chris Hanson
You had me going!


From: "Alex Shinn" <alex...@gmail.com>
To: "scheme-reports-wg1" <scheme-re...@googlegroups.com>, "Chris Hanson" <c...@chris-hanson.org>
Sent: Sunday, December 11, 2022 9:42:18 PM

Subject: Re: [scheme-reports-wg1] Probable bug in R7RS syntax-rules

Arthur A. Gleckler

unread,
May 4, 2023, 8:14:07 PM5/4/23
to scheme-re...@googlegroups.com, Chris Hanson
Sorry, but did we ever come to a resolution on this thread?  I don't see a consensus, and I want to make sure that the clarification or correction isn't lost.

Alex Shinn

unread,
May 9, 2023, 12:07:57 AM5/9/23
to scheme-re...@googlegroups.com, Chris Hanson
To recap:

We're discussing this syntax:

  (P1 . . . Pk Pe ⟨ellipsis⟩ Pm+1 . . . Pn . Px)

The motivation of this is to allow variadic improper list patterns, which could not otherwise be handled.

Chris' concern was that an ellipsis followed by a dotted tail is ambiguous in where the split occurs, requiring backtracking.

Aaron pointed out that there is no ambiguity, quoting the standard:

"P is of the form (P1 . . . Pk Pe ⟨ellipsis⟩ Pm+1 . . .
Pn . Px) where E is a list or improper list of n elements, the first k of which match P1 through Pk,
whose next m − k elements each match Pe, whose remaining n−m elements match Pm+1 through Pn, and
whose nth and final cdr matches Px; ..."

Here "nth and final cdr" means there is no ambiguity: Px must always be nil or a non-pair,
so the greedy approach in Chibi is correct and no backtracking is needed.

--
Alex

On Fri, May 5, 2023 at 9:14 AM Arthur A. Gleckler <sch...@speechcode.com> wrote:
Sorry, but did we ever come to a resolution on this thread?  I don't see a consensus, and I want to make sure that the clarification or correction isn't lost.

--
You received this message because you are subscribed to the Google Groups "scheme-reports-wg1" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scheme-reports-...@googlegroups.com.

Arthur A. Gleckler

unread,
May 9, 2023, 12:43:15 AM5/9/23
to scheme-reports-wg1, Chris Hanson
On Mon, May 8, 2023 at 9:07 PM Alex Shinn <alex...@gmail.com> wrote:
To recap:

Thanks.  It sounds like there's nothing to do.

(I keep track of all unresolved SRFI threads as messages come in, and accidentally marked this one as unresolved in the process.)
Reply all
Reply to author
Forward
0 new messages