Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Numbered regexps throw invalid regex error

37 views
Skip to first unread message

Tom

unread,
Jan 25, 2012, 5:17:26 AM1/25/12
to help-gn...@gnu.org
According to the docs:

This construct allows you to force a particular
group number. There is no particular restriction on the numbering,

but when I try this:

(looking-at "\\(a\\(?1:b\\)\\)")

then it throws an error.

I know it's possibly a collision with the containing group's number
and I can overcome it by using a larger number, but the doc says
I can use it to force a group number, so why does it throw an error
then?

Is it a bug? I tried it with GNU Emacs 23.2.1




Stefan Monnier

unread,
Jan 25, 2012, 9:10:24 AM1/25/12
to
> This construct allows you to force a particular
> group number. There is no particular restriction on the numbering,
> but when I try this:
> (looking-at "\\(a\\(?1:b\\)\\)")
> then it throws an error.
> I know it's possibly a collision with the containing group's number
> and I can overcome it by using a larger number,

Rather than a larger number, you can just stop the outer group from
getting a number: (looking-at "\\(?:a\\(?1:b\\)\\)") which happens to be
the same as (looking-at "\\(?:a\\(b\\)\\)").

> but the doc says I can use it to force a group number, so why does it
> throw an error then?

Because having the same group nested in "itself" leads to tricky
questions about the expected semantics: when matching
"\\(a\\(?1:b\\)c\\)", should a successful match's group1 match "abc" or
"b" or "ab" or "bc"? You'd probably want either "abc" or "b" but the
most likely behavior, given the current implementation, would be "bc".

> Is it a bug? I tried it with GNU Emacs 23.2.1

I (as implementor of the feature, and maintainer of Emacs) don't
consider it as a bug, no.
But I'd like to hear more context explaining why it's a problem for you,


Stefan

Andreas Röhler

unread,
Jan 25, 2012, 12:21:23 PM1/25/12
to help-gnu-emacs@gnu.org List
Am 25.01.2012 11:17, schrieb Tom:
> (looking-at "\\(a\\(?1:b\\)\\)")
>
AFAIU there are serveral errors,

think you can't refer to first match inside itself

here is a working example matching "aba"

(looking-at "\\(a\\)\\(b\\)\\(\\1\\)")aba

Tom

unread,
Jan 25, 2012, 12:51:43 PM1/25/12
to help-gn...@gnu.org
Andreas Röhler <andreas.roehler <at> easy-emacs.de> writes:

>
> Am 25.01.2012 11:17, schrieb Tom:
> > (looking-at "\\(a\\(?1:b\\)\\)")
> >
> AFAIU there are serveral errors,
>
> think you can't refer to first match inside itself
>

You misunderstand the feature.

It's not a backreference. It's an explicit numbering of
the group, so it doesn't change if you add more parens:

`\(?NUM: ... \)'
is the "explicitly numbered group" construct. Normal groups get
their number implicitly, based on their position, which can be
inconvenient. This construct allows you to force a particular
group number. There is no particular restriction on the numbering,
e.g. you can have several groups with the same number in which case
the last one to match (i.e. the rightmost match) will win.
Implicitly numbered groups always get the smallest integer larger
than the one of any previous group.



Andreas Röhler

unread,
Jan 25, 2012, 3:02:13 PM1/25/12
to help-gn...@gnu.org
Am 25.01.2012 18:51, schrieb Tom:
> Andreas Röhler<andreas.roehler<at> easy-emacs.de> writes:
>
>>
>> Am 25.01.2012 11:17, schrieb Tom:
>>> (looking-at "\\(a\\c(?1:b\\)\\)")
>>>
>> AFAIU there are serveral errors,
>>
>> think you can't refer to first match inside itself
>>
>
> You misunderstand the feature.
>
> It's not a backreference. It's an explicit numbering of
> the group, so it doesn't change if you add more parens:
>
> `\(?NUM: ... \)'
> is the "explicitly numbered group" construct. Normal groups get
> their number implicitly, based on their position, which can be
> inconvenient. This construct allows you to force a particular
> group number. There is no particular restriction on the numbering,
> e.g. you can have several groups with the same number in which case
> the last one to match (i.e. the rightmost match) will win.
> Implicitly numbered groups always get the smallest integer larger
> than the one of any previous group.
>
>
>
>

Okay, next try :)

you can't assign a group number already assigned automatically

that would work:

(looking-at "\\(?2:a\\(?1:b\\)c\\)")





Tom

unread,
Jan 25, 2012, 3:14:07 PM1/25/12
to help-gn...@gnu.org
Andreas Röhler <andreas.roehler <at> easy-emacs.de> writes:
>
> you can't assign a group number already assigned automatically
>

The doc clearly says:

"There is no particular restriction on the numbering,
e.g. you can have several groups with the same number in which case
the last one to match (i.e. the rightmost match) will win."



There is no mention of an automatic assignment. It says the last one
will win, so assigning a number which is already used for
a previous group should work.




Andreas Röhler

unread,
Jan 25, 2012, 3:37:36 PM1/25/12
to help-gn...@gnu.org
would consider it a bug then - at least a docu bug


Tom

unread,
Jan 26, 2012, 3:30:00 AM1/26/12
to help-gn...@gnu.org
I don't think it's a doc bug. The whole point of the feature is that
you can assign constant numbers to groups, so they don't change if you
add/remove parens.

If it can suddenly throw a regexp error, just because you've added
a paren around an other paren then it pretty much defeats the
prupose of the feature.



Andreas Röhler

unread,
Jan 26, 2012, 4:29:28 AM1/26/12
to help-gn...@gnu.org
Probably. OTOH one may arg the error message helps counting.

Could think of a static variant --as is-- as it provides some security.

OTOH one may think at dynamic environments which can't be handled that way.

Anyway in favor of a bug report.




Tim Landscheidt

unread,
Jan 26, 2012, 6:52:45 AM1/26/12
to help-gn...@gnu.org
Tom <adatg...@gmail.com> wrote:

>> > There is no mention of an automatic assignment. It says the last one
>> > will win, so assigning a number which is already used for
>> > a previous group should work.

>> would consider it a bug then - at least a docu bug

> I don't think it's a doc bug. The whole point of the feature is that
> you can assign constant numbers to groups, so they don't change if you
> add/remove parens.

> If it can suddenly throw a regexp error, just because you've added
> a paren around an other paren then it pretty much defeats the
> prupose of the feature.

I'd agree that the documentation should be more verbose, but
I don't think that your argument that it'd be a bug is va-
lid. If someone added another pair of parentheses around
"a\\(?1:b\\)", either they don't want to refer to it, thus
using "\\(?:", or if they want to refer to it, they have to
think about how to do that anyhow.

How would you rephrase the documentation so that the cur-
rent behaviour is more comprehensibly described?

Tim


Tom

unread,
Jan 26, 2012, 9:36:50 AM1/26/12
to help-gn...@gnu.org
Tim Landscheidt <tim <at> tim-landscheidt.de> writes:

>
> I'd agree that the documentation should be more verbose, but
> I don't think that your argument that it'd be a bug is va-
> lid. If someone added another pair of parentheses around
> "a\\(?1:b\\)", either they don't want to refer to it, thus
> using "\\(?:"


Why should they use "\\(?:" explicitly if they don't care about it?

I personally never use the shy group construct, because
if I don't want to use a groups' value then I simply ignore it.
It will capture some data, but who cares?

That's what the explicit numbering of groups is about that I can
number those groups which I'm interested in and I can simply ignore
the others, beause I know that adding and removing groups should not
affect my explicitly numbered groups.

The proper solution is fixing the implementation, so it honors
the user's explicit choices, not forcing the user to change the
regexp.



Stefan Monnier

unread,
Jan 27, 2012, 8:52:50 AM1/27/12
to
> The proper solution is fixing the implementation, so it honors
> the user's explicit choices, not forcing the user to change the
> regexp.

Using \(...\) is not an explicit statement that you don't care about
this group. Only \(?:...\) is.
So when you have \(a\(?1:b\)c\), the two uses of group number 1 have
equal importance. The doc does say "the rightmost match takes
precedence" but now you tell me which of the two is "rightmost": the
inner one because it starts to the right of the outer's beginning, or
the outer one because it ends to the right of the inner one?


Stefan

Barry Margolin

unread,
Jan 27, 2012, 12:18:31 PM1/27/12
to
In article <jwvk44ds1a3.fsf-mon...@gnu.org>,
Since numeric counting of groups is based on the beginnings, it would be
consistent for "rightmost" to use the same convention.

--
Barry Margolin, bar...@alum.mit.edu
Arlington, MA
*** PLEASE post questions in newsgroups, not directly to me ***
0 new messages