Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Scoping rules for (?imsx-imsx)

5 views
Skip to first unread message

Edi Weitz

unread,
Sep 11, 2002, 8:19:02 PM9/11/02
to
man perlre isn't particularly verbose about the scope of the embedded
pattern-match modifiers (?imsx-imsx) except for the sentence "These
modifiers are localized inside an enclosing group (if any)." What I'm
missing is a clear explanation where, say, the case-insensitivity
introduced by (?i) starts and where exactly it ends again. I wrote a
small program to test different positions and groupings and I'm quite
confused now:

#!/usr/bin/perl

# note that the following assignment is one line
@a = qw{a.b (?s)a.b a.b(?s) (?:a.b(?s)) (?:(?s)a.b) (?s)a(?:.)b (?s)a(?-s:.)b (?s)a(?:(?-s).)b (?s)a(?-s:.(?s))b (?s)a(?-s:(?s).)b a(?s:.)b(?-s)};

$x = "a\nb";
for $r (@a) {
printf "%2d %18s ", ++$counter, $r;
if ($x =~ /$r/) {
print ". matches newline.\n";
} else {
print ". doesn't match newline.\n";
}
$r =~ tr/.s/$m/;
$r =~ s/b//;
}

for $r (@a) {
printf "%2d %18s ", ++$counter, $r;
if ($x =~ /$r/) {
print "\$ matches end of line.\n";
} else {
print "\$ doesn't match end of line.\n";
}
}

This yields (with Perl 5.6.1):

edi@bird:~/regex > ./foo.pl
1 a.b . doesn't match newline.
2 (?s)a.b . matches newline.
3 a.b(?s) . matches newline.
4 (?:a.b(?s)) . doesn't match newline.
5 (?:(?s)a.b) . matches newline.
6 (?s)a(?:.)b . matches newline.
7 (?s)a(?-s:.)b . doesn't match newline.
8 (?s)a(?:(?-s).)b . doesn't match newline.
9 (?s)a(?-s:.(?s))b . doesn't match newline.
10 (?s)a(?-s:(?s).)b . matches newline.
11 a(?s:.)b(?-s) . matches newline.
12 a$ $ doesn't match end of line.
13 (?m)a$ $ matches end of line.
14 a$(?m) $ matches end of line.
15 (?:a$(?m)) $ doesn't match end of line.
16 (?:(?m)a$) $ matches end of line.
17 (?m)a(?:$) $ matches end of line.
18 (?m)a(?-m:$) $ matches end of line.
19 (?m)a(?:(?-m)$) $ matches end of line.
20 (?m)a(?-m:$(?m)) $ matches end of line.
21 (?m)a(?-m:(?m)$) $ matches end of line.
22 a(?m:$)(?-m) $ matches end of line.

Hmmm, let's see:

A. Lines 2 and 3 seem to imply that the position of (?s) relative to
the dot doesn't matter. The dot's behaviour is changed no matter
whether (?s) occurs before or after it.

B. But inside of a group it _does_ matter whether (?s) is in front of
the dot or not - see lines 4 and 5.

C. The inner group in line 6 doesn't shield the dot from the effect of
(?s), i.e. it doesn't fall back to its default behaviour not to
match newline.

D. However, the inner groups in 7 and 8 which have an explicit (?-s)
_do_ shield the dot from the outer (?s).

E. Moreover, (?m) doesn't seem to follow the same scoping rules as
(?s) - compare lines 7 to 9 with lines 18 to 20.

The more I look at these examples the less I understand them. Could
somebody please enlighten me?

Thanks in advance,
Edi.

PS: Yeah, I know, these are contrived examples. I came across them
when I built my own regex engine which I wanted to be as
Perl-compatible as possible.

Jeff 'japhy' Pinyan

unread,
Sep 11, 2002, 8:49:40 PM9/11/02
to Edi Weitz
[posted & mailed]

On 12 Sep 2002, Edi Weitz wrote:

>man perlre isn't particularly verbose about the scope of the embedded
>pattern-match modifiers (?imsx-imsx) except for the sentence "These
>modifiers are localized inside an enclosing group (if any)." What I'm
>missing is a clear explanation where, say, the case-insensitivity
>introduced by (?i) starts and where exactly it ends again. I wrote a
>small program to test different positions and groupings and I'm quite
>confused now:

Up until 5.8, top-level (?ismx) modifiers bled to the rest of the
top-level no matter where they were, thus:

> 2 (?s)a.b . matches newline.
> 3 a.b(?s) . matches newline.

whereas

> 4 (?:a.b(?s)) . doesn't match newline.
> 5 (?:(?s)a.b) . matches newline.

In 5.8, I believe this problem was fixed.

--
Jeff "japhy" Pinyan RPI Acacia Brother #734 2002 Acacia Senior Dean
"And I vos head of Gestapo for ten | Michael Palin (as Heinrich Bimmler)
years. Ah! Five years! Nein! No! | in: The North Minehead Bye-Election
Oh. Was NOT head of Gestapo AT ALL!" | (Monty Python's Flying Circus)

Ilya Zakharevich

unread,
Sep 12, 2002, 2:33:55 AM9/12/02
to
[A complimentary Cc of this posting was sent to
Edi Weitz
<e...@agharta.de>], who wrote in article <871y80d...@bird.agharta.de>:

> man perlre isn't particularly verbose about the scope of the embedded
> pattern-match modifiers (?imsx-imsx) except for the sentence "These
> modifiers are localized inside an enclosing group (if any)."

This is made for a reason (the legacy implementation is not good
enough to be documented). Add an extra group to have the modifier at
the group start.

Ilya

0 new messages