Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
Message from discussion + in regular expression

Received: by 10.66.88.42 with SMTP id bd10mr2131260pab.18.1349429008178;
        Fri, 05 Oct 2012 02:23:28 -0700 (PDT)
Path: g9ni25517pbh.1!nntp.google.com!news.glorb.com!fu-berlin.de!uni-berlin.de!individual.net!not-for-mail
From: Duncan Booth <duncan.bo...@invalid.invalid>
Newsgroups: comp.lang.python
Subject: Re: + in regular expression
Date: 5 Oct 2012 09:23:26 GMT
Lines: 42
Message-ID: <XnsA0E3689B3693duncanbooth@127.0.0.1>
References: <CALwzidnH2T5vsYT=nMvBmO4V6fmK+aMfHpxQDWrwArJ6aKtVew@mail.gmail.com> <mailman.1838.1349414969.27098.python-list@python.org>
Reply-To: duncan.bo...@suttoncourtenay.org.uk
Mime-Version: 1.0
X-Trace: individual.net OdccvRJuoBbdB4qKXZCu1gH38ICZl1Fjq+vAMmOW3Mfk85fUFJ
Cancel-Lock: sha1:s+Z0gb44pC0nADg5ek4yaMDEyR8=
User-Agent: Xnews/2006.08.24 Hamster/2.1.0.11
X-Face: .C;/v...@2k.C(.1v-}d=`|7AQ-%,#A$0ZGtAkLPvuawAM>3#D,pXaAb31%(=Gn2ZZK/Z~fd0y4't5iKK~F":}F2*|\mQYX+BUr4ZM*|+`@o-TKzFGwsJnan{)*b~QJ-Fu^u'$$nYV
Content-Type: text/plain; charset=iso-8859-1
Content-Transfer-Encoding: 8bit

Cameron Simpson <c...@zip.com.au> wrote:

> On 03Oct2012 21:17, Ian Kelly <ian.g.ke...@gmail.com> wrote:
>| On Wed, Oct 3, 2012 at 9:01 PM, contro opinion
>| <contropin...@gmail.com> wrote: 
>| > why the  "\s{6}+"  is not a regular pattern?
>| 
>| Use a group: "(?:\s{6})+"
> 
> Yeah, it is probably a precedence issue in the grammar.
> "(\s{6})+" is also accepted.

It's about syntax, not precedence, but the documentation doesn't really 
spell it out in full. Like most regex documentation it talks in woolly 
terms about special characters rather than giving a formal syntax.

A regular expression element may be followed by a quantifier. 
Quantifiers are '*', '+', '?', '{n}', '{n,m}' (and lazy quantifiers 
'*?', '+?', '{n,m}?'). There's nothing in the regex language which says 
you can follow an element with two quantifiers. Parentheses (grouping or 
non-grouping) around a regex turn that regex into a single element which 
is why you can then use another quantifier.

In bnf, I think Python's regexes would be somthing like:

re ::= union | simple-re
union ::= re | simple-re
simple-re ::= concatenation | basic-re
concatenation ::= simple-re basic-re
basic-re ::= element | element quantifier
element ::= group | nc-group | "." | "^" | "$" | char | charset
quantifier = "*" | "+" | "?" | "{" NUMBER "}" | "{" NUMBER "," NUMBER 
"}" |"*?" | "+?" | "{" NUMBER "," NUMBER "}?"
group ::= "(" re ")"
nc-group ::= "(?:" re ")"
char = <any non-special character> | "\" <any character>

... and so on. I didn't include charsets or all the (?...) extensions or 
special sequences.

-- 
Duncan Booth http://kupuguy.blogspot.com