Message from discussion
+ in regular expression
Received: by 10.66.88.42 with SMTP id bd10mr2131260pab.18.1349429008178;
Fri, 05 Oct 2012 02:23:28 -0700 (PDT)
Path: g9ni25517pbh.1!nntp.google.com!news.glorb.com!fu-berlin.de!uni-berlin.de!individual.net!not-for-mail
From: Duncan Booth <duncan.bo...@invalid.invalid>
Newsgroups: comp.lang.python
Subject: Re: + in regular expression
Date: 5 Oct 2012 09:23:26 GMT
Lines: 42
Message-ID: <XnsA0E3689B3693duncanbooth@127.0.0.1>
References: <CALwzidnH2T5vsYT=nMvBmO4V6fmK+aMfHpxQDWrwArJ6aKtVew@mail.gmail.com> <mailman.1838.1349414969.27098.python-list@python.org>
Reply-To: duncan.bo...@suttoncourtenay.org.uk
Mime-Version: 1.0
X-Trace: individual.net OdccvRJuoBbdB4qKXZCu1gH38ICZl1Fjq+vAMmOW3Mfk85fUFJ
Cancel-Lock: sha1:s+Z0gb44pC0nADg5ek4yaMDEyR8=
User-Agent: Xnews/2006.08.24 Hamster/2.1.0.11
X-Face: .C;/v...@2k.C(.1v-}d=`|7AQ-%,#A$0ZGtAkLPvuawAM>3#D,pXaAb31%(=Gn2ZZK/Z~fd0y4't5iKK~F":}F2*|\mQYX+BUr4ZM*|+`@o-TKzFGwsJnan{)*b~QJ-Fu^u'$$nYV
Content-Type: text/plain; charset=iso-8859-1
Content-Transfer-Encoding: 8bit
Cameron Simpson <c...@zip.com.au> wrote:
> On 03Oct2012 21:17, Ian Kelly <ian.g.ke...@gmail.com> wrote:
>| On Wed, Oct 3, 2012 at 9:01 PM, contro opinion
>| <contropin...@gmail.com> wrote:
>| > why the "\s{6}+" is not a regular pattern?
>|
>| Use a group: "(?:\s{6})+"
>
> Yeah, it is probably a precedence issue in the grammar.
> "(\s{6})+" is also accepted.
It's about syntax, not precedence, but the documentation doesn't really
spell it out in full. Like most regex documentation it talks in woolly
terms about special characters rather than giving a formal syntax.
A regular expression element may be followed by a quantifier.
Quantifiers are '*', '+', '?', '{n}', '{n,m}' (and lazy quantifiers
'*?', '+?', '{n,m}?'). There's nothing in the regex language which says
you can follow an element with two quantifiers. Parentheses (grouping or
non-grouping) around a regex turn that regex into a single element which
is why you can then use another quantifier.
In bnf, I think Python's regexes would be somthing like:
re ::= union | simple-re
union ::= re | simple-re
simple-re ::= concatenation | basic-re
concatenation ::= simple-re basic-re
basic-re ::= element | element quantifier
element ::= group | nc-group | "." | "^" | "$" | char | charset
quantifier = "*" | "+" | "?" | "{" NUMBER "}" | "{" NUMBER "," NUMBER
"}" |"*?" | "+?" | "{" NUMBER "," NUMBER "}?"
group ::= "(" re ")"
nc-group ::= "(?:" re ")"
char = <any non-special character> | "\" <any character>
... and so on. I didn't include charsets or all the (?...) extensions or
special sequences.
--
Duncan Booth http://kupuguy.blogspot.com