Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
+ in regular expression
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  12 messages - Collapse all  -  Translate all to Translated (View all originals)
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
Ian Kelly  
View profile  
 More options Oct 3 2012, 11:18 pm
Newsgroups: comp.lang.python
From: Ian Kelly <ian.g.ke...@gmail.com>
Date: Wed, 3 Oct 2012 21:17:19 -0600
Local: Wed, Oct 3 2012 11:17 pm
Subject: Re: + in regular expression

On Wed, Oct 3, 2012 at 9:01 PM, contro opinion <contropin...@gmail.com> wrote:
> why the  "\s{6}+"  is not a regular pattern?

Use a group: "(?:\s{6})+"

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Mark Lawrence  
View profile  
 More options Oct 4 2012, 5:59 pm
Newsgroups: comp.lang.python
From: Mark Lawrence <breamore...@yahoo.co.uk>
Date: Thu, 04 Oct 2012 22:59:27 +0100
Local: Thurs, Oct 4 2012 5:59 pm
Subject: Re: + in regular expression
On 04/10/2012 04:01, contro opinion wrote:

Why are you too lazy to do any research before posting a question?

--
Cheers.

Mark Lawrence.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Evan Driscoll  
View profile  
 More options Oct 4 2012, 10:41 pm
Newsgroups: comp.lang.python
From: Evan Driscoll <drisc...@cs.wisc.edu>
Date: Thu, 04 Oct 2012 21:25:40 -0500
Local: Thurs, Oct 4 2012 10:25 pm
Subject: Re: Re: + in regular expression
On 10/04/2012 04:59 PM, Mark Lawrence wrote:
>> why the  "\s{6}+"  is not a regular pattern?

> Why are you too lazy to do any research before posting a question?

Errr... what?

I'm only somewhat familiar with the extra stuff that languages provide
in their regexs beyond true regular expressions and simple extensions,
but I was surprised to see the question because I too would have
expected that to work. (And match any sequence of whitespace characters
whose length is a multiple of six.) I reskimmed the documentation of the
re module and didn't see anything that would prohibit it. I looked at
several of the results of a Google search for the multiple repeat error,
and didn't really find any explanation beyond "because you can't do it"
or "here's a regex that works." (Well, OK, I did see a mention of +
being a possessive quantifier which Python doesn't support. But that
still doesn't explain why my expectation isn't what happened.)

In what way is that an unreasonable question?

Evan


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Saroo Jain  
View profile  
 More options Oct 4 2012, 11:48 pm
Newsgroups: comp.lang.python
From: Saroo Jain <Saroo_J...@infosys.com>
Date: Fri, 5 Oct 2012 03:44:25 +0000
Local: Thurs, Oct 4 2012 11:44 pm
Subject: RE: + in regular expression
x3=re.match("\s{6}+",str)

instead use
x3=re.match("\s{6,}",str)

This serves the purpose. And also give some food for thought for why the first one throws an error.

Cheers,
Saroo


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Ian Kelly  
View profile  
 More options Oct 5 2012, 1:14 am
Newsgroups: comp.lang.python
From: Ian Kelly <ian.g.ke...@gmail.com>
Date: Thu, 4 Oct 2012 23:14:13 -0600
Local: Fri, Oct 5 2012 1:14 am
Subject: Re: + in regular expression

On Thu, Oct 4, 2012 at 9:44 PM, Saroo Jain <Saroo_J...@infosys.com> wrote:
> x3=re.match("\s{6}+",str)

> instead use
> x3=re.match("\s{6,}",str)

> This serves the purpose. And also give some food for thought for why the first one throws an error.

That matches six or more spaces, not multiples of six spaces.

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Cameron Simpson  
View profile  
 More options Oct 5 2012, 1:29 am
Newsgroups: comp.lang.python
From: Cameron Simpson <c...@zip.com.au>
Date: Fri, 5 Oct 2012 15:22:28 +1000
Local: Fri, Oct 5 2012 1:22 am
Subject: Re: + in regular expression
On 03Oct2012 21:17, Ian Kelly <ian.g.ke...@gmail.com> wrote:
| On Wed, Oct 3, 2012 at 9:01 PM, contro opinion <contropin...@gmail.com> wrote:
| > why the  "\s{6}+"  is not a regular pattern?
|
| Use a group: "(?:\s{6})+"

Yeah, it is probably a precedence issue in the grammar.
"(\s{6})+" is also accepted.
--
Cameron Simpson <c...@zip.com.au>

Disclaimer: ERIM wanted to share my opinions, but I wouldn't let them.
        - David Wiseman <dwise...@erim.org>


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Duncan Booth  
View profile  
 More options Oct 5 2012, 5:23 am
Newsgroups: comp.lang.python
From: Duncan Booth <duncan.bo...@invalid.invalid>
Date: 5 Oct 2012 09:23:26 GMT
Local: Fri, Oct 5 2012 5:23 am
Subject: Re: + in regular expression

Cameron Simpson <c...@zip.com.au> wrote:
> On 03Oct2012 21:17, Ian Kelly <ian.g.ke...@gmail.com> wrote:
>| On Wed, Oct 3, 2012 at 9:01 PM, contro opinion
>| <contropin...@gmail.com> wrote:
>| > why the  "\s{6}+"  is not a regular pattern?
>|
>| Use a group: "(?:\s{6})+"

> Yeah, it is probably a precedence issue in the grammar.
> "(\s{6})+" is also accepted.

It's about syntax, not precedence, but the documentation doesn't really
spell it out in full. Like most regex documentation it talks in woolly
terms about special characters rather than giving a formal syntax.

A regular expression element may be followed by a quantifier.
Quantifiers are '*', '+', '?', '{n}', '{n,m}' (and lazy quantifiers
'*?', '+?', '{n,m}?'). There's nothing in the regex language which says
you can follow an element with two quantifiers. Parentheses (grouping or
non-grouping) around a regex turn that regex into a single element which
is why you can then use another quantifier.

In bnf, I think Python's regexes would be somthing like:

re ::= union | simple-re
union ::= re | simple-re
simple-re ::= concatenation | basic-re
concatenation ::= simple-re basic-re
basic-re ::= element | element quantifier
element ::= group | nc-group | "." | "^" | "$" | char | charset
quantifier = "*" | "+" | "?" | "{" NUMBER "}" | "{" NUMBER "," NUMBER
"}" |"*?" | "+?" | "{" NUMBER "," NUMBER "}?"
group ::= "(" re ")"
nc-group ::= "(?:" re ")"
char = <any non-special character> | "\" <any character>

... and so on. I didn't include charsets or all the (?...) extensions or
special sequences.

--
Duncan Booth http://kupuguy.blogspot.com


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Evan Driscoll  
View profile  
 More options Oct 5 2012, 11:27 am
Newsgroups: comp.lang.python
From: Evan Driscoll <drisc...@cs.wisc.edu>
Date: Fri, 05 Oct 2012 10:27:00 -0500
Subject: Re: Re: + in regular expression
On 10/05/2012 04:23 AM, Duncan Booth wrote:
> A regular expression element may be followed by a quantifier.
> Quantifiers are '*', '+', '?', '{n}', '{n,m}' (and lazy quantifiers
> '*?', '+?', '{n,m}?'). There's nothing in the regex language which says
> you can follow an element with two quantifiers.

In fact, *you* did -- the first sentence of that paragraph! :-)

\s is a regex, so you can follow it with a quantifier and get \s{6}.
That's also a regex, so you should be able to follow it with a quantifier.

I can understand that you can create a grammar that excludes it. I'm
actually really interested to know if anyone knows whether this was a
deliberate decision and, if so, what the reason is. (And if not --
should it be considered a (low priority) bug?)

Was it because such patterns often reveal a mistake? Because "\s{6}+"
has other meanings in different regex syntaxes and the designers didn't
want confusion? Because it was simpler to parse that way? Because the
"hey you recognize regular expressions by converting it to a finite
automaton" story is a lie in most real-world regex implementations (in
part because they're not actually regular expressions) and repeated
quantifiers cause problems with the parsing techniques that actually get
used?

Evan


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Evan Driscoll  
View profile  
 More options Oct 5 2012, 11:31 am
Newsgroups: comp.lang.python
From: Evan Driscoll <drisc...@cs.wisc.edu>
Date: Fri, 05 Oct 2012 10:31:26 -0500
Local: Fri, Oct 5 2012 11:31 am
Subject: Re: + in regular expression
On 10/05/2012 10:27 AM, Evan Driscoll wrote:
> On 10/05/2012 04:23 AM, Duncan Booth wrote:
>> A regular expression element may be followed by a quantifier.
>> Quantifiers are '*', '+', '?', '{n}', '{n,m}' (and lazy quantifiers
>> '*?', '+?', '{n,m}?'). There's nothing in the regex language which says
>> you can follow an element with two quantifiers.
> In fact, *you* did -- the first sentence of that paragraph! :-)

> \s is a regex, so you can follow it with a quantifier and get \s{6}.
> That's also a regex, so you should be able to follow it with a
> quantifier.

OK, I guess this isn't true... you said a "regular expression *element*"
can be followed by a quantifier. I just took what I usually see as part
of a regular expression and read into your post something it didn't
quite say. Still, the rest of mine applies.

Evan


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
MRAB  
View profile  
 More options Oct 5 2012, 12:08 pm
Newsgroups: comp.lang.python
From: MRAB <pyt...@mrabarnett.plus.com>
Date: Fri, 05 Oct 2012 17:07:47 +0100
Local: Fri, Oct 5 2012 12:07 pm
Subject: Re: + in regular expression
On 2012-10-05 16:27, Evan Driscoll wrote:

You rarely want to repeat a repeated element. It can also result in
catastrophic
backtracking unless you're _very_ careful.

In many other regex implementations (including mine), "*+", "*+" and
"?+" are possessive quantifiers, much as "??", "*?" and "??" are lazy
quantifiers.

You could, of course, ask why adding "?" after a quantifier doesn't
make it optional, e.g. why r"\s{6}?" doesn't mean the same as
r"(?:\s{6})?", or why r"\s{0,6}?" doesn't mean the same as
r"(?:\s{0,6})?".


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Cameron Simpson  
View profile  
 More options Oct 5 2012, 7:37 pm
Newsgroups: comp.lang.python
From: Cameron Simpson <c...@zip.com.au>
Date: Sat, 6 Oct 2012 09:37:42 +1000
Local: Fri, Oct 5 2012 7:37 pm
Subject: Re: + in regular expression
On 05Oct2012 10:27, Evan Driscoll <drisc...@cs.wisc.edu> wrote:
| I can understand that you can create a grammar that excludes it. [...]
| Was it because such patterns often reveal a mistake?

For myself, I would consider that sufficient reason.

I've seen plenty of languages (C and shell, for example, though they
are not alone or egrarious) where a compiler can emit a syntax complaint
many lines from the actual coding mistake (in shell, an unclosed quote
or control construct is a common examplei; Python has the same issue
but mitigated by the indentation requirements which cut the occurence
down a lot).

Forbidding a common error by requiring a wordier workaround isn't
unreasonable.

| Because "\s{6}+"
| has other meanings in different regex syntaxes and the designers didn't
| want confusion?

I think Python REs are supposed to be Perl compatible; ISTR an opening
sentence to that effect...

| Because it was simpler to parse that way? Because the
| "hey you recognize regular expressions by converting it to a finite
| automaton" story is a lie in most real-world regex implementations (in
| part because they're not actually regular expressions) and repeated
| quantifiers cause problems with the parsing techniques that actually get
| used?

There are certainly constructs that can cause an exponential amount
of backtracking is misused. One could make a case for discouragement
(though not a case for forbidding them).

Just my 2c,
--
Cameron Simpson <c...@zip.com.au>

The most annoying thing about being without my files after our disc crash was
discovering once again how widespread BLINK was on the web.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Duncan Booth  
View profile  
 More options Oct 9 2012, 7:29 am
Newsgroups: comp.lang.python
From: Duncan Booth <duncan.bo...@invalid.invalid>
Date: 9 Oct 2012 11:29:16 GMT
Local: Tues, Oct 9 2012 7:29 am
Subject: Re: + in regular expression

Cameron Simpson <c...@zip.com.au> wrote:
>| Because "\s{6}+"
>| has other meanings in different regex syntaxes and the designers didn't
>| want confusion?

> I think Python REs are supposed to be Perl compatible; ISTR an opening
> sentence to that effect...

I don't know the full history of how regex engines evolved, but I suspect
at least part of the answer is that the decisions the Perl developers made
influenced the other implementations.

Perl's quantifiers allow both '?' and '+' as modifiers on the standard
quantifiers so clearly you cannot stack those particular quantifiers in
Perl, therefore quantifiers in general are unstackable.

The only grammars I can find online for regular expressions split out the
elements and quantifiers the way I did in my previous post. Python's regex
parser (and I would guess also most of the others in existence) tend more
to the spaghetti code than following a grammar (_parse is a 238 line
function). So I think it really is just trying to match existing regular
expression parsers and any possible grammar is an excuse for why it should
be the way it is rather than an explanation.

--
Duncan Booth http://kupuguy.blogspot.com


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »