Possessive quantifiers in regexp

466 views
Skip to first unread message

amit....@mail.huji.ac.il

unread,
Apr 3, 2014, 10:45:08 AM4/3/14
to golan...@googlegroups.com
Hi,

I see that currently go's regexp doesn't support possessive quantifiers.
Is this temporary, or is this a defined feature (or misfeature)?

cheers

Ian Lance Taylor

unread,
Apr 3, 2014, 7:18:46 PM4/3/14
to amit....@mail.huji.ac.il, golang-nuts
On Thu, Apr 3, 2014 at 7:45 AM, <amit....@mail.huji.ac.il> wrote:
>
> I see that currently go's regexp doesn't support possessive quantifiers.
> Is this temporary, or is this a defined feature (or misfeature)?

The regexp package does not backtrack, so I can't think of any reason
why it would need possessive quantifiers. There is some background at
http://swtch.com/~rsc/regexp/regexp1.html .

Ian

amit....@mail.huji.ac.il

unread,
Apr 4, 2014, 3:09:13 AM4/4/14
to golan...@googlegroups.com, amit....@mail.huji.ac.il
Thanks Ian.
The article you gave shows an alternative implementation to the standard one, but still I can't see how it implies a different behavior on the matcher. The regular quantifiers in regexp are not possessive (they match "aaa" for regex "a+a").
Are possessive quantifiers impossible to implement in the NFA approach?

Amit

Ian Lance Taylor

unread,
Apr 4, 2014, 9:49:41 AM4/4/14
to amit....@mail.huji.ac.il, golang-nuts
On Fri, Apr 4, 2014 at 12:09 AM, <amit....@mail.huji.ac.il> wrote:
>
> The article you gave shows an alternative implementation to the standard
> one, but still I can't see how it implies a different behavior on the
> matcher. The regular quantifiers in regexp are not possessive (they match
> "aaa" for regex "a+a").
> Are possessive quantifiers impossible to implement in the NFA approach?

I'm not an expert on this. Before I answer that question, can you
explain why it would be useful to have possessive quantifiers in a
regexp matcher that does not backtrack?

Ian

amit....@mail.huji.ac.il

unread,
Apr 4, 2014, 11:43:32 AM4/4/14
to golan...@googlegroups.com, amit....@mail.huji.ac.il
I'm not an expert on this.  Before I answer that question, can you
explain why it would be useful to have possessive quantifiers in a
regexp matcher that does not backtrack?
 
I need to use regular expressions for file paths, and sometimes I need it to match as much as it can without backtracking.
Now, I understand that it may be using a different implementation behind the scenes, but it still gives the results of a backtracking matcher (as expected).

For example: ([^/]+/)++(file_pattern) for matching file names without their directories. I need to be sure that the file pattern is matched against the file name alone. With just a greedy quantifier, it will backtrack and match it against full paths too - which happens in go.

thanks,
Amit

Jan Mercl

unread,
Apr 4, 2014, 11:49:06 AM4/4/14
to amit....@mail.huji.ac.il, golang-nuts
On Fri, Apr 4, 2014 at 5:43 PM, <amit....@mail.huji.ac.il> wrote:
> For example: ([^/]+/)++(file_pattern) for matching file names without their
> directories. I need to be sure that the file pattern is matched against the
> file name alone. With just a greedy quantifier, it will backtrack and match
> it against full paths too - which happens in go.

Package regexp does not backtrack. Can you please show an example on
the Go Playground with some concrete output and stating what is the
desired output instead?

And BTW, perhaps better use path.Base[0]. Probably less costly both in
space and time for "matching file names without their directories".

[0]: http://golang.org/pkg/path/#Base

-j

amit....@mail.huji.ac.il

unread,
Apr 4, 2014, 12:16:44 PM4/4/14
to golan...@googlegroups.com, amit....@mail.huji.ac.il
Thanks for the replies!

Here is an example that proves that it does backtrack:
http://play.golang.org/p/T1Tyg7Uzzz

Explanation: the first parenthesized expression should have matched "ccc/bbb/aaa/", but it backtracked and matched "ccc/bbb/", so that the right side would match. This is the expected behavior from the quantifiers I've used.

My question was why does it not have possessive quantifiers. In the above example, a possessive quantifier would cause a mismatch.

Thanks for the advice on Base, but I need the regex for more complicated matching on files. I'm using both packages (regexp & path) for this.

Cheers,
Amit

Jan Mercl

unread,
Apr 4, 2014, 12:33:17 PM4/4/14
to amit....@mail.huji.ac.il, golang-nuts
On Fri, Apr 4, 2014 at 6:16 PM, <amit....@mail.huji.ac.il> wrote:
> Here is an example that proves that it does backtrack:
> http://play.golang.org/p/T1Tyg7Uzzz

Package regexp does not backtrack.

> Explanation: the first parenthesized expression should have matched
> "ccc/bbb/aaa/", but it backtracked and matched "ccc/bbb/", so that the right
> side would match. This is the expected behavior from the quantifiers I've
> used.

I'm not sure if I understand what is the desired outcome as you've
unfortunately didn't provided what the expected output should be - as
I asked for. So I don't know if this is a solution to your task:
http://play.golang.org/p/3oHbx0EigU

If not, please provide the expected output you want.

-j

amit....@mail.huji.ac.il

unread,
Apr 4, 2014, 1:24:44 PM4/4/14
to golan...@googlegroups.com, amit....@mail.huji.ac.il
I'm not sure if I understand what is the desired outcome as you've
unfortunately didn't provided what the expected output should be - as
I asked for. So I don't know if this is a solution to your task:
http://play.golang.org/p/3oHbx0EigU

If not, please provide the expected output you want.

Sorry, I'm afraid my point wasn't clear enough.

I only asked why there are no possessive quantifiers in go's regexp.
It does backtrack as it provides the expected functionality of greedy and lazy quantifiers.

The example I gave was just a simplified demonstration of the kind of tasks I am dealing with, so I'm not asking for a solution for it (but thanks for the effort :)).
In my example, a possessive quantifier ( (([^/]+/)*+)(a.*) ) would result in a mismatch, as I said before - which is what I need.
When creating a regex, sometimes I need to know that a pattern had caught all the repetitions of something. This is achieved via possessive quantifiers. Why were they not implemented?

I hope that clears things.

Thanks again
Amit

Robert Johnstone

unread,
Apr 4, 2014, 1:44:26 PM4/4/14
to golan...@googlegroups.com, amit....@mail.huji.ac.il
Hello,

I believe that your problem is actually that your final group, (a.*), also matches forward slashes.  What you should instead write is (a[^/]*). Additionally, if you want a complete match (i.e. the regex must match the entire string), you need to also add a $ character to the end of your regex.  Combining these two changes, I believe that this is the regex that you are looking for:

(([^/]+/)*)(a[^/]*)$

To be clear, NFA based regex libraries definitely do not backtrack.

Ian Lance Taylor

unread,
Apr 4, 2014, 1:45:37 PM4/4/14
to amit....@mail.huji.ac.il, golang-nuts
On Fri, Apr 4, 2014 at 9:16 AM, <amit....@mail.huji.ac.il> wrote:
>
> Here is an example that proves that it does backtrack:
> http://play.golang.org/p/T1Tyg7Uzzz
>
> Explanation: the first parenthesized expression should have matched
> "ccc/bbb/aaa/", but it backtracked and matched "ccc/bbb/", so that the right
> side would match. This is the expected behavior from the quantifiers I've
> used.

The regexp package does not do any backtracking. I promise. If you
read http://swtch.com/~rsc/regexp/regexp1.html carefully you will see
how it is done. Or look at Ken Thompson's 1968 paper
(http://dl.acm.org/citation.cfm?doid=363347.363387).


> My question was why does it not have possessive quantifiers. In the above
> example, a possessive quantifier would cause a mismatch.

The only definitions of possessive quantifiers that I can find are
written in terms of implementation details of a backtracking
implementation. I'm not an expert, but I don't know how to apply
those to something like Go's regexp package. If somebody can define
possessive quantifiers in terms of a regular expression--not in terms
of the implementation of the matcher--then perhaps they can be
implemented. I really don't know.

Ian

Tamás Gulácsi

unread,
Apr 4, 2014, 1:50:44 PM4/4/14
to golan...@googlegroups.com
All I found about possessive quantigiers is that they are greedy, but don't backtrack - for speed optimization.

Can you give a good example for its use? Your filename matching pattern is wrong: use [^/] instead of . after the "a".

Brad Fitzpatrick

unread,
Apr 4, 2014, 2:23:18 PM4/4/14
to Ian Lance Taylor, amit....@mail.huji.ac.il, golang-nuts
In other words (I think), what program would a possessive quantifier generate?

Here's a greedy and non-greedy quantifier and their program:



What program would `[ab]++a` generate, such that it didn't match the string "aa"?

Seems like you'd need new instructions in the prog language?

amit....@mail.huji.ac.il

unread,
Apr 4, 2014, 3:03:18 PM4/4/14
to golan...@googlegroups.com, Ian Lance Taylor, amit....@mail.huji.ac.il
Ok, first let's make clear what a possessive quantifier is:
a quantifier that matches all the repetitions of its quantified pattern.

The performance issue is only a result of that definition, but is not a part of it.

Let's see an example: http://play.golang.org/p/bv5gndXOUE
As you can see, the possessive quantifier matches all the repetitions, and it doesn't release gradually like the greedy one.

I know that I can add [^/], but my real expressions are more complicated than that, so this solution is impractical for me.
My point is (third time lucky) - I need possessive quantifiers. Why can't I have them?

Ian Lance Taylor

unread,
Apr 4, 2014, 3:42:32 PM4/4/14
to amit....@mail.huji.ac.il, golang-nuts
On Fri, Apr 4, 2014 at 12:03 PM, <amit....@mail.huji.ac.il> wrote:
>
> Ok, first let's make clear what a possessive quantifier is:
> a quantifier that matches all the repetitions of its quantified pattern.

That implies that the only possible use of a possessive quantifier in
a non-backtracking regexp implementation is to turn some matches into
failures-to-match. Is that correct?

> The performance issue is only a result of that definition, but is not a part
> of it.

To be clear, implementing this feature in Go's regexp package (if
possible) will have no performance implications.

Ian

amit....@mail.huji.ac.il

unread,
Apr 4, 2014, 3:59:14 PM4/4/14
to golan...@googlegroups.com, amit....@mail.huji.ac.il
That implies that the only possible use of a possessive quantifier in
a non-backtracking regexp implementation is to turn some matches into
failures-to-match.  Is that correct?

Yes, I think you can put it this way.
 

To be clear, implementing this feature in Go's regexp package (if
possible) will have no performance implications.

Yes of course. That was clear from your first comment. :)

Ian Lance Taylor

unread,
Apr 4, 2014, 5:34:32 PM4/4/14
to amit....@mail.huji.ac.il, golang-nuts
On Fri, Apr 4, 2014 at 12:59 PM, <amit....@mail.huji.ac.il> wrote:
>> That implies that the only possible use of a possessive quantifier in
>> a non-backtracking regexp implementation is to turn some matches into
>> failures-to-match. Is that correct?
>
>
> Yes, I think you can put it this way.

http://golang.org/issue/7713

Ian
Message has been deleted

amit....@mail.huji.ac.il

unread,
Apr 5, 2014, 8:14:25 AM4/5/14
to golan...@googlegroups.com, amit....@mail.huji.ac.il
Thank you.

Michael Jones

unread,
Apr 5, 2014, 12:50:52 PM4/5/14
to amit....@mail.huji.ac.il, golang-nuts

http://play.golang.org/p/C0C1L01TX8

--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages