Non-greedy regex with the "regexp/syntax" package

1,380 views
Skip to first unread message

Steve Phillips

unread,
Jun 9, 2012, 3:28:22 AM6/9/12
to golang-nuts
I'm trying to do non-greedy regex. I think the problem is that I
don't understand the relationship between a *syntax.Regexp variable to
a *regexp.Regexp. Converting a *syntax.Regexp variable to type
*regexp.Regexp did not work.

If there is some way to do something like `flags := syntax.DotNL |
syntax.OneLine | syntax.NonGreedy` using the regexp package without
having to dive into regexp/syntax, that would be ideal, but that
doesn't seem to be possible.

Here's the error I get when running the code below:
"getComment.FindAllString undefined (type *syntax.Regexp has no field
or method FindAllString)". That's because *regexp.Regexp variables/
objects have a method `FindAllString` but *syntax.Regexp vars don't.

How should I proceed? Thanks!

****************************************************************

package main

import (
"fmt"
// "regexp"
"regexp/syntax"
)

func main() {
html := `<div><p>Some comment</p></div><div><p>Another one</p></div>`
pat := `<p>(.*?)</p>`
flags := syntax.DotNL | syntax.OneLine | syntax.NonGreedy
getComment, err := syntax.Parse(pat, flags)
if err != nil {
fmt.Printf("Error: %s\n", err)
return
}
comments := getComment.FindAllString(html, -1)
fmt.Printf("%T, %s\n", getComment, html)
}

Steve Phillips

unread,
Jun 9, 2012, 4:39:55 AM6/9/12
to golang-nuts
Nevermind -- got it :-). See code below, especially the `(?U)` at the
beginning of the regexp pattern string (which I called `pat`).

****************************************************************

package main

import (
"fmt"
"regexp"
)

func main() {
html := `<div><p>Some comment</p></div><div><p>Another one</p></div>`
pat := `(?U)<p>(.*)</p>`
getComment := regexp.MustCompile(pat)
comments := getComment.FindAllString(html, -1)
for _, c := range comments {
fmt.Printf("%s\n", c)

Steve Phillips

unread,
Jun 9, 2012, 4:43:08 AM6/9/12
to golang-nuts
In addition to

pat := `(?U)<p>(.*)</p>`

this also works:

pat := `<p>(.*?)</p>`

which fellow Python programmers will find more familiar.


--Steve Phillips / elimisteve
Reply all
Reply to author
Forward
0 new messages