Regexp Capture Groups

Derek Tracy

unread,

Jul 5, 2012, 1:52:56 PM7/5/12

to golan...@googlegroups.com

I am finally sitting down to start writing utilities using Go. The first one on my list is a utility that parses my broadband usage and displays how much bandwidth utilization I have left (MB remaining, % remaining and refill time).

I am successfully pulling the status page using net/http, the return from that is a horribly ugly web-page that I need to parse. I started using the regexp package to parse the output and can pull the string but there is a lot of text around also getting pulled (I just want the number for the percentage not the html tags surrounding).

Here is the text I am pulling. In this example only 67 should return as my percentage.

<td style="border-width:0px;">Allowance Remaining (%)</td><td style="border-width:0px;">67</td>

My regexp regex snippet:

pct_remaining_regex, err := regexp.Compile("(?:<td style=\"border-width:0px;\">Allowance Remaining \\(%\\)</td><td style=\"border-width:0px;\">([0-9]+)</td>)")

if err != nil {

log.Fatal(err)

}

pct_remaining := pct_remaining_regex.Find(allowance_req)

I tried to use capture groups but I am unsure how to use them properly in go. Using my snippet

pct_remaining = <td style="border-width:0px;">Allowance Remaining (%)</td><td style="border-width:0px;">67</td>

I also tried to use SubexpNames like below:

numSub := pct_remaining_regex.NumSubexp()

names := pct_remaining_regex.SubexpNames()

here numSub gets set to 1 as I would expect but both names[0] and names[1] are blank.

Any advice would be greatly appreciated.

---------------------------------
Derek Tracy
tra...@gmail.com
---------------------------------

Kamil Kisiel

unread,

Jul 5, 2012, 2:34:04 PM7/5/12

to golan...@googlegroups.com

In order for SubexpNames to be useful you need to have named capturing groups in your regular expression. They're created with the (?P<capture_group_name>...) syntax. Otherwise you can just index in to a capturing group with a numeric index, but you need to use the *Submatch() variant of the regexp functions.

I've created an example to illustrate the usage:

http://play.golang.org/p/RnsOzFajc0

Derek Tracy

unread,

Jul 5, 2012, 3:42:44 PM7/5/12

to Kamil Kisiel, golan...@googlegroups.com

Thank you, this makes a lot more sense, trying to find regexp examples online using Go was not working.

Kyle Lemons

unread,

Jul 5, 2012, 10:45:02 PM7/5/12

to Derek Tracy, Kamil Kisiel, golan...@googlegroups.com

As another suggestion: you might also have luck naively stripping out HTML tags before doing regexp matches. Something simple like

regexp.MustCompile(`</?[^>]+>`).ReplaceAllString(page, " ") // obviously not as a one-liner

might do the trick.

senior7515

unread,

Jul 6, 2012, 12:32:35 AM7/6/12

to golang-nuts

I didn't know about (?P<name> Thanks for giving an example.

> > trac...@gmail.com
> > ---------------------------------

Reply all

Reply to author

Forward