Regexp Capture Groups

2,935 views
Skip to first unread message

Derek Tracy

unread,
Jul 5, 2012, 1:52:56 PM7/5/12
to golan...@googlegroups.com
I am finally sitting down to start writing utilities using Go.  The first one on my list is a utility that parses my broadband usage and displays how much bandwidth utilization I have left (MB remaining, % remaining and refill time).

I am successfully pulling the status page using net/http, the return from that is a horribly ugly web-page that I need to parse.  I started using the regexp package to parse the output and can pull the string but there is a lot of text around also getting pulled (I just want the number for the percentage not the html tags surrounding).

Here is the text I am pulling.  In this example only 67 should return as my percentage.
<td style="border-width:0px;">Allowance Remaining (%)</td><td style="border-width:0px;">67</td>
 
My regexp regex snippet:
pct_remaining_regex, err := regexp.Compile("(?:<td style=\"border-width:0px;\">Allowance Remaining \\(%\\)</td><td style=\"border-width:0px;\">([0-9]+)</td>)")
        if err != nil {
                log.Fatal(err)
        }

pct_remaining := pct_remaining_regex.Find(allowance_req)

I tried to use capture groups but I am unsure how to use them properly in go.  Using my snippet 
pct_remaining = <td style="border-width:0px;">Allowance Remaining (%)</td><td style="border-width:0px;">67</td>

I also tried to use SubexpNames like below:

numSub := pct_remaining_regex.NumSubexp()
names := pct_remaining_regex.SubexpNames()

here numSub gets set to 1 as I would expect but both names[0] and names[1] are blank.

Any advice would be greatly appreciated.


---------------------------------
Derek Tracy
tra...@gmail.com
---------------------------------

Kamil Kisiel

unread,
Jul 5, 2012, 2:34:04 PM7/5/12
to golan...@googlegroups.com
In order for SubexpNames to be useful you need to have named capturing groups in your regular expression. They're created with the (?P<capture_group_name>...) syntax. Otherwise you can just index in to a capturing group with a numeric index, but you need to use the *Submatch() variant of the regexp functions.

I've created an example to illustrate the usage:

Derek Tracy

unread,
Jul 5, 2012, 3:42:44 PM7/5/12
to Kamil Kisiel, golan...@googlegroups.com
Thank you, this makes a lot more sense, trying to find regexp examples online using Go was not working.

Kyle Lemons

unread,
Jul 5, 2012, 10:45:02 PM7/5/12
to Derek Tracy, Kamil Kisiel, golan...@googlegroups.com
As another suggestion: you might also have luck naively stripping out HTML tags before doing regexp matches.  Something simple like
regexp.MustCompile(`</?[^>]+>`).ReplaceAllString(page, " ") // obviously not as a one-liner
might do the trick.

senior7515

unread,
Jul 6, 2012, 12:32:35 AM7/6/12
to golang-nuts
I didn't know about (?P<name> Thanks for giving an example.
> > trac...@gmail.com
> > ---------------------------------
Reply all
Reply to author
Forward
0 new messages