Regex without lookahead/lookback

799 views
Skip to first unread message

Donovan Hide

unread,
Jan 28, 2013, 12:37:11 PM1/28/13
to golang-nuts
Hi,

I've got this slightly painful regex:

        ^(:?(\\d+(-(\\d+))?))+$

which should match according to the following examples:

        {"", true},
{"1", true},
{"1:3", true},
{"1-2:3", true},
{"1-2:3-2", true},
{"1-2:3-2", true},
{"-", false},
{":", false},
{"1-", false},
{"-1", false},
{"1-2:", false},
{":1-2", false},
{"asas1-2asas", false},
{"1:2asas", false},

I have a problem with the ":1-2" case. I can't work out how to exclude the first occurrence of the colon without using lookahead/lookback which isn't part of RE2/regexp.

Anyone got any ideas? Thanks in advance!

Donovan. 

Jan Mercl

unread,
Jan 28, 2013, 12:51:29 PM1/28/13
to Donovan Hide, golang-nuts

Not tested at all: ^\d+(-\d+)?(:\d+(-\d+)?)?$

-j

Donovan Hide

unread,
Jan 28, 2013, 1:05:14 PM1/28/13
to Jan Mercl, golang-nuts
Not tested at all: ^\d+(-\d+)?(:\d+(-\d+)?)?$
-j

Many thanks! It works! Ended up with this slightly extended monstrosity:

"^(?P<range>\\d+(-\\d+)?(:(?P<range>\\d+(-\\d+)?))?)*$"

to deal with the blank case and named parameters. Is this the sort of thing that would be better with a lexer? If so, any suggestions on how to get started with what's in the standard library?

Cheers,
Donovan.

Kyle Lemons

unread,
Jan 28, 2013, 1:34:39 PM1/28/13
to Donovan Hide, Jan Mercl, golang-nuts
I would just go with what you mean :).  I'm not sure I know precisely what you're trying to parse, but it seems like the following might work:

type range struct {*lo, *hi int}
rangeStrs := strings.Split(input, ":")
var ranges []range
for _, s := range rangeStrs {
  pieces := strings.Split(s, "-")
  switch len(pieces) {
    case 1:
      // single
    case 2:
      // range
  }
}


--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group, send email to golang-nuts...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Donovan Hide

unread,
Jan 28, 2013, 1:52:39 PM1/28/13
to Kyle Lemons, Jan Mercl, golang-nuts
Hi Kyle,

thanks for the reply! I did it in a way very similar to that in the first instance :-) The spec is to define multiple intervals that can be encoded into a URL, validated by a gorilla Mux router (hence the regex) and then transformed into a MongoDB query represented as  a bson.M map.

This was before I realised that named groups don't really work in the way I'm used to with Python's regex implementation :-)

An example is:

1-3:5-7:9 results in the interval set 1,2,3,5,7,9 and the Mongo query looks a bit like:

{ $or: [ { _id.doctype: { $lte: 3, $gte: 1 } }, { _id.doctype: { $gte: 5, $lte: 7 } }, { _id.doctype: 9 } ] }

I suppose I can just use the regex for validation only and not for the extraction of the ranges. Just seemed like it would be simpler to use regular expressions. I think I was mistaken!

I remember watching Rob's talk on lexers and thought this might be a simple way to try one out, but maybe it is too simple for that...

Cheers,
Donovan. 


Nate Finch

unread,
Jan 28, 2013, 2:01:42 PM1/28/13
to golan...@googlegroups.com, Kyle Lemons, Jan Mercl
Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems.

Donovan Hide

unread,
Jan 28, 2013, 2:06:48 PM1/28/13
to Nate Finch, golang-nuts, Kyle Lemons, Jan Mercl
I resemble that quote :-)

Trouble is that nearly every web framework implements it's URL routing via regular expressions, so it's not something that is going to go away any time soon! 

Jan Mercl

unread,
Jan 28, 2013, 2:09:03 PM1/28/13
to Nate Finch, Kyle Lemons, golang-nuts

... which in no way can be something to blame REs for.

Screwing with a hammer is as wise as hammering with a screwdriver.

-j

Donovan Hide

unread,
Jan 28, 2013, 2:16:53 PM1/28/13
to Jan Mercl, Nate Finch, Kyle Lemons, golang-nuts
Thread quickly descends into regular expression war of responsiblity :-) 

... which in no way can be something to blame REs for.

Screwing with a hammer is as wise as hammering with a screwdriver.

Does anyone have experience of using Go for extremely simple parsing/lexing of url fragments, such as multiple intervals/ranges? That's probably a more constructive direction :-)

bryanturley

unread,
Jan 28, 2013, 2:17:49 PM1/28/13
to golan...@googlegroups.com, Nate Finch, Kyle Lemons, Jan Mercl


On Monday, January 28, 2013 1:06:48 PM UTC-6, Donovan wrote:
I resemble that quote :-)

I don't know man he may have a point.

resemble: to be like or similar to

Using regex may have caused you to express yourself irregularly ;)

Jan Mercl

unread,
Jan 28, 2013, 2:19:45 PM1/28/13
to Donovan Hide, golang-nuts, Kyle Lemons, Nate Finch

No war. Wanted to point out that the tool is not the culprit. A wrong choice of a tool is ;-)

-j

Andy Balholm

unread,
Jan 28, 2013, 7:40:42 PM1/28/13
to golan...@googlegroups.com, Nate Finch, Kyle Lemons, Jan Mercl
On Monday, January 28, 2013 11:17:49 AM UTC-8, bryanturley wrote:
On Monday, January 28, 2013 1:06:48 PM UTC-6, Donovan wrote:
I resemble that quote :-)

I don't know man he may have a point.

resemble: to be like or similar to

Using regex may have caused you to express yourself irregularly ;)

I think he intended to say "resemble". "I resemble that remark" is a common quote (attributed variously to Groucho Marx and the Three Stooges).  

Donovan Hide

unread,
Jan 28, 2013, 8:24:18 PM1/28/13
to Andy Balholm, golang-nuts, Nate Finch, Kyle Lemons, Jan Mercl
I think he intended to say "resemble". "I resemble that remark" is a common quote (attributed variously to Groucho Marx and the Three Stooges).  

Correct :-)

Nate Finch

unread,
Jan 28, 2013, 8:40:47 PM1/28/13
to Donovan Hide, golan...@googlegroups.com, Kyle Lemons, Jan Mercl

My apologies for derailing the thread. In penance, I'll spend some time trying to grok your regex problem and see if I can help figure out where you're going astray.  Can't do it until the morning, though, kids and wives take precedence over penance.

DisposaBoy

unread,
Jan 29, 2013, 3:02:14 AM1/29/13
to golan...@googlegroups.com
the fact that you're having trouble solving the problem tells me that it really isn't as extremely simple as you'd like it to be but whatever... given a problem like that I wouldn't try to force a url parser if you will, to parse it . I'd try first to extract the entire string as one blob and then parse that blob with the proper tools

Nate Finch

unread,
Jan 29, 2013, 3:34:02 AM1/29/13
to golan...@googlegroups.com, Donovan Hide
Note mux's 

func (r *Router) MatcherFunc(f MatcherFunc) *Route
and
type MatcherFunc func(*http.Request, *RouteMatch) bool

That seems to indicate you can make a custom matching function that doesn't use a regex.

I'd do a simple split on : and then split on -


parts := strings.Split(request.URL, ":")
for _, s := range parts {
    if strings.Contains(s, "-") {
          ranges := strings.Split(s, "-")
          if len(ranges) != 2 {
              return false
          }
          start, err := strconv.Atoi(ranges[0])
          if err != nil {
              return false
          }
          end, err := strconv.Atoi(ranges[1])
          if err != nil {
              return false
          }
          // handle range from start to end
    }
    if single, err := stringconv.Atoi(s); err != nil {
        return false
    } else {
        // handle single number
    }
}

Sure, it's a lot more code than a regex... but you'll actually still know what it does in 6 months, and it's a hell of a lot easier to debug.

Rob Pike

unread,
Jan 29, 2013, 4:09:09 AM1/29/13
to Nate Finch, golan...@googlegroups.com, Donovan Hide

Donovan Hide

unread,
Jan 29, 2013, 4:43:50 AM1/29/13
to Rob Pike, Nate Finch, golang-nuts
Hi,

thank you to everyone for their cautionary tales on regular expressions. I understand fully the pitfalls of making the choice to go regex! I had a fully working parser using string splitting and strconv before I made this post. The issue I have is that will receive many different forms of URL containing one or more ranges. For example:

/association/5:1-2/
/association/1/78-100/
/document/6-9:100-200:999/

So I need to be able to validate these URLS and then pass each range set (the full text between the slashes) into a function that extracts the individual ranges. The point of the post was to see if it was possible to combine validation and parsing. I quickly realised this was not a good idea! However, I'm not going to be able to get rid of the regex validation without rewriting Gorilla's RouteMatch functionality. I don't want to do this because I'd then have to write non-regex validators for other complicated sections such as Mongo ObjectId's.

The regexes I have are:

const docRegex = "[0-9]+"
const queueRegex = "[0-9a-f]{24}"
const rangeRegex = "(\\d+(-\\d+)?(:\\d+(-\\d+)?)?)*"

and these get slotted into the routing table to give:

/document/
/document/{doctypes:(\d+(-\d+)?(:\d+(-\d+)?)?)*}/
/document/{doctype:[0-9]+}/{docid:[0-9]+}/
/association/
/association/{source:(\d+(-\d+)?(:\d+(-\d+)?)?)*}/
/association/{source:(\d+(-\d+)?(:\d+(-\d+)?)?)*}/{target:(\d+(-\d+)?(:\d+(-\d+)?)?)*}/
/queue/
/queue/{id:[0-9a-f]{24}}/
/index/
/search/

The rangeRegex is the ugly one. If anyone has a simplification of it, that would be really useful. If someone wants to write a non-regex URL router which can interpret something a bit like the Go's EBNF spec, that would be truly groundbreaking :-)

Cheers,
Donovan.

Greg Ward

unread,
Jan 31, 2013, 3:19:08 PM1/31/13
to Donovan Hide, Jan Mercl, golang-nuts
On 28 January 2013, Donovan Hide said:
> Many thanks! It works! Ended up with this slightly extended monstrosity:
>
> "^(?P<range>\\d+(-\\d+)?(:(?P<range>\\d+(-\\d+)?))?)*$"
>
> to deal with the blank case and named parameters. Is this the sort of thing
> that would be better with a lexer? If so, any suggestions on how to get
> started with what's in the standard library?

I believe the lexing tools in the standard library are just for lexing
Go programs, not for building general-purpose lexers. For that you
need golex: https://github.com/cznic/golex

Greg
--
Greg Ward http://www.gerg.ca
<gr...@gerg.ca> @gergdotca
Reply all
Reply to author
Forward
0 new messages