a minor regexp question (and a how-could-I-answer-it-myself? question)

17 views
Skip to first unread message

Tim Hanson

unread,
May 1, 2020, 8:20:51 AM5/1/20
to Racket Users
hi, just now I'm debugging a regular expression and trying to understand why this:

> (regexp-match-positions #rx"[-+][0-9]+" "-0500")
'((0 . 5))

works as I expect, whereas this:

> (regexp-match-positions #rx"[-+][0-9]{4}" "-0500")
#f

doesn't. (My naive opinion is the second expression should return the same answer as the first.)

Wondering whether I could discover for myself whether this is a known issue, I found directly:

https://github.com/racket/racket/search?q=regexp&type=Issues

or perhaps something like this:

https://github.com/racket/racket/search?q=regexp+type%3Aissue&unscoped_q=regexp+type%3Aissue

is better, but either way I'm quickly pretty lost in terms of deciding whether this relates to a known issue.


I'd appreciate advice on either question.

Jens Axel Søgaard

unread,
May 1, 2020, 8:34:30 AM5/1/20
to Tim Hanson, Racket Users
Den fre. 1. maj 2020 kl. 14.20 skrev Tim Hanson <tbh...@gmail.com>:
hi, just now I'm debugging a regular expression and trying to understand why this:

  > (regexp-match-positions #rx"[-+][0-9]+" "-0500")
  '((0 . 5))

works as I expect, whereas this:

  > (regexp-match-positions #rx"[-+][0-9]{4}" "-0500")
  #f

doesn't. (My naive opinion is the second expression should return the same answer as the first.)

The cause of the confusion is that there are several notations for regular expressions.
Racket supports two types "egrep" and "Perl" ones.
Turns out the repetition syntax with braces {} are only supported in Perl regular expressions.
Therefore you need to use #px rather than #rx.

The relevant section of the documentation:

image.png

 
Wondering whether I could discover for myself whether this is a known issue, I found directly:

  https://github.com/racket/racket/search?q=regexp&type=Issues

or perhaps something like this:

  https://github.com/racket/racket/search?q=regexp+type%3Aissue&unscoped_q=regexp+type%3Aissue

is better, but either way I'm quickly pretty lost in terms of deciding whether this relates to a known issue.

You need to look at docs.racket-lang.org instead. Then search for "regexp" or "regular expression".
You will then end up here:

https://docs.racket-lang.org/reference/regexp.html?q=regular%20expression

Also ... if it is any consolation ... you are not the first to be confused by this.

/Jens Axel

Racket Stories
https://racket-stories.com


 

Tim Hanson

unread,
May 1, 2020, 9:50:25 AM5/1/20
to Racket Users
Thanks, Jens, much appreciated. I suspect I even knew this once and had since forgotten it.

(I even glanced at the docs, saw the two kinds, but didn’t pause long enough to wonder if it mattered to me.)

David Storrs

unread,
May 1, 2020, 2:07:02 PM5/1/20
to Tim Hanson, Racket Users
For the record, it's probably better to stick with #px in all cases.  The vast majority of non-Racket code is based off the pcre library, which stands for "Perl-compatible regular expressions" so if you stick with #px the regex will be more familiar to more people.  Plus, standardizing on one notation will eliminate bugs like the one you're seeing.

On Fri, May 1, 2020 at 9:50 AM Tim Hanson <tbh...@gmail.com> wrote:
Thanks, Jens, much appreciated. I suspect I even knew this once and had since forgotten it.

(I even glanced at the docs, saw the two kinds, but didn’t pause long enough to wonder if it mattered to me.)

--
You received this message because you are subscribed to the Google Groups "Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to racket-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/racket-users/69320674-8272-4751-a6ce-40c140a7358c%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages