Regex in pattern matching - possibility for compile time checks on number of capture groups?

68 views
Skip to first unread message

Asko Kauppi

unread,
Jan 15, 2015, 5:35:16 AM1/15/15
to scala...@googlegroups.com
I spent much of yesterday and this morning, trying to understand the subtleties of using regex's in pattern matching.

Ended up making a list of pitfalls to my colleagues, in order to keep our code fresh and working in the future, as well. The problem seems to be, quite a lot is left to the runtime, which is counter to the overall goals of the Scala language. So I wonder if something can be done about this in Scala language development.

Sample code:

    val Re = """.*cd""".r

    val tmp=
      "abcd" match {
        case Re() => 0      // gets here if no captures
        case Re(_) => 1     // gets here if there's one capture
        case Re(_,_) => 2   // gets here if there's two captures
        case _ => -1
      }
    tmp shouldBe 0

This compiles, and runs.

In any use case I can think of, a regex would always have a pre-determined number of capture groups (0 or more). If the compiler knows this number, it could err on the cases that would never be matching.

Let's modify the code a bit:

    val Re = """.*(c)d""".r

    val tmp=
      "abcd" match {
        case Re() => ...
      }

I've added a capture group but am only matching against a no-catch extractor. This compiles, and runs, without any warnings, but the match will never happen (since there's no 'Re(_)' case). If the compiler knew the number of captures in a 'Regex' it would not compile such code.

Is doing such an improvement (more compile time checks in using Regex's in pattern matching) possible, in the future Scala versions?

- Asko Kauppi

Rodrigo Cano

unread,
Jan 15, 2015, 8:29:50 AM1/15/15
to Asko Kauppi, scala-user
Its possible to do it now with macros and string interpolators really

--
You received this message because you are subscribed to the Google Groups "scala-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scala-user+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Som Snytt

unread,
Jan 15, 2015, 11:18:00 AM1/15/15
to Rodrigo Cano, Asko Kauppi, scala-user
I just started looking at regextractor again last night, to make it ready for prime time.

The macro produces an unapply with optional groups producing Option[String] instead of null.

The interpolator version checks that the number of holes equals the number of groups:

Som Snytt

unread,
Jan 15, 2015, 11:58:53 AM1/15/15
to Rodrigo Cano, Asko Kauppi, scala-user
Just back from the morning walk to school, I had forgotten how it works.

scala> import regex._
import regex._

scala> gr"foo"
res0: regex.Gregex[Nothing] = foo

scala> gr"(abc)"
res1: regex.Gregex[String] = (abc)

scala> import PartialFunction.{cond=>when}
import PartialFunction.{cond=>when}

scala> when("abc") { case res1() => true }
<console>:13: error: not enough patterns for <$anon: AnyRef> offering String: expected 1, found 0
              when("abc") { case res1() => true }
                                 ^

scala> "abc" match { case res1(x) => x }
res5: String = abc

scala> "abc" match { case res1(xs @ _*) => xs.head }
<console>:13: error: Star pattern must correspond with varargs or unapplySeq
              "abc" match { case res1(xs @ _*) => xs.head }
                                 ^

scala> "abc" match { case gr"a$x(b).*" => x }
res9: String = b

scala> "abc" match { case gr"a$_(b).*" => true }
res10: Boolean = true

scala> "abc" match { case gr"a(b).*" => true }
<console>:12: error: 1 groups to extract 0 strings.
              "abc" match { case gr"a(b).*" => true }
                                 ^

scala> "ac" match { case gr"a$x(b)?.*" => x }
res12: Option[String] = None

I remember one important issue is to move pattern compilation out of the unapply.

Som Snytt

unread,
Jan 15, 2015, 12:08:46 PM1/15/15
to Asko Kauppi, scala-user
Some named group support:

scala> val r = gr"a(?<res>b)c"
r: regex.Gregex[String] = a(b)c

scala> val all = r findAllMatchIn "abcdefabc"
all: Iterator[scala.util.matching.Regex.Match] = non-empty iterator

scala> all map (_ group "res")
res13: Iterator[String] = non-empty iterator

scala> .toList
res14: List[String] = List(b, b)

Simon Ochsenreither

unread,
Jan 15, 2015, 6:04:29 PM1/15/15
to scala...@googlegroups.com, ioni...@gmail.com, aka...@gmail.com
That's fantastic to hear!
Did you see my recent proposal to add something like this in one of the next versions of Scala? What do you think? :-)
Reply all
Reply to author
Forward
0 new messages