regex extractors crashing if input isn't right

85 views
Skip to first unread message

rtm...@googlemail.com

unread,
Jun 20, 2016, 2:19:59 PM6/20/16
to scala-user
Hi,
I've searched around but found nothing to say what should happen, and don't know if this is expected behaviour (certainly isn't desirable AFAICS).
As follows:

Welcome to Scala 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_65).
Type in expressions for evaluation. Or try :help.


scala> val r = "([aeiou]*)?([^aeiou]*)?".r
r: scala.util.matching.Regex = ([aeiou]*)?([^aeiou]*)?

scala> val r(vowels, consonants) = "aaabbb"
vowels: String = aaa
consonants: String = bbb


scala> val r(vowels, consonants) = "bbbaaa"
scala.MatchError: bbbaaa (of class java.lang.String)
  ... 32 elided


The last doesn't match so crashes outright. I'm surprised. Thoughts?

cheers

jan

Roland Kuhn

unread,
Jun 20, 2016, 2:52:33 PM6/20/16
to rtm...@googlemail.com, scala-user
Using an extractor in a val definition means the application of pattern matching with a single case, hence the MatchError is exactly what should happen.

But I wonder: what else would you expect?

Regards,

Roland

--
You received this message because you are subscribed to the Google Groups "scala-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scala-user+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

rtm...@googlemail.com

unread,
Jun 20, 2016, 3:18:44 PM6/20/16
to scala-user, rtm...@googlemail.com
Hi Roland,
if I do


val r = "([aeiou]*)?([^aeiou]*)?".r

then make it a var not a val for the extractor:

scala> var r(vowels, consonants) = "bbbaaa"

scala.MatchError: bbbaaa (of class java.lang.String)
  ... 32 elided


I get the same, so what's the issue with the "val definition" you mention? I don't understand.

What I was expecting was vowel and consonant to be both None, not a crash. Otherwise you have to know the data going into your extractor is in the right format beforehand, which is extra work which almost negates the point of the extractor - you might as well ask to match on the regex with capture brackets then pull out the captured contents if the match was successful.

cheers

jan

Clint Gilbert

unread,
Jun 20, 2016, 4:04:24 PM6/20/16
to scala...@googlegroups.com
The line

> val r(vowels, consonants) = "aaabbb"

creates two vals, 'vowels' and 'consonants', and gives them values by
doing a pattern match via the regex object 'r'. If the match doesn't
succeed - in your case because the string on the right isn't matched by
the regex, then it's impossible to give values to 'vowels' and 'consonants'.



On 06/20/2016 02:19 PM, rtm443x via scala-user wrote:
> Hi,
> I've searched around but found nothing to say what should happen, and
> don't know if this is expected behaviour (certainly isn't desirable AFAICS).
> As follows:
>
> /Welcome to Scala 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_65).
> Type in expressions for evaluation. Or try :help./
>
> *scala> val r = "([aeiou]*)?([^aeiou]*)?".r*
> /r: scala.util.matching.Regex = ([aeiou]*)?([^aeiou]*)?/
>
> *scala> val r(vowels, consonants) = "aaabbb"*
> /vowels: String = aaa
> consonants: String = bbb/
>
> *scala> val r(vowels, consonants) = "bbbaaa"*
> /scala.MatchError: bbbaaa (of class java.lang.String)
> ... 32 elided/
>
> The last doesn't match so crashes outright. I'm surprised. Thoughts?
>
> cheers
>
> jan
>
> --
> You received this message because you are subscribed to the Google
> Groups "scala-user" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to scala-user+...@googlegroups.com
> <mailto:scala-user+...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.

--



signature.asc

rtm...@googlemail.com

unread,
Jun 20, 2016, 4:41:13 PM6/20/16
to scala-user
> then it's impossible to give values to 'vowels' and 'consonants'.

quite so, hence my expectation they'd be None; no value.

cheers

jan
--



Clint Gilbert

unread,
Jun 20, 2016, 4:57:18 PM6/20/16
to scala...@googlegroups.com
Aha, got it. None is one of the possibilities for Option[T]. Do you
mean null, or something like it?

Variable-creation in pattern-matches could be implemented such that the
created variables are Options, but then you'd have to unwrap those
options, map/flatMap them, etc, which would be a lot of overhead.

In general, the bias in Scala is away from null, and toward variables
having real values or not existing at all.
signature.asc

rtm...@googlemail.com

unread,
Jun 20, 2016, 5:56:11 PM6/20/16
to scala-user
Hi, thanks for getting back,
no I really did mean None as a subtype of Option[String].
Oddly enough the Odersky Scala book (2nd ed) does show regexp matches returning true nulls, see P611/612, also on the same pages for regexp searches using findFirstIn/findPrefixOf they show results of None and Some(...). Here goes:

scala> val Decimal = """(-)?(\d+)(\.\d*)?""".r
Decimal: scala.util.matching.Regex = (-)?(\d+)(\.\d*)?

scala> val input = "for -1.0 to 99 by 3"
input: String = for -1.0 to 99 by 3

scala> Decimal findFirstIn input
res3: Option[String] = Some(-1.0)    // note the option here

scala> Decimal findPrefixOf input
res4: Option[String] = None             // and here

scala>

scala> val Decimal(sign, intpart, decpart) = "1.0"
sign: String = null    // and here's a null!
intpart: String = 1
decpart: String = .0


I don't quite get it but the optional stuff is to do with capture brackets - no capture = no result (null or None) - but that doesn't help AFAICS for when the entire thing doesn't match. Maybe if I make the entire regexp optional that would work, as in prevent a crash? Like:

val Decimal = """((-)?(\d+)(\.\d*)?)?""".r

but that looks dodgy and I dn't have time to experiment.

cheers

jan

rtm...@googlemail.com

unread,
Jun 20, 2016, 5:58:25 PM6/20/16
to scala-user
Sorry I missed the bit "but then you'd have to unwrap those options, map/flatMap them, etc, which would be a lot of overhead"
Perhaps, but is it much different from having to pre-check your input to make sure it doesn't bomb your program? Some and its ilk do seem a bit clumsy - but maybe I'll change my mind with experience.

thanks

jan

Clint Gilbert

unread,
Jun 20, 2016, 6:27:59 PM6/20/16
to scala...@googlegroups.com
Code like

val Decimal(sign, intpart, decpart) = "1.0"

desugars into (among other things) calls to Decimal.unapply(...) (or
maybe unapplySeq(...), I don't know offhand), the results of which are
further massaged by the compiler and runtime. That's different from
user code explicitly calling methods like findFirstIn.

In general,

val Foo(myVal) = bar

is very roughly equivalent to

val myVal = Foo.unapply(bar) match {
case Some(result) => result
case _ => throw new MatchError(...)
}

Why it's this way is more than I know. I suspect it's to avoid
returning every match result as an Option that users need to unwrap.
The tradeoff is that non-exhaustive matches (like you val Decimal...
line) can fail at runtime. Personally, I don't write non-exhaustive
matches anywhere but tests, unless I can prove they will succeed.


On 06/20/2016 05:56 PM, rtm443x via scala-user wrote:
> Hi, thanks for getting back,
> no I really did mean None as a subtype of Option[String].
> Oddly enough the Odersky Scala book (2nd ed) does show regexp matches
> returning true nulls, see P611/612, also on the same pages for regexp
> searches using findFirstIn/findPrefixOf they show results of None and
> Some(...). Here goes:
>
> scala> *val Decimal = """(-)?(\d+)(\.\d*)?""".r*
> /Decimal: scala.util.matching.Regex = (-)?(\d+)(\.\d*)?/
>
> scala>*val input = "for -1.0 to 99 by 3"*
> /input: String = for -1.0 to 99 by 3/
>
> scala> *Decimal findFirstIn input*
> /res3: Option[String] = Some(-1.0)/ // note the option here
>
> scala> *Decimal findPrefixOf input*
> /res4: Option[String] = None/ // and here
>
> scala>
>
> scala> *val Decimal(sign, intpart, decpart) = "1.0"*
> /sign: String = null /// and here's a null!
> /intpart: String = 1
> decpart: String = .0/
signature.asc

Roland Kuhn

unread,
Jun 21, 2016, 2:45:15 AM6/21/16
to rtm...@googlemail.com, scala-user
Whether to extract String or Option[String] is a choice that is made by the extractor: you can easily experiment with your desired approach by writing an extractor that goes the Option route (but you need to manually create the Matcher so that you can also get the groupCount when the pattern does not match).

Regards,

Roland

--
You received this message because you are subscribed to the Google Groups "scala-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scala-user+...@googlegroups.com.

Rüdiger Klaehn

unread,
Jun 21, 2016, 3:05:20 AM6/21/16
to Roland Kuhn, rtm...@googlemail.com, scala-user
What about this?

val pattern = "([aeiou]*)?([^aeiou]*)?".r
pattern.unapplySeq("aaabbb") match {
  case Some(Seq(a,b)) => println(s"$a $b")
  case _ => println("no match")
}

pattern.unapplySeq returns an Option[List[String]], which you can then map/flatmap or match.

Another alternative, which works better if you want to match multiple patterns:

"aaabbb" match {
  case pattern(a,b) => println(s"$a $b")
  case _ => println("no match")
}

Michal Politowski

unread,
Jun 21, 2016, 9:51:40 AM6/21/16
to scala-user
On Mon, 20 Jun 2016 11:19:59 -0700, rtm443x via scala-user wrote:
> Hi,
> I've searched around but found nothing to say what should happen, and don't
> know if this is expected behaviour (certainly isn't desirable AFAICS).
> As follows:
>
>
> *Welcome to Scala 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java
> 1.8.0_65).Type in expressions for evaluation. Or try :help.*
>
> *scala> val r = "([aeiou]*)?([^aeiou]*)?".r*
> *r: scala.util.matching.Regex = ([aeiou]*)?([^aeiou]*)?*
>
> *scala> val r(vowels, consonants) = "aaabbb"*
>
> *vowels: String = aaaconsonants: String = bbb*
>
> *scala> val r(vowels, consonants) = "bbbaaa"*
>
> *scala.MatchError: bbbaaa (of class java.lang.String) ... 32 elided*
>
> The last doesn't match so crashes outright. I'm surprised. Thoughts?

This is the expected behaviour per Scala language specification.
http://www.scala-lang.org/files/archive/spec/2.11/04-basic-declarations-and-definitions.html#value-declarations-and-definitions
http://www.scala-lang.org/files/archive/spec/2.11/08-pattern-matching.html#pattern-matching-expressions

When you are using patterns in value definitions, they better
be irrefutable (or otherwise provably matching).

In other cases the match expression is your friend:
https://ideone.com/fWKvgY

--
Michał Politowski

rtm...@googlemail.com

unread,
Jun 21, 2016, 10:27:32 AM6/21/16
to scala-user, mp...@meep.pl
"In other cases the match expression is your friend: https://ideone.com/fWKvgY"

Oh fantastic, treat it as a partial function! So obvious (I am a n00b though).

cheers

jan

(I did look in the specs but couldn't find anything).

som-snytt

unread,
Jul 15, 2016, 4:08:36 PM7/15/16
to scala-user, mp...@meep.pl

There are a couple of intersecting features that might be confusing in this example.

The first is that the pattern match must match all of input by default.  Make your regex "unanchored" to see the optional groups matched as empty strings. (An optional group matches anything, but x* always matches empty input.)

scala> val r = "([aeiou]*)?([^aeiou]*)?".r
r: scala.util.matching.Regex = ([aeiou]*)?([^aeiou]*)?

scala> val r(vowels, consonants) = "bbbaaa"
scala.MatchError: bbbaaa (of class java.lang.String)
  ... 28 elided

scala> val r = "([aeiou]*)?([^aeiou]*)?".r.unanchored
r: scala.util.matching.UnanchoredRegex = ([aeiou]*)?([^aeiou]*)?


scala> val r(vowels, consonants) = "bbbaaa"
vowels: String = ""

consonants: String = bbb

scala> val r(vowels, consonants) = ""
vowels: String = ""
consonants: String = ""



The behavior of regextractor is to supply None for optional groups. I think there's a ticket for that; Daniel Sobral wanted it.

scala> import regextractor._
import regextractor._

scala> val r = r"(a*)?(b*)?"
r: regextractor.Regex[(Option[String], Option[String])] = (a*)?(b*)?

scala> val r(as, bs) = "aaabbb"
as: Option[String] = Some(aaa)
bs: Option[String] = Some(bbb)

scala> val r(as, bs) = "bbb"
as: Option[String] = Some()
bs: Option[String] = Some(bbb)

scala> val r = r"(a)?(b*)"
r: regextractor.Regex[(Option[String], String)] = (a)?(b*)

scala> val r(a, bs) = "abb"
a: Option[String] = Some(a)
bs: String = bb

scala> val r(a, bs) = "bb"
a: Option[String] = None
bs: String = bb


Maybe I'll publish regextractor and then maybe SLIP it in. https://github.com/som-snytt/regextractor

som-snytt

unread,
Jul 15, 2016, 4:13:36 PM7/15/16
to scala-user, mp...@meep.pl
The other syntax is:

scala> val r"${a}(a)?${bs}(b*)" = ""

a: Option[String] = None
bs: String = ""

scala> val r"${a}(a)?${bs}(b*)" = "abbb"

a: Option[String] = Some(a)
bs: String = bbb
Reply all
Reply to author
Forward
0 new messages