new motivation for applyIfDefined

395 views
Skip to first unread message

Paul Phillips

unread,
Jan 26, 2011, 2:15:10 PM1/26/11
to martin odersky, scala-i...@googlegroups.com
[Resending with new internals address.]

Here is a failure from the scala test suite I witnessed this morning.

[partest] Log file '/scratch/trunk1/test/files/pos/t3866-pos.log':
[partest] scala.MatchError: /scratch/trunk1/test/files/pos/t1987-pos.obj (of class scala.tools.nsc.io.PlainFile)
[partest] at scala.tools.nsc.util.DirectoryClassPath$$anonfun$packages$2.apply(ClassPath.scala:318)
[partest] at scala.tools.nsc.util.DirectoryClassPath$$anonfun$packages$2.apply(ClassPath.scala:318)
[partest] at scala.collection.TraversableLike$$anonfun$collect$1.apply(TraversableLike.scala:306)
[partest] at scala.collection.Iterator$class.foreach(Iterator.scala:646)

The fact that it's a MatchError is both telling and anxiety producing,
because the line throwing it is this:

lazy val packages: List[DirectoryClassPath] = dir collect {
case f if f.isDirectory && validPackage(f.name) => new DirectoryClassPath(f, context)
} toList

At first I thought this was punishment for having a non-atomic
compare-and-then-act setup. But that line cannot produce a match error
if the PF logic is:

if (isDefinedAt(x)) f(x)

And indeed, disassembling the bytecode reveals the logic actually used
for anonymous partial functions. Witness it this way:

scala> def gd(x: Int) = { println("gd(" + x + ")") ; x < 3 }
gd: (x: Int)Boolean

scala> 0 to 5 collect { case x if gd(x) => x }
gd(0)
gd(0)
gd(1)
gd(1)
gd(2)
gd(2)
gd(3)
gd(4)
gd(5)
res0: scala.collection.immutable.IndexedSeq[Int] = Vector(0, 1, 2)

Doesn't happen if I make my own partial function.

scala> val pf = new PartialFunction[Int, Int] { def isDefinedAt(x: Int) = { println("gd(" + x + ")") ; x < 3 } ; def apply(x: Int) = x }
pf: java.lang.Object with PartialFunction[Int,Int] = <function1>

scala> 0 to 5 collect pf
gd(0)
gd(1)
gd(2)
gd(3)
gd(4)
gd(5)
res0: scala.collection.immutable.IndexedSeq[Int] = Vector(0, 1, 2)

Examining the specification shows why this is the case. The partial
function is created with the given cases taken as-is for the apply
method, and isDefinedAt has all the same cases, but the right hand sides
are changed to "true" and a default case is false.

So that means guards will always be double-evaluated on any match. I
would not mind the double evaluation if it were necessary -- the guard
itself should not have side effects. But this approach creates
matcherrors which should be impossible given the intended logic. And
they would be impossible if I hand rolled all my partial functions,
which obviously is not real appealing.

Given only isDefinedAt and apply it is not trivial to fix this. The
guards have to be run to find out if it isDefinedAt, but then on the
call to apply you don't know which case should match if you ignore the
guards. We've been through all this during the recent-ish discussion of
a faster alternative to partialfunction. But I don't remember this
issue coming up: that there is a critical loss of fidelity between the
logic being expressed and the logic being implemented in an anonymous
partial function. My interest perks way up when we're not just talking
about efficiency but correctness.

I can't see a way out that doesn't involve state in the partial function
or the addition of an applyIfDefined style method. I think we should
seriously consider adding such a method to partialfunction immediately,
with a default obvious implementation returning Option[T]. Then we
could have anonymous ones generate an override which only applies the
guard logic once before application.

And then we could at least fix collect. I am filled with dread at the
prospect of having to worry about collect throwing exceptions anytime
it's not working with eternally immutable data.

--
Paul Phillips | Atheists dig the best foxholes.
Everyman |
Empiricist |
slap pi uphill! |----------* http://www.improving.org/paulp/ *----------

Paul Phillips

unread,
Jan 26, 2011, 2:22:44 PM1/26/11
to martin odersky, scala-i...@googlegroups.com
It occurs to me a hackier and repellent way to fix collect, but which we
should definitely do if we can do nothing else, is

// current implementation
def collect[B, That](pf: PartialFunction[A, B])(implicit bf: CanBuildFrom[Repr, B, That]): That = {
val b = bf(repr)
for (x <- this) if (pf.isDefinedAt(x)) b += pf(x)
b.result
}

// paulp's "we're going to hell" implementation
def collect[B, That](pf: PartialFunction[A, B])(implicit bf: CanBuildFrom[Repr, B, That]): That = {
val b = bf(repr)
for (x <- this) if (pf.isDefinedAt(x)) {
try b += pf(x)
catch { case _: MatchError => () }
}

b.result
}

--
Paul Phillips | Giving every man a vote has no more made men wise
Caged Spirit | and free than Christianity has made them good.
Empiricist | -- H. L. Mencken
all hip pupils! |----------* http://www.improving.org/paulp/ *----------

Paul Phillips

unread,
Jan 26, 2011, 2:45:14 PM1/26/11
to martin odersky, scala-i...@googlegroups.com
On Wed, Jan 26, 2011 at 08:21:42PM +0100, martin odersky wrote:
> Hi Paul, yes, it's a good argument. But the spec also says that guards
> are supposed to be side-effect free, and that the compiler is free to
> apply optimizations assuming that. So we have striictly speaking an
> invalid test and not a compiler failure.

Right, strictly speaking. And it's safe to say that few want to speak
quite this strictly.

But that sword cuts both ways: we could memoize the guards. That would
lead to even more surprising (but still strictly speaking correct)
behavior for something like this because you might have memoized the
result of a "does this directory exist" test a week ago. But it would
prevent this problem!

> That said, I also think it would be much better to go the
> applyIfDefined route. I'd have to page the mailoing list discussion
> back into my memory to get a feeling what the tendency was then. I
> remember there was no lack of proposals!

Rex did a bunch of performance tests and one of the more obscure options
came out looking real good. I believe I am summarizing it correctly: an
index of Int => Case is created, and end up with something like:

def whereDefinedAt(x: T): Int = guards indexWhere (gd => gd(x))
def applyAtIndex(x: T, idx: Int) = cases(idx)(x)
def applyIfDefined(x: T) = {
val idx = whereDefinedAt(x)
if (idx < 0) None else Some(applyAtIndex(x, idx))
}

// compat
def isDefinedAt(x: T) = whereDefinedAt(x) >= 0
def apply(x: T) = cases(whereDefinedAt(x))(x)

--
Paul Phillips | Appreciation is a wonderful thing; it makes what is
Stickler | excellent in others belong to us as well.
Empiricist | -- Voltaire
pull his pi pal! |----------* http://www.improving.org/paulp/ *----------

martin odersky

unread,
Jan 26, 2011, 2:59:26 PM1/26/11
to Paul Phillips, scala-i...@googlegroups.com
Resending to googlegroup.


Hi Paul, yes, it's a good argument. But the spec also says that guards are supposed to be side-effect free, and that the compiler is free to apply optimizations assuming that. So we have striictly speaking an invalid test and not a compiler failure.

That said, I also think it would be much better to go the applyIfDefined route. I'd have to page the mailoing list discussion back into my memory to get a feeling what the tendency was then. I remember there was no lack of proposals!

Cheers

 -- Martin

Paul Phillips

unread,
Jan 26, 2011, 4:33:49 PM1/26/11
to martin odersky, scala-i...@googlegroups.com
There doesn't seem to be an exploitable boundary between testing for a
match and running the body.

object Extract {
def unapply(x: Directory) = {
val files = x.files.toList filter (_ hasExtension "scala")
if (files.size >= 2) Some((files.head, files.tail.head))
else None
}
}
def f(xs: List[Directory]) = xs collect {
case Extract(f1, f2) => f1.path + "$" f2.path
}

To avoid a possible match error this has to use the return value of the
unapply when running the body. But where is that going to come from? My
previous characterization of using Ints to select the case body does not
offer any way to recover the pattern bindings.

In other words it's not just the guard, it's the pattern too. I know
these things aren't supposed to be side effecting but even in a side
effecting world if you think you're writing this

if (p(x)) f(x)

and you're getting this

if (p(x)) { if (/* still? */p(x)) f(x) else throw new BREAKEVERYTHING }

you might say "hey..."

The only ways I can see to implement a sequence of cases for a function
like collect without risking match errors by virtue of unapply or guard
re-evaluation involve:

- memoizing pattern+guard results, or some other form of state
- try f(x) catch { case _: UniqueToThisMatchError => () }

Daniel Sobral

unread,
Jan 26, 2011, 4:43:36 PM1/26/11
to scala-i...@googlegroups.com, martin odersky
There was some discussion some months ago about alternatives to the present form of partial functions. Wouldn't one of them present a better solution here?
--
Daniel C. Sobral

I travel to the future all the time.

Paul Phillips

unread,
Jan 26, 2011, 4:54:16 PM1/26/11
to scala-i...@googlegroups.com
On Wed, Jan 26, 2011 at 07:43:36PM -0200, Daniel Sobral wrote:
> There was some discussion some months ago about alternatives to the
> present form of partial functions. Wouldn't one of them present a
> better solution here?

They present a better solution to a different problem. We have to deal
with what exists.

Or if you have a CONCRETE suggestion, I'm all ears.

--
Paul Phillips | Adultery is the application of democracy to love.
Protagonist | -- H. L. Mencken
Empiricist |
pp: i haul pills |----------* http://www.improving.org/paulp/ *----------

Daniel Sobral

unread,
Jan 26, 2011, 5:20:19 PM1/26/11
to scala-i...@googlegroups.com
Well, if the predicate is not idempotent, then all bets are off. But the situation you describe seems even worse. Consider:

def x() = scala.util.Random.nextInt(5)
def p(x: Int) => x != 0
def g(x: Int) => x / 2
x match { case y : (() => Int) if p(y()) => g(y())

If even p is true on both times it is called, there's still the possibility of an exception being thrown at g. So, if "f.isDirectory && validPackage(f.name)" failed the second time around, would "new DirectoryClassPath(f, context)" have been meaningful at all? After all, the condition no longer holds true.

If we used an applyOrElse, applyOption, or any of the similar alternatives, it would still be the case that you could get an exception or a meaningless result. You only eliminated the MatchError, which might not have been worse than silently producing an invalid result.

Ismael Juma

unread,
Jan 26, 2011, 5:29:42 PM1/26/11
to scala-i...@googlegroups.com
On Wed, Jan 26, 2011 at 10:20 PM, Daniel Sobral <dcso...@gmail.com> wrote:
> So, if "f.isDirectory &&
> validPackage(f.name)" failed the second time around, would "new
> DirectoryClassPath(f, context)" have been meaningful at all? After all, the
> condition no longer holds true.
> If we used an applyOrElse, applyOption, or any of the similar alternatives,
> it would still be the case that you could get an exception or a meaningless
> result. You only eliminated the MatchError, which might not have been worse
> than silently producing an invalid result.

Not necessarily. The check could have been an optimisation and
DirectoryClassPath would still be able to handle changes in the
filesystem. If a MatchError is thrown though, it messes everything up.
Not saying that this is the case here, but it's plausible.

Best,
Ismael

Daniel Sobral

unread,
Jan 26, 2011, 5:36:40 PM1/26/11
to scala-i...@googlegroups.com
My point is that it is possible for it to go the other way too. What's the point of being overly concerned about a MatchError if you just eliminate a (small?) subset of the problem?

Paul Phillips

unread,
Jan 26, 2011, 5:38:01 PM1/26/11
to scala-i...@googlegroups.com
On Wed, Jan 26, 2011 at 08:20:19PM -0200, Daniel Sobral wrote:
> If even p is true on both times it is called, there's still the
> possibility of an exception being thrown at g. So, if "f.isDirectory
> && validPackage( f.name)" failed the second time around, would "new
> DirectoryClassPath(f, context)" have been meaningful at all? After
> all, the condition no longer holds true.

This is all irrelevant. What you describe is user error. Nothing can
guarantee that the body of the match will or won't make sense. What CAN
be guaranteed, and what is presently not guaranteed and we know not even
met, is that exceptions will not be thrown which exist solely because of
inadequacies in the implementation.

Even if every call to p initiates a new nuclear war,

if (p(x)) f(x)

should either happen or not happen. There is no room in the land of
sane code for a third choice of "or we throw an exception if p changes
its mind when we double check."

--
Paul Phillips | The most dangerous man to any government is the man who
Analgesic | is able to think things out [...] Almost inevitably he
Empiricist | comes to the conclusion that the government he lives under
pp: i haul pills | is dishonest, insane, intolerable. -- H. L. Mencken

Paul Phillips

unread,
Jan 26, 2011, 5:44:25 PM1/26/11
to scala-i...@googlegroups.com
On Wed, Jan 26, 2011 at 08:36:40PM -0200, Daniel Sobral wrote:
> My point is that it is possible for it to go the other way too. What's
> the point of being overly concerned about a MatchError if you just
> eliminate a (small?) subset of the problem?

You are confusing exceptions which occur in user code with exceptions
which are caused by the implementation. The problem is not "exceptions
exist." The problem is "the implementation throws exceptions which need
not be thrown."

--
Paul Phillips | It's better to have gloved and tossed than never to
Moral Alien | have played baseball.
Empiricist |
i'll ship a pulp |----------* http://www.improving.org/paulp/ *----------

Daniel Sobral

unread,
Jan 26, 2011, 6:46:08 PM1/26/11
to scala-i...@googlegroups.com
On Wed, Jan 26, 2011 at 20:44, Paul Phillips <pa...@improving.org> wrote:
On Wed, Jan 26, 2011 at 08:36:40PM -0200, Daniel Sobral wrote:
> My point is that it is possible for it to go the other way too. What's
> the point of being overly concerned about a MatchError if you just
> eliminate a (small?) subset of the problem?

You are confusing exceptions which occur in user code with exceptions
which are caused by the implementation.  The problem is not "exceptions
exist." The problem is "the implementation throws exceptions which need
not be thrown."

Well, they won't be thrown if the code is not buggy (ie, if the predicate is idempotent). If a predicate returns two different results for the same value in two consecutive calls, then it is a bug in the program, and a MatchError is a valid result. It means: this code is buggy, fix me.


 

--
Paul Phillips      | It's better to have gloved and tossed than never to
Moral Alien        | have played baseball.
Empiricist         |
i'll ship a pulp   |----------* http://www.improving.org/paulp/ *----------

Nathan Bronson

unread,
Jan 26, 2011, 6:28:46 PM1/26/11
to scala-i...@googlegroups.com
Doesn't solve the problem with collect, but at least in this case
flatMap is almost as concise:

lazy val packages = dir.toList.flatMap { f =>
if (f.isDirectory && validPackage(f.name)) Some(new
DirectoryClassPath(f, context)) else None

Burak Emir

unread,
Jan 27, 2011, 5:23:03 PM1/27/11
to scala-i...@googlegroups.com
Hey all,

So the following program produces a PartialFunction...

object test {
  def foo(x: List[String]) = x collect {
    case x if x.startsWith("foo") => x
  }
}

why not simply make the isDefinedAt complete by adding "case _ => false"

The only problem I can see if the match already has a "case _", but in that case, it will never throw a MatchError, right?

best wishes

-- Burak

package <empty> {
  final class test extends java.lang.Object with ScalaObject {
    def this(): object test = {
      test.super.this();
      ()
    };
    def foo(x: List[java.lang.String]): List[java.lang.String] = x.collect[java.lang.String, List[java.lang.String]]({
      final <synthetic> class $anonfun extends java.lang.Object with PartialFunction[java.lang.String,java.lang.String] {
        def this(): anonymous class $anonfun = {
          $anonfun.super.this();
          ()
        };
        final def apply(x0$1: java.lang.String): java.lang.String = (x0$1: java.lang.String @unchecked) match {
          case (x @ _) if x.startsWith("foo") => x
        };
        final def isDefinedAt(x$1: java.lang.String): Boolean = (x$1: java.lang.String @unchecked) match {
          case (x @ _) if x.startsWith("foo") => true
          case _ => false
        }
      };
      (new anonymous class $anonfun(): PartialFunction[java.lang.String,java.lang.String])
    }, immutable.this.List.canBuildFrom[java.lang.String]())

Burak Emir

unread,
Jan 27, 2011, 5:30:40 PM1/27/11
to scala-i...@googlegroups.com
Ha funny, so actually, there is 'case _ => false' ... silly me.

but why do you get a MatchError then?

-- Burak

Ismael Juma

unread,
Jan 27, 2011, 5:41:29 PM1/27/11
to scala-i...@googlegroups.com
On Thu, Jan 27, 2011 at 10:30 PM, Burak Emir <burak...@gmail.com> wrote:
> Ha funny, so actually, there is 'case _ => false' ... silly me.
> but why do you get a MatchError then?

The MatchError happens in the apply call.

Best,
Ismael

Burak Emir

unread,
Jan 27, 2011, 5:48:26 PM1/27/11
to scala-i...@googlegroups.com
Yeah, I figured that much out meanwhile, but why do we even get to the apply?  

Why would doubly evaluating guards cause match errors?

Is this a discussion about allowing guards that will evaluate once to true and once to false?

Ismael Juma

unread,
Jan 27, 2011, 5:53:24 PM1/27/11
to scala-i...@googlegroups.com
On Thu, Jan 27, 2011 at 10:48 PM, Burak Emir <burak...@gmail.com> wrote:
> Yeah, I figured that much out meanwhile, but why do we even get to the
> apply?
> Why would doubly evaluating guards cause match errors?
> Is this a discussion about allowing guards that will evaluate once to true
> and once to false?

Yes. The example given was:

case f if f.isDirectory && validPackage(f.name)

This can change between isDefinedAt and apply if, for example, the
file is renamed or deleted.

Best,
Ismael

Paul Phillips

unread,
Jan 27, 2011, 6:21:01 PM1/27/11
to scala-i...@googlegroups.com
On Thu, Jan 27, 2011 at 11:48:26PM +0100, Burak Emir wrote:
> Why would doubly evaluating guards cause match errors?
>
> Is this a discussion about allowing guards that will evaluate once to
> true and once to false?

As I summarized it earlier, one attempts to write this:

if (p(x)) f(x)

but gets this:

if (p(x)) { if (/* still? */p(x)) f(x) else throw new BREAKEVERYTHING }

Importantly, the code is fine whether p(x) is true or false, so the fact
that the guard changes answers is an academic matter. Patterns are at
liberty to match or not match when guards are being flibbertigibbety,
but not to start hurling runtime errors.

--
Paul Phillips | You think you know when you learn, are more sure
Analgesic | when you can write, even more when you can teach,
Empiricist | but certain when you can program. -- Alan Perlis

Daniel Sobral

unread,
Jan 27, 2011, 6:38:49 PM1/27/11
to scala-i...@googlegroups.com
On Thu, Jan 27, 2011 at 21:21, Paul Phillips <pa...@improving.org> wrote:
As I summarized it earlier, one attempts to write this:

 if (p(x)) f(x)

but gets this:

 if (p(x)) { if (/* still? */p(x)) f(x) else throw new BREAKEVERYTHING }

Importantly, the code is fine whether p(x) is true or false, so the fact
that the guard changes answers is an academic matter.  Patterns are at
liberty to match or not match when guards are being flibbertigibbety,
but not to start hurling runtime errors.

This I disagree with. If the code if fine whether p(x) is true or false, then make it a function, not a match. 

Paul Phillips

unread,
Jan 27, 2011, 6:47:23 PM1/27/11
to scala-i...@googlegroups.com
On Thu, Jan 27, 2011 at 09:38:49PM -0200, Daniel Sobral wrote:
> This I disagree with. If the code if fine whether p(x) is true or
> false, then make it a function, not a match.

Please stop injecting irrelevancies. I know there is a breed of
programmer who thinks everyone who doesn't write code as they say it
ought to be written deserves every bug they get and we should be looking
for ways to hurt them. For those of us who don't take that point of
view, it makes absolutely no difference what someone "should" have done.
The only question that matters is whether this is acceptable behavior.
I say it isn't. If you say it is I'm glad you're only an observer.

--
Paul Phillips | Giving every man a vote has no more made men wise

Imperfectionist | and free than Christianity has made them good.

Empiricist | -- H. L. Mencken

Burak Emir

unread,
Jan 27, 2011, 6:57:40 PM1/27/11
to scala-i...@googlegroups.com
On Fri, Jan 28, 2011 at 12:47 AM, Paul Phillips <pa...@improving.org> wrote:
On Thu, Jan 27, 2011 at 09:38:49PM -0200, Daniel Sobral wrote:
> This I disagree with. If the code if fine whether p(x) is true or
> false, then make it a function, not a match.

Please stop injecting irrelevancies.  I know there is a breed of
programmer who thinks everyone who doesn't write code as they say it
ought to be written deserves every bug they get and we should be looking
for ways to hurt them.  For those of us who don't take that point of
view, it makes absolutely no difference what someone "should" have done.
The only question that matters is whether this is acceptable behavior.
I say it isn't.  If you say it is I'm glad you're only an observer.


I agree with Daniel. 

Like him, I don't think supporting flip-flopping guards is going to pay off very much.
 
Apparently, we are talking about a non-synchronized concurrent program here, and code like that would be considered a bug. Consider any program with a single match statement

 case _ if p(x) => throw new BreakEverythingException() 
 case _ => Thread.sleep(1000); ariane(5).launch()
}

Now, is the implementation "incorrect" if p(x) evaluates to true while we are sleeping?

Re: concurrency, I am somewhat reminded of the "while(condition) { wait() }" idiom [1], which is by no means academic, and it does not quite matter whether we think there should be spurious wakeups or not.

If you want to make an efficiency argument against evaluating guards twice, please go ahead. applyIfDefined could work out nicely and makes total sense for partial functions, IMHO.

The whole discussion about correctness in the presence of concurrent modifications to the world looks like a red herring to me. Please don't get me wrong, but the spec on how to translate pattern matching anonymous functions spells it out crystal clear (pp 122-123), so by your own argument, you, Paul, are the observer throwing irrelevant normative statements and the whole world has code deployed that expects that MatchError to be thrown.

best wishes,
-- Burak

 
--
Paul Phillips      | Giving every man a vote has no more made men wise
Imperfectionist    | and free than Christianity has made them good.
Empiricist         |     -- H. L. Mencken
slap pi uphill!    |----------* http://www.improving.org/paulp/ *----------

Paul Phillips

unread,
Jan 27, 2011, 7:11:03 PM1/27/11
to scala-i...@googlegroups.com
On Fri, Jan 28, 2011 at 12:57:40AM +0100, Burak Emir wrote:
> Apparently, we are talking about a non-synchronized concurrent program
> here

No, we aren't. I wish you would read the existing thread which outlines
the issue in great detail.

--
Paul Phillips | On two occasions, I have been asked, 'Mr. Babbage, if you
Moral Alien | put into the machine wrong figures, will the right answers
Empiricist | come out?' I am not able to rightly apprehend the kind of
pal, i pill push | confusion of ideas that could provoke such a question.

Paul Phillips

unread,
Jan 27, 2011, 7:15:52 PM1/27/11
to scala-i...@googlegroups.com
On Fri, Jan 28, 2011 at 12:57:40AM +0100, Burak Emir wrote:
> {

> case _ if p(x) => throw new BreakEverythingException()
> case _ => Thread.sleep(1000); ariane(5).launch()
> }
>
> Now, is the implementation "incorrect" if p(x) evaluates to true while
> we are sleeping?

Let me show you the actual situation under consideration. Not
concurrent. Sequential access.

{
case _ if p(x) => 1
case _ => 2
}

Is the implementation "incorrect" if this throws a
BreakEverythingException? You are arguing, I hope out of ignorance of
what we are talking about and not actual conviction, that it is not
incorrect. I say it is very incorrect.

--
Paul Phillips | Eschew mastication.
Analgesic |
Empiricist |
pal, i pill push |----------* http://www.improving.org/paulp/ *----------

Paul Phillips

unread,
Jan 27, 2011, 7:20:10 PM1/27/11
to scala-i...@googlegroups.com
The default case confuses the issue because it will be caught in apply.
Let us look more simply at this.

List(1, 2, 3) collect { case x if p(x) => x }

I will never come around to thinking this should throw exceptions.

--
Paul Phillips | The important thing here is that the music is not in
In Theory | the piano. And knowledge and edification is not in the
Empiricist | computer. The computer is simply an instrument whose
all hip pupils! | music is ideas. -- Alan Kay

Burak Emir

unread,
Jan 27, 2011, 8:06:51 PM1/27/11
to scala-i...@googlegroups.com
I don't say that getting a MatchError there is intuitive, even if I don't think it is outrageous. Patterns are still full of surprises : ) but here it is more the translation to PartialFunction that seems to be the problem.

One way to avoid the spec change and still have guards evaluated only once in order to improve efficiency is to introduce a default implementation of applyIfDefined (lift.apply) in PartialFunction, have the compiler generate an override def applyIfDefined in pattern matching anonymous functions, and use that in collect.

I would call this an optimization. It increases code size somewhat, and whether it is generally worth doing, that is another question. 

A high-level optimization of "expr collect { case ... => ... }" would be much much nicer.

Doing "override def applyIfDefined" or the high-level optimization are both approaches that would not protect other uses of if (f.isDefinedAt(x)) f(x) though, but maybe they are good enough? 

I would vote for the high-level optimization. That would have a nice impact on a large class of programs, in addition to preventing obscure errors in programs that deal with obscure situations.

cheers
-- Burak

Josh Suereth

unread,
Jan 27, 2011, 8:14:10 PM1/27/11
to scala-i...@googlegroups.com
I agree that collect should not throw strange exceptions, although I may see an argument for a ConcurrentModificationException or a "ZOMGWORLDEXPLODE" exception wrapper when this situation happens :)

In any case, I think the "paulp is going to hell variant" is a decent stop gap if we come to no other better alternative.  I'll run a performance analysis and see how expensive *not* making a another solution will be.


I will state that the file 'external mutability' is a somewhat known problem (among I/O library designers perhaps) that can wreak havoc on any method/library.  

However, the two-match nature of partial functions is really asking to be optimized.  I don't know of any reason to prefer partial function besides wanting the 'isDefined'/'apply' operation(s).

Nathan Bronson

unread,
Jan 27, 2011, 8:27:24 PM1/27/11
to scala-i...@googlegroups.com
Perhaps it is useful to separate problems that occur to the left and to
the right of the "if"?

The idea of returning the isDefined result as an int is just trying to
capture a continuation of a partially completed apply in a form that is
efficient on the JVM. That continuation is not correct if I write a
non-pure unapply. For example, I could write a non-pure P.unapply and
then

collect { case P(x) if x >= 0 => x }

might return a negative element.

To me, an error in this case seems much less odious than Paul's
original, since custom unapply is more rare than match guards. I don't
think that we should hesitate to fix Paul's case even if the solution
doesn't address this one.

- Nathan

Paul Phillips

unread,
Jan 27, 2011, 8:40:26 PM1/27/11
to scala-i...@googlegroups.com
On Fri, Jan 28, 2011 at 02:06:51AM +0100, Burak Emir wrote:
> I don't say that getting a MatchError there is intuitive, even if I
> don't think it is outrageous. Patterns are still full of surprises : )
> but here it is more the translation to PartialFunction that seems to
> be the problem.

It is without question the translation of anonymous partial functions
that is the problem. I'm sorry to suggest it one last time but I
explained exactly what the problem is in my first couple messages in
this thread.

> I would vote for the high-level optimization. That would have a nice
> impact on a large class of programs, in addition to preventing obscure
> errors in programs that deal with obscure situations.

I do not know if that is meant to imply that the situation under
discussion is obscure. It sounds like it. I don't know what to say
about that other than that you are mistaken. Predicates whose answers
might change: size > x, System.currentTimeMillis < y, etc. are not the
least bit obscure. The method "collect" is not the least bit obscure.
They will not become obscure in combination.

--
Paul Phillips | You think you know when you learn, are more sure

Apatheist | when you can write, even more when you can teach,


Empiricist | but certain when you can program. -- Alan Perlis

Daniel Sobral

unread,
Jan 27, 2011, 8:47:20 PM1/27/11
to scala-i...@googlegroups.com
On Thu, Jan 27, 2011 at 21:47, Paul Phillips <pa...@improving.org> wrote:
On Thu, Jan 27, 2011 at 09:38:49PM -0200, Daniel Sobral wrote:
> This I disagree with. If the code if fine whether p(x) is true or
> false, then make it a function, not a match.

Please stop injecting irrelevancies.  I know there is a breed of
programmer who thinks everyone who doesn't write code as they say it
ought to be written deserves every bug they get and we should be looking
for ways to hurt them.  For those of us who don't take that point of
view, it makes absolutely no difference what someone "should" have done.
The only question that matters is whether this is acceptable behavior.
I say it isn't.  If you say it is I'm glad you're only an observer.

This is not an irrelevancy, it is pretty much the crux of the matter. If I make a guard p(x) protecting an f(x), it is a problem that x changes after checking p(x). Whether p(x) is called again, *that* is irrelevant. It only changes the error.

However, you are putting forward an optimization for this case based solely on a case where p(x) need not stay true for f(x) to be called, *and* you want a partial function instead of a function. I can't see that as anything but a very small set of cases, which could be easily done in other ways. Again, back to your original example:

lazy val packages: List[DirectoryClassPath] = dir collect {
   case f if f.isDirectory && validPackage(f.name) => new DirectoryClassPath(f, context)
 } toList

1. If it doesn't matter that f is no longer a directory and f.name is no longer a valid package, then it isn't necessary to use collect -- you can just use map. Collect is an optimization, but not necessary.
2. It can still be done as dir flatMap { if (...) Some(new DirectoryClassPath(f, context)) else None }.
3. It can still be done as filter/map.

So, as an argument for applyIfDefined, this qualifies as "minor convenience": it is not necessary, and there are simple workarounds.

And, on the subject of laughable claims, this:

> who thinks everyone who doesn't write code as they say it
> ought to be written deserves every bug they get 

I'm not saying that at all. I'm saying THE CODE WAS BUGGY TO BEGIN WITH. If the collect is necessary, then the code is buggy. If the code is not buggy, the collect is not necessary. Partial function's behavior doesn't enter this equation.

Josh Suereth

unread,
Jan 27, 2011, 8:54:11 PM1/27/11
to scala-i...@googlegroups.com
But the contract of collect sort-of-implies that you won't get a MatchError, so it is unexpected.  Even if my p(x) changes, I would not expect a match error unless I also knew that the collect function goes through the match statements twice.   This is the crux of paulp's argument.

The collect function states:  "Builds a new collection by applying a partial function to all elements of this iterable collection on which the function is defined."

This could be interpreted several ways, but I agree with paul in that a match error is unexpected.  Dropping the element seems a more intuitive result.

Paul Phillips

unread,
Jan 27, 2011, 8:54:56 PM1/27/11
to scala-i...@googlegroups.com
On Thu, Jan 27, 2011 at 05:27:24PM -0800, Nathan Bronson wrote:
> Perhaps it is useful to separate problems that occur to the left and
> to the right of the "if"?

You're right that we should, although I'm not sure if I buy your
reasoning. Your example and similar ones are in a qualitatively
different category no matter how you slice it. I know it was just an
example, but an intentionally side effecting, result producing unapply
is way out of bounds in terms of what can be done.

The problem I was talking about yesterday is in something like:

case Foo(x, Bar(y)) if p(x, y) => x+y

But I agree with you, if we took the Int approach the Int can take me to
the matching case. I re-run the pattern match to recover the variables
and walk right past the sleeping guard. Problem solved(*).

(*) Check local regulations.

--
Paul Phillips | If this is raisin, make toast with it.
Future Perfect |
Empiricist |
up hill, pi pals! |----------* http://www.improving.org/paulp/ *----------

Daniel Sobral

unread,
Jan 27, 2011, 9:27:03 PM1/27/11
to scala-i...@googlegroups.com
On Thu, Jan 27, 2011 at 23:54, Josh Suereth <joshua....@gmail.com> wrote:
But the contract of collect sort-of-implies that you won't get a MatchError, so it is unexpected.  Even if my p(x) changes, I would not expect a match error unless I also knew that the collect function goes through the match statements twice.   This is the crux of paulp's argument.

The collect function states:  "Builds a new collection by applying a partial function to all elements of this iterable collection on which the function is defined."

This could be interpreted several ways, but I agree with paul in that a match error is unexpected.  Dropping the element seems a more intuitive result.


I get that. I just think that's the equivalent of doing nothing if I call a method on a null reference. To me, MatchError on collect is as much unexpected as a NullPointerException on a method call, and it is very annoying to get them on code that was supposed to be correct, but they both mean the code isn't correct after all. But having already spent 1 billion dollars on NullPointerException, we got used to it.

I'm all for applyIfDefined for performance reasons alone, but collect should just document it may throw MatchError is the predicate is not idempotent and get on with life.

Michael Bayne

unread,
Jan 28, 2011, 1:45:26 AM1/28/11
to scala-i...@googlegroups.com, Paul Phillips
On Wed, Jan 26, 2011 at 11:59 AM, martin odersky <martin....@epfl.ch> wrote:
> That said, I also think it would be much better to go the applyIfDefined
> route. I'd have to page the mailoing list discussion back into my memory to
> get a feeling what the tendency was then. I remember there was no lack of
> proposals!

The last word on that thread was a reiteration of a suggestion from
Martin, which involved enhancing Function1 thusly:

class Function1[-A, +B] {
def apply(x: A): B
def missingCase(x: A): B = throw new MatchError
...
}

with the suggestion that "This would let you get the effective
functionality of applyOrElse by subclassing Function1."

But I don't see how that's going to help in the case of functions
generated by the compiler from case expressions (i.e. { case x => ...
}). The compiler could synthesize a default case that called
missingCase, but it would also have to generate the implementation of
missingCase, so I don't see where someone can come in with a subclass
and make something useful happen. Probably I'm missing something.

Anyhow, what seems perhaps more fruitful from a "let's not evaluate
guards twice" perspective (both for correctness and performance
reasons), is a small "enhancement" to PartialFunction:

trait PartialFunction[-A, +B] extends (A => B) {
def isDefinedAt(x: A): Boolean

def apply(x :A) = applyOrElse(x, throw new MatchError)

def applyOrElse[A1 <: A, B1 >: B](x: A1, default: A1 => B1): B1 =
mapOrElse(x, identity[B1], default)

def mapOrElse[A1 <: A, B1 >: B, C](x :A1, f: B1 => C, default: A1 => C): C
}

The compiler would generate an implementation of isDefined as well as
mapOrElse. This would not avoid the code duplication, but it would
avoid double execution. Crucially, it would provide a way for nearly
all of the users of PartialFunction to get their job done without
risking mysterious MatchExceptions. For example:

def collect[B, That](pf: PartialFunction[A, B])(implicit bf:
CanBuildFrom[Repr, B, That]): That = {
val b = bf(repr)
for (x <- this) pf.mapOrElse(x, (v: B) => b += v, (_: A) => ())
b.result
}

This being a common use case, it could be tidied up with something
like the following (which is screaming out for a better name than
'mapIf'):

trait PartialFunction[-A, +B] extends (A => B) {
def mapIf[A1 <: A, B1 >: B] (x :A1, f: B1 => Unit) :Unit =
mapOrElse(x, f, (_: A1) => ())
}

which gives us:

def collect[B, That](pf: PartialFunction[A, B])(implicit bf:
CanBuildFrom[Repr, B, That]): That = {
val b = bf(repr)
for (x <- this) pf.mapOrElse(x, b.+=)
b.result
}

There is the unfortunate additional expense of apply calling
applyOrElse calling mapOrElse with identity, but I think the vast
majority of users of PartialFunction will find themselves going
directly to mapOrElse. The various PartialFunction combinators also
avoid this overhead because only the outermost PartialFunction will go
through two extra hops, and then all of the actual evaluation will
take place in a chain of mapOrElse calls. Thus this change might let
Martin have the performance cake he was looking for with
FunctionWithDefault, and eat it too.

See here for the new implementations of orElse, andThen, etc.:
https://gist.github.com/799916

One could then put a nice warning on isDefinedAt saying "Beware the
madness of `if(pf.isDefinedAt(x)) pf(x)`. Use `mapOrElse` or `mapIf`."
Only the rare circumstances like Lift's delayed initialization would
ever need to call isDefinedAt.

This does introduce additional complexity into the compiler, as it
will have to thread the success continuation through the
implementation when generating an implementation of mapOrElse (as
Martin mentioned offhandedly when considering other possibilities for
FunctionWithDefault). But trading a bit of compiler complexity for
increased performance and a decrease in scary edge cases with arguable
correctness properties sounds like a win.

-- m...@samskivert.com

Matthew Pocock

unread,
Jan 28, 2011, 4:14:18 AM1/28/11
to scala-i...@googlegroups.com
My 2c is that a naive user (one who isn't steeped in several years of scala arkainery) would find a match error on collect with a guard very strange, and may well be surprised at this double-evaluation. It seems like this MatchError is exposing implementation details about the underlying magic used to compile cases with guards.This is bad because a) code-bases may come to rely upon foibles of the implementation rather than on the API contract and, b) it adds unnecessarily to the magic that a scala programmer may need to know to be effective in diagnosing failures.

That this implementation also triggers a class of faults where the predicate isn't pure is, to me, no argument that the status quo is desirable. In the longer term, it would be better to have purity checking and to flash up a compile-time warning if the code can not be proven to be pure. For now, any solution that avoids the double-evaluation, so hides the implementation better, is acceptable.

Matthew

Ruediger Keller

unread,
Jan 28, 2011, 7:16:40 AM1/28/11
to scala-i...@googlegroups.com
I'm surprised that there is so much discussion on this topic, as I had
the impression that it was already decided by Martin what the way to
go is. I'm referring to his mail from 2010-11-02 where he suggests the
following:

************
Define Function1 like this:

class Function1[-A, +B] {
def apply(x: A): B
def missingCase(x: A): B = throw new MatchError
...
}

2. Change the rules so that _any_ subclass of Function can be
implemented by a closure { x => ... }, as long as the class contains a
single abstract method named `apply'.

3. Change the rules so that in a pattern matching expression

x match { case P1 => ... case Pn => ... }

that was the body of a function, a selector that was not matched by
any of the patterns invoked missingCase(x) instead of immediately
throwing a MatchError(x).
************

Although, regarding point 2, I like the idea that not only any
subclass of function, but any SAM-type can be implemented with a
closure. I think this was brought up by Mark Harrah.

Regards,
Ruediger


2011/1/28 Michael Bayne <m...@samskivert.com>:

martin odersky

unread,
Jan 28, 2011, 7:46:24 AM1/28/11
to scala-i...@googlegroups.com
On Fri, Jan 28, 2011 at 1:16 PM, Ruediger Keller <ruedige...@rk42.de> wrote:
I'm surprised that there is so much discussion on this topic, as I had
the impression that it was already decided by Martin what the way to
go is. I'm referring to his mail from 2010-11-02 where he suggests the
following:

************
Define Function1 like this:

class Function1[-A, +B] {
 def apply(x: A): B
 def missingCase(x: A): B = throw new MatchError
 ...
}

2. Change the rules so that _any_ subclass of Function can be
implemented by a closure { x => ... }, as long as the class contains a
single abstract method named `apply'.

3. Change the rules so that in a pattern matching expression

 x match { case P1 => ... case Pn => ... }

that was the body of a function, a selector that was not matched by
any of the patterns invoked missingCase(x) instead of immediately
throwing a MatchError(x).
************

Yes, I still think that's the way to go. Thanks for digging it up!

 -- Martin

Josh Suereth

unread,
Jan 28, 2011, 7:57:55 AM1/28/11
to scala-i...@googlegroups.com
I don't get how this fixes the collect case, as you would have to dynamically sub-class the partial function to write your own fixingCase that did something different.  I'm not seeing how this would be translated into a use-case like collect.

Seth Tisue

unread,
Jan 28, 2011, 8:05:54 AM1/28/11
to scala-i...@googlegroups.com
>>>>> "Matthew" == Matthew Pocock <turingate...@gmail.com> writes:

Matthew> For now, any solution that avoids the double-evaluation, so
Matthew> hides the implementation better, is acceptable.

Agree.

Also, I don't think anyone has mentioned this yet: what if the predicate
is expensive? Then the double evaluation is a performance bug -- even
if the predicate is pure. This isn't merely about predicates that
change their minds.

I find it rather bizarre that some are actually arguing that the double
evaluation here is just fine. Evaluating user code twice for no
defensible reason at all is a bug, plain and simple.

--
Seth Tisue @ Northwestern University | http://tisue.net
lead developer, NetLogo: http://ccl.northwestern.edu/netlogo/

martin odersky

unread,
Jan 28, 2011, 8:11:06 AM1/28/11
to scala-i...@googlegroups.com
On Fri, Jan 28, 2011 at 1:57 PM, Josh Suereth <joshua....@gmail.com> wrote:
I don't get how this fixes the collect case, as you would have to dynamically sub-class the partial function to write your own fixingCase that did something different.  I'm not seeing how this would be translated into a use-case like collect.

The collect method could take a subclass of Function1 that defines missingCase.

Cheers

 -- Martin

Josh Suereth

unread,
Jan 28, 2011, 8:17:06 AM1/28/11
to scala-i...@googlegroups.com
Yes, but you're pushing the issue onto users of partial functions.  I'm not sure that's intuitive, and we'd still get back to the same issue for those who do not define missing case (say, an anonymous partial function).

To allow library writers to write code that allows users to write anonymous partial functions:  I think we need some way to partially evaluate the partial function and continue if something is defined.

I think a simple applyOrNone(x : A) : Option[B] would be alright for this case, but again would lead to more code bloat for all anonymous partial functions.

martin odersky

unread,
Jan 28, 2011, 8:32:07 AM1/28/11
to scala-i...@googlegroups.com
On Fri, Jan 28, 2011 at 2:17 PM, Josh Suereth <joshua....@gmail.com> wrote:
Yes, but you're pushing the issue onto users of partial functions.  I'm not sure that's intuitive, and we'd still get back to the same issue for those who do not define missing case (say, an anonymous partial function).

Not at all. The collect method needs to take as argument a subclass of Function1 which defines missingCase. The user just writes a closure, as always.

Cheers

 -- Martin

martin odersky

unread,
Jan 28, 2011, 8:33:18 AM1/28/11
to scala-i...@googlegroups.com
On Fri, Jan 28, 2011 at 2:32 PM, martin odersky <martin....@epfl.ch> wrote:


On Fri, Jan 28, 2011 at 2:17 PM, Josh Suereth <joshua....@gmail.com> wrote:
Yes, but you're pushing the issue onto users of partial functions.  I'm not sure that's intuitive, and we'd still get back to the same issue for those who do not define missing case (say, an anonymous partial function).

Not at all. The collect method needs to take as argument a
 
I should have written "needs to take as formal parameter".

Josh Suereth

unread,
Jan 28, 2011, 8:47:12 AM1/28/11
to scala-i...@googlegroups.com
Yes... I see how you can extend any function class using a closure.

However, what type should the function be?

Currently collect takes a map A => B where the collection has an element of type A.

To utilize the hook correctly, collect would create it's own subclass of function that deifnes missingCase with.... what?

It seems the function would have to be A => Option[B], or collect would have missingCase throw a stored exception (for speed reasons) and continue on...

I'm not sure either of this solutions is a huge win.  However I do *love* the idea of closures being able to automatically create anonymous subclasses of any single-abstract-method class.  Having this just apply to subclass of function still seems a huge win, but I don't see a large gain in the problem we're looking at.

martin odersky

unread,
Jan 28, 2011, 9:00:25 AM1/28/11
to scala-i...@googlegroups.com
On Fri, Jan 28, 2011 at 2:47 PM, Josh Suereth <joshua....@gmail.com> wrote:
Yes... I see how you can extend any function class using a closure.

However, what type should the function be?

Currently collect takes a map A => B where the collection has an element of type A.

To utilize the hook correctly, collect would create it's own subclass of function that deifnes missingCase with.... what?

It seems the function would have to be A => Option[B], or collect would have missingCase throw a stored exception (for speed reasons) and continue on...

Yes, stored exception seems to be our only choice if we want to keep the same interface. If collect takes a map A => Option[B],
the whole issue is moot of course, then collect is just an alias for flatMap.

 -- Martin

Paul Phillips

unread,
Jan 28, 2011, 9:56:28 AM1/28/11
to scala-i...@googlegroups.com
On Fri, Jan 28, 2011 at 03:00:25PM +0100, martin odersky wrote:
> Yes, stored exception seems to be our only choice if we want to keep
> the same interface. If collect takes a map A => Option[B], the whole
> issue is moot of course, then collect is just an alias for flatMap.

It occurs to me we could synthesize an override for lift which amounted
to applyIfDefined. Then collect can take the same argument and call
lift in the implementation.

Couldn't sleep thinking about all these fascinating problems though, so
I'm not advocating anything with the above as I don't know if it's
crazier than usual. I DO know that I too am keen on this idea that one
could implement any SAM-style interface with case syntax.

Also, since the names aren't set: if we end up with a special
"missingCase" type method and also end up with a "missingMethod" method
(which is not called that) in the Dynamic trait, then timingwise we have
lucked into a chance to find consistent naming. I know they're not in
the same category but they have relevant similarities.

--
Paul Phillips | Adultery is the application of democracy to love.
Imperfectionist | -- H. L. Mencken
Empiricist |

Ismael Juma

unread,
Jan 28, 2011, 9:58:14 AM1/28/11
to scala-i...@googlegroups.com
On Fri, Jan 28, 2011 at 2:56 PM, Paul Phillips <pa...@improving.org> wrote:
> I DO know that I too am keen on this idea that one
> could implement any SAM-style interface with case syntax.

Yes, this could be very cool.

Best,
Ismael

Michael Bayne

unread,
Jan 31, 2011, 12:08:03 PM1/31/11
to scala-i...@googlegroups.com
> Yes, I still think that's the way to go. Thanks for digging it up!

I did some profiling of using a missingCase function to throw a
(preallocated) exception to implement PartialFunction-like behavior
with collect() as a client. I compared it against the existing
PartialFunction implementation, as well as an implementation which
uses an extra level of indirection but avoids the exception.

Unfortunately, the exception-based approach was rather unpredictable
with regard to whether Hotspot optimized it. When it failed to do so,
there was a 5x-10x slowdown. The explicit indirection approach was
faster than the existing PartialFunction approach (because it avoids
double evaluation of the guards), but only if one was very careful to
avoid creating a new closure on every function application (or if the
guards are very expensive).

The full analysis is here:

https://github.com/samskivert/scala-pf-rethink

Feel free to fork it and play around with it, and if there are glaring
errors in my method, I'd love to hear about them.

-- m...@samskivert.com

Daniel Sobral

unread,
Feb 3, 2011, 7:35:26 PM2/3/11
to scala-i...@googlegroups.com
I just noticed that this proposal will make implicit implicits possible at last! I wonder what impact it would have on the STM library/API?

杨博

unread,
Jul 9, 2015, 1:42:41 AM7/9/15
to scala-i...@googlegroups.com, martin....@epfl.ch, pa...@improving.org
I think we can add a conversion helper between PartialFunction and extractor, https://groups.google.com/d/msg/scala-language/g0-hbN5qerQ/iBcjSB3UruwJ

在 2011年1月27日星期四 UTC+8上午5:33:49,Paul Phillips写道:
There doesn't seem to be an exploitable boundary between testing for a
match and running the body.

  object Extract {
    def unapply(x: Directory) = {
      val files = x.files.toList filter (_ hasExtension "scala")
      if (files.size >= 2) Some((files.head, files.tail.head))
      else None
    }
  }
  def f(xs: List[Directory]) = xs collect {
    case Extract(f1, f2) => f1.path + "$" f2.path
  }

To avoid a possible match error this has to use the return value of the
unapply when running the body.  But where is that going to come from? My
previous characterization of using Ints to select the case body does not
offer any way to recover the pattern bindings.

In other words it's not just the guard, it's the pattern too.  I know
these things aren't supposed to be side effecting but even in a side
effecting world if you think you're writing this

  if (p(x)) f(x)

and you're getting this

  if (p(x)) { if (/* still? */p(x)) f(x) else throw new BREAKEVERYTHING }

you might say "hey..."

The only ways I can see to implement a sequence of cases for a function
like collect without risking match errors by virtue of unapply or guard
re-evaluation involve:

 - memoizing pattern+guard results, or some other form of state
 - try f(x) catch { case _: UniqueToThisMatchError => () }

--
Paul Phillips      | Giving every man a vote has no more made men wise
Caged Spirit       | and free than Christianity has made them good.
Empiricist         |     -- H. L. Mencken
all hip pupils!    |----------* http://www.improving.org/paulp/ *----------

Alec Zorab

unread,
Jul 9, 2015, 4:13:46 AM7/9/15
to scala-i...@googlegroups.com, martin....@epfl.ch, pa...@improving.org
As mentioned on scala-language, I think this might be covered by PartialFunction.applyOrElse

--
You received this message because you are subscribed to the Google Groups "scala-internals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scala-interna...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages