| AnyRefMap | martin | 30/01/14 05:32 | I was just trying out AnyRefMap. It's really nice to have a drop-in replacement for HashMap that avoids the "##" overhead that slows down the original HashMap. But I was frustrated that it's not a drop-in replacement. In particular I had several places where I inherited HashMap from an anonymous class or an object. Sometimes I needed to override the default value, at other times I wanted to add new behavior to the map. This works with HashMap (or any other map implementation), but not with AnyRefMap, because AnyRefMap is final.
I think it would be great if we could eliminate this incoherence before 2.11RC1. What do others think? Cheers - Martin |
| Re: [scala-internals] AnyRefMap | Jason Zaugg | 30/01/14 05:54 | On Thu, Jan 30, 2014 at 2:32 PM, martin odersky <martin...@epfl.ch> wrote: I've added my thoughts to the ticket you created: https://issues.scala-lang.org/browse/SI-8214.
Rex will know the tradeoffs best, though. -jason |
| Re: AnyRefMap | Simon Ochsenreither | 30/01/14 09:25 |
I'd consider any non-final collection class a severe bug. Concerning inheriting from HashMap to override the default value ... what's wrong with using withDefault/withDefaultValue? |
| Re: [scala-internals] AnyRefMap | Rex Kerr | 30/01/14 10:53 | I agree that it's nice to have perfect drop-in replacements for things, but it's much harder to fashion a replacement that is a drop-in replacement for inheritance as well as for usage (composition). The internals of HashMap and AnyRefMap are different, and although I have mimicked them as much as I could while retaining good performance, I chose not to allow inheritance from AnyRefMap in part _because_ I was worried that people would treat it as too much of a drop-in replacement for an altered HashMap. In particular, some speedups come from code duplication (so you need to override in multiple places in AnyRefMap to change certain behavior); in some cases the call chain has reversed (e.g. which method do you need to override in order to add custom behavior to lookups?). Overall, it seemed that there was no safe way to inherit broadly from AnyRefMap, and it was a lot easier to make the whole class final than figure out which methods were dangerous and which were safe. (And don't forget that any code that assumes inheritance from HashTable is going to break also.)I do agree, however, that consistency is a considerable virtue. "AnyRefMap's implementation makes inheritance risky. If you wish to set a default value, use the defaultEntry constructor parameter." -- |
| Re: [scala-internals] AnyRefMap | Simon Ochsenreither | 30/01/14 12:54 | I agree on this. We need to clearly restrict the warranty of the collection API to usage as much as possible. If we let people inherit, they will automatically depend on arbitrary implementation details, which were never meant to be exposed. The collection library still needs tons of work, fixes and improvements all over the place. If we can't guide people to the supported usage which allows us to further evolve the API and the implementation (by making specific collection classes final), the only thing that's left is declaring the collection API as "done, finished, immutable, deprecated and bugs-are-features". If this is supposed to be the case, then let's start thinking about a better design for scala.collection2 as soon as possible. |
| Re: [scala-internals] Re: AnyRefMap | martin | 30/01/14 14:03 | On Thu, Jan 30, 2014 at 6:25 PM, Simon Ochsenreither <simon.och...@gmail.com> wrote: Why? Java's collections are non-final. What's wrong with that?
An extra indirection. When I use AnyRefMap I do it to get the best performance. withDefault does not give me that. - Martin |
| Re: [scala-internals] Re: AnyRefMap | Rex Kerr | 30/01/14 14:08 | In the case of AnyRefMap, this is irrelevant because you can supply a default mapping function in the constructor. (I agree with the general point that surplus wrappers are bad for performance.) --Rex |
| Re: [scala-internals] AnyRefMap | martin | 30/01/14 14:09 | On Thu, Jan 30, 2014 at 7:53 PM, Rex Kerr <ich...@gmail.com> wrote: Can you give outline which methods would not be safe to override? Do you expect that people would have a tendency to override anyone of these?
That's not a problem for typical use cases of mutable collections. You typically do not transform them, just use them in the standard CRUD way.
Well, I would say, that class then not be done in the Scala collection way. Overriding default is documented standard behavior. I do not see why we should break away now. And I particularly do not see why this should be for one, highly specialized class only. You just create an inconsistency without any gain that I can see.
Cheers - Martin
|
| Re: [scala-internals] Re: AnyRefMap | martin | 30/01/14 14:13 | On Thu, Jan 30, 2014 at 11:08 PM, Rex Kerr <ich...@gmail.com> wrote: Yes, but I am afraid that just makes everything worse. It introduces a new (and highly unintuitive!) way to achieve what is done with a different idiom elsewhere. That's precisely the antithesis of the uniformity of usage that we saw in Scala's collections so far.
Why unintuitive? Answer quickly: What is the first things that comes to mind as an argument for a collection constructor? I would wager it's definitely not the behavior in case of a key not found.
Cheers - Martin
|
| Re: [scala-internals] Re: AnyRefMap | Simon Ochsenreither | 30/01/14 14:31 |
I'm not sure if this is a joke. Can you clarify? Anyway, why should decisions made by people who built a different language 20 years ago have any impact on us?
Then it is probably the best to fix that indirection and move on. |
| Re: [scala-internals] Re: AnyRefMap | Simon Ochsenreither | 30/01/14 14:35 |
I don't think this is a big deal at all. Remove that constructor and use AnyRefMap.withDefault instead. If Rex agrees, I'll create a patch for that. |
| Re: [scala-internals] Re: AnyRefMap | martin | 30/01/14 14:35 | On Thu, Jan 30, 2014 at 11:31 PM, Simon Ochsenreither <simon.och...@gmail.com> wrote: Java's collection design is pretty reasonable for its age and intended use case. It clearly was very successful in terms of usage. No need for 3 different collection libraries (analogous to Date) because the first one was irredeemably broken. I simply do not see a reason why the very common idiom of extending a mutable class should be no longer valid. The proof obligation is really on the people who want to remove the feature here.
Cheers - Martin
|
| Re: [scala-internals] Re: AnyRefMap | martin | 30/01/14 14:44 | We can go back and forth on the use of inheriting from collection classes and how best to treat defaults. We clearly have different opinions here. My main argument is that the biggest virtue of the Scala collections library is its consistency. That consistency took a lot of energy and determination to achieve. That's why I am so allergic to the proposition of throwing consistency out of the window for the sake of a single use case, in particular because I still have not heard a convincing argument why it needs to be so.
Cheers - Martin
|
| Re: [scala-internals] Re: AnyRefMap | Adriaan Moors | 30/01/14 14:50 | Making the leaf collection classes final will only make them more consistent, because we'll be fully in control of them, rather than having to anticipate all possible subclasses. |
| Re: [scala-internals] AnyRefMap | Rex Kerr | 30/01/14 14:53 | On Thu, Jan 30, 2014 at 2:09 PM, martin odersky <martin...@epfl.ch> wrote: Any method where the call-chain isn't obvious is unsafe to override. For example, HashMap has -= overridden 7 times and += 8 times in the library scan I did. Update was overridden 0 times. put was overridden 0 times. Now, does += call update, or does update call +=, or neither call the other, or what? In AnyRefMap, += calls update, but put has its own implementation. In HashMap, update calls put, but += has its own implementation. How is anyone supposed to keep this straight? So in the wild we have 8 implementations of maps that are probably inconsistent because people didn't override all the necessary methods (and how could they have known?). In my code, it's not safe to override much of anything that I implemented. There's a fair bit of code duplication because removing the redundant logic manually provides better results than hoping the JIT compiler can do it. In the Java HashMap code, it's safe to override things because there is only one way to do one thing in almost every case.
Maybe I'm atypical, but I do transform mutable collections. Isn't allowing that the reason why they're in the same hierarchy as immutable collections? I know (from bitter experience) that trying to extend anything you're going to transform is a losing battle, but sometimes that means I give up on the extension battle so I can transform. (Sometimes I give up transformations so I can extend.) If you use composition instead of inheritance for modifications, though, your extensions can survive reasonable transformations.
There is a gain, but it may well not be enough gain to warrant the inconsistency. I grant that maintaining consistency is very important when possible. For the record, I'm perfectly happy to go back to overriding and to mark every dangerous method as final (which is most of them). --Rex |
| Re: [scala-internals] Re: AnyRefMap | Simon Ochsenreither | 30/01/14 14:56 |
That's my point too. It would be terribly inconsistent if we made one collection non-final, while all the others are going to be final in the near future. |
| Re: [scala-internals] Re: AnyRefMap | rklaehn | 30/01/14 14:57 | There are a lot of methods in scala collections, and it is easy to introduce inconsistencies by overriding some methods and not others. That is why advice to java programmers these days is "Design and document for inheritance or else prohibit it". (Joshua Bloch, Effective Java). This is basically gospel in java land these days. One of the few things that C# got right was to not make virtual the default for methods. But how would you design and document for inheritance with scala collections? You could make every method final that is difficult to override because it has to be consistent with other methods. That would be basically all of them except for default in case of AnyRefMap. And then you have to make sure to also do final overrides in case new methods are introduced in one of the traits you inherit from. |
| Re: [scala-internals] Re: AnyRefMap | martin | 30/01/14 15:03 | Here is more breakage: new AnyRefMap[String, String] with SynchronizedMap[String, String] <console>:11: error: illegal inheritance from final class AnyRefMap
new AnyRefMap[String, String] with SynchronizedMap[String, String] So, we can't have a synchronized AnyRefMap? Regarding Adriaan's argument of consistency, I checked all subclasses of mutable.Map. All but one of them are non-final. The one which is final is TrieMap, another late addition to the collections, and if I believe its finality is a big mistake for the same reason I am speaking up now. If I had remarked this when it was introduced I would have raised my objections as well.
I know the arguments of composition over inheritance ad nauseam. But consistency trumps everything. - Martin Prof. Martin Odersky
LAMP/IC, EPFL Station 14, 1015 Lausanne, Switzerland Tel: +41 21 693 6863 |
| Re: [scala-internals] Re: AnyRefMap | Eiríkr Åsheim | 30/01/14 15:03 | I don't have a lot to add here, but I would like to echo Rex that letting users subclass and AnyRefMap is likely to introduce bugs and will probably also compromise performance by making monomorphic method calls bimorphic or megamorphic.
I am skeptical that inheriting the class and avoiding all the possible bugs will produce a net performance win. |
| Re: [scala-internals] Re: AnyRefMap | rklaehn | 30/01/14 15:11 | SynchronizedMap is a terrible idea. It creates the illusion of thread-safety while leaving the user open to difficult to find bugs. So I would count not being able to use SynchronizedMap as an advantage of making AnyRefMap final. There are dozens of methods in SynchronizedMap that are not overridden. So if you use one of them you will not be synchronized and will see the collection in an inconsistent state.
|
| Re: [scala-internals] Re: AnyRefMap | √iktor Klang | 30/01/14 15:11 | In my opinion, SynchronizedX should be deprecated, it's the wrong solution to the wrong problem. |
| Re: [scala-internals] AnyRefMap | martin | 30/01/14 15:13 | On Thu, Jan 30, 2014 at 11:53 PM, Rex Kerr <ich...@gmail.com> wrote: These are potential problems every Map implementation has. How many observed problems do we have in the JIRA database?
I don't believe that's 100% true. Does remove call get or put in the Java implementations? I do not want to argue that an inheritance based design is inevitable. Maybe a type-class based design or something else is better. We'd have to have one to experiment with to be able to tell for sure. What I do very strongly believe is that a design cannot and should not be gradually undermined and riddled with inconsistencies because we do not agree with its basic premises. Throw it out completely and replace it with a new one. Or keep to it. That should be our only two options.
Cheers - Martin
|
| Re: [scala-internals] Re: AnyRefMap | Simon Ochsenreither | 30/01/14 15:14 |
If a deprecated, bit-rotted trait is the best use-case for keeping classes non-final, I'm delighted about the current approach of making everything final. |
| Re: [scala-internals] AnyRefMap | Erik Osheim | 30/01/14 15:16 | On Fri, Jan 31, 2014 at 12:13:38AM +0100, martin odersky wrote:It makes sense that we should be giving users a consistent message about this stuff. So my vote goes to throwing it out (e.g. deprecating it). Show of hands? -- Erik |
| Re: [scala-internals] Re: AnyRefMap | Rex Kerr | 30/01/14 15:16 | SynchronizedMap is deprecated because adding synchronization via a trait is a hopeless endeavor. Invariably, something like this will happen without anyone noticing: class MyMap extends YourMap {} } ... I wish there was a good way around this. I can't think of one. If anyone can, I'm happy to implement it. |
| Re: [scala-internals] Re: AnyRefMap | Simon Ochsenreither | 30/01/14 15:17 |
It is already deprecated. |
| Re: [scala-internals] Re: AnyRefMap | martin | 30/01/14 15:28 | I was not aware that SynchronizedMap is deprecated in 2.11. I am not sure again we need to do this. This means that any code that mixes in a SynchronizedMap in a custom map is invalidated. Pretty tough, because I do not see a possible replacement for such code. Here's an example, from the compiler itself:
val unitOfFile = new LinkedHashMap[AbstractFile, RichCompilationUnit] with SynchronizedMap[AbstractFile, RichCompilationUnit] We _do_ need a LinkedHashMap there, but unfortunately ConcurrentHashMap is not ordered. What to do?
Can we not fix SynchronizedMap instead? Sure, it's painful to keep track of all the methods, but with a bit of inventiveness these things can be automated. Cheers - Martin On Fri, Jan 31, 2014 at 12:14 AM, Simon Ochsenreither <simon.och...@gmail.com> wrote:
|
| Re: [scala-internals] AnyRefMap | rklaehn | 30/01/14 15:28 | On Fri, Jan 31, 2014 at 12:13 AM, martin odersky <martin...@epfl.ch> wrote:
If a user overrides e.g. AnyRefMap in a wrong way and ends up with a Map with inconsistent behavior, it won't show up in the scala JIRA database. It will just be a puzzled user that can't figure out why his code doesn't work.
I think using inheritance within the scala collections is fine. People contributing to scala know the pitfalls. But when using the scala collections from the outside people should not inherit from "leaf" collections. Inheriting from traits like Map/MapLike that are designed for inheritance is fine with some guidance in the docs.
|
| Re: [scala-internals] AnyRefMap | Adriaan Moors | 30/01/14 15:29 | On Thu, Jan 30, 2014 at 3:13 PM, martin odersky <martin...@epfl.ch> wrote: riddled with inconsistencies Maybe we're talking about a different kind of consistency, but I don't see how making all leaf classes final is inconsistent. It also fosters another kind of consistency: the internal behavior of the methods defined in these classes. If you want to make your own collection, implement one of the interface traits, possibly with an underlying existing collection.
|
| Re: [scala-internals] AnyRefMap | Rex Kerr | 30/01/14 15:32 | On Thu, Jan 30, 2014 at 3:13 PM, martin odersky <martin...@epfl.ch> wrote:
This isn't the right metric. The question to ask is: how many observed problems _are there in the bug databases of people who extended maps_? I don't know. As I said above, 8 out of 8 implementations that changed += did not also change put. (They did mostly change -=.) So this argues that, first, people do like to override collections; but, second, they don't have adequate tools to do it correctly. Maybe _they_ know never to use "put" and always use "+=", but if I then go use that library, how am I to know? It's just a deep source of problematic bugs waiting to happen. I'm willing to allow some level of bugs in exchange for nice capabilities, or in exchange for nice features. But it's a tradeoff, and some implementations--and AnyRefMap is one of them--have too many risks for it to be worth it. I am happy to wall off all the danger areas that I have created, though. If we had some way to express to the compiler what kind of rules should apply to overriding, then I wouldn't be worried. But we don't presently.
No, but it's strange indeed to rely upon remove to do your lookups (since you then have to put them back). This is what I mean by "almost every case". There is an obvious use for remove: remove elements. And an obvious use for put: add elements. And an obvious use for get: look up elements. If you use the non-obvious features, you may end up in trouble. In Scala's case, it's the obvious use of the obvious features which are in danger of getting out of sync. Update or += or put--three ways to do the same thing--had better agree.
I agree with those as the options, and I also think that nothing has proven itself better than inheritance for forming the core hierarchy for code reuse. So I wouldn't suggest throwing it out; I haven't the confidence that a replacement would be superior. But I also think that finality is there for a reason--it's to protect you when implementations really aren't safe to muck with. And very often, by the time you get to the leaves of the collections hierarchy, there is a lot that requires deep knowledge to muddle with safely, and thus probably shouldn't be muddled with at all. Anyway, maybe the resolution should be that I un-finalize AnyRefMap but make a lot of the methods final. SynchronizedMap-type stuff won't work, but overriding default will. (Nobody relies upon filter preserving defaults, so losing that might actually be an advantage consistency-wise.) --Rex |
| Re: [scala-internals] AnyRefMap | Jason Zaugg | 30/01/14 15:37 | It would be better a wrapper like in java. Then it isn't susceptible to linearisation gotchas. I looked at those a while back and found the impl to be a stark reminder of how much our mixins leave unprotected, for example toString, equals, serialisation. But in your example I would tend to write a wrapper manually that encapsulates the mutable unsynchronised map and offers a smaller API. It could contain a method to return an immutable snapshot if clients need rich query operations
Jason
|
| Re: [scala-internals] Re: AnyRefMap | Adriaan Moors | 30/01/14 15:37 | It was based on this discussion: https://groups.google.com/d/msg/scala-internals/1gCbqTRCxAQ/7fJS6YsN50wJ Assuming we need a good synchronized collection, while also agreeing "synchronization via mixin override trait is unnecessary",
the logical conclusion (to me) is deprecating the approach we don't like so that we can replace it by a better one. We are making users go WTF all over the place with things that are broken like this (see also: delayedinit). I say we should at the very least warn them, even if we don't have a better alternative. (The Java concurrent collections come to mind as an alternative.)
|
| Re: [scala-internals] AnyRefMap | martin | 30/01/14 15:39 | On Fri, Jan 31, 2014 at 12:29 AM, Adriaan Moors <adr...@typesafe.com> wrote: But we are not making all the leaf collections final. That would be a different collection library. We should have a discussion whether/when we can do do that. My guess is that we should not attempt any of this in the 2.x timeframe because it would just undermine the design the underlies the current collections. The current collections are _designed_ so that you can override the default method (why else have it?), so that you can mix in traits like Synchronized (OK, maybe Synchronized is a bad idea but we have to come up with a valid (composition-based?) replacement) and so on. We can discuss about a big jump that undoes all this design and suffers the corresponding breakage in user code. But doing this gradually, class after class strikes me as death by a 1000 needles.
Cheers - Martin
Prof. Martin Odersky
LAMP/IC, EPFL Station 14, 1015 Lausanne, Switzerland Tel: +41 21 693 6863 |
| Re: [scala-internals] Re: AnyRefMap | Rex Kerr | 30/01/14 15:41 | As a replacement, I would suggest something like class Synchronized[A <: AnyRef](val repr: A) extends AnyVal { def apply[B](f: A => B): B = repr.synchronized{ f(repr) } } You do have to replace all uses of x.stuff with x{_.stuff}, but otherwise it's pretty painless and wraps everything without possibility for error. Well, without any more possibility for error than you normally have when synchronizing. --Rex |
| Re: [scala-internals] AnyRefMap | martin | 30/01/14 15:43 | On Fri, Jan 31, 2014 at 12:37 AM, Jason Zaugg <jza...@gmail.com> wrote: Yes, that's a good point.
I am not sure we can demand from users to write these wrappers that were so far created for them in the library. How about we throw together a SynchronizedWrapper for Map, then we can safely deprecate the old class?
Cheers - Martin
|
| Re: [scala-internals] AnyRefMap | Jason Zaugg | 30/01/14 15:45 | This misses the use case of passing the collection to existing code. Jason
|
| Re: [scala-internals] Re: AnyRefMap | martin | 30/01/14 15:45 | On Fri, Jan 31, 2014 at 12:37 AM, Adriaan Moors <adr...@typesafe.com> wrote: Agreed. I am just arguing we should do both things at the same time, that would avoid some unnecessary migration grief. Thanks all for the animated discussion! I have to drop out now and catch some sleep.
- Martin
|
| Re: [scala-internals] AnyRefMap | Rex Kerr | 30/01/14 15:50 | Just to be clear: the immutable collection leaf classes are already impossible to sensibly override. We don't do it, anyone who tries is liable to break everything, and in practice nobody does (from a survey of about a half gig of bytecode from various projects selected...well...I'm not exactly sure how Adriaan picked them). Overriding most of them them is deprecated now, if they weren't already final. I doubt anyone will notice. The mutable classes can be overridden, and people do, but there is constant tension with the builder capabilities, which throw away your overriding.This is the great problem with the builder approach. You can't just say and be okay. You certainly can't do it anonymously and be okay. If we had enough syntactic sugar to generate the entire framework for a new collections hierarchy member in a single line, then the argument that everything at least in the mutable hierarchy should be overridable would be much stronger. As it is, I think we should take an intermediate approach: make final or deprecate overriding from those things which are really hard to get right, while maintaining the flexibility to override mutable collections whose implementations make this less like walking in a minefield. --Rex |
| Re: [scala-internals] AnyRefMap | Rex Kerr | 30/01/14 15:52 | If it is important to pass the collection to existing code, agreed. It's not a complete solution. --RexFrom what I saw in the compiler, at least, that wasn't a dominant use case. But I didn't look around very much.
|
| Re: [scala-internals] Re: AnyRefMap | √iktor Klang | 30/01/14 16:25 | Great to see that it happened. On Fri, Jan 31, 2014 at 12:17 AM, Simon Ochsenreither <simon.och...@gmail.com> wrote:
|
| Re: [scala-internals] Re: AnyRefMap | Matthew Pocock | 30/01/14 16:40 |
This is a transform I hit often enough that I wonder if there is potential for sugar to rewrite x.foo to x.apply (foo) when foo is not a member and foo is in scope and it does not clash with an implicit and there is an apply with a single argument with a type matching the type of foo. Perhaps this is a terrible idea. Matthew |
| Re: [scala-internals] Re: AnyRefMap | Seth Tisue | 30/01/14 19:34 | Martin writes: I believe this puts too much faith in the contents of JIRA as a reflection of what everyday Scala programmers are doing. The people who take the trouble to isolate issues and turn them into new JIRA tickets are generally conscientious, already wise in the ways of Scala, and a little world-weary. These personality traits are highly negatively correlated with a tendency to even attempt to inherit from a Scala collection class! They are positively correlated with having figured out long ago that doing that is about as much fun as stuffing a hornet's nest in your pants. Thus: no tickets. But absence of evidence is not evidence of absence. Seth |
| Re: [scala-internals] Re: AnyRefMap | Haoyi Li | 31/01/14 00:46 | > No need for 3 different collection libraries I wouldn't put too much faith in the quality of the Java collections library. Apart from the Std Lib, there's org.apache.commons.collections and the Google Guava collections, both of which are pretty widespread and definitely not niche libraries. > Answer quickly: What is the first things that comes to mind as an argument for a collection constructor? I would wager it's definitely not the behavior in case of a key not found. Coming from Python-land, which is where I spend my salaried hours every day, the idiomatic way of creating a dictionary with a default is defaultdict(default_func), which you pass in a thunk which is used to generate defaults. It's pretty straightforward; in Scala maybe you'd have Map.withDefault(defaultFunc) instead, and maybe defaultFunc could take the key as an argument. In fact this is exactly what Ruby dictionaries do. This "inherit and override" strategy to pass in a simple function is basically a Java-ism, because without easy HoFs, that is the only way to "pass" a function to the constructor, and is in a similar vein to Java's inheritance-based collection-literals. Passing a function to the constructor/factory is standard fare in Ruby/Python and isn't unintuitive at all: I want a collection with a default, what more obvious thing do I need to provide than the way of calculating that default? |
| Re: [scala-internals] Re: AnyRefMap | Raul Bache | 31/01/14 00:51 | http://docs.scala-lang.org/overviews/macros/quasiquotes.html, the examples wraps the methods in future blocks. Couldnt we have a SynchronizedX-macro that overrides and wraps all methods of the class in synchronization blocks? |
| Re: [scala-internals] Re: AnyRefMap | Hanns Holger Rutz | 31/01/14 00:59 | On 30/01/2014 23:50, Adriaan Moors wrote:
> Making the leaf collection classes final will only make them more > consistent, > because we'll be fully in control of them, rather than having to > anticipate all possible subclasses. I agree with you. I also wonder... does it make sense at all to override a leaf collection class if it's immutable? As soon as you modify anything you would need also to make sure that you override `newBuilder` and all that stuff. I think the problem is rather that the mental effort to create your own custom `Map` class is high. I once did that for classes `osc.Bundle` and `osc.Message` (https://github.com/Sciss/ScalaOSC/blob/master/src/main/scala/de/sciss/osc/Packet.scala#L567), mixing in with `LinearSeq` and `LinearSeqLike`. Why both by the way? Not very intuitive. I had to adjust these classes between Scala 2.7 and 2.8 and 2.9, but I think it stabilised now (I don't remember having to change anything for Scala 2.10). So my thought is, do make the leaf classes final, use `withDefault` if the few microseconds of performance don't count, otherwise build your own with `MapLike`. For that last case, probably it would be good to have an "official" tutorial on how to achieve that. Here is an example for a `mutable.MapLike`: https://stackoverflow.com/questions/4224242/implement-a-scala-collection-so-that-map-filter-etc-produce-the-right-type#4225324 – I think that's the way to go if you really can't live with `withDefault`... 2c, .h.h. |
| Re: [scala-internals] AnyRefMap | Hanns Holger Rutz | 31/01/14 01:05 | +1
That was the whole point of things like `MapLike`—be able to put new collections together by mixing in these traits. |
| Re: [scala-internals] AnyRefMap | martin | 31/01/14 02:48 | Thanks for the feedback! There are too many mails to respond to them all, so let me try to summarize and explain my position. 1. Inheritance is a fundamental building block for Scala software but it has to be used carefully (and I readily admit that the current collection could do with less, and would in fact be keen to see alternative implementations). A good guideline is: "Every method should be abstract or final, or should be a documented extension point". In my mind that's a better guideline than "all leaf classes should be final". The latter has it backwards: Final classes are by necessity leaf-classes, but who is otherwise able to define what a leaf class is?
2. The default method is clearly documented as a "hook", i.e. a documented extension point. And in fact I believe there's a good reason for that. Mainly, performance is the concern here. If one overrides default, the chances are that this is done so that instead of `get` one can call `apply` directly. So `default` is likely called on the most common access path for the collection. We simply do not want another closure object and an indirection for something like that.
The alternatives from Python do not apply, because Python has a completely different performance profile, and was not OO from the start. 3. Some people have said that MapLike and friends could replace the concrete Map classes as classes that are inherited. I am afraid that will not work. The ...Like classes are there to add utility methods such as map, filter, take to a base collection. ...Like classes do not contain a collection implementation themselves, they play essentially the role of containers of extension methods. So saying clients should extend MapLike instead of HashMap amounts to saying that they should write their own HashMap implementation from scratch!
For all of these reasons I would say, let's try to work towards guideline (1), but do so in a way which is consistent over the whole collections. I do not see right now a better alternative for the `default` method. It does what it needs to do with minimum fuzz and performance overhead. So, I would suggest we make AnyRefMap non-final, but at the same time make as many of its methods final as we can. That will involve methods in its base classes, first, because a lot of the critical methods are probably inherited by AnyRefMap and second, because an isolated "improvement" for AnyRefMap would be a pessimization overall for breaking consistency.
What do people think about this? - Martin On Fri, Jan 31, 2014 at 10:05 AM, Hanns Holger Rutz <con...@sciss.de> wrote: +1 |
| Re: [scala-internals] AnyRefMap | Peter Empen | 31/01/14 02:56 | One general policy of the collection library has been to refuse extension requests with no obvious virtue for all users. This implicates that in edge cases users are left with the single option to extend a specific collection to be able to add functionality that you cannot implement without access to protected members. Copying sources is not a real option, right? Also, it's a bit degrading argument to say that users who need extensions are not capable of inheriting properly. I'd advokate for letting it be their own responsibility. |
| Re: [scala-internals] AnyRefMap | Jason Zaugg | 31/01/14 02:58 | On Fri, Jan 31, 2014 at 11:48 AM, martin odersky <martin...@epfl.ch> wrote: I would prefer to keep the leaves final, even if that means: /** Document general pattern of extension */ abstract class BaseAnyRefMap {
// not safe to override final def foo // document relationship between these methods (you must override both!) def foo1 = ... def foo2 = }
final class AnyRefMap extends BaseAnyRefMap { // we can exploit finality here in every which way override def getOrElse(...) = // avoid allocating Some } Cons: 1. its a bit harder to find which class to extend 2. inertia 3. designing for extension is still hard! Pros 1. See #1 from cons :). Extending collections is an advanced usage of collections, IMO, and I've seen too many beginners reach for it as their very first attempt ("how can I make my Family class extends List[Person]"?)
2. we have more the freedom to optimize in the leaves. 3. could be retrofitted to existing collections to offer people a migration path -jason |
| Re: [scala-internals] AnyRefMap | Rex Kerr | 31/01/14 03:08 |
I like that guideline. I think the existing classes are very very very far outside of that ideal, however. Very little is documented as an extension point.
I benchmarked it both ways, and there is a small difference. But emphasis on _small_. The reason is that one extra indirection doesn't cost very much in heavily-used code, and lookup is already a pretty expensive operation. I don't recall what the proportion was, but I'm pretty sure it was under 5%. But I agree that it is clearly documented as a hook, and that has value.
Agreed. MapLike does not give you a map.
I recommend leaving other classes alone, mostly because it's a ton of work to wade through everything in base classes and think about whether each method can be sensibly overridden. Usually the answer will be yes, after some deep thought, but yes-if-you-keep-these-things-in-mind. And so that knowledge will have to be imparted to scaladoc. I am quite sure I cannot get this done in anything like a comprehensive way before RC1, and right now AnyRefMap is an anomaly. So I suggest instead that I _don't_ do anything to base classes (actually, there aren't any, only traits), since AnyRefMap's core functionality is all within AnyRefMap. Of course it gets a ton of stuff from MapLike, but as you just said, MapLike doesn't implement a hash map. AnyRefMap does, and it's those internals that are hardest to get right. And then we can gradually work towards better documentation of how to extend methods. (Or, actually, I don't think that is ever going to happen in practice just through manual labor. Instead, we need the compiler to tell us dependencies so the information is available automatically, so that when one is compiling, the compiler will warn you when you override too little to have a chance of making sense.) --Rex |
| Re: [scala-internals] Re: AnyRefMap | Jason Zaugg | 31/01/14 03:22 | On Thu, Jan 30, 2014 at 11:08 PM, Rex Kerr <ich...@gmail.com> wrote:
BTW, *if* you end up keeping the default as a function passed to the constructor, we ought to consider making that constructor private to allay the objection that `new Collection(f)` ought to create a collection with `f` as an element. A named factory in the companion would be preferable. (But as usual, I'm not great at suggesting the name!)
Relatedly, we've only just recovered from this wrinkle: scala> new Vector(1, 2, 3) java.lang.NullPointerException -jason
|
| Re: [scala-internals] Re: AnyRefMap | Haoyi Li | 31/01/14 06:03 |
I don't think OO from the start means do-everything-with-inheritance. I mean, Ruby is "OO from the start" and does the exact same thing as Python. Do-everything-with-inheritance is Java-style OO to make up for their lack of lambdas, since inheriting is less clunky than passing in an anonymous-inner-class-instance. Maybe the performance gain from not passing in a lambda is worth sacrificing the ideal API, but Rex seems to indicate it's not a huge hit. I agree with everything else said =)
|
| Re: [scala-internals] Re: AnyRefMap | Simon Ochsenreither | 31/01/14 06:19 |
|
| Re: [scala-internals] AnyRefMap | martin | 31/01/14 06:35 | On Fri, Jan 31, 2014 at 12:08 PM, Rex Kerr <ich...@gmail.com> wrote: Right. It will be some work to get there, and we have to go carefully for risk of breakage. But that should be to goal, anyway,
I agree with all of it. Makes sense to me. Cheers - Martin
|
| Re: [scala-internals] Re: AnyRefMap | rklaehn | 31/01/14 07:06 | On Fri, Jan 31, 2014 at 9:59 AM, Hanns Holger Rutz <con...@sciss.de> wrote:
At least for the immutable collections I am familiar with (HashSet, HashMap), I can guarantee that it is absolutely impossible to override them and get anything but chaos. HashSet and HashMap should have been sealed from the beginning. The same is almost definitely true for the other immutable collections.
|
| Re: [scala-internals] Re: AnyRefMap | Simon Ochsenreither | 31/01/14 13:16 |
It's kind of mind-boggling that it's 2014 and we are discussing whether subclassing collection classes is the correct design approach. If someone wants to have a non-final collection class, I'd recommend that this person steps forward and writes the documentation on how exactly this I supposed to be done for that class. Everyone who touched the collection framework lately says it's impossible, so that guide would probably be very enlightening for all of us. |
| Re: [scala-internals] AnyRefMap | Jason Zaugg | 01/02/14 01:25 | On Fri, Jan 31, 2014 at 12:43 AM, martin odersky <martin...@epfl.ch> wrote: I don't think one can throw this sort of thing together. It has to be engineered. One point that occurred to me yesterday was that even synchronized wrappers are pretty broken as soon as the API contains higher-order methods. For example, synchronizing `getOrElseUpdate` means that we call back to user code under a lock, which is a classic recipe for deadlock. I was curious as to how JDK8 handles this. Looks like they way you must explicitly convert to Stream before getting using any HOM means that they sidestepped this thorny issue. (j.u.Collection doesn't contain, even as a default method, `forEach`)
I know it's not that palatable, but people can copy/paste the current broken trait into their own code base (it is 64 loc) if the can't abide by the deprecation. -jason
|
| Re: [scala-internals] AnyRefMap | rklaehn | 01/02/14 02:05 | I think the idea of using a standard mutable collection and then somehow wrapping/decorating it to be thread safe is fundamentally flawed regardless of whether you use a wrapper or a mixin. The only context where I feel comfortable using mutable collections that are not designed for concurrency in a situation that requires concurrency is within an actor.
There are several clean ways to deal with concurrency in the typesafe stack (Actors, Agents, ...). Why do we have to have ways like SynchronizedXXX that will very likely lead to trouble, at all? final class SynchronizedRef[T](initial: T) { private[this] var _current = initial def current = _current def update(f: T => T): Unit = { synchronized {
_current = f(_current) } } }
|
| Re: [scala-internals] AnyRefMap | rklaehn | 01/02/14 02:11 | Here is a more complete implementation of the approach described above:
https://www.bionicspirit.com/blog/2013/05/07/towards-better-atomicreference-scala.html |
| Re: [scala-internals] Re: AnyRefMap | Peter Empen | 01/02/14 03:02 | Please advice me on how to solve the following real-world requirement other than by inheritance: About two years ago a client asked me to provide O(1) selection of random direct successors to a given node of a graph to enable him to implement random walks in an efficient manner. Until then successors were stored in collection.mutable.HashSet instances. I was able to use strait-forward inheritance to satisfy his request by adding a draw method: https://github.com/scala-graph/scala-graph/blob/master/core/src/main/scala/scalax/collection/mutable/ExtHashSet.scala. To confirm, it is impossible to extend collection.imutable.HashSet mainly because the classes HashSet1, ..., HashTrieSet are placed in the companion object. What would be best practice to implement random selection in this case? For another reason, I recently also tried to extend java.util.IdentityHashMap but I stranded due to package visibility and lots of private members. So I had to "convert" it to Scala (EqHashMap in the same folder). I believe if you decide to seal collection classes we do need a means that allows users to reuse collection implementations in some way. For instance by abstract base classes as indicated by Jason and others. |
| Re: [scala-internals] Re: AnyRefMap | Simon Ochsenreither | 01/02/14 12:28 |
Absolutely! In my opinion, moving as much functionality to traits and abstract classes is the way to go, so that developers can easily reuse this functionality. Sub-classing concrete collection classes is dangerous in my opinion, because a) it gives people false impressions about the work involved b) it doesn't turn their collection class into a subtype of a Scala collection class, which might break reasonable assumptions everywhere it is used. |
| Re: [scala-internals] Re: AnyRefMap | Rex Kerr | 01/02/14 12:28 | On Sat, Feb 1, 2014 at 3:02 AM, Peter Empen <peter...@arcor.de> wrote: mutable.HashSet remains inheritable-from, so this wouldn't change. (Also, your implementation is not statistically sound. Elements near large empty gaps will be selected more often ((k+1) times more often for a gap of k). In a large set, gap sizes will be approximately Poisson-distributed.) Under most scenarios, you'd be better off maintaining an array with the same items. You get O(1) addition to the end, O(1) deletion (because order is unimportant, so you just take the last item off and stuff it in the slot of the removed item), and fast O(1) lookup. I understand that it would be nice if possible to solve these problems using inheritance, but the key to doing so is not that people have marked the collections as not final, but that the design promotes inheritance. Sometimes there's an inherent conflict between that goal and others (performance or correctness, for instance). Inheritance should not always win. --Rex |
| Re: [scala-internals] Re: AnyRefMap | rklaehn | 01/02/14 13:36 | On Sat, Feb 1, 2014 at 12:02 PM, Peter Empen <peter...@arcor.de> wrote:
The draw method just needs access to the internals of the HashSet, but does not override any behavior. So since the internals are not private but private[collection], you could just write an enhancement like this: package scala.collection.mutable import scala.util.Random object HashSetExt { implicit class HashSetDraw[A](val value:HashSet[A]) extends AnyVal { def draw(random:Random) : A = { val table = value.hashTableContents.table val len = table.length val drawn = random.nextInt(len) // IllegalArgumentException if len == 0 def search(from: Int, incr: Int, hold: Int => Boolean): Option[A] = { var i = from while (hold(i)) { val elem = table(i) if (elem ne null) return Some(elem.asInstanceOf[A]) else i += incr } None } // must always find an element either upwards or downwards search(drawn, 1, (i: Int) => i < len) getOrElse( search(drawn - 1, -1, (i: Int) => i > 0).get) } } } And the good thing is that this works with any s.c.m.HashSet, not just your personal version. So if you get a s.c.m.HashSet from somewhere else you can immediately use draw instead of having to copy the data into one of your HashSetExt. Here is how you would use it: scala> :paste // Entering paste mode (ctrl-D to finish) import scala.collection.mutable._ import HashSetExt._ val r = new scala.util.Random val hs = HashSet(0 until 10:_*) hs.draw(r) // Exiting paste mode, now interpreting. import scala.collection.mutable._ import HashSetExt._ r: scala.util.Random = scala.util.Random@23e0b48 hs: scala.collection.mutable.HashSet[Int] = Set(0, 9, 1, 5, 2, 6, 3, 7, 4, 8) res3: Int = 8 scala> But please note that as Rex has pointed out you will not get out all elements with the same probability. scala> val stats = Array.fill(1000000)(hs.draw(r)) stats: Array[Int] = Array(0, 5, 5, 7, 3, 1, 4, 9, 8, 6, 1, 8, 8, 6, 5, 5, 8, 8, 1, 9, 8, 5, 9, 8, 6, 9, 8, 8, 7, 7, 5, 7, 9, 8, 5, 7, 8, 9, 1, 1, 8, 7, 7, 9, 8, 4, 7, 5, 7, 9, 8, 4, 7, 9, 8, 1, 5, 8, 6, 9, 0, 2, 6, 6, 8, 0, 1, 1, 5, 5, 7, 8, 2, 1, 6, 7, 7, 5, 5, 5, 2, 6, 8, 4, 5, 8, 6, 5, 2, 8, 7, 8, 1, 1, 6, 1, 6, 8, 8, 5, 9, 8, 6, 8, 9, 8, 0, 1, 7, 1, 6, 8, 5, 1, 6, 7, 5, 8, 1, 8, 6, 7, 4, 6, 6, 8, 6, 1, 9, 1, 8, 7, 5, 9, 8, 7, 5, 8, 6, 7, 7, 5, 6, 0, 2, 5, 8, 1, 9, 5, 6, 0, 8, 7, 1, 2, 7, 8, 6, 6, 8, 6, 7, 8, 1, 7, 9, 5, 5, 8, 6, 6, 5, 9, 7, 8, 5, 5, 1, 7, 5, 6, 8, 7, 3, 5, 2, 7, 9, 1, 8, 7, 5, 8, 8, 7, 7, 6, 5, 9, 6, 7, 6, 0, 7, 7, 7, 5, 7, 8, 7, 1, 9, 5, 2, 8, 6, 8, 5, 5, 8, 3, 7, 8, 8, 6, 7, 7, 8, 7, 1, 8, 7, 6, 7, 4, 6, 6, 7, 5, 9, 6, 8, 7, 8, 1, 7, 1, 1, 5, 7, 1, 8, 8, 3, 9, 6, ... scala> stats.groupBy(x => x).map { case (k,vs) => (k,vs.size) }.mkString("\n") res18: String = 0 -> 31099 5 -> 156567 1 -> 93721 6 -> 156340 9 -> 93571 2 -> 31068 7 -> 155970 3 -> 31543 8 -> 218617 4 -> 31504
No, it is the other way round: HashSet1, HashSetCollision1 and HashTrieSet are in the companion object because it is impossible to inherit from them. There are basically three reasons for this: 1. Immutable objects need to be able to make copies of themselves. If you override an immutable object, you need to make sure that whenever you make a copy you make a copy of the right type. If you have an HashSetExt, the copy must also be of type HashSetExt and not just HashSet. So all places in HashSet where instances are being created with new HashTrieSet / new HashSet1 / new HashSetCollision1 would have to be replaced with calls to virtual factory methods (mkHashSet1, mkHashTrieSet, mkHashSetCollision1) that generate the correct type. 2. You would have to extend not just HashSet, but HashTrieSet, HashSet1 and HashSetCollision1. In all three cases you would have to override the virtual factory methods mentioned above to return the right types. 3. You would have to add a self type to make sure that the compile time type of operations is correct. Otherwise you would get something like this: val x = HashSetExt.empty[Int] // draw can be called, even though it does not make much sense on an empty set val y = x + 1 // type is HashSetExt, but compile time type HashSet, not HashSetExt. So the method .draw would not be available without a cast. You would probably end up with the following class hierarchy: // the base classes for inheritance HashSetLike[Repr,T] (with the three virtual methods mentioned above) HashSet1Like[Repr,T] extends HashSetLike[Repr,T] HashSetCollision1Like[Repr,T] extends HashSetLike[Repr,T]
HashTrieSetLike[Repr,T] extends HashSetLike[Repr,T] // the normal immutable HashSet HashSet[T] // overrides the mkXXX methods HashSet1[T] extends HashSet[T] with HashSetLike[HashSet[T],T] HashSetCollision1[T] extends HashSet[T] with HashSetCollision1Like[HashSetCollision1[T],T] HashTrieSet[T] extends HashSet[T] with HashTrieSetLike[HashTrieSet[T],T] // any custom immutable HashSet HashSetExt[T] // overrides the mkXXX methods HashSetExt1[T] extends HashSetExt[T] with HashSetLike[HashSetExt1[T],T] HashSetExtCollision1[T] extends HashSetExt[T] with HashSetCollision1Like[HashSetExtCollision1[T],T] HashTrieSetExt[T] extends HashSetExt[T] with HashTrieSetLike[HashTrieSetExt[T],T] As you can see, pure insanity.
I agree that it would be nice to be able to reuse data structures from scala collections. But I think the current situation regarding reuse of existing data structures is far from ideal even with non-final classes. For example, the Trie data structure underlying the immutable HashSet and HashMap could be a very efficient data structure for something like storing an ordered set of integers without any hashing. But the algorithm is so entangled with the various boilerplate to make it fit into the scala collection hierarchy as a Set or Map that there is no way to do it. One place where it is done better is RedBlackTree. The class RedBlackTree just implements the data structure itself without any additional boilerplate. It is wrapped in two different ways as TreeSet and TreeMap. But if you just want a RedBlackTree for your own data structure, you can just take it and wrap it in some other way. The downside is of course that the wrapping creates another indirection. But in case of TreeMap/TreeSet this indirection is necessary anyway because RedBlackTree does things internally (using null for empty for efficiency) which you have to hide from the outside. If only there was a way to do a wrapper without runtime overhead. Value classes do not work because they do not allow redefining equality and hashcode, and that is needed for various base traits of the scala collection hierarchy. You can make a value class that extends TraversableOnce, but anything more complex than that (Traversable, Iterable, Seq etc. ) requires a full blown object with runtime overhead. |
| Re: [scala-internals] Re: AnyRefMap | rklaehn | 01/02/14 14:11 | Here is a version of an immutable set with a draw method that should have good performance as long as you don't mix building the set and draw() calls too much.
import scala.collection.generic.ImmutableSetFactory import scala.collection.SetLike import scala.util.Random case class IndexableSet[T](s:Set[T]) extends Set[T] with SetLike[T, IndexableSet[T]] { def contains(elem: T): Boolean = s.contains(elem) def +(elem: T): IndexableSet[T] = copy(s + elem) def -(elem: T): IndexableSet[T] = copy(s - elem) def iterator: Iterator[T] = s.iterator override def empty = IndexableSet.empty private[this] lazy val elements = s.asInstanceOf[Set[AnyRef]].toArray def draw(r:Random) = elements(r.nextInt(elements.length)).asInstanceOf[T] } object IndexableSet extends ImmutableSetFactory[IndexableSet] { override def empty[A]: IndexableSet[A] = new IndexableSet(Set.empty[A]) } On Sat, Feb 1, 2014 at 12:02 PM, Peter Empen <peter...@arcor.de> wrote:
|
| Re: [scala-internals] Re: AnyRefMap | Peter Empen | 02/02/14 01:15 | Rüdiger, I like your suggestion for HashSetDraw. Admittedly, I wasn't aware of private[collection] hashTableContents. Partly because Scaladoc doesn't list members with widened private visibility, partly because if I see protected var table, I'm not yet used to look for other members exposing the very same. Maybe protected[collection] var table would do it as well with a direct clue? (I understand hashTableContents is conceptually clean but it allocates additional memory.) Yes, I did realize that the probability distribution is not even but I put up with it in the absence of a better alternative. |
| Re: [scala-internals] Re: AnyRefMap | rklaehn | 02/02/14 01:59 | I think reaching into the guts of a collection is not a good idea whether you use inheritance or enhancement. You rely on the implementation details of the original collection too much. I just wanted to show that you do not necessarily need inheritance. But I think this starts to become off-topic for scala-internals, so if you want to discuss this further PM me.What's wrong with the wrapping approach? It works with any wrapped set, draw has an equal probability for each element, and it is even faster. It would not be that hard to do a similar thing for a mutable set if you need that last bit of performance that gives you.
|