AnyRefMap

Showing 1-66 of 66 messages
AnyRefMap martin 30/01/14 05:32
I was just trying out AnyRefMap. It's really nice to have a drop-in replacement for HashMap that avoids the "##" overhead that slows down the original HashMap. But I was frustrated that it's not a drop-in replacement. In particular I had several places where I inherited HashMap from an anonymous class or an object. Sometimes I needed to override the default value, at other times I wanted to add new behavior to the map. This works with HashMap (or any other map implementation), but not with AnyRefMap, because AnyRefMap is final. 

I think it would be great if we could eliminate this incoherence before 2.11RC1. What do others think?

Cheers

 - Martin

Re: [scala-internals] AnyRefMap Jason Zaugg 30/01/14 05:54
On Thu, Jan 30, 2014 at 2:32 PM, martin odersky <martin...@epfl.ch> wrote:
I was just trying out AnyRefMap. It's really nice to have a drop-in replacement for HashMap that avoids the "##" overhead that slows down the original HashMap. But I was frustrated that it's not a drop-in replacement. In particular I had several places where I inherited HashMap from an anonymous class or an object. Sometimes I needed to override the default value, at other times I wanted to add new behavior to the map. This works with HashMap (or any other map implementation), but not with AnyRefMap, because AnyRefMap is final. 

I think it would be great if we could eliminate this incoherence before 2.11RC1. What do others think?

I've added my thoughts to the ticket you created: https://issues.scala-lang.org/browse/SI-8214.

Rex will know the tradeoffs best, though.

-jason

Re: AnyRefMap Simon Ochsenreither 30/01/14 09:25

I was just trying out AnyRefMap. It's really nice to have a drop-in replacement for HashMap that avoids the "##" overhead that slows down the original HashMap. But I was frustrated that it's not a drop-in replacement. In particular I had several places where I inherited HashMap from an anonymous class or an object. Sometimes I needed to override the default value, at other times I wanted to add new behavior to the map. This works with HashMap (or any other map implementation), but not with AnyRefMap, because AnyRefMap is final. 

I think it would be great if we could eliminate this incoherence before 2.11RC1. What do others think?

I'd consider any non-final collection class a severe bug.

Concerning inheriting from HashMap to override the default value ... what's wrong with using withDefault/withDefaultValue?
Re: [scala-internals] AnyRefMap Rex Kerr 30/01/14 10:53
 I agree that it's nice to have perfect drop-in replacements for things, but it's much harder to fashion a replacement that is a drop-in replacement for inheritance as well as for usage (composition).  The internals of HashMap and AnyRefMap are different, and although I have mimicked them as much as I could while retaining good performance, I chose not to allow inheritance from AnyRefMap in part _because_ I was worried that people would treat it as too much of a drop-in replacement for an altered HashMap.  In particular, some speedups come from code duplication (so you need to override in multiple places in AnyRefMap to change certain behavior); in some cases the call chain has reversed (e.g. which method do you need to override in order to add custom behavior to lookups?).

Overall, it seemed that there was no safe way to inherit broadly from AnyRefMap, and it was a lot easier to make the whole class final than figure out which methods were dangerous and which were safe.  (And don't forget that any code that assumes inheritance from HashTable is going to break also.)

With respect to default values, there are two problems with using default values that come from inheritance.  First, you can't propagate your defaults anywhere: you add one filter, and suddenly you lose your defaults.  Secondly, they require you to have a non-final class in order to use them.  Even if we decided to make AnyRefMap non-final, someone else might want to inherit from it and make it final--and then they would lose the ability to set a default without specifically saying that was the desired behavior.

I do agree, however, that consistency is a considerable virtue.

So I am happy to change AnyRefMap to non-final, and I can also alter how defaults work, but both changes were made for good reason.  I suppose for aiding switching from HashMap to AnyRefMap, I could make AnyRefMap non-final but deprecate overriding with a note explaining how to achieve HashMap functionality with AnyRefMap without inheriting.

"AnyRefMap's implementation makes inheritance risky.  If you wish to set a default value, use the defaultEntry constructor parameter."

  --Rex


--
You received this message because you are subscribed to the Google Groups "scala-internals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scala-i...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: [scala-internals] AnyRefMap Simon Ochsenreither 30/01/14 12:54
I agree on this. We need to clearly restrict the warranty of the collection API to usage as much as possible. If we let people inherit, they will automatically depend on arbitrary implementation details, which were never meant to be exposed.

The collection library still needs tons of work, fixes and improvements all over the place. If we can't guide people to the supported usage which allows us to further evolve the API and the implementation (by making specific collection classes final), the only thing that's left is declaring the collection API as "done, finished, immutable, deprecated and bugs-are-features".

If this is supposed to be the case, then let's start thinking about a better design for scala.collection2 as soon as possible.
Re: [scala-internals] Re: AnyRefMap martin 30/01/14 14:03



On Thu, Jan 30, 2014 at 6:25 PM, Simon Ochsenreither <simon.och...@gmail.com> wrote:

I was just trying out AnyRefMap. It's really nice to have a drop-in replacement for HashMap that avoids the "##" overhead that slows down the original HashMap. But I was frustrated that it's not a drop-in replacement. In particular I had several places where I inherited HashMap from an anonymous class or an object. Sometimes I needed to override the default value, at other times I wanted to add new behavior to the map. This works with HashMap (or any other map implementation), but not with AnyRefMap, because AnyRefMap is final. 

I think it would be great if we could eliminate this incoherence before 2.11RC1. What do others think?

I'd consider any non-final collection class a severe bug.

Why? Java's collections are non-final. What's wrong with that?
 
Concerning inheriting from HashMap to override the default value ... what's wrong with using withDefault/withDefaultValue?

An extra indirection. When I use AnyRefMap I do it to get the best performance. withDefault does not give me that.

 - Martin



Re: [scala-internals] Re: AnyRefMap Rex Kerr 30/01/14 14:08
In the case of AnyRefMap, this is irrelevant because you can supply a default mapping function in the constructor.

(I agree with the general point that surplus wrappers are bad for performance.)

  --Rex
 

Re: [scala-internals] AnyRefMap martin 30/01/14 14:09



On Thu, Jan 30, 2014 at 7:53 PM, Rex Kerr <ich...@gmail.com> wrote:
 I agree that it's nice to have perfect drop-in replacements for things, but it's much harder to fashion a replacement that is a drop-in replacement for inheritance as well as for usage (composition).  The internals of HashMap and AnyRefMap are different, and although I have mimicked them as much as I could while retaining good performance, I chose not to allow inheritance from AnyRefMap in part _because_ I was worried that people would treat it as too much of a drop-in replacement for an altered HashMap.  In particular, some speedups come from code duplication (so you need to override in multiple places in AnyRefMap to change certain behavior); in some cases the call chain has reversed (e.g. which method do you need to override in order to add custom behavior to lookups?).

Can you give outline which methods would not be safe to override? Do you expect that people would have a tendency to override anyone of these?
 

Overall, it seemed that there was no safe way to inherit broadly from AnyRefMap, and it was a lot easier to make the whole class final than figure out which methods were dangerous and which were safe.  (And don't forget that any code that assumes inheritance from HashTable is going to break also.)

With respect to default values, there are two problems with using default values that come from inheritance.  First, you can't propagate your defaults anywhere: you add one filter, and suddenly you lose your defaults. 

That's not a problem for typical use cases of mutable collections. You typically do not transform them, just use them in the standard CRUD way.

Secondly, they require you to have a non-final class in order to use them.  Even if we decided to make AnyRefMap non-final, someone else might want to inherit from it and make it final--and then they would lose the ability to set a default without specifically saying that was the desired behavior.

Well, I would say, that class then not be done in the Scala collection way. Overriding default is documented standard behavior. I do not see why we should break away now. And I particularly do not see why this should be for one, highly specialized class only. You just create an inconsistency without any gain that I can see. 

Cheers

 - Martin


I do agree, however, that consistency is a considerable virtue.

So I am happy to change AnyRefMap to non-final, and I can also alter how defaults work, but both changes were made for good reason.  I suppose for aiding switching from HashMap to AnyRefMap, I could make AnyRefMap non-final but deprecate overriding with a note explaining how to achieve HashMap functionality with AnyRefMap without inheriting.

"AnyRefMap's implementation makes inheritance risky.  If you wish to set a default value, use the defaultEntry constructor parameter."

  --Rex


On Thu, Jan 30, 2014 at 5:32 AM, martin odersky <martin...@epfl.ch> wrote:
I was just trying out AnyRefMap. It's really nice to have a drop-in replacement for HashMap that avoids the "##" overhead that slows down the original HashMap. But I was frustrated that it's not a drop-in replacement. In particular I had several places where I inherited HashMap from an anonymous class or an object. Sometimes I needed to override the default value, at other times I wanted to add new behavior to the map. This works with HashMap (or any other map implementation), but not with AnyRefMap, because AnyRefMap is final. 

I think it would be great if we could eliminate this incoherence before 2.11RC1. What do others think?

Cheers

 - Martin

--
You received this message because you are subscribed to the Google Groups "scala-internals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scala-i...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "scala-internals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scala-i...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: [scala-internals] Re: AnyRefMap martin 30/01/14 14:13



On Thu, Jan 30, 2014 at 11:08 PM, Rex Kerr <ich...@gmail.com> wrote:
On Thu, Jan 30, 2014 at 2:03 PM, martin odersky <martin...@epfl.ch> wrote:



On Thu, Jan 30, 2014 at 6:25 PM, Simon Ochsenreither <simon.och...@gmail.com> wrote:

 
Concerning inheriting from HashMap to override the default value ... what's wrong with using withDefault/withDefaultValue?

An extra indirection. When I use AnyRefMap I do it to get the best performance. withDefault does not give me that.

In the case of AnyRefMap, this is irrelevant because you can supply a default mapping function in the constructor.

Yes, but I am afraid that just makes everything worse. It introduces a new (and highly unintuitive!) way to achieve what is done with a different idiom elsewhere. That's precisely the antithesis of the uniformity of usage that we saw in Scala's collections so far.

Why unintuitive? Answer quickly: What is the first things that comes to mind as an argument for a collection constructor? I would wager it's definitely not the behavior in case of a key not found.

Cheers

 - Martin



(I agree with the general point that surplus wrappers are bad for performance.)

  --Rex
 

--
You received this message because you are subscribed to the Google Groups "scala-internals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scala-i...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: [scala-internals] Re: AnyRefMap Simon Ochsenreither 30/01/14 14:31

Why? Java's collections are non-final. What's wrong with that?

I'm not sure if this is a joke. Can you clarify?

Anyway, why should decisions made by people who built a different language 20 years ago have any impact on us?


An extra indirection. When I use AnyRefMap I do it to get the best performance. withDefault does not give me that.

Then it is probably the best to fix that indirection and move on.
Re: [scala-internals] Re: AnyRefMap Simon Ochsenreither 30/01/14 14:35

Why unintuitive? Answer quickly: What is the first things that comes to mind as an argument for a collection constructor?

I don't think this is a big deal at all. Remove that constructor and use AnyRefMap.withDefault instead.

If Rex agrees, I'll create a patch for that.
Re: [scala-internals] Re: AnyRefMap martin 30/01/14 14:35



On Thu, Jan 30, 2014 at 11:31 PM, Simon Ochsenreither <simon.och...@gmail.com> wrote:

Why? Java's collections are non-final. What's wrong with that?

I'm not sure if this is a joke. Can you clarify?

Java's collection design is pretty reasonable for its age and intended use case. It clearly was very successful in terms of usage. No need for 3 different collection libraries (analogous to Date) because the first one was irredeemably broken. I simply do not see a reason why the very common idiom of extending a mutable class should be no longer valid. The proof obligation is really on the people who want to remove the feature here.

Cheers

 - Martin
 
Anyway, why should decisions made by people who built a different language 20 years ago have any impact on us?


An extra indirection. When I use AnyRefMap I do it to get the best performance. withDefault does not give me that.

Then it is probably the best to fix that indirection and move on.

--
You received this message because you are subscribed to the Google Groups "scala-internals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scala-i...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: [scala-internals] Re: AnyRefMap martin 30/01/14 14:44
We can go back and forth on the use of inheriting from collection classes and how best to treat defaults. We clearly have different opinions here. My main argument is that the biggest virtue of the Scala collections library is its consistency. That consistency took a lot of energy and determination to achieve. That's why I am so allergic to the proposition of throwing consistency out of the window for the sake of a single use case, in particular because I still have not heard a convincing argument why it needs to be so.

Cheers

 - Martin





--
You received this message because you are subscribed to the Google Groups "scala-internals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scala-i...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.



Re: [scala-internals] Re: AnyRefMap Adriaan Moors 30/01/14 14:50
Making the leaf collection classes final will only make them more consistent,
because we'll be fully in control of them, rather than having to anticipate all possible subclasses.
Re: [scala-internals] AnyRefMap Rex Kerr 30/01/14 14:53
On Thu, Jan 30, 2014 at 2:09 PM, martin odersky <martin...@epfl.ch> wrote:



On Thu, Jan 30, 2014 at 7:53 PM, Rex Kerr <ich...@gmail.com> wrote:
 I agree that it's nice to have perfect drop-in replacements for things, but it's much harder to fashion a replacement that is a drop-in replacement for inheritance as well as for usage (composition).  The internals of HashMap and AnyRefMap are different, and although I have mimicked them as much as I could while retaining good performance, I chose not to allow inheritance from AnyRefMap in part _because_ I was worried that people would treat it as too much of a drop-in replacement for an altered HashMap.  In particular, some speedups come from code duplication (so you need to override in multiple places in AnyRefMap to change certain behavior); in some cases the call chain has reversed (e.g. which method do you need to override in order to add custom behavior to lookups?).

Can you give outline which methods would not be safe to override? Do you expect that people would have a tendency to override anyone of these?

Any method where the call-chain isn't obvious is unsafe to override.  For example, HashMap has -= overridden 7 times and += 8 times in the library scan I did.  Update was overridden 0 times.  put was overridden 0 times.  Now, does += call update, or does update call +=, or neither call the other, or what?  In AnyRefMap, += calls update, but put has its own implementation.  In HashMap, update calls put, but += has its own implementation.  How is anyone supposed to keep this straight?

So in the wild we have 8 implementations of maps that are probably inconsistent because people didn't override all the necessary methods (and how could they have known?).

In my code, it's not safe to override much of anything that I implemented.  There's a fair bit of code duplication because removing the redundant logic manually provides better results than hoping the JIT compiler can do it.

In the Java HashMap code, it's safe to override things because there is only one way to do one thing in almost every case.
 
 

Overall, it seemed that there was no safe way to inherit broadly from AnyRefMap, and it was a lot easier to make the whole class final than figure out which methods were dangerous and which were safe.  (And don't forget that any code that assumes inheritance from HashTable is going to break also.)

With respect to default values, there are two problems with using default values that come from inheritance.  First, you can't propagate your defaults anywhere: you add one filter, and suddenly you lose your defaults. 

That's not a problem for typical use cases of mutable collections. You typically do not transform them, just use them in the standard CRUD way.

Maybe I'm atypical, but I do transform mutable collections.  Isn't allowing that the reason why they're in the same hierarchy as immutable collections?

I know (from bitter experience) that trying to extend anything you're going to transform is a losing battle, but sometimes that means I give up on the extension battle so I can transform.  (Sometimes I give up transformations so I can extend.)  If you use composition instead of inheritance for modifications, though, your extensions can survive reasonable transformations.
 

Secondly, they require you to have a non-final class in order to use them.  Even if we decided to make AnyRefMap non-final, someone else might want to inherit from it and make it final--and then they would lose the ability to set a default without specifically saying that was the desired behavior.

Well, I would say, that class then not be done in the Scala collection way. Overriding default is documented standard behavior. I do not see why we should break away now. And I particularly do not see why this should be for one, highly specialized class only. You just create an inconsistency without any gain that I can see. 

There is a gain, but it may well not be enough gain to warrant the inconsistency.  I grant that maintaining consistency is very important when possible.  For the record, I'm perfectly happy to go back to overriding and to mark every dangerous method as final (which is most of them).

  --Rex
 
Re: [scala-internals] Re: AnyRefMap Simon Ochsenreither 30/01/14 14:56

My main argument is that the biggest virtue of the Scala collections library is its consistency.

That's my point too. It would be terribly inconsistent if we made one collection non-final, while all the others are going to be final in the near future.
Re: [scala-internals] Re: AnyRefMap rklaehn 30/01/14 14:57
There are a lot of methods in scala collections, and it is easy to introduce inconsistencies by overriding some methods and not others. That is why advice to java programmers these days is "Design and document for inheritance or else prohibit it". (Joshua Bloch, Effective Java). This is basically gospel in java land these days. One of the few things that C# got right was to not make virtual the default for methods.

But how would you design and document for inheritance with scala collections? You could make every method final that is difficult to override because it has to be consistent with other methods. That would be basically all of them except for default in case of AnyRefMap. And then you have to make sure to also do final overrides in case new methods are introduced in one of the traits you inherit from.
Re: [scala-internals] Re: AnyRefMap martin 30/01/14 15:03
Here is more breakage:

new AnyRefMap[String, String] with SynchronizedMap[String, String]
<console>:11: error: illegal inheritance from final class AnyRefMap
              new AnyRefMap[String, String] with SynchronizedMap[String, String]

So, we can't have a synchronized AnyRefMap?

Regarding Adriaan's argument of consistency, I checked all subclasses of mutable.Map. All but one of them are non-final. The one which is final is TrieMap, another late addition to the collections, and if I believe its finality is a big mistake for the same reason I am speaking up now. If I had remarked this when it was introduced I would have raised my objections as well. 

I know the arguments of composition over inheritance ad nauseam. But consistency trumps everything.

 - Martin
Prof. Martin Odersky
LAMP/IC, EPFL
Station 14, 1015 Lausanne, Switzerland
Tel: +41 21 693 6863

Re: [scala-internals] Re: AnyRefMap Eiríkr Åsheim 30/01/14 15:03
I don't have a lot to add here, but I would like to echo Rex that letting users subclass and AnyRefMap is likely to introduce bugs and will probably also compromise performance by making monomorphic method calls bimorphic or megamorphic.

I am skeptical that inheriting the class and avoiding all the possible bugs will produce a net performance win.
Re: [scala-internals] Re: AnyRefMap rklaehn 30/01/14 15:11
SynchronizedMap is a terrible idea. It creates the illusion of thread-safety while leaving the user open to difficult to find bugs. So I would count not being able to use SynchronizedMap as an advantage of making AnyRefMap final.
There are dozens of methods in SynchronizedMap that are not overridden. So if you use one of them you will not be synchronized and will see the collection in an inconsistent state.
  // !!! todo: also add all other methods
Re: [scala-internals] Re: AnyRefMap √iktor Klang 30/01/14 15:11
In my opinion, SynchronizedX should be deprecated, it's the wrong solution to the wrong problem.
Cheers,

———————
Viktor Klang
Chief Architect - Typesafe

Twitter: @viktorklang
Re: [scala-internals] AnyRefMap martin 30/01/14 15:13



On Thu, Jan 30, 2014 at 11:53 PM, Rex Kerr <ich...@gmail.com> wrote:
On Thu, Jan 30, 2014 at 2:09 PM, martin odersky <martin...@epfl.ch> wrote:



On Thu, Jan 30, 2014 at 7:53 PM, Rex Kerr <ich...@gmail.com> wrote:
 I agree that it's nice to have perfect drop-in replacements for things, but it's much harder to fashion a replacement that is a drop-in replacement for inheritance as well as for usage (composition).  The internals of HashMap and AnyRefMap are different, and although I have mimicked them as much as I could while retaining good performance, I chose not to allow inheritance from AnyRefMap in part _because_ I was worried that people would treat it as too much of a drop-in replacement for an altered HashMap.  In particular, some speedups come from code duplication (so you need to override in multiple places in AnyRefMap to change certain behavior); in some cases the call chain has reversed (e.g. which method do you need to override in order to add custom behavior to lookups?).

Can you give outline which methods would not be safe to override? Do you expect that people would have a tendency to override anyone of these?

Any method where the call-chain isn't obvious is unsafe to override.  For example, HashMap has -= overridden 7 times and += 8 times in the library scan I did.  Update was overridden 0 times.  put was overridden 0 times.  Now, does += call update, or does update call +=, or neither call the other, or what?  In AnyRefMap, += calls update, but put has its own implementation.  In HashMap, update calls put, but += has its own implementation.  How is anyone supposed to keep this straight?
 
So in the wild we have 8 implementations of maps that are probably inconsistent because people didn't override all the necessary methods (and how could they have known?).

In my code, it's not safe to override much of anything that I implemented.  There's a fair bit of code duplication because removing the redundant logic manually provides better results than hoping the JIT compiler can do it.

These are potential problems every Map implementation has. How many observed problems do we have in the JIRA database? 
 
In the Java HashMap code, it's safe to override things because there is only one way to do one thing in almost every case.

I don't believe that's 100% true. Does remove call get or put in the Java implementations? 

I do not want to argue that an inheritance based design is inevitable. Maybe a type-class based design or something else is better. We'd have to have one to experiment with to be able to tell for sure. What I do very strongly believe is that a design cannot and should not be gradually undermined and riddled with inconsistencies because we do not agree with its basic premises. Throw it out completely and replace it with a new one.  Or keep to it. That should be our only two options.

Cheers

 - Martin


 
 

Overall, it seemed that there was no safe way to inherit broadly from AnyRefMap, and it was a lot easier to make the whole class final than figure out which methods were dangerous and which were safe.  (And don't forget that any code that assumes inheritance from HashTable is going to break also.)

With respect to default values, there are two problems with using default values that come from inheritance.  First, you can't propagate your defaults anywhere: you add one filter, and suddenly you lose your defaults. 

That's not a problem for typical use cases of mutable collections. You typically do not transform them, just use them in the standard CRUD way.

Maybe I'm atypical, but I do transform mutable collections.  Isn't allowing that the reason why they're in the same hierarchy as immutable collections?

I know (from bitter experience) that trying to extend anything you're going to transform is a losing battle, but sometimes that means I give up on the extension battle so I can transform.  (Sometimes I give up transformations so I can extend.)  If you use composition instead of inheritance for modifications, though, your extensions can survive reasonable transformations.
 

Secondly, they require you to have a non-final class in order to use them.  Even if we decided to make AnyRefMap non-final, someone else might want to inherit from it and make it final--and then they would lose the ability to set a default without specifically saying that was the desired behavior.

Well, I would say, that class then not be done in the Scala collection way. Overriding default is documented standard behavior. I do not see why we should break away now. And I particularly do not see why this should be for one, highly specialized class only. You just create an inconsistency without any gain that I can see. 

There is a gain, but it may well not be enough gain to warrant the inconsistency.  I grant that maintaining consistency is very important when possible.  For the record, I'm perfectly happy to go back to overriding and to mark every dangerous method as final (which is most of them).

  --Rex
 

--
You received this message because you are subscribed to the Google Groups "scala-internals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scala-i...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: [scala-internals] Re: AnyRefMap Simon Ochsenreither 30/01/14 15:14

new AnyRefMap[String, String] with SynchronizedMap[String, String]
<console>:11: error: illegal inheritance from final class AnyRefMap
              new AnyRefMap[String, String] with SynchronizedMap[String, String]

So, we can't have a synchronized AnyRefMap?

If a deprecated, bit-rotted trait is the best use-case for keeping classes non-final, I'm delighted about the current approach of making everything final.
Re: [scala-internals] AnyRefMap Erik Osheim 30/01/14 15:16
On Fri, Jan 31, 2014 at 12:13:38AM +0100, martin odersky wrote:
> Throw it out completely and replace it with a new one.
> Or keep to it. That should be our only two options.

It makes sense that we should be giving users a consistent message
about this stuff. So my vote goes to throwing it out (e.g. deprecating
it). Show of hands?

-- Erik
Re: [scala-internals] Re: AnyRefMap Rex Kerr 30/01/14 15:16
SynchronizedMap is deprecated because adding synchronization via a trait is a hopeless endeavor.  Invariably, something like this will happen without anyone noticing:

  class MyMap extends YourMap {
    def putOrRemove(key: K, value: V): Option[V] = {
      // Non-atomic operation
    }
  }

  val threadsafeMap = new MyMap with SynchronizedMap  // Woot!  Isn't thread-safety easy!
  ...
  threadsafeMap.putOrRemove(foo, bar)   // BOOM!  Wha???

Unfortunately, synchronization is difficult.  If we want to have robust collections, we have to pay attention to the entire context, which you lose when you allow inheritance.  The advantage of traits is that you can avoid paying attention to implementation details.  Unfortunately, when you need to pay attention to implementation details, they're a liability.

I wish there was a good way around this.  I can't think of one.  If anyone can, I'm happy to implement it.

  --Rex


Re: [scala-internals] Re: AnyRefMap Simon Ochsenreither 30/01/14 15:17

In my opinion, SynchronizedX should be deprecated, it's the wrong solution to the wrong problem.

It is already deprecated.
Re: [scala-internals] Re: AnyRefMap martin 30/01/14 15:28
I was not aware that SynchronizedMap is deprecated in 2.11. I am not sure again we need to do this. This means that any code that mixes in a SynchronizedMap in a custom map is invalidated. Pretty tough, because I do not see a possible replacement for such code. Here's an example, from the compiler itself:

val unitOfFile = new LinkedHashMap[AbstractFile, RichCompilationUnit] with
                       SynchronizedMap[AbstractFile, RichCompilationUnit]

We _do_ need a LinkedHashMap there, but unfortunately ConcurrentHashMap is not ordered. What to do? 

Can we not fix SynchronizedMap instead? Sure, it's painful to keep track of all the methods, but with a bit of inventiveness these things can be automated.

Cheers

 - Martin





On Fri, Jan 31, 2014 at 12:14 AM, Simon Ochsenreither <simon.och...@gmail.com> wrote:

new AnyRefMap[String, String] with SynchronizedMap[String, String]
<console>:11: error: illegal inheritance from final class AnyRefMap
              new AnyRefMap[String, String] with SynchronizedMap[String, String]

So, we can't have a synchronized AnyRefMap?

If a deprecated, bit-rotted trait is the best use-case for keeping classes non-final, I'm delighted about the current approach of making everything final.

--
You received this message because you are subscribed to the Google Groups "scala-internals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scala-i...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: [scala-internals] AnyRefMap rklaehn 30/01/14 15:28
On Fri, Jan 31, 2014 at 12:13 AM, martin odersky <martin...@epfl.ch> wrote:



On Thu, Jan 30, 2014 at 11:53 PM, Rex Kerr <ich...@gmail.com> wrote:
On Thu, Jan 30, 2014 at 2:09 PM, martin odersky <martin...@epfl.ch> wrote:



On Thu, Jan 30, 2014 at 7:53 PM, Rex Kerr <ich...@gmail.com> wrote:
 I agree that it's nice to have perfect drop-in replacements for things, but it's much harder to fashion a replacement that is a drop-in replacement for inheritance as well as for usage (composition).  The internals of HashMap and AnyRefMap are different, and although I have mimicked them as much as I could while retaining good performance, I chose not to allow inheritance from AnyRefMap in part _because_ I was worried that people would treat it as too much of a drop-in replacement for an altered HashMap.  In particular, some speedups come from code duplication (so you need to override in multiple places in AnyRefMap to change certain behavior); in some cases the call chain has reversed (e.g. which method do you need to override in order to add custom behavior to lookups?).

Can you give outline which methods would not be safe to override? Do you expect that people would have a tendency to override anyone of these?

Any method where the call-chain isn't obvious is unsafe to override.  For example, HashMap has -= overridden 7 times and += 8 times in the library scan I did.  Update was overridden 0 times.  put was overridden 0 times.  Now, does += call update, or does update call +=, or neither call the other, or what?  In AnyRefMap, += calls update, but put has its own implementation.  In HashMap, update calls put, but += has its own implementation.  How is anyone supposed to keep this straight?
 
So in the wild we have 8 implementations of maps that are probably inconsistent because people didn't override all the necessary methods (and how could they have known?).

In my code, it's not safe to override much of anything that I implemented.  There's a fair bit of code duplication because removing the redundant logic manually provides better results than hoping the JIT compiler can do it.

These are potential problems every Map implementation has. How many observed problems do we have in the JIRA database? 
 
If a user overrides e.g. AnyRefMap in a wrong way and ends up with a Map with inconsistent behavior, it won't show up in the scala JIRA database. It will just be a puzzled user that can't figure out why his code doesn't work.
 
In the Java HashMap code, it's safe to override things because there is only one way to do one thing in almost every case.

I don't believe that's 100% true. Does remove call get or put in the Java implementations? 

I do not want to argue that an inheritance based design is inevitable. Maybe a type-class based design or something else is better. We'd have to have one to experiment with to be able to tell for sure. What I do very strongly believe is that a design cannot and should not be gradually undermined and riddled with inconsistencies because we do not agree with its basic premises. Throw it out completely and replace it with a new one.  Or keep to it. That should be our only two options.

I think using inheritance within the scala collections is fine. People contributing to scala know the pitfalls. But when using the scala collections from the outside people should not inherit from "leaf" collections. Inheriting from traits like Map/MapLike that are designed for inheritance is fine with some guidance in the docs.
 
Cheers

 - Martin


 
 

Overall, it seemed that there was no safe way to inherit broadly from AnyRefMap, and it was a lot easier to make the whole class final than figure out which methods were dangerous and which were safe.  (And don't forget that any code that assumes inheritance from HashTable is going to break also.)

With respect to default values, there are two problems with using default values that come from inheritance.  First, you can't propagate your defaults anywhere: you add one filter, and suddenly you lose your defaults. 

That's not a problem for typical use cases of mutable collections. You typically do not transform them, just use them in the standard CRUD way.

Maybe I'm atypical, but I do transform mutable collections.  Isn't allowing that the reason why they're in the same hierarchy as immutable collections?

I know (from bitter experience) that trying to extend anything you're going to transform is a losing battle, but sometimes that means I give up on the extension battle so I can transform.  (Sometimes I give up transformations so I can extend.)  If you use composition instead of inheritance for modifications, though, your extensions can survive reasonable transformations.
 

Secondly, they require you to have a non-final class in order to use them.  Even if we decided to make AnyRefMap non-final, someone else might want to inherit from it and make it final--and then they would lose the ability to set a default without specifically saying that was the desired behavior.

Well, I would say, that class then not be done in the Scala collection way. Overriding default is documented standard behavior. I do not see why we should break away now. And I particularly do not see why this should be for one, highly specialized class only. You just create an inconsistency without any gain that I can see. 

There is a gain, but it may well not be enough gain to warrant the inconsistency.  I grant that maintaining consistency is very important when possible.  For the record, I'm perfectly happy to go back to overriding and to mark every dangerous method as final (which is most of them).

  --Rex
 

--
You received this message because you are subscribed to the Google Groups "scala-internals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scala-i...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


--
You received this message because you are subscribed to the Google Groups "scala-internals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scala-i...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: [scala-internals] AnyRefMap Adriaan Moors 30/01/14 15:29

On Thu, Jan 30, 2014 at 3:13 PM, martin odersky <martin...@epfl.ch> wrote:
riddled with inconsistencies

Maybe we're talking about a different kind of consistency, but I don't see how making all leaf classes final is inconsistent.
It also fosters another kind of consistency: the internal behavior of the methods defined in these classes.

If you want to make your own collection, implement one of the interface traits, possibly with an underlying existing collection.
Re: [scala-internals] AnyRefMap Rex Kerr 30/01/14 15:32



On Thu, Jan 30, 2014 at 3:13 PM, martin odersky <martin...@epfl.ch> wrote:


On Thu, Jan 30, 2014 at 11:53 PM, Rex Kerr <ich...@gmail.com> wrote:
On Thu, Jan 30, 2014 at 2:09 PM, martin odersky <martin...@epfl.ch> wrote:



On Thu, Jan 30, 2014 at 7:53 PM, Rex Kerr <ich...@gmail.com> wrote:
 I agree that it's nice to have perfect drop-in replacements for things, but it's much harder to fashion a replacement that is a drop-in replacement for inheritance as well as for usage (composition).  The internals of HashMap and AnyRefMap are different, and although I have mimicked them as much as I could while retaining good performance, I chose not to allow inheritance from AnyRefMap in part _because_ I was worried that people would treat it as too much of a drop-in replacement for an altered HashMap.  In particular, some speedups come from code duplication (so you need to override in multiple places in AnyRefMap to change certain behavior); in some cases the call chain has reversed (e.g. which method do you need to override in order to add custom behavior to lookups?).

Can you give outline which methods would not be safe to override? Do you expect that people would have a tendency to override anyone of these?

Any method where the call-chain isn't obvious is unsafe to override.  For example, HashMap has -= overridden 7 times and += 8 times in the library scan I did.  Update was overridden 0 times.  put was overridden 0 times.  Now, does += call update, or does update call +=, or neither call the other, or what?  In AnyRefMap, += calls update, but put has its own implementation.  In HashMap, update calls put, but += has its own implementation.  How is anyone supposed to keep this straight?
 
So in the wild we have 8 implementations of maps that are probably inconsistent because people didn't override all the necessary methods (and how could they have known?).

In my code, it's not safe to override much of anything that I implemented.  There's a fair bit of code duplication because removing the redundant logic manually provides better results than hoping the JIT compiler can do it.

These are potential problems every Map implementation has. How many observed problems do we have in the JIRA database?

This isn't the right metric.  The question to ask is: how many observed problems _are there in the bug databases of people who extended maps_?  I don't know.  As I said above, 8 out of 8 implementations that changed += did not also change put.  (They did mostly change -=.)  So this argues that, first, people do like to override collections; but, second, they don't have adequate tools to do it correctly.  Maybe _they_ know never to use "put" and always use "+=", but if I then go use that library, how am I to know?

It's just a deep source of problematic bugs waiting to happen.

I'm willing to allow some level of bugs in exchange for nice capabilities, or in exchange for nice features.  But it's a tradeoff, and some implementations--and AnyRefMap is one of them--have too many risks for it to be worth it.  I am happy to wall off all the danger areas that I have created, though.

If we had some way to express to the compiler what kind of rules should apply to overriding, then I wouldn't be worried.  But we don't presently.
 
 
In the Java HashMap code, it's safe to override things because there is only one way to do one thing in almost every case.

I don't believe that's 100% true. Does remove call get or put in the Java implementations?

No, but it's strange indeed to rely upon remove to do your lookups (since you then have to put them back).  This is what I mean by "almost every case".  There is an obvious use for remove: remove elements.  And an obvious use for put: add elements.  And an obvious use for get: look up elements.  If you use the non-obvious features, you may end up in trouble.

In Scala's case, it's the obvious use of the obvious features which are in danger of getting out of sync.  Update or += or put--three ways to do the same thing--had better agree.
 

I do not want to argue that an inheritance based design is inevitable. Maybe a type-class based design or something else is better. We'd have to have one to experiment with to be able to tell for sure. What I do very strongly believe is that a design cannot and should not be gradually undermined and riddled with inconsistencies because we do not agree with its basic premises. Throw it out completely and replace it with a new one.  Or keep to it. That should be our only two options.

I agree with those as the options, and I also think that nothing has proven itself better than inheritance for forming the core hierarchy for code reuse.  So I wouldn't suggest throwing it out; I haven't the confidence that a replacement would be superior.

But I also think that finality is there for a reason--it's to protect you when implementations really aren't safe to muck with.  And very often, by the time you get to the leaves of the collections hierarchy, there is a lot that requires deep knowledge to muddle with safely, and thus probably shouldn't be muddled with at all.

Anyway, maybe the resolution should be that I un-finalize AnyRefMap but make a lot of the methods final.  SynchronizedMap-type stuff won't work, but overriding default will.  (Nobody relies upon filter preserving defaults, so losing that might actually be an advantage consistency-wise.)

  --Rex
 
Re: [scala-internals] AnyRefMap Jason Zaugg 30/01/14 15:37
It would be better a wrapper like in java. Then it isn't susceptible to linearisation gotchas.  I looked at those a while back and found the impl to be a stark reminder of how much our mixins leave unprotected, for example toString, equals, serialisation. 

But in your example I would tend to write a wrapper manually that encapsulates the mutable unsynchronised map and offers a smaller API. It could contain a method to return an immutable snapshot if clients need rich query operations 

Jason 


To unsubscribe from this group and stop receiving emails from it, send an email to scala-internals+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.


--
You received this message because you are subscribed to the Google Groups "scala-internals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scala-internals+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.
Re: [scala-internals] Re: AnyRefMap Adriaan Moors 30/01/14 15:37

On Thu, Jan 30, 2014 at 3:28 PM, martin odersky <martin...@epfl.ch> wrote:
I was not aware that SynchronizedMap is deprecated in 2.11.

It was based on this discussion: https://groups.google.com/d/msg/scala-internals/1gCbqTRCxAQ/7fJS6YsN50wJ
Assuming we need a good synchronized collection, while also agreeing "synchronization via mixin override trait is unnecessary",
the logical conclusion (to me) is deprecating the approach we don't like so that we can replace it by a better one.

We are making users go WTF all over the place with things that are broken like this (see also: delayedinit).
I say we should at the very least warn them, even if we don't have a better alternative. (The Java concurrent collections come to mind as an alternative.)

Re: [scala-internals] AnyRefMap martin 30/01/14 15:39



On Fri, Jan 31, 2014 at 12:29 AM, Adriaan Moors <adr...@typesafe.com> wrote:

On Thu, Jan 30, 2014 at 3:13 PM, martin odersky <martin...@epfl.ch> wrote:
riddled with inconsistencies

Maybe we're talking about a different kind of consistency, but I don't see how making all leaf classes final is inconsistent.
It also fosters another kind of consistency: the internal behavior of the methods defined in these classes.

But we are not making all the leaf collections final. That would be a different collection library. We should have a discussion whether/when we can do  do that. My guess is that we should not attempt any of this in the 2.x timeframe because it would just undermine the design the underlies the current collections. The current collections are _designed_ so that you can override the default method (why else have it?), so that you can mix in traits like Synchronized (OK, maybe Synchronized is a bad idea but we have to come up with a valid (composition-based?) replacement) and so on. We can discuss about a big jump that undoes all this design and suffers the corresponding breakage in user code. But doing this gradually, class after class strikes me as death by a 1000 needles. 

Cheers

 - Martin



 
If you want to make your own collection, implement one of the interface traits, possibly with an underlying existing collection.

--
You received this message because you are subscribed to the Google Groups "scala-internals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scala-i...@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.



--
Prof. Martin Odersky
LAMP/IC, EPFL
Station 14, 1015 Lausanne, Switzerland
Tel: +41 21 693 6863

Re: [scala-internals] Re: AnyRefMap Rex Kerr 30/01/14 15:41
As a replacement, I would suggest something like

class Synchronized[A <: AnyRef](val repr: A) extends AnyVal {
  def apply[B](f: A => B): B = repr.synchronized{ f(repr) }
}

You do have to replace all uses of x.stuff with x{_.stuff}, but otherwise it's pretty painless and wraps everything without possibility for error.  Well, without any more possibility for error than you normally have when synchronizing.

 --Rex
Re: [scala-internals] AnyRefMap martin 30/01/14 15:43



On Fri, Jan 31, 2014 at 12:37 AM, Jason Zaugg <jza...@gmail.com> wrote:
It would be better a wrapper like in java. Then it isn't susceptible to linearisation gotchas.  I looked at those a while back and found the impl to be a stark reminder of how much our mixins leave unprotected, for example toString, equals, serialisation. 

Yes, that's a good point.  

But in your example I would tend to write a wrapper manually that encapsulates the mutable unsynchronised map and offers a smaller API. It could contain a method to return an immutable snapshot if clients need rich query operations 

I am not sure we can demand from users to write these wrappers that were so far created for them in the library. How about we throw together a SynchronizedWrapper for Map, then we can safely deprecate the old class?

Cheers

 - Martin

To unsubscribe from this group and stop receiving emails from it, send an email to scala-i...@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.



Re: [scala-internals] AnyRefMap Jason Zaugg 30/01/14 15:45
This misses the use case of passing the collection to existing code.

Jason


On Friday, January 31, 2014, Rex Kerr <ich...@gmail.com> wrote:
As a replacement, I would suggest something like

class Synchronized[A <: AnyRef](val repr: A) extends AnyVal {
  def apply[B](f: A => B): B = repr.synchronized{ f(repr) }
}

You do have to replace all uses of x.stuff with x{_.stuff}, but otherwise it's pretty painless and wraps everything without possibility for error.  Well, without any more possibility for error than you normally have when synchronizing.
Re: [scala-internals] Re: AnyRefMap martin 30/01/14 15:45



On Fri, Jan 31, 2014 at 12:37 AM, Adriaan Moors <adr...@typesafe.com> wrote:

On Thu, Jan 30, 2014 at 3:28 PM, martin odersky <martin...@epfl.ch> wrote:
I was not aware that SynchronizedMap is deprecated in 2.11.

It was based on this discussion: https://groups.google.com/d/msg/scala-internals/1gCbqTRCxAQ/7fJS6YsN50wJ
Assuming we need a good synchronized collection, while also agreeing "synchronization via mixin override trait is unnecessary",
the logical conclusion (to me) is deprecating the approach we don't like so that we can replace it by a better one.

Agreed. I am just arguing we should do both things at the same time, that would avoid some unnecessary migration grief.  

Thanks all for the animated discussion! I have to drop out now and catch some sleep.

 - Martin

We are making users go WTF all over the place with things that are broken like this (see also: delayedinit).
I say we should at the very least warn them, even if we don't have a better alternative. (The Java concurrent collections come to mind as an alternative.)

 

--
You received this message because you are subscribed to the Google Groups "scala-internals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scala-i...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.



--
Prof. Martin Odersky
LAMP/IC, EPFL
Station 14, 1015 Lausanne, Switzerland
Tel: +41 21 693 6863

Re: [scala-internals] AnyRefMap Rex Kerr 30/01/14 15:50
Just to be clear: the immutable collection leaf classes are already impossible to sensibly override.  We don't do it, anyone who tries is liable to break everything, and in practice nobody does (from a survey of about a half gig of bytecode from various projects selected...well...I'm not exactly sure how Adriaan picked them).  Overriding most of them them is deprecated now, if they weren't already final.  I doubt anyone will notice.

The mutable classes can be overridden, and people do, but there is constant tension with the builder capabilities, which throw away your overriding.

This is the great problem with the builder approach.  You can't just say

  class MyNewColl extends SomeScalaColl with AllTheGoodies[MyNewColl] { ... }

and be okay.  You certainly can't do it anonymously and be okay.

If we had enough syntactic sugar to generate the entire framework for a new collections hierarchy member in a single line, then the argument that everything at least in the mutable hierarchy should be overridable would be much stronger.  As it is, I think we should take an intermediate approach: make final or deprecate overriding from those things which are really hard to get right, while maintaining the flexibility to override mutable collections whose implementations make this less like walking in a minefield.

  --Rex

Re: [scala-internals] AnyRefMap Rex Kerr 30/01/14 15:52
If it is important to pass the collection to existing code, agreed.  It's not a complete solution.

From what I saw in the compiler, at least, that wasn't a dominant use case.  But I didn't look around very much.

  --Rex


--
You received this message because you are subscribed to the Google Groups "scala-internals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scala-i...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: [scala-internals] Re: AnyRefMap √iktor Klang 30/01/14 16:25
Great to see that it happened.


On Fri, Jan 31, 2014 at 12:17 AM, Simon Ochsenreither <simon.och...@gmail.com> wrote:

In my opinion, SynchronizedX should be deprecated, it's the wrong solution to the wrong problem.

It is already deprecated.

--
You received this message because you are subscribed to the Google Groups "scala-internals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scala-i...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.



--
Cheers,

———————
Viktor Klang
Chief Architect - Typesafe

Twitter: @viktorklang
Re: [scala-internals] Re: AnyRefMap Matthew Pocock 30/01/14 16:40


> You do have to replace all uses of x.stuff with x{_.stuff},

This is a transform I hit often enough that I wonder if there is potential for sugar to rewrite x.foo to x.apply (foo) when foo is not a member and foo is in scope and it does not clash with an implicit and there is an apply with a single argument with a type matching the type of foo. Perhaps this is a terrible idea.

Matthew

Re: [scala-internals] Re: AnyRefMap Seth Tisue 30/01/14 19:34
Martin writes:
>These are potential problems every Map
>implementation has. How many observed
>problems do we have in the JIRA database?

I believe this puts too much faith in the contents of JIRA as a reflection of what everyday Scala programmers are doing.

The people who take the trouble to isolate issues and turn them into new JIRA tickets are generally conscientious, already wise in the ways of Scala, and a little world-weary.

These personality traits are highly negatively correlated with a tendency to even attempt to inherit from a Scala collection class!

They are positively correlated with having figured out long ago that doing that is about as much fun as stuffing a hornet's nest in your pants.

Thus: no tickets.

But absence of evidence is not evidence of absence.

Seth

Re: [scala-internals] Re: AnyRefMap Haoyi Li 31/01/14 00:46
> No need for 3 different collection libraries 

I wouldn't put too much faith in the quality of the Java collections library. Apart from the Std Lib, there's org.apache.commons.collections and the Google Guava collections, both of which are pretty widespread and definitely not niche libraries. 

> Answer quickly: What is the first things that comes to mind as an argument for a collection constructor? I would wager it's definitely not the behavior in case of a key not found.

Coming from Python-land, which is where I spend my salaried hours every day, the idiomatic way of creating a dictionary with a default is defaultdict(default_func), which you pass in a thunk which is used to generate defaults. It's pretty straightforward; in Scala maybe you'd have Map.withDefault(defaultFunc) instead, and maybe defaultFunc could take the key as an argument. In fact this is exactly what Ruby dictionaries do.

This "inherit and override" strategy to pass in a simple function is basically a Java-ism, because without easy HoFs, that is the only way to "pass" a function to the constructor, and is in a similar vein to Java's inheritance-based collection-literals. Passing a function to the constructor/factory is standard fare in Ruby/Python and isn't unintuitive at all: I want a collection with a default, what more obvious thing do I need to provide than the way of calculating that default?
Re: [scala-internals] Re: AnyRefMap Raul Bache 31/01/14 00:51
http://docs.scala-lang.org/overviews/macros/quasiquotes.html, the examples wraps the methods in future blocks. 
Couldnt we have a SynchronizedX-macro that overrides and wraps all methods of the class in synchronization blocks?
Re: [scala-internals] Re: AnyRefMap Hanns Holger Rutz 31/01/14 00:59
On 30/01/2014 23:50, Adriaan Moors wrote:
> Making the leaf collection classes final will only make them more
> consistent,
> because we'll be fully in control of them, rather than having to
> anticipate all possible subclasses.

I agree with you. I also wonder... does it make sense at all to override
a leaf collection class if it's immutable? As soon as you modify
anything you would need also to make sure that you override `newBuilder`
and all that stuff.

I think the problem is rather that the mental effort to create your own
custom `Map` class is high. I once did that for classes `osc.Bundle` and
`osc.Message`
(https://github.com/Sciss/ScalaOSC/blob/master/src/main/scala/de/sciss/osc/Packet.scala#L567),
mixing in with `LinearSeq` and `LinearSeqLike`. Why both by the way? Not
very intuitive.

I had to adjust these classes between Scala 2.7 and 2.8 and 2.9, but I
think it stabilised now (I don't remember having to change anything for
Scala 2.10).

So my thought is, do make the leaf classes final, use `withDefault` if
the few microseconds of performance don't count, otherwise build your
own with `MapLike`. For that last case, probably it would be good to
have an "official" tutorial on how to achieve that.

Here is an example for a `mutable.MapLike`:
https://stackoverflow.com/questions/4224242/implement-a-scala-collection-so-that-map-filter-etc-produce-the-right-type#4225324
– I think that's the way to go if you really can't live with
`withDefault`...

2c, .h.h.
Re: [scala-internals] AnyRefMap Hanns Holger Rutz 31/01/14 01:05
+1

That was the whole point of things like `MapLike`—be able to put new
collections together by mixing in these traits.
Re: [scala-internals] AnyRefMap martin 31/01/14 02:48
Thanks for the feedback! There are too many mails to respond to them all, so let me try to summarize and explain my position.

1. Inheritance is a fundamental building block for Scala software  but it has to be used carefully (and I readily admit that the current collection could do with less, and would in fact be keen to see alternative implementations).

A good guideline is: "Every method should be abstract or final, or should be a documented extension point". In my mind that's a better guideline than "all leaf classes should be final". The latter has it backwards: Final classes are by necessity leaf-classes, but who is otherwise able to define what a leaf class is?

2. The default method is clearly documented as a "hook", i.e. a documented extension point. And in fact I believe there's a good reason for that. Mainly, performance is the concern here. If one overrides default, the chances are that this is done so that instead of `get` one can call `apply` directly. So `default` is likely called on the most common access path for the collection. We simply do not want another closure object and an indirection for something like that.

The alternatives from Python do not apply, because Python has a completely different performance profile, and was not OO from the start.

3. Some people have said that MapLike and friends could replace the concrete Map classes as classes that are inherited. I am afraid that will not work. The ...Like classes are there to add utility methods such as map, filter, take to a base collection. ...Like classes do not contain a collection implementation themselves, they play essentially the role of containers of extension methods. So saying clients should extend MapLike instead of HashMap amounts to saying that they should write their own HashMap implementation from scratch!

For all of these reasons I would say, let's try to work towards guideline (1), but do so in a way which is consistent over the whole collections. I do not see right now a better alternative for the `default` method. It does what it needs to do with minimum fuzz and performance overhead. So, I would suggest we make AnyRefMap non-final, but at the same time make as many of its methods final as we can. That will involve methods in its base classes, first, because a lot of the critical methods are probably inherited by AnyRefMap and second, because an isolated "improvement" for AnyRefMap would be a pessimization overall for breaking consistency.

What do people think about this?

 - Martin





On Fri, Jan 31, 2014 at 10:05 AM, Hanns Holger Rutz <con...@sciss.de> wrote:
+1

That was the whole point of things like `MapLike`--be able to put new

collections together by mixing in these traits.

On 31/01/2014 00:28, Rüdiger Klaehn wrote:
> I think using inheritance within the scala collections is fine. People
> contributing to scala know the pitfalls. But when using the scala
> collections from the outside people should not inherit from "leaf"
> collections. Inheriting from traits like Map/MapLike that are designed
> for inheritance is fine with some guidance in the docs.

--
You received this message because you are subscribed to the Google Groups "scala-internals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scala-i...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: [scala-internals] AnyRefMap Peter Empen 31/01/14 02:56
One general policy of the collection library has been to refuse extension requests with no obvious virtue for all users. This implicates that in edge cases users are left with the single option to extend a specific collection to be able to add functionality that you cannot implement without access to protected members. Copying sources is not a real option, right?

Also, it's a bit degrading argument to say that users who need extensions are not capable of inheriting properly. I'd advokate for letting it be their own responsibility.
Re: [scala-internals] AnyRefMap Jason Zaugg 31/01/14 02:58
On Fri, Jan 31, 2014 at 11:48 AM, martin odersky <martin...@epfl.ch> wrote:
For all of these reasons I would say, let's try to work towards guideline (1), but do so in a way which is consistent over the whole collections. I do not see right now a better alternative for the `default` method. It does what it needs to do with minimum fuzz and performance overhead. So, I would suggest we make AnyRefMap non-final, but at the same time make as many of its methods final as we can. That will involve methods in its base classes, first, because a lot of the critical methods are probably inherited by AnyRefMap and second, because an isolated "improvement" for AnyRefMap would be a pessimization overall for breaking consistency.

I would prefer to keep the leaves final, even if that means:

/** Document general pattern of extension */
abstract class BaseAnyRefMap {
   // not safe to override
   final def foo

   // document relationship between these methods (you must override both!)
   def foo1 = ...
   def foo2 = 
}

final class AnyRefMap extends BaseAnyRefMap {
   // we can exploit finality here in every which way
   override def getOrElse(...) = // avoid allocating Some
}

Cons:
  1. its a bit harder to find which class to extend
  2. inertia
  3. designing for extension is still hard!
Pros
  1. See #1 from cons :). Extending collections is an advanced usage of collections, IMO, and I've seen too many beginners reach for it as their very first attempt ("how can I make my Family class extends List[Person]"?)
  2. we have more the freedom to optimize in the leaves.
  3. could be retrofitted to existing collections to offer people a migration path

-jason
Re: [scala-internals] AnyRefMap Rex Kerr 31/01/14 03:08
On Fri, Jan 31, 2014 at 2:48 AM, martin odersky <martin...@epfl.ch> wrote:
Thanks for the feedback! There are too many mails to respond to them all, so let me try to summarize and explain my position.

1. Inheritance is a fundamental building block for Scala software  but it has to be used carefully (and I readily admit that the current collection could do with less, and would in fact be keen to see alternative implementations).

A good guideline is: "Every method should be abstract or final, or should be a documented extension point". In my mind that's a better guideline than "all leaf classes should be final". The latter has it backwards: Final classes are by necessity leaf-classes, but who is otherwise able to define what a leaf class is?

I like that guideline.  I think the existing classes are very very very far outside of that ideal, however.  Very little is documented as an extension point.
 

2. The default method is clearly documented as a "hook", i.e. a documented extension point. And in fact I believe there's a good reason for that. Mainly, performance is the concern here. If one overrides default, the chances are that this is done so that instead of `get` one can call `apply` directly. So `default` is likely called on the most common access path for the collection. We simply do not want another closure object and an indirection for something like that.

The alternatives from Python do not apply, because Python has a completely different performance profile, and was not OO from the start.

I benchmarked it both ways, and there is a small difference.  But emphasis on _small_.  The reason is that one extra indirection doesn't cost very much in heavily-used code, and lookup is already a pretty expensive operation.  I don't recall what the proportion was, but I'm pretty sure it was under 5%.

But I agree that it is clearly documented as a hook, and that has value.
 

3. Some people have said that MapLike and friends could replace the concrete Map classes as classes that are inherited. I am afraid that will not work. The ...Like classes are there to add utility methods such as map, filter, take to a base collection. ...Like classes do not contain a collection implementation themselves, they play essentially the role of containers of extension methods. So saying clients should extend MapLike instead of HashMap amounts to saying that they should write their own HashMap implementation from scratch!

Agreed.  MapLike does not give you a map.
 

For all of these reasons I would say, let's try to work towards guideline (1), but do so in a way which is consistent over the whole collections. I do not see right now a better alternative for the `default` method. It does what it needs to do with minimum fuzz and performance overhead. So, I would suggest we make AnyRefMap non-final, but at the same time make as many of its methods final as we can. That will involve methods in its base classes, first, because a lot of the critical methods are probably inherited by AnyRefMap and second, because an isolated "improvement" for AnyRefMap would be a pessimization overall for breaking consistency.

I recommend leaving other classes alone, mostly because it's a ton of work to wade through everything in base classes and think about whether each method can be sensibly overridden.  Usually the answer will be yes, after some deep thought, but yes-if-you-keep-these-things-in-mind.  And so that knowledge will have to be imparted to scaladoc.  I am quite sure I cannot get this done in anything like a comprehensive way before RC1, and right now AnyRefMap is an anomaly.

So I suggest instead that I _don't_ do anything to base classes (actually, there aren't any, only traits), since AnyRefMap's core functionality is all within AnyRefMap.  Of course it gets a ton of stuff from MapLike, but as you just said, MapLike doesn't implement a hash map. AnyRefMap does, and it's those internals that are hardest to get right.

And then we can gradually work towards better documentation of how to extend methods.

(Or, actually, I don't think that is ever going to happen in practice just through manual labor.  Instead, we need the compiler to tell us dependencies so the information is available automatically, so that when one is compiling, the compiler will warn you when you override too little to have a chance of making sense.)

  --Rex

Re: [scala-internals] Re: AnyRefMap Jason Zaugg 31/01/14 03:22
On Thu, Jan 30, 2014 at 11:08 PM, Rex Kerr <ich...@gmail.com> wrote:
On Thu, Jan 30, 2014 at 2:03 PM, martin odersky <martin...@epfl.ch> wrote:



On Thu, Jan 30, 2014 at 6:25 PM, Simon Ochsenreither <simon.och...@gmail.com> wrote:

 
Concerning inheriting from HashMap to override the default value ... what's wrong with using withDefault/withDefaultValue?

An extra indirection. When I use AnyRefMap I do it to get the best performance. withDefault does not give me that.

In the case of AnyRefMap, this is irrelevant because you can supply a default mapping function in the constructor.

BTW, *if* you end up keeping the default as a function passed to the constructor, we ought to consider making that constructor private to allay the objection that `new Collection(f)` ought to create a collection with `f` as an element. A named factory in the companion would be preferable. (But as usual, I'm not great at suggesting the name!)

Relatedly, we've only just recovered from this wrinkle:

  scala> new Vector(1, 2, 3)
  java.lang.NullPointerException

-jason

Re: [scala-internals] Re: AnyRefMap Haoyi Li 31/01/14 06:03
> 2. The default method is clearly documented as a "hook", i.e. a documented extension point. And in fact I believe there's a good reason for that. Mainly, performance is the concern here. If one overrides default, the chances are that this is done so that instead of `get` one can call `apply` directly. So `default` is likely called on the most common access path for the collection. We simply do not want another closure object and an indirection for something like that.
The alternatives from Python do not apply, because Python has a completely different performance profile, and was not OO from the start.

I don't think OO from the start means do-everything-with-inheritance. I mean, Ruby is "OO from the start" and does the exact same thing as Python. Do-everything-with-inheritance is Java-style OO to make up for their lack of lambdas, since inheriting is less clunky than passing in an anonymous-inner-class-instance. Maybe the performance gain from not passing in a lambda is worth sacrificing the ideal API, but Rex seems to indicate it's not a huge hit.

I agree with everything else said =)

On Friday, January 31, 2014 7:22:06 PM UTC+8, Jason Zaugg wrote:
On Thu, Jan 30, 2014 at 11:08 PM, Rex Kerr <ich...@gmail.com> wrote:
On Thu, Jan 30, 2014 at 2:03 PM, martin odersky <martin....@epfl.ch> wrote:



On Thu, Jan 30, 2014 at 6:25 PM, Simon Ochsenreither <simon.och...@gmail.com> wrote:

 
Concerning inheriting from HashMap to override the default value ... what's wrong with using withDefault/withDefaultValue?

An extra indirection. When I use AnyRefMap I do it to get the best performance. withDefault does not give me that.

In the case of AnyRefMap, this is irrelevant because you can supply a default mapping function in the constructor.

BTW, *if* you end up keeping the default as a function passed to the constructor, we ought to consider making that constructor private to allay the objection that `new Collection(f)` ought to create a collection with `f` as an element. A named factory in the companion would be preferable. (But as usual, I'm not great at suggesting the name!)

Relatedly, we've only just recovered from this wrinkle:

  scala> new Vector(1, 2, 3)
  java.lang.NullPointerException

-jason

Re: [scala-internals] Re: AnyRefMap Simon Ochsenreither 31/01/14 06:19

we ought to consider making that constructor private to allay the objection that `new Collection(f)` ought to create a collection with `f` as an element. A named factory in the companion would be preferable. (But as usual, I'm not great at suggesting the name!)
Re: [scala-internals] AnyRefMap martin 31/01/14 06:35



On Fri, Jan 31, 2014 at 12:08 PM, Rex Kerr <ich...@gmail.com> wrote:
On Fri, Jan 31, 2014 at 2:48 AM, martin odersky <martin...@epfl.ch> wrote:
Thanks for the feedback! There are too many mails to respond to them all, so let me try to summarize and explain my position.

1. Inheritance is a fundamental building block for Scala software  but it has to be used carefully (and I readily admit that the current collection could do with less, and would in fact be keen to see alternative implementations).

A good guideline is: "Every method should be abstract or final, or should be a documented extension point". In my mind that's a better guideline than "all leaf classes should be final". The latter has it backwards: Final classes are by necessity leaf-classes, but who is otherwise able to define what a leaf class is?

I like that guideline.  I think the existing classes are very very very far outside of that ideal, however.  Very little is documented as an extension point.

Right. It will be some work to get there, and we have to go carefully for risk of breakage. But that should be to goal, anyway,  
 

2. The default method is clearly documented as a "hook", i.e. a documented extension point. And in fact I believe there's a good reason for that. Mainly, performance is the concern here. If one overrides default, the chances are that this is done so that instead of `get` one can call `apply` directly. So `default` is likely called on the most common access path for the collection. We simply do not want another closure object and an indirection for something like that.

The alternatives from Python do not apply, because Python has a completely different performance profile, and was not OO from the start.

I benchmarked it both ways, and there is a small difference.  But emphasis on _small_.  The reason is that one extra indirection doesn't cost very much in heavily-used code, and lookup is already a pretty expensive operation.  I don't recall what the proportion was, but I'm pretty sure it was under 5%.

But I agree that it is clearly documented as a hook, and that has value.
 

3. Some people have said that MapLike and friends could replace the concrete Map classes as classes that are inherited. I am afraid that will not work. The ...Like classes are there to add utility methods such as map, filter, take to a base collection. ...Like classes do not contain a collection implementation themselves, they play essentially the role of containers of extension methods. So saying clients should extend MapLike instead of HashMap amounts to saying that they should write their own HashMap implementation from scratch!

Agreed.  MapLike does not give you a map.
 

For all of these reasons I would say, let's try to work towards guideline (1), but do so in a way which is consistent over the whole collections. I do not see right now a better alternative for the `default` method. It does what it needs to do with minimum fuzz and performance overhead. So, I would suggest we make AnyRefMap non-final, but at the same time make as many of its methods final as we can. That will involve methods in its base classes, first, because a lot of the critical methods are probably inherited by AnyRefMap and second, because an isolated "improvement" for AnyRefMap would be a pessimization overall for breaking consistency.

I recommend leaving other classes alone, mostly because it's a ton of work to wade through everything in base classes and think about whether each method can be sensibly overridden.  Usually the answer will be yes, after some deep thought, but yes-if-you-keep-these-things-in-mind.  And so that knowledge will have to be imparted to scaladoc.  I am quite sure I cannot get this done in anything like a comprehensive way before RC1, and right now AnyRefMap is an anomaly.

So I suggest instead that I _don't_ do anything to base classes (actually, there aren't any, only traits), since AnyRefMap's core functionality is all within AnyRefMap.  Of course it gets a ton of stuff from MapLike, but as you just said, MapLike doesn't implement a hash map. AnyRefMap does, and it's those internals that are hardest to get right.

And then we can gradually work towards better documentation of how to extend methods.

(Or, actually, I don't think that is ever going to happen in practice just through manual labor.  Instead, we need the compiler to tell us dependencies so the information is available automatically, so that when one is compiling, the compiler will warn you when you override too little to have a chance of making sense.)

I agree with all of it. Makes sense to me.

Cheers

 - Martin 


  --Rex

--
You received this message because you are subscribed to the Google Groups "scala-internals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scala-i...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.



Re: [scala-internals] Re: AnyRefMap rklaehn 31/01/14 07:06
On Fri, Jan 31, 2014 at 9:59 AM, Hanns Holger Rutz <con...@sciss.de> wrote:
On 30/01/2014 23:50, Adriaan Moors wrote:
> Making the leaf collection classes final will only make them more
> consistent,
> because we'll be fully in control of them, rather than having to
> anticipate all possible subclasses.

I agree with you. I also wonder... does it make sense at all to override
a leaf collection class if it's immutable? As soon as you modify
anything you would need also to make sure that you override `newBuilder`
and all that stuff.

At least for the immutable collections I am familiar with (HashSet, HashMap), I can guarantee that it is absolutely impossible to override them and get anything but chaos. HashSet and HashMap should have been sealed from the beginning. The same is almost definitely true for the other immutable collections.
Re: [scala-internals] Re: AnyRefMap Simon Ochsenreither 31/01/14 13:16

At least for the immutable collections I am familiar with (HashSet, HashMap), I can guarantee that it is absolutely impossible to override them and get anything but chaos. HashSet and HashMap should have been sealed from the beginning. The same is almost definitely true for the other immutable collections.

It's kind of mind-boggling that it's 2014 and we are discussing whether subclassing collection classes is the correct design approach.
If someone wants to have a non-final collection class, I'd recommend that this person steps forward and writes the documentation on how exactly this I supposed to be done for that class.

Everyone who touched the collection framework lately says it's impossible, so that guide would probably be very enlightening for all of us.
Re: [scala-internals] AnyRefMap Jason Zaugg 01/02/14 01:25
On Fri, Jan 31, 2014 at 12:43 AM, martin odersky <martin...@epfl.ch> wrote:

On Fri, Jan 31, 2014 at 12:37 AM, Jason Zaugg <jza...@gmail.com> wrote:
It would be better a wrapper like in java. Then it isn't susceptible to linearisation gotchas.  I looked at those a while back and found the impl to be a stark reminder of how much our mixins leave unprotected, for example toString, equals, serialisation. 

Yes, that's a good point.  

But in your example I would tend to write a wrapper manually that encapsulates the mutable unsynchronised map and offers a smaller API. It could contain a method to return an immutable snapshot if clients need rich query operations 

I am not sure we can demand from users to write these wrappers that were so far created for them in the library. How about we throw together a SynchronizedWrapper for Map, then we can safely deprecate the old class? 

I don't think one can throw this sort of thing together. It has to be engineered.

One point that occurred to me yesterday was that even synchronized wrappers are pretty broken as soon as the API contains higher-order methods. For example, synchronizing `getOrElseUpdate` means that we call back to user code under a lock, which is a classic recipe for deadlock.

I was curious as to how JDK8 handles this. Looks like they way you must explicitly convert to Stream before getting using any HOM means that they sidestepped this thorny issue. (j.u.Collection doesn't contain, even as a default method, `forEach`)

I know it's not that palatable, but people can copy/paste the current broken trait into their own code base (it is 64 loc) if the can't abide by the deprecation.

-jason

Re: [scala-internals] AnyRefMap rklaehn 01/02/14 02:05
I think the idea of using a standard mutable collection and then somehow wrapping/decorating it to be thread safe is fundamentally flawed regardless of whether you use a wrapper or a mixin. The only context where I feel comfortable using mutable collections that are not designed for concurrency in a situation that requires concurrency is within an actor.

When I need something that is guaranteed to be threadsafe and can't put it in an actor, I usually use an immutable collection wrapped in something like a synchronized AtomicReference. See below for the basic approach. That gives you instant snapshots and thread-safety. Of course you still get deadlocks by calling back into user code under a lock.

There are several clean ways to deal with concurrency in the typesafe stack (Actors, Agents, ...). Why do we have to have ways like SynchronizedXXX that will very likely lead to trouble, at all?

final class SynchronizedRef[T](initial: T) {
  @volatile
  private[this] var _current = initial

  def current = _current

  def update(f: T => T): Unit = {
    synchronized {
      _current = f(_current)
    }
  }
}


--
You received this message because you are subscribed to the Google Groups "scala-internals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scala-i...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: [scala-internals] AnyRefMap rklaehn 01/02/14 02:11
Here is a more complete implementation of the approach described above:

https://www.bionicspirit.com/blog/2013/05/07/towards-better-atomicreference-scala.html
Re: [scala-internals] Re: AnyRefMap Peter Empen 01/02/14 03:02
Please advice me on how to solve the following real-world requirement other than by inheritance:

About two years ago a client asked me to provide O(1) selection of random direct successors to a given node of a graph to enable him to implement random walks in an efficient manner. Until then successors were stored in collection.mutable.HashSet instances. I was able to use strait-forward inheritance to satisfy his request by adding a draw method: https://github.com/scala-graph/scala-graph/blob/master/core/src/main/scala/scalax/collection/mutable/ExtHashSet.scala.

To confirm, it is impossible to extend collection.imutable.HashSet mainly because the classes HashSet1, ..., HashTrieSet are placed in the companion object. What would be best practice to implement random selection in this case?

For another reason, I recently also tried to extend java.util.IdentityHashMap but I stranded due to package visibility and lots of private members. So I had to "convert" it to Scala (EqHashMap in the same folder).

I believe if you decide to seal collection classes we do need a means that allows users to reuse collection implementations in some way. For instance by abstract base classes as indicated by Jason and others.
Re: [scala-internals] Re: AnyRefMap Simon Ochsenreither 01/02/14 12:28

I believe if you decide to seal collection classes we do need a means that allows users to reuse collection implementations in some way. For instance by abstract base classes as indicated by Jason and others.

Absolutely! In my opinion, moving as much functionality to traits and abstract classes is the way to go, so that developers can easily reuse this functionality.
Sub-classing concrete collection classes is dangerous in my opinion, because a) it gives people false impressions about the work involved b) it doesn't turn their collection class into a subtype of a Scala collection class, which might break reasonable assumptions everywhere it is used.
Re: [scala-internals] Re: AnyRefMap Rex Kerr 01/02/14 12:28



On Sat, Feb 1, 2014 at 3:02 AM, Peter Empen <peter...@arcor.de> wrote:
Please advice me on how to solve the following real-world requirement other than by inheritance:

About two years ago a client asked me to provide O(1) selection of random direct successors to a given node of a graph to enable him to implement random walks in an efficient manner. Until then successors were stored in collection.mutable.HashSet instances. I was able to use strait-forward inheritance to satisfy his request by adding a draw method: https://github.com/scala-graph/scala-graph/blob/master/core/src/main/scala/scalax/collection/mutable/ExtHashSet.scala.


mutable.HashSet remains inheritable-from, so this wouldn't change.

(Also, your implementation is not statistically sound.  Elements near large empty gaps will be selected more often ((k+1) times more often for a gap of k).  In a large set, gap sizes will be approximately Poisson-distributed.)
 
To confirm, it is impossible to extend collection.imutable.HashSet mainly because the classes HashSet1, ..., HashTrieSet are placed in the companion object. What would be best practice to implement random selection in this case?

Under most scenarios, you'd be better off maintaining an array with the same items.  You get O(1) addition to the end, O(1) deletion (because order is unimportant, so you just take the last item off and stuff it in the slot of the removed item), and fast O(1) lookup.

I understand that it would be nice if possible to solve these problems using inheritance, but the key to doing so is not that people have marked the collections as not final, but that the design promotes inheritance.  Sometimes there's an inherent conflict between that goal and others (performance or correctness, for instance).  Inheritance should not always win.

  --Rex
 
Re: [scala-internals] Re: AnyRefMap rklaehn 01/02/14 13:36
On Sat, Feb 1, 2014 at 12:02 PM, Peter Empen <peter...@arcor.de> wrote:
Please advice me on how to solve the following real-world requirement other than by inheritance:

About two years ago a client asked me to provide O(1) selection of random direct successors to a given node of a graph to enable him to implement random walks in an efficient manner. Until then successors were stored in collection.mutable.HashSet instances. I was able to use strait-forward inheritance to satisfy his request by adding a draw method: https://github.com/scala-graph/scala-graph/blob/master/core/src/main/scala/scalax/collection/mutable/ExtHashSet.scala.

The draw method just needs access to the internals of the HashSet, but does not override any behavior. So since the internals are not private but private[collection], you could just write an enhancement like this:

package scala.collection.mutable

import scala.util.Random

object HashSetExt {

  implicit class HashSetDraw[A](val value:HashSet[A]) extends AnyVal {

    def draw(random:Random) : A = {
      val table = value.hashTableContents.table
      val len = table.length
      val drawn = random.nextInt(len) // IllegalArgumentException if len == 0
      def search(from: Int, incr: Int, hold: Int => Boolean): Option[A] = {
        var i = from
        while (hold(i)) {
          val elem = table(i)
          if (elem ne null) return Some(elem.asInstanceOf[A])
          else i += incr
        }
        None
      }
      // must always find an element either upwards or downwards
      search(drawn, 1, (i: Int) => i < len) getOrElse(
        search(drawn - 1, -1, (i: Int) => i > 0).get)
    }

  }
}

And the good thing is that this works with any s.c.m.HashSet, not just your personal version. So if you get a s.c.m.HashSet from somewhere else you can immediately use draw instead of having to copy the data into one of your HashSetExt. Here is how you would use it:

scala> :paste
// Entering paste mode (ctrl-D to finish)

import scala.collection.mutable._
import HashSetExt._
val r = new scala.util.Random
val hs = HashSet(0 until 10:_*)
hs.draw(r)

// Exiting paste mode, now interpreting.

import scala.collection.mutable._
import HashSetExt._
r: scala.util.Random = scala.util.Random@23e0b48
hs: scala.collection.mutable.HashSet[Int] = Set(0, 9, 1, 5, 2, 6, 3, 7, 4, 8)
res3: Int = 8

scala>

But please note that as Rex has pointed out you will not get out all elements with the same probability.

scala> val stats = Array.fill(1000000)(hs.draw(r))
stats: Array[Int] = Array(0, 5, 5, 7, 3, 1, 4, 9, 8, 6, 1, 8, 8, 6, 5, 5, 8, 8, 1, 9, 8, 5, 9, 8, 6, 9, 8, 8, 7, 7, 5, 7, 9, 8, 5, 7, 8, 9, 1, 1, 8, 7, 7, 9, 8, 4, 7, 5, 7, 9, 8, 4, 7, 9, 8, 1, 5, 8, 6, 9, 0, 2, 6, 6, 8, 0, 1, 1, 5, 5, 7, 8, 2, 1, 6, 7, 7, 5, 5, 5, 2, 6, 8, 4, 5, 8, 6, 5, 2, 8, 7, 8, 1, 1, 6, 1, 6, 8, 8, 5, 9, 8, 6, 8, 9, 8, 0, 1, 7, 1, 6, 8, 5, 1, 6, 7, 5, 8, 1, 8, 6, 7, 4, 6, 6, 8, 6, 1, 9, 1, 8, 7, 5, 9, 8, 7, 5, 8, 6, 7, 7, 5, 6, 0, 2, 5, 8, 1, 9, 5, 6, 0, 8, 7, 1, 2, 7, 8, 6, 6, 8, 6, 7, 8, 1, 7, 9, 5, 5, 8, 6, 6, 5, 9, 7, 8, 5, 5, 1, 7, 5, 6, 8, 7, 3, 5, 2, 7, 9, 1, 8, 7, 5, 8, 8, 7, 7, 6, 5, 9, 6, 7, 6, 0, 7, 7, 7, 5, 7, 8, 7, 1, 9, 5, 2, 8, 6, 8, 5, 5, 8, 3, 7, 8, 8, 6, 7, 7, 8, 7, 1, 8, 7, 6, 7, 4, 6, 6, 7, 5, 9, 6, 8, 7, 8, 1, 7, 1, 1, 5, 7, 1, 8, 8, 3, 9, 6, ...
scala> stats.groupBy(x => x).map { case (k,vs) => (k,vs.size) }.mkString("\n")
res18: String =
0 -> 31099
5 -> 156567
1 -> 93721
6 -> 156340
9 -> 93571
2 -> 31068
7 -> 155970
3 -> 31543
8 -> 218617
4 -> 31504
 
To confirm, it is impossible to extend collection.imutable.HashSet mainly because the classes HashSet1, ..., HashTrieSet are placed in the companion object. What would be best practice to implement random selection in this case?

No, it is the other way round: HashSet1, HashSetCollision1 and HashTrieSet are in the companion object because it is impossible to inherit from them.

There are basically three reasons for this:

1. Immutable objects need to be able to make copies of themselves. If you override an immutable object, you need to make sure that whenever you make a copy you make a copy of the right type. If you have an HashSetExt, the copy must also be of type HashSetExt and not just HashSet. So all places in HashSet where instances are being created with new HashTrieSet / new HashSet1 / new HashSetCollision1 would have to be replaced with calls to virtual factory methods (mkHashSet1, mkHashTrieSet, mkHashSetCollision1) that generate the correct type.

2. You would have to extend not just HashSet, but HashTrieSet, HashSet1 and HashSetCollision1. In all three cases you would have to override the virtual factory methods mentioned above to return the right types.

3. You would have to add a self type to make sure that the compile time type of operations is correct. Otherwise you would get something like this:

val x = HashSetExt.empty[Int] // draw can be called, even though it does not make much sense on an empty set
val y = x + 1 // type is HashSetExt, but compile time type HashSet, not HashSetExt. So the method .draw would not be available without a cast.

You would probably end up with the following class hierarchy:

// the base classes for inheritance
HashSetLike[Repr,T] (with the three virtual methods mentioned above)
HashSet1Like[Repr,T] extends HashSetLike[Repr,T]
HashSetCollision1Like[Repr,T] extends HashSetLike[Repr,T]
HashTrieSetLike[Repr,T] extends HashSetLike[Repr,T]

// the normal immutable HashSet
HashSet[T] // overrides the mkXXX methods
HashSet1[T] extends HashSet[T] with HashSetLike[HashSet[T],T]
HashSetCollision1[T] extends HashSet[T] with HashSetCollision1Like[HashSetCollision1[T],T]
HashTrieSet[T] extends HashSet[T] with HashTrieSetLike[HashTrieSet[T],T]
 
// any custom immutable HashSet
HashSetExt[T] // overrides the mkXXX methods
HashSetExt1[T] extends HashSetExt[T] with HashSetLike[HashSetExt1[T],T]
HashSetExtCollision1[T] extends HashSetExt[T] with HashSetCollision1Like[HashSetExtCollision1[T],T]
HashTrieSetExt[T] extends HashSetExt[T] with HashTrieSetLike[HashTrieSetExt[T],T]

As you can see, pure insanity.

For another reason, I recently also tried to extend java.util.IdentityHashMap but I stranded due to package visibility and lots of private members. So I had to "convert" it to Scala (EqHashMap in the same folder).

I believe if you decide to seal collection classes we do need a means that allows users to reuse collection implementations in some way. For instance by abstract base classes as indicated by Jason and others.

I agree that it would be nice to be able to reuse data structures from scala collections.

But I think the current situation regarding reuse of existing data structures is far from ideal even with non-final classes. For example, the Trie data structure underlying the immutable HashSet and HashMap could be a very efficient data structure for something like storing an ordered set of integers without any hashing. But the algorithm is so entangled with the various boilerplate to make it fit into the scala collection hierarchy as a Set or Map that there is no way to do it.

One place where it is done better is RedBlackTree. The class RedBlackTree just implements the data structure itself without any additional boilerplate. It is wrapped in two different ways as TreeSet and TreeMap. But if you just want a RedBlackTree for your own data structure, you can just take it and wrap it in some other way. The downside is of course that the wrapping creates another indirection. But in case of TreeMap/TreeSet this indirection is necessary anyway because RedBlackTree does things internally (using null for empty for efficiency) which you have to hide from the outside.

If only there was a way to do a wrapper without runtime overhead. Value classes do not work because they do not allow redefining equality and hashcode, and that is needed for various base traits of the scala collection hierarchy. You can make a value class that extends TraversableOnce, but anything more complex than that (Traversable, Iterable, Seq etc. ) requires a full blown object with runtime overhead.
Re: [scala-internals] Re: AnyRefMap rklaehn 01/02/14 14:11
Here is a version of an immutable set with a draw method that should have good performance as long as you don't mix building the set and draw() calls too much.

import scala.collection.generic.ImmutableSetFactory
import scala.collection.SetLike
import scala.util.Random

case class IndexableSet[T](s:Set[T]) extends Set[T] with SetLike[T, IndexableSet[T]] {
  def contains(elem: T): Boolean = s.contains(elem)

  def +(elem: T): IndexableSet[T] = copy(s + elem)

  def -(elem: T): IndexableSet[T] = copy(s - elem)

  def iterator: Iterator[T] = s.iterator

  override def empty = IndexableSet.empty

  private[this] lazy val elements = s.asInstanceOf[Set[AnyRef]].toArray

  def draw(r:Random) = elements(r.nextInt(elements.length)).asInstanceOf[T]
}

object IndexableSet extends ImmutableSetFactory[IndexableSet] {

  override def empty[A]: IndexableSet[A] = new IndexableSet(Set.empty[A])
}

On Sat, Feb 1, 2014 at 12:02 PM, Peter Empen <peter...@arcor.de> wrote:

--
You received this message because you are subscribed to the Google Groups "scala-internals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scala-i...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: [scala-internals] Re: AnyRefMap Peter Empen 02/02/14 01:15
Rüdiger, I like your suggestion for HashSetDraw.

Admittedly, I wasn't aware of private[collection] hashTableContents. Partly because Scaladoc doesn't list members with widened private visibility, partly because if I see protected var table, I'm not yet used to look for other members exposing the very same. Maybe protected[collection] var table would do it as well with a direct clue? (I understand hashTableContents is conceptually clean but it allocates additional memory.)

Yes, I did realize that the probability distribution is not even but I put up with it in the absence of a better alternative.
Re: [scala-internals] Re: AnyRefMap rklaehn 02/02/14 01:59
I think reaching into the guts of a collection is not a good idea whether you use inheritance or enhancement. You rely on the implementation details of the original collection too much. I just wanted to show that you do not necessarily need inheritance.

What's wrong with the wrapping approach? It works with any wrapped set, draw has an equal probability for each element, and it is even faster. It would not be that hard to do a similar thing for a mutable set if you need that last bit of performance that gives you.

But I think this starts to become off-topic for scala-internals, so if you want to discuss this further PM me.

--
You received this message because you are subscribed to the Google Groups "scala-internals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scala-i...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

More topics »