--
You received this message because you are subscribed to the Google Groups "scala-debate" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scala-debate...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
> But, what about ResizableArray. It looks like an array, so why deny it
the same operations?
Good question. Do people use it? I count 220 occurrences on github in Scala files vs 370895 occurrences for "Seq" vs 48,532 occurrences for "ArrayBuffer", so I'd argue the added weirdness in that case won't affect too many people> There's the other valid use case where you build up a mutablecollection and once it is fully constructed want to use it as if itwas immutable. Copying to make it immutable is not permitted forperformance reasons. How would be support that?
Isn't that what the XXXBuilder classes like VectorBuilder are for? If you want a builder you use a builder collection, not a mutable collection. Unless we're getting rid of builders?
> To play the devils advocate with what Martin said: we don't need to copy a mutable collection to make it immutable, we could just freeze it, i.e. disallow further updates.Re-sending what I sent to martin to the list, I must've forgotten to reply-allWould an explicit wrapper work in that case? e.g.def freeze(myArray: Array[T]): IndexedSeq[T]This "want to turn mutable collection into immutable one with zero copy" use case happens, but I'd guess happens a lot less than the use case of "want to use mutable collection as mutable collection and not accidentally call expensive immutable methods", so we can optimize the common case to make it impossible to accidentally call immutable methods, while providing a short one-function helper to make the other use case convenient too
> Another possibility would be to go the Rust way, and use ownership types to ensure that iteration and mutation don't happen at the same time.I'd love if this was the case but I doubt it will happen in next 2 years =PCould you clarify?
> I'm still not sure what is the root cause why we would want to remove operations on mutable collections (and from the sentence "Buffers append/remove is what you generally care about, but they're lost in the huge sea of immutable methods you basically never want", it's not clear to me which ones you want to see removed).Basically everything that returns a new collection should not be part of the mutable API by default.
The whole point of using mutable collections is to mutate them in place and not re-allocate all the time. If I wanted to re-allocate all over the place I'd just use an immutable collection. The number of times I've actually wanted to call map, filter, etc. on a mutable collection in the last 3 years has been zero.
As an exercise, compare:Compare mutable.ArrayBuffer:with java.util.ArrayList:Get a stopwatch and answer the following questions:- How do I remove every element satisfying some predicate from mutable.ArrayBuffer, in place?- How do I remove every element satisfying some predicate from java.util.ArrayList, in place?- How do sort the mutable.ArrayBuffer by some predicate, in place?-How do sort the java.util.ArrayList by some predicate, in place?Time how long it takes you to answer each question (how to do it, or if it cannot be done) by browsing the list of methods.For me it takes >10x as long to answer each one for mutable.ArrayBuffer, due to the huge pile of "useless" copying methods.SPOILER BELOWIt turns out you can't perform either operation on a mutable.ArrayBuffer, which is pretty surprising given these are very common operations to want to do on any mutable collection.
> But, I'm sure that some people have valid use cases where they find these methods useful,Let's skip the hypotheticals and ask: who are these people and what are their use cases? In theory everything is useful to someone somewhere maybe. In the end there's always an empirical judgement of how many people you think will want this functionality. "Non-zero, maybe" isn't satisfactory =PCould we kick of community builds with instrumentation via Abide to see what the actual usage numbers on the various APIs are? That would help give us something concrete to talk about, and perhaps suggest alternatives for the use cases we do find.
> as a side note, I find it funny how we go at great lengths to prevent our users from shooting themselves in the footWhile Scala's immutable collections are nice, Scala's mutable collections API is one of the most easily-foot-shooting APIs I know of. 100s of methods you don't want, barely any methods you want, missing several methods you *do* want.Python, Java, and many other languages provide more ergonomic mutable collections. Let's not be too proud of ourselves =D
> Should a conversion from a mutable collection to a sequence also be enabled through an explicitly imported implicit conversion?What's wrong with just an explicit conversion?def freeze(input: mutable.Seq[T]): immutable.Seq[T]Not everything needs to be an implicit.
--
> Why do you say they are more ergonomic? In which ways?Here's what you get when you want the methods on a python list:A small number of operations, basically all of which are useful to everyone who wants a mutable list. And all can be understood by anyone who has ever programmed before, without reading any docs!
Here's what you get when you ask for the methods on a scala mutable buffer:How many of these are methods that someone can call on a mutable buffer without horrendous performance implications? Less than half. In fact, methods which look very similar (++ and ++=) have totally different semantics and performance implications.
As an exercise, can you give me which methods on that list will result in O(n) copying of the whole collection, which is basically what you never want?
I'm not sure if it makes sense trying to salvage the request for strawmen, but I think
Otherwise it feels we are just swapping some well-known disadvantages against other well-known disadvantages.
--
Martin Odersky
EPFL
On Tuesday, 6 October 2015 00:34:14 UTC+2, Haoyi Li wrote:> Why do you say they are more ergonomic? In which ways?Here's what you get when you want the methods on a python list:A small number of operations, basically all of which are useful to everyone who wants a mutable list. And all can be understood by anyone who has ever programmed before, without reading any docs!
That's a bit unfair comparison, because it does not include Python's built-in functions, many of which are list-related:
https://docs.python.org/2/library/functions.html
That adds more functions to the table.
Plus, there's Python's itertools, which has more list-related methods:
https://docs.python.org/2/library/itertools.html
Here's what you get when you ask for the methods on a scala mutable buffer:How many of these are methods that someone can call on a mutable buffer without horrendous performance implications? Less than half. In fact, methods which look very similar (++ and ++=) have totally different semantics and performance implications.
I have recently opened an issue inviting strawman designs for new collection architectures: https://github.com/lampepfl/dotty/issues/818 The issue lists some requirements for a design to be acceptable. Since then there has been a discussion about several points that are all outside the requirements outlined in #818. I think it's worthwhile to discuss these, but do not think #818 is the proper place to do it. That's why I would like to continue the discussion here. - Martin
-- Francois ARMAND http://rudder-project.org http://www.normation.com
:+
, :\
, ++:
, or &~
are hard to read and even harder to google. Such method names encourage
developers to create their own scary symbolic methods what in turn
hurts the reputations of Scala and leads to bad code.- We cannot get enough improvement while staying within requirement (1).
Or does it just mean that existing code looking like that should keep compiling/doing the same with minimal interaction?
If it's the second (i.e. we need more time), then there's still
nothing to stop us from starting the discussion now. In fact, if this
should arrive in 2.13, we have about 2 years to do it, or if we miss
that release, almost 4 years. That looks like plenty of time to me,
*if we start the experiments now*. That's what I was trying to achieve
with the strawman initiative.
Yes to both.
That's a very pessimistic sentiment! Let's see the strawman proposals
first. It could be that the conclusion in the end is, no, it's not
worth it. But I think we should try and evaluate the evidence before
reaching that conclusion.
"ambitious" proposals are ruled out by definition
But, what about ResizableArray. It looks like an array, so why deny it
the same operations?
There's the other valid use case where you build up a mutable
collection and once it is fully constructed want to use it as if it
was immutable. Copying to make it immutable is not permitted for
performance reasons. How would be support that?
- Martin
On Mon, Oct 5, 2015 at 10:30 PM, Haoyi Li <haoy...@gmail.com> wrote:
>> So if we do not want to go there, where to draw the line?
>
> I say we draw the line between Arrays and everything else. They're already
> weird special things (reified generics, rubbish toString, rubbish equality,
> ...), so IMHO it's perfectly fine to have them be weird-special mutable
> objects with immutable operations. They're not even really collections in
> the first place!
>
> After that, immutable.* operations on mutable.T forall T in {Seq, Set, Map,
> Buffer, Queue, PriorityQueue, ...} should be all killed.
>
> On Mon, Oct 5, 2015 at 1:23 PM, martin odersky <martin....@epfl.ch>
> wrote:
>>
>> Here is a first response to
>>
>> >> We gloss over the mutabile/immutable distinction
>>
>> > We probably want mutable collections to not have the same API as the
>> > immutable ones. One large problem with the current mutable collections is
>> > that by inheriting all the immutable methods, it's
>> neigh impossible to figure out where the "useful" mutable methods are.
>> e.g. Buffers append/remove is what you generally care about, but
>> they're lost in the huge sea of immutable methods you basically never
>> want. The same applies to mutable.Set, mutable.Queue, etc.
>>
>> > By that logic, ArrayBuffer probably shouldn't be a collection, and
>> > perhaps making it convertible to one through .view or .iterator is enough?
>>
>> If we take this idea to its logical conclusion, we should abandon
>> `map` and friends on arrays because they are also mutable. This would
>> render obsolete virtually every Scala book in existence. Besides, it
>> would be counter-productive. Map and friends are perfectly good
>> methods on arrays.
>> So if we do not want to go there, where to draw the line?
>>
>> - Martin
>>
>>
>>
>>
>> On Mon, Oct 5, 2015 at 10:18 PM, martin odersky <martin....@epfl.ch>
>> wrote:
>> > I have recently opened an issue inviting strawman designs for new
>> > collection architectures:
>> >
>> > https://github.com/lampepfl/dotty/issues/818
>> >
>> > The issue lists some requirements for a design to be acceptable. Since
>> > then there has been a discussion about several points that are all
>> > outside the requirements outlined in #818. I think it's worthwhile to
>> > discuss these, but do not think #818 is the proper place to do it.
>> > That's why I would like to continue the discussion here.
>> >
>> > - Martin
>> >
>> >
>> >
>> > --
>> > Martin Odersky
>> > EPFL
>>
>>
>>
>> --
>> Martin Odersky
>> EPFL
(as a side note, I find it funny how we go at great lengths to prevent our users from shooting themselves in the foot)
I wanted to bring up specialization as well - I think that it will be relevant even with the AST fusion.
People commonly use a primitive type parameter and silently incur the cost of boxing:
def addNums(b: new ArrayBuffer[Int]) {
b += 1
b += 2
b += 3
}
Since we applied specialization to Function0,1,2,..., I see no reason why collections such as array buffers would not specialize for a specific set of primitive types.
The common problem here is instantiating arrays, which typical collections (buffers, hash maps, hash tries, queues, heaps) do a lot and which specialization doesn't support in the generic context. The way some rogue collection frameworks solved this in the past is by introducing a custom ClassTag-gy type-class that knows how to instantiate arrays.
I have to agree with Rex and Viktor. I think it would be best to
continue cleaning up the existing collections and do a major redesign
as a clean break in a different namespace. Otherwise we will have to
repeat almost all of the design decisions for backwards compatibility,
which kind of defeats the purpose of a redesign.
For the cleanup of the existing collections:
1. can we please somehow deal with the fact that filterKeys and
mapValues returns a view? Just adding mapValuesNow to make it clear
that mapValues is lazy might be the best we can do given backwards
compatibility. But something needs to be done.
https://issues.scala-lang.org/browse/SI-4776
2. What about just ripping out parallel collections and all the
complexity they produce? I have not seen many usages of them in the
wild. Maybe it would be possible to add .par as an enhancement method,
or use java streams. But the complexity added by having parallel
collections intermingled with normal collections is staggering.
On Tue, Oct 6, 2015 at 9:48 AM, Haoyi Li <haoy...@gmail.com> wrote:For example, I want:- in-place filter,- in-place sort,- in-place distinct,- in-place drop,- in-place dropWhile,- in-place takeYes, these would be good to have. But maybe under different names?Otherwise we get into the non-sensical situation that filter copies on arrays (which it has to, there's no way around it!) yet destroys on ResizableArrays and ArrayBuffers. But we could use a systematic naming scheme for these. E.g.filterReplace, sortReplace, ...orfilter_!, sort_!, ...- Martin
I actually like that most operations on scala collections work the same between mutable and immutable collections.
And I don't care if it breaks and if I have to sed 10000s of line of code, if the goal is clear and substentially better.
2. What about just ripping out parallel collections and all the
complexity they produce? I have not seen many usages of them in the
wild. Maybe it would be possible to add .par as an enhancement method,
or use java streams. But the complexity added by having parallel
collections intermingled with normal collections is staggering.
Rip out parallel collections? Are you serious? That would squander one of the main advantages of functional programming in Scala!I just started using parallel collections, and they couldn't be simpler to use. How hard is it to add ".par"? Oh, yes, I will concede that first eliminating all the vars can be a non-trivial task, but it is usually not that bad once you get the hang of it.I am also finding that the speedup I get with parallel collections is not as much as I had hoped for, but that should change when I get a CPU with more cores.
That reminded me of the work by Paul Phillips on something similar, and that: https://github.com/paulp/psp-std/blob/master/doc/views.md seems to be desirable goals of a redesigned scala collection.
Cheers,
-- Francois ARMAND http://rudder-project.org http://www.normation.com
Simon,
I get it that you are a skeptic, but your tone could be a lot less
dismissive. It's not fun for me to discuss on this level, so I will
stop doing it.
I encourage you nevertheless to come up with a concrete proposal
whether it conforms to the requirements or not. Proposals are good,
overly negative attitudes aren't.
- Martin
>> That's a very pessimistic sentiment! Let's see the strawman proposals
>> first. It could be that the conclusion in the end is, no, it's not
>> worth it. But I think we should try and evaluate the evidence before
>> reaching that conclusion.
>
>
> The way the request for proposals is worded predetermines the result: It
> will be a race to the bottom looking for the least useless change possible,
> while more ambitious approaches are excluded from the get go.
Sorry, this is ridicuThen all
Can we try to make scala a bit more democratic please?
> But we could use a systematic naming scheme for these.
Sounds fine; I'm sure we'll be able to come up with a reasonable name if we decided this was something we wanted to do =)I guess the last thing I have to contribute to this theme (mutable vs immutable collections) is the following:Mutable collections are used very differently from immutable ones. Most languages make the mistake of freezing mutable collections and calling them "immutable" while still leaving all the operations the same. From a Scala background we think it's ridiculous, because we know how nice to use immutable collections with a proper immutable API, v.s. something like Guava's ImmutableList.In that case end up with immutable collections which are the worst of both worlds: expensive to update (no structural sharing) and clunky to use, with random methods that blow up with exceptions (which nobody wants), but people still use them because immutability is sometimes that valuable.On the other hand, Scala has made the exact same mistake: we have taken our immutable API, chucked it on a set of mutable collections, and called it a day. Nevermind that the performance characteristics are all wrong, the use-cases are different, the styles of use are different.As a result, we have a mutable collections that are the worst of both worlds: expensive to update (because all the operators copy!) or clunky to use (because you're back to while-loops), with random methods that'll blow-up in execution time (which nobody wants), but people sometimes use them because perf is sometimes that valuableIn reality, using mutable collections is a style in-of-itself, and not a subset-or-super-of the style of using immutable collections. For example.- With mutable collections, get-and-remove is one of the most common things to do. In Javascript it's called .shift, in Python .pop. In Scala it's missing entirely.- With mutable collections, you do things in-place. You sort it in-place, you filter it in-place, you do everything in place.- Even if you didn't care about perf and wanted to use the existing collections filter/sort/etc. operations on a val b: mutable.Buffer, you couldn't! You'd need a var b: mutable.Buffer, which is twice as much mutability as anyone really needs and probably will get you yelled at during code review.- Since we already have immutable collections satisfying the convenient-if-maybe-expensive use case, the mutable collections only really satisfy the remaining must-be-relatively-performant use case.- Thus, while for "convenient-if-maybe-expensive" you usually don't care too much about the implementation or how long it takes, with mutable collections you care exactly about the implementation of every operation and its performance characteristics- Following from that, maybe it's OK to have hundreds (!!!) of operations on dozens of immutable collections, since even if we don't know how they work we don't care. With mutable collections, though, you want the opposite: a small set of operations on a small set of collections, so that you can fully understand how every operation works, and how much every operation costs, on the mutable collection that's in front of you.With mutable collections, you don't want a sprawl of methods for every convenience, abstracted away for maximum re-use. Maybe that's the goal for the immutable collections, and that's fine, but the mutable collections are their own world. Scala's current mutable collections are about as elegant as Guava's immutable collections are in Java: clunky, expensive, and missing tons of useful functionality while still exposing methods which definitely do not do what you want.
On Tue, Oct 6, 2015 at 12:54 AM, martin odersky <martin....@epfl.ch> wrote:
On Tue, Oct 6, 2015 at 9:48 AM, Haoyi Li <haoy...@gmail.com> wrote:
> That's a bit unfair comparison, because it does not include Python's built-in functions, many of which are list-relatedI count:12 itertools methods,12 list-related methods in builtins (excluding constructors)9 methods on lists3 operators (+, *, [])I may have missed one or two, but that's roughly it, 36 methods.I count 203 methods in http://www.scala-lang.org/api/current/index.html#scala.collection.mutable.ArrayBuffer> I'd like to challenge that. Buffer has 130+ methods defined on it. I don't think there are 65 methods among them that have horrendous performance implications.I'm defining "horrendous" here to be "O(n) when you can do something similar in O(less than n) with mutable collections", because that's about the upper bound on how bad performance can be on a collection.Here's the rough breakdown of how I'd classify the first ~120 methods I found in the Scaladoc before my fingers got tiredRoughly, I said something is "Bad" if it takes a O(n) copy to do something that in a mutable collection you really don't want to copy (that's the whole point of using a mutable collection!), "Inherited" is all the partialfunction stuff, and "???" is stuff that IMHO really has no business being there for non-perf reasons.Bad and ??? add up to 40/120, so 1/3 instead of 1/2. Perhaps someone with stronger fingers can do a more complete analysis. That's 1/3 stuff that IMHO shouldn't be there.Going through has revealed a few things to me:- The Querying methods on sequences apply just as well to mutable and immutable sequences. Things which return a T or Option[T], there is nothing to be gained writing it yourself. We can and should share all these with the immutable collections.- The Transformation methods on mutable.Seqs really should be in-place-transforms, in contrast to those on immutable.Seqs which should return new ones. This mirrors how ++= and += are in-place appends on mutable vs new Seqs on immutable (Despite what it might seem, this fact is totally not-transparent to a user and has bitten me more than once)..
For example, I want:- in-place filter,- in-place sort,- in-place distinct,- in-place drop,- in-place dropWhile,- in-place take
Yes, these would be good to have. But maybe under different names?Otherwise we get into the non-sensical situation that filter copies on arrays (which it has to, there's no way around it!) yet destroys on ResizableArrays and ArrayBuffers. But we could use a systematic naming scheme for these. E.g.filterReplace, sortReplace, ...orfilter_!, sort_!, ...- Martin
methods. Doing things in-place because of performance is why we're using mutable buffers in the first place after all! It's annoying to have to re-implement all these in-place ops ad-hoc with while-loops all the time.map, collect and flatMap obviously cant be done in-place, but when working with mutable buffers you tend to just use foreach and do everything in the side effect for performance.> - Should ArrayBuffer support map, flatMap, filter? I believe the answer is yes, because otherwise we render for comprehensions over ArrayBuffers impossible.I don't se why for-comprehensions over ArrayBuffers are something that is a goal in of itself. When I use ArrayBuffers, it's because I *don't* want excessive copying and allocations. If I *do* want copying and allocations, I'd be willing call freeze() or something before my for-comprehension and make it explicit.> - If it supports filter, then why not also support partition and scan? They are just variants of filter, after all.I'd want an in-place filter, and maybe a semi-in-place partition. In fact I've wanted it multiple times, and now I've found out Java 8 has it! Scan is hard to say, but if I'm working with mutable.Buffer I'd probably avoid using it entirely and just rely on foreach + side effects for perf. Or If I don't care about perf I'd be willing to call .toList and copy the whole thing to get access to these methods> - What about ++? Why forbid concatenating two resizable arrays into one?> - What about zip? It's clearly useful and it clearly constructs a new collection.I've never called map, flatMap, filter, partition, scan ++ or zip on a mutable.Buffer in three years, including a lot of time spend writing fairly-high-performance stuff in Metascala/Scalatags/uPickle/etc. using mutable.Buffer all over the place.I call those methods a *lot* otherwise, when using immutable collections. But the only reason I end up using mutable.Buffers is for performance, so I end up writing manual while-loops because these methods are too expensive. I'd love if there were in-place versions I could use, but there aren't, so it's back to while loops.--------------------------------------------------------------------------------------------------------------------------------Anyway, this is all just based on my experience as one amateur person writing Scala part time for fun =D So perhaps it's not worth taking too seriously. Other people may have different experiences. It would be nice to have actual method-usage stats from the community build and from real-world code-bases, but I do not have sufficient $s/PhDs to pull that off.Without numbers, we could argue in circles forever about how "clearly useful" things are and even if people get convinced and we have consensus we're all still blindOn Mon, Oct 5, 2015 at 11:45 PM, martin odersky <martin....@epfl.ch> wrote:On Tue, Oct 6, 2015 at 1:29 AM, Aleksandar Prokopec <aleksanda...@gmail.com> wrote:
On Tuesday, 6 October 2015 00:34:14 UTC+2, Haoyi Li wrote:> Why do you say they are more ergonomic? In which ways?Here's what you get when you want the methods on a python list:A small number of operations, basically all of which are useful to everyone who wants a mutable list. And all can be understood by anyone who has ever programmed before, without reading any docs!
That's a bit unfair comparison, because it does not include Python's built-in functions, many of which are list-related:
https://docs.python.org/2/library/functions.html
That adds more functions to the table.
Plus, there's Python's itertools, which has more list-related methods:
https://docs.python.org/2/library/itertools.html
Here's what you get when you ask for the methods on a scala mutable buffer:How many of these are methods that someone can call on a mutable buffer without horrendous performance implications? Less than half. In fact, methods which look very similar (++ and ++=) have totally different semantics and performance implications.I'd like to challenge that. Buffer has 130+ methods defined on it. I don't think there are 65 methods among them that have horrendous performance implications.Let's look at it in more detail, and things get fuzzy:- we have already established that arrays should get the full complement of immutable methods.- ResizableArray is like array, so why draw the boundary there?- ResizableArray is not used a lot because most people use ArrayBuffer to get a resizable array. So maybe ArrayBuffer needs to support these immutable operations as well?In more detail:- Should ArrayBuffer support map, flatMap, filter? I believe the answer is yes, because otherwise we render for comprehensions over ArrayBuffers impossible.- If it supports filter, then why not also support partition and scan? They are just variants of filter, after all.- What about ++? Why forbid concatenating two resizable arrays into one?- What about zip? It's clearly useful and it clearly constructs a new collection.And so on. So what operations do we want to throw out? Probably the most "dubious" one are +, +:, :+, -, which copy the collection while adding or removing a single element. There are few situations where these make sense. And maybe we should indeed phase them out for mutable collections. I think, arguably, doing a + instead of += on a mutable collection counts as "exotic use" and could be deprecated. But note that these operations are even more often misused for immutable collections. xs :+ x on lists is one of the most common rookie mistakes in Scala.Cheers- Martin
A couple of remarks here:
1. There are many use-cases where one deals with small collections, comprised of several elements, where performance implications are less important.
2. There are use-cases where copying is less expensive than the actual useful work performed on the collection.
3. Non-strict map, flatMap, filter, ... could reduce or eliminate performance overhead due to copying.
4. The optimizer could fuse many of these operations, or eliminate copying, in particular those where a copying bulk operations is followed by a fold.
As an exercise, can you give me which methods on that list will result in O(n) copying of the whole collection, which is basically what you never want?
I'm not excluding that you might be right about "basically what you never want", but as you pointed out yourself - we would need numbers to make definite claims about this.
Here are a few that you don't want to ever call if you want mutable-seq performance++++:+:---collectdiffdistinctdropdropRightdropWhilefilterfilterNotflatMapgroupBygroupedHere are a few that you *do* want++=+=collectFirstappendappendAllfold{Left, Right}forallforeachcountclear()Here are a few which are just wtf:<<(cmd: Message[Int])+(other: String)addString(b: StringBuilder)On Mon, Oct 5, 2015 at 3:16 PM, Simon Ochsenreither <simon.och...@gmail.com> wrote:Given the scope, I would strongly suggest abandoning the collection redesign. It doesn't address the pain points we should worry about, and the restrictions mean it is at best yet-another temporary measure.
This feels like a great recipe for a Python 2/Python3 disaster: Too many small, subtle changes which will bite people with not enough associated benefit to encourage strong adoption.
If we really want to do something, I would probably prefer working on Josh's views, despite not being enamored about it: At least it's made clear that they are provisional band-aids.
But the scope just means that we will have the same discussion a few years down the road, and I prefer breaking people's code once, not twice.
--Martin Odersky
EPFL
--Martin Odersky
EPFL
Twitter discussions: Miles and others have stated explicitly that they
do not want a standard library. So of course they will not get
involved in improving it. I think that's a pity, it would be great to
see wider discussion about the ideas and design trade-offs.
To get the ball started I have added a link to a first strawman
proposal to the #818 issue:
https://github.com/lampepfl/dotty/issues/818
def intersectionIsEmpty[A](it1: Iterator[A], it2: Iterator[A]) =
!(it1 exists (it2 contains _))
> Parser combinators is
the same thing. It would be great to have a separate module which
improves on the current state.
If you really are interested, such a thing exists http://lihaoyi.github.io/fastparse/
It's almost drop-in except a lot better to use. Not to mention 4-10x slowdown vs hand-rolled instead of the scala-parser-combinator's 400x
Is there a compiled list of all the problems (or missing features) of the current collections? I know about a bunch of them personally or because they've been mentioned here, but I don't know what other problems people may agree on.
On Tue, Oct 6, 2015 at 3:28 AM Aleksandar Prokopec <aleksanda...@gmail.com> wrote:I wanted to bring up specialization as well - I think that it will be relevant even with the AST fusion.
People commonly use a primitive type parameter and silently incur the cost of boxing:
def addNums(b: new ArrayBuffer[Int]) {
b += 1
b += 2
b += 3
}
Since we applied specialization to Function0,1,2,..., I see no reason why collections such as array buffers would not specialize for a specific set of primitive types.
The common problem here is instantiating arrays, which typical collections (buffers, hash maps, hash tries, queues, heaps) do a lot and which specialization doesn't support in the generic context. The way some rogue collection frameworks solved this in the past is by introducing a custom ClassTag-gy type-class that knows how to instantiate arrays.
How does this relate to miniboxing?
--
Martin Odersky
EPFL
--
You received this message because you are subscribed to the Google Groups "scala-debate" group.
It would be great to see fastparse as a module in the stdlib!
Is there a compiled list of all the problems (or missing features) of the current collections? I know about a bunch of them personally or because they've been mentioned here, but I don't know what other problems people may agree on.
Thanks!
Daniel Armak
--On Wed, Oct 7, 2015 at 7:13 PM, Dmitry Petrashko <darkd...@gmail.com> wrote:
In https://github.com/lampepfl/dotty/pull/819 I've added changes that
allow to make index type be Long on JVM and Int on JavaScript.
This is done by defining a value class:
https://github.com/lampepfl/dotty/commit/3c6ed1f23e2da86a4cb870bf91931b9bfc4925ab
that has arithmetic operations on it, and implicit conversions from Int and to Long.
--
You received this message because you are subscribed to the Google Groups "scala-debate" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scala-debate...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
You received this message because you are subscribed to the Google Groups "scala-debate" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scala-debate...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
I have recently opened an issue inviting strawman designs for new
collection architectures:
https://github.com/lampepfl/dotty/issues/818
The issue lists some requirements for a design to be acceptable. Since
then there has been a discussion about several points that are all
outside the requirements outlined in #818. I think it's worthwhile to
discuss these, but do not think #818 is the proper place to do it.
That's why I would like to continue the discussion here.
- Martin
--
Martin Odersky
EPFL
--
You received this message because you are subscribed to the Google Groups "scala-debate" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scala-debate...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
On Fri, Oct 9, 2015 at 2:31 PM, Matthew Pocock
Hi Matthew,
A question: If one writes
val xs: Seq[T] = myConsList
would you then expect `xs.tail` to be still a constant time operation?
- Martin
9 okt 2015 kl. 16:07 skrev Matthew Pocock <turingate...@gmail.com>:On 9 October 2015 at 14:28, martin odersky <martin....@epfl.ch> wrote:On Fri, Oct 9, 2015 at 2:31 PM, Matthew Pocock
Hi Matthew,
A question: If one writes
val xs: Seq[T] = myConsList
would you then expect `xs.tail` to be still a constant time operation?
I would hope that a 'best effort' is made to keep this operation constant-time, but no, there's no longer a guarantee, as we've decided to throw away the knowledge that this is a cons list. We've weakened the contract by widening the type. It may have even invoked some referentially-transparent implicit conversion so that we're strictly not dealing with the same in-memory instances.
--Matthew
- Martin
Dr Matthew PocockTuring ate my hamster LTDmailto: turingate...@gmail.comIntegrative Bioinformatics Group, School of Computing Science, Newcastle Universitymailto: matthew...@ncl.ac.ukgchat: turingate...@gmail.comirc.freenode.net: drdozerskype: matthew.pococktel: (0191) 2566550mob: +447535664143
--
You received this message because you are subscribed to the Google Groups "scala-debate" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scala-debate...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
This sends a shiver down my spine: object allocation is an effect that needs to be visible, otherwise I cannot reason about the runtime resource usage of my programs. A collections library that does unprovoked allocations is rather useless to me.
To motivate why I want this kind of design, I often find myself needing tiny sets. It is efficient to implement these with List[T] and an auxiliary for insert that does a linear scan over what is already in the list.
Here's another thought: how hard would it be to make collections primitive-specialized, and backed by primitive arrays?
That may need us to take classtags here or there (we could fall back to Array[Any] if we can't find one) but seems like it would reduce the memory usage of e.g. primitive mutable.Buffers by several X, not to mention cache-coherence improvements now they don't need to do so much pointer-chasing.
That kind of perf/memory boost alone could be sufficient to justify overhauling the collections library
Part of the pressure to rework collections is coming from the android dev community
Given this, I would actually prefer to try to come up with good collection design, that is geared towards developer experience and not towards performance.
If we have good collections with operations that have clear semantics it would make optimizing them substantially easier and performance could be obtained later by general optimizations.
With Valhalla it will probably become completely impossible to close the gap between OpenJDK and Android, if Google keeps up its non-development of Android. I'm wondering whether it makes more sense to just use Scala.js for mobile these days ...
--You received this message because you are subscribed to the Google Groups "scala-debate" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scala-debate...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
I have recently opened an issue inviting strawman designs for new collection architectures: https://github.com/lampepfl/dotty/issues/818 The issue lists some requirements for a design to be acceptable. Since then there has been a discussion about several points that are all outside the requirements outlined in #818. I think it's worthwhile to discuss these, but do not think #818 is the proper place to do it. That's why I would like to continue the discussion here. - Martin
I'm almost sure that people from typelevel had produced some code
or discussion on the subject, but I can't find back them right
now.
Any other link to discussion about the subject would be
appreciated!
What I think makes designing a good collections library so hard is the diversity of features that a collection may support. Most features are of the type “can I do X efficiently/safely”. I was inspired by this discussion to catalog the different ways that one might want to traverse a collection, that is get every element and do something with it. I was surprised how well my thoughts aligned with the current design of Scala. I am by no means an expert on programming language design, but I thought I would copy ideas in here in case anyone was interested.
· Can I get an immutable sequence of the collection’s elements?
o This is allowed to take O(n) time. The collection may be infinite if the elements can be constructed as a lazy collection in substantially less than O(n).
o This operation makes no sense for infinite collections.
o Example: An actor-based array that cannot wait for user code, even user code passed through foreach, but it can give a copy of the current state.
o This is the only well-behaved operation that traverses mutable collections.
· Does the collection have a foreach?
o This is allowed to take O(k) time for processing the first k elements.*
o An eager fold and a foreach are isomorphic. I would use the name foreach because it makes it clear that this does not terminate on infinite collections.
o Despite the appearance of a relationship between elements and foreach, they actually have no hierarchy. Foreach cannot be implemented from elements because it may take O(n) to execute the first element. Elements cannot be implemented from foreach because it may never return. However, elements is the intersection between finite and foreach.
o Example: A mutable collection that can’t wait for user code, but can freeze its elements in O(1) time, can freeze and then run foreach of the frozen elements. This collection can even be infinite.
o If the function changes the shape of the collection (number/location of elements), this operation is not well-behaved. If an effect system is in place, an eager fold that took only pure functions would be well-behaved.
· Does the collection have a lazy fold?
o This is allowed O(n) time.
o Obviously, this requires cooperation from the function doing the folding to actually be lazy.
o This is a subtype of foreach. We can implement foreach using lazy fold, but not vice versa because it may not be safe to pause folding the collection (e.g. the collection is mutable).
o If we allowed O(k) time for the first k elements, this would be identical to head/tail, but a collection where finding the first element is not cheap would be excluded.
o Example: A rotated sorted array, which can be traversed in O(n) time, but takes O(n) to find the first element.
· Can I get each next element one at a time so that I can iterate on my own terms?
o Each call to next is allowed O(1) time**
o Example: A stream from a TPM will continue to produce elements, but cannot store or expose the state necessary to implement tail.
o Example: A truly random stream, which has to state to set.
· When getting the head element, can I also keep the tail state so that I can iterate the collection as many times as I like?
o Each call to head or tail is allowed O(1) time**
o This is a subtype of next because it is trivial to implement next from head.
· Can I get a random element?
o This is allowed O(1) time**
o I suppose this is also well-behaved in the face of mutable collections, because it puts all the responsibility for getting the next element on the user. If the user shrinks the collection, it is on him if he tries to get an element outside the bounds.
* There are many intricacies with the time constraint on foreach. For finite collections, we would expect this to take O(n) to complete. But what about infinite collections? That would leave no constraint on how long it takes to produce a value. Do we say it takes O(k) to process the first k entries? Then the Scala Vector wouldn’t be allowed because it gives the first element in O(log n) time. Should a dense bit set be allowed to have a foreach? It cannot produce elements in O(n) time. If not, then either Set cannot have an O(n) foreach or a dense bit set cannot be a Set. Perhaps a dense bit set should be an InSet only, not an ExSet. A dense bit set produces elements in O(capacity), so perhaps what exactly n refers to needs to be relaxed. Maybe this operation is just so abstract, it shouldn’t have a time constraint at all!
** There is some weird stuff to deal with regarding the time constraint of single elements. Lots of immutable collections use tree structures to provide efficient access and update. But here efficient means O(log n), which would put it out of the running for operations expected to take O(1). But most interestingly, it has been shown empirically that memory access is not O(1) anyway, but more like O(sqrt n), which means that everything less than O(sqrt n), like O(log n), is pretty much the same.
*** Because we are dealing with traits, everywhere it says subtype could instead be a conversion to an implicit class that provides the subtype functionality.
David Hagen
I have recently opened an issue inviting strawman designs for new
collection architectures:
https://github.com/lampepfl/dotty/issues/818
The issue lists some requirements for a design to be acceptable. Since
then there has been a discussion about several points that are all
outside the requirements outlined in #818. I think it's worthwhile to
discuss these, but do not think #818 is the proper place to do it.
That's why I would like to continue the discussion here.
- Martin
--
Martin Odersky
EPFL
Yes, an equality typeclass is very sorely missed. However as has been discussed, there is an impedance mismatch between properly typed equality and JVM equality, which makes coming up with a satisfactory solution that doesn't break anything very hard.
--