--
guava-...@googlegroups.com.
http://groups.google.com/group/guava-discuss?hl=en
unsubscribe: guava-discus...@googlegroups.com
This list is for discussion; for help, post to Stack Overflow instead:
http://stackoverflow.com/questions/ask
Use the tag "guava".
What I would like to propose (except complaining) is to add non-live
view map, filter, fold methods (different names?) so that:
- developers are more aware that there is a difference between
"original" map, filter and fold and Collections2.filter and transform,
- one may "do real" functional programming with Guava.
The final questions is - should this be part of Guava or not?
* - I use the term mutable first time in my life describing function
not class, but I think this fits very well here. Resulting collection
may change over time when one modifies input collection.
I may be spectacularly missing the point... but isn't it trivial to simulate immutable transform/filter by:
return Immutable{List,Set,...}.copyOf(Iterables.{transform,filter}(theInput, theFunctionOrPredicate));
I may be spectacularly missing the point... but isn't it trivial to simulate immutable transform/filter by:return Immutable{List,Set,...}.copyOf(Iterables.{transform,filter}(theInput, theFunctionOrPredicate));There are persistent data structures that support, for example, adding an entry to a map in O(log n) time, and leaving the original version of the map unchanged while returning a new version of the map with the added entry. This cannot be done with the existing Guava features.
While I'm a staunch advocate of functional programming (Haskell is my single favorite language), I don't think Guava is the right place to have these features.
I do think that these sorts of features could be built on top of Guava -- leveraging the ImmutableCollection hierarchy --
but Guava isn't meant to include features that will only be useful to a small subset of programmers, and much as I hate to admit it, functional programming style isn't a common use case in Java.
I think that after Java 7 comes around, and closures and such become available (?), it might be a different story. But until then, Java is a spectacularly awkward language to do functional-style programming in, and a package like Guava shouldn't bend over backwards to supply those features. (And I have to mention, additionally, that if you use your collections like a functional programmer -- never modifying them -- then Lists.filter and Collections.transform do, indeed, work like their functional equivalents.)
While I agree that doing FP in Java quickly becomes an awkward affair,
I still feel there is a place for some persistent data structures.
Just having a persistent map, set and list would be useful, and should
definitely be less disappointing than not having any.
> I believe that a very common motivation for turning to persistent data
> structures is to avoid having to worry about concurrent access to data:
> Immutable objects are automatically thread-safe, so you can pass them around
> freely between threads, even if they represent very large data structures.
Keeping the complexity of doing concurrent programming to a minimum is
for me indeed the primary reason for wanting to use persistent data
structures in Java.
> The problem with that reasoning is that you still might have to communicate
> new states to other threads, and that communication, safe as it is, can drag
> the whole multiprocessor down.
I don't really understand this yet :(
> It's far more natural and efficient in Java to decompose a mutable data
> structure and ensure that only one thread has access to any given element of
> the decomposition at one time. Doug Lea's Fork-Join framework (available
> now!) is ideal for expressing this kind of computation.
I've found it hard to squeeze it into existing code, but it is most
helpful when starting from scratch.
--
Johan Van den Neste
> I believe that a very common motivation for turning to persistent data> structures is to avoid having to worry about concurrent access to data:Keeping the complexity of doing concurrent programming to a minimum is for me indeed the primary reason for wanting to use persistent data structures in Java.
> Immutable objects are automatically thread-safe, so you can pass them around
> freely between threads, even if they represent very large data structures.
I don't really understand this yet :(
> The problem with that reasoning is that you still might have to communicate
> new states to other threads, and that communication, safe as it is, can drag
> the whole multiprocessor down.
> It's far more natural and efficient in Java to decompose a mutable dataI've found it hard to squeeze it into existing code, but it is most helpful when starting from scratch.
> structure and ensure that only one thread has access to any given element of
> the decomposition at one time. Doug Lea's Fork-Join framework (available
> now!) is ideal for expressing this kind of computation.
Thanks for the explanation. I can see how this would be a serious
problem if one were to use persistent data structures exclusively.
It's hard for me to estimate how much of a problem this would be in
specific use cases.
To be investigated!
But persistent collections are useful even without FP. For solving
CSP/optimizations problems by a divide-and-conquer style algorithm you
may use either use backtracking or a set (or priority queue) of still to
be explored states (where each state is a map of committed variable
assignments). Using the latter is much better for concurrency, much
easier to do, and more flexible (you may use heuristics for choosing the
most promising state first). It can be trivially implemented using a
persistent map like follows
StateMap s = queue.poll();
Variable v = chooseBranchVariable();
StateMap s0 = evaluate(s.with(v, false));
StateMap s1 = evaluate(s.with(v, true));
if (isAlive(s0)) queue.add(s0);
if (isAlive(s1)) queue.add(s1);
In the above s.with(v, false) means to create a map just like s with the
additional assignment of false to v. Doing this with non-persistent maps
is also possible, but it can be prohibitively time-consuming.
--
- P2 may actually need only a small portion of s1. This is quite
probable, especially in case of large data structures. The same may be
true for P1 and s0.
- With persistent structures most of the data tends to be shared between
s0 and s1.
- There are much better operations possibly available instead of cache
flush. For example, one core of an AMD Phenom CPU may get the dirty data
directly from other core's cache (using the MOESI protocol). This is
faster then simple memory access and leave the memory port free for
other work (I don't know how current INTEL CPUs works in the respect).
Of course, the situation for a machine employing 64k cores may be very
different (but this is not what most use).