Non-live views: should Guava support Functional Programming's map, filter and fold?

353 views
Skip to first unread message

koppernickus

unread,
Dec 28, 2010, 7:52:21 AM12/28/10
to guava-discuss
In short my question is: should Guava support "really" functional
programming or it is out of its scope?

I realize that the following may sound strange, but are there any
plans to create functions map, filter and fold resulting in non-live
views (completely new, immutable collections)?

The user story is...

As a Java developer I would like to use Guava for functional
programming.

The (current) problem is...

Functional programming style is useless without three major higher-
order functions: map, filter, fold. Fortunately Guava has Collections2
(and similar) with its transform and filter. The problem is that they
are not what we know from functional languages like OCaml, Scala or
even Ruby, Python or Groovy - they are MUTABLE*!

I don't say it's bad, especially that documentation is quite clear -
they return "live views" of input collection. Please don't get me
wrong - live views seems to be great feature, but:

1. Names filter and transform may mislead developers already familiar
with functional programming (developers tend to first use and then
read the documentation - well, maybe only me and my colleagues
preferring self-documenting code?).
2. Currently Guava doesn't support (immutable) functions map, filter,
fold - which makes it hard to use (I would even say useless) with
functional programming style - see for example Lists.filter and
Sets.transform. You can't? There are no such methods? Well... seems
this is a consequence of live-view feature :-)

What I would like to propose (except complaining) is to add non-live
view map, filter, fold methods (different names?) so that:
- developers are more aware that there is a difference between
"original" map, filter and fold and Collections2.filter and transform,
- one may "do real" functional programming with Guava.

The final questions is - should this be part of Guava or not?

* - I use the term mutable first time in my life describing function
not class, but I think this fits very well here. Resulting collection
may change over time when one modifies input collection.

koppernickus

unread,
Dec 28, 2010, 6:01:52 AM12/28/10
to guava-discuss
In short my question is: should Guava support functional programming
or it is out of it's scope?

I realize that the following may sound strange, but are there any
plans to create functions transform, filter and fold resulting in non-
live views (completely new, immutable collections)?

The user story is...

As a Java developer I would like to use Guava for functional
programming.

The (current) problem is...

Functional programming style is useless without three major higher-
order functions: map, filter, fold. Fortunately Guava has Collections2
(and similar) with its transform and filter. The problem is that they
are not what we know from functional languages like OCaml, Scala or
even Ruby, Python or Groovy - they are MUTABLE*!

I don't say it's bad, especially that documentation is quite clear -
they return "live views" of input collection. Please don't get me
wrong - live views seems to be great feature, but for some

1. Names filter and transform may mislead developers already familiar
with functional programming (developers tend to first use and then
read the documentation - well, maybe only me and my colleagues
preferring self-documenting code?).
2. Currently Guava doesn't support (immutable) functions map, filter,
fold - which makes it hard to use (I would even say useless) with
functional programming style - see for example Lists.filter and
Sets.transform. You can't? There are no such methods? Well... seems
this is a consequence of live-view feature :-)


Louis Wasserman

unread,
Dec 30, 2010, 11:47:19 AM12/30/10
to koppernickus, guava-discuss
While I'm a staunch advocate of functional programming (Haskell is my single favorite language), I don't think Guava is the right place to have these features.  I do think that these sorts of features could be built on top of Guava -- leveraging the ImmutableCollection hierarchy -- but Guava isn't meant to include features that will only be useful to a small subset of programmers, and much as I hate to admit it, functional programming style isn't a common use case in Java.  

I think that after Java 7 comes around, and closures and such become available (?), it might be a different story.  But until then, Java is a spectacularly awkward language to do functional-style programming in, and a package like Guava shouldn't bend over backwards to supply those features.  (And I have to mention, additionally, that if you use your collections like a functional programmer -- never modifying them -- then Lists.filter and Collections.transform do, indeed, work like their functional equivalents.)
--
guava-...@googlegroups.com.
http://groups.google.com/group/guava-discuss?hl=en
unsubscribe: guava-discus...@googlegroups.com

This list is for discussion; for help, post to Stack Overflow instead:
http://stackoverflow.com/questions/ask
Use the tag "guava".

Sam Berlin

unread,
Dec 30, 2010, 11:55:52 AM12/30/10
to koppernickus, guava-discuss
I may be spectacularly missing the point... but isn't it trivial to simulate immutable transform/filter by:
  return Immutable{List,Set,...}.copyOf(Iterables.{transform,filter}(theInput, theFunctionOrPredicate));
?

sam
 

What I would like to propose (except complaining) is to add non-live
view map, filter, fold methods (different names?) so that:
- developers are more aware that there is a difference between
"original" map, filter and fold and Collections2.filter and transform,
- one may "do real" functional programming with Guava.

The final questions is - should this be part of Guava or not?

* - I use the term mutable first time in my life describing function
not class, but I think this fits very well here. Resulting collection
may change over time when one modifies input collection.

Louis Wasserman

unread,
Dec 30, 2010, 12:03:22 PM12/30/10
to Sam Berlin, koppernickus, guava-discuss
I may be spectacularly missing the point... but isn't it trivial to simulate immutable transform/filter by:
  return Immutable{List,Set,...}.copyOf(Iterables.{transform,filter}(theInput, theFunctionOrPredicate));

There are persistent data structures that support, for example, adding an entry to a map in O(log n) time, and leaving the original version of the map unchanged while returning a new version of the map with the added entry.  This cannot be done with the existing Guava features.

Sam Berlin

unread,
Dec 30, 2010, 12:20:08 PM12/30/10
to Louis Wasserman, koppernickus, guava-discuss
On Thu, Dec 30, 2010 at 12:03 PM, Louis Wasserman <wasserm...@gmail.com> wrote:
I may be spectacularly missing the point... but isn't it trivial to simulate immutable transform/filter by:
  return Immutable{List,Set,...}.copyOf(Iterables.{transform,filter}(theInput, theFunctionOrPredicate));

There are persistent data structures that support, for example, adding an entry to a map in O(log n) time, and leaving the original version of the map unchanged while returning a new version of the map with the added entry.  This cannot be done with the existing Guava features.

Ahh, Yes.  For those cases, I agree that a custom datatype is more warranted (as opposed to adding support into Guava's Immutable{X}), primarily because supporting such types would likely mean the internals of the Immutable classes would have to change to support the different algorithm.

sam

koppernickus

unread,
Dec 30, 2010, 1:11:01 PM12/30/10
to guava-discuss
Yes, it is easily possible. But the syntax then becomes terrible.
Using copyOf every time is unacceptable. I think everyone wanting to
use Guava this way will eventually create "shortcut" functions like
this:

MyImmutableList.transform(List l) {
copyOf(Iterables.transform(l));
}

If support for Functional Programming is in scope of Guava I would
expect the above be part of Guava instead "everyone doing it by
themselves" + have in Guava bunch of "helper" "immutable" functions
working with immutable collections.

There is also one thing that is "nice to have". Better data structures
like in Closure, that do not copy full collection every time.

On Dec 30, 5:55 pm, Sam Berlin <sber...@gmail.com> wrote:
> > unsubscribe: guava-discus...@googlegroups.com<guava-discuss%2Bunsu...@googlegroups.com>

koppernickus

unread,
Dec 30, 2010, 1:27:16 PM12/30/10
to guava-discuss
I am recently "playing with" closures code folding in IntelliJ and
Eclipse. At the beginning I thought this is stupid idea, but
surprisingly in practice this works well. It seems I don't need Java
7(8) closures to program functional way in Java. I am possibly missing
something. I guess that there are cases when anonymous classes are not
enough, but so far so good... :-)

Of course I realize that I am probably in the group of 0,01% Java
developers using Java this way, but I won't be surprised if this will
change. The code written using transform/filter + code folding looks
so much better than for loop. This is only one use case, but very
common and IMO important one.

Summarizing... I bet that functional-style programming will start in
Java world before Java 8 closures will come (if ever) and I bet this
will be thanks to code folding even if this covers only some cases. Of
course I may be terribly wrong. Right now I am experimenting with
existing collection frameworks that would help me and others trying to
use functional programming with anonymous classes with code folding.

Right now Guava is for me "OK" choice but not "great" one.

Regards,
kopper...@gmail.com

On Dec 30, 5:47 pm, Louis Wasserman <wasserman.lo...@gmail.com> wrote:
> While I'm a staunch advocate of functional programming (Haskell is my single
> favorite language), I don't think Guava is the right place to have these
> features.  I do think that these sorts of features could be built on top of
> Guava -- leveraging the ImmutableCollection hierarchy -- but Guava isn't
> meant to include features that will only be useful to a small subset of
> programmers, and much as I hate to admit it, functional programming style
> isn't a common use case in Java.
>
> I think that after Java 7 comes around, and closures and such become
> available (?), it might be a different story.  But until then, Java is a
> spectacularly awkward language to do functional-style programming in, and a
> package like Guava shouldn't bend over backwards to supply those features.
>  (And I have to mention, additionally, that if you use your collections like
> a functional programmer -- never modifying them -- then Lists.filter and
> Collections.transform do, indeed, work like their functional equivalents.)
>
> Louis Wasserman
> wasserman.lo...@gmail.comhttp://profiles.google.com/wasserman.louis
> > unsubscribe: guava-discus...@googlegroups.com<guava-discuss%2Bunsu...@googlegroups.com>

Tim Peierls

unread,
Dec 30, 2010, 3:32:00 PM12/30/10
to Louis Wasserman, koppernickus, guava-discuss
On Thu, Dec 30, 2010 at 11:47 AM, Louis Wasserman <wasserm...@gmail.com> wrote:
While I'm a staunch advocate of functional programming (Haskell is my single favorite language), I don't think Guava is the right place to have these features.

I agree that Guava should not stretch to support functional programming or persistent data structures. It would be hard to find agreement on what exactly those terms mean in a Java setting, so any stated attempt to include them would be bound to disappoint whole communities.

A few years ago I wrote a Java library to let me express Okasaki's Haskell data structures more-or-less directly in Java, including batched queues, real time queues, skew binary random access lists, and catenable lists. It worked, but it was a disappointment: The resulting data structures are awkward to use in Java, and they don't perform well. The need to return the new state of the data structure after each "mutative" operation makes for very unnatural programs. The need (in some cases) to use lazy evaluation requires the use of a type to represent a suspended value, which adds a layer that hinders performance and readability. (In contrast, FP languages have these concepts built in from the start.)

In each case, it would have been easier and more efficient to write a purpose-built data structure in natural Java than to use my slavish imitation of Haskell in Java. You might look at my library and say that it could have been done better -- and you'd be right -- but the fact remains that writing to a language's strengths yields better programs.

Those who love FP but can't let go of the richness of the Java libraries should really consider Scala. It's not for me, but I don't see a downside for the serious FP zealot.

 
I do think that these sorts of features could be built on top of Guava -- leveraging the ImmutableCollection hierarchy --

Maybe, but in that case the resulting data structure types should not be mixed up with the ImmutableXxx types. The latter would be implementation details and the former would be the higher-level API. I don't think ImmutableXxx types should be dragged into service themselves as FP types. It's not a good fit.

 
but Guava isn't meant to include features that will only be useful to a small subset of programmers, and much as I hate to admit it, functional programming style isn't a common use case in Java.  

What I learned from my experience was that the ideas behind various persistent data structures are powerful and useful to all programmers; they don't have to be expressed in FP terms when you're writing in a procedural language. 
 
 
I think that after Java 7 comes around, and closures and such become available (?), it might be a different story.  But until then, Java is a spectacularly awkward language to do functional-style programming in, and a package like Guava shouldn't bend over backwards to supply those features.  (And I have to mention, additionally, that if you use your collections like a functional programmer -- never modifying them -- then Lists.filter and Collections.transform do, indeed, work like their functional equivalents.)

I believe that a very common motivation for turning to persistent data structures is to avoid having to worry about concurrent access to data: Immutable objects are automatically thread-safe, so you can pass them around freely between threads, even if they represent very large data structures. The problem with that reasoning is that you still might have to communicate new states to other threads, and that communication, safe as it is, can drag the whole multiprocessor down.

It's far more natural and efficient in Java to decompose a mutable data structure and ensure that only one thread has access to any given element of the decomposition at one time. Doug Lea's Fork-Join framework (available now!) is ideal for expressing this kind of computation.

--tim

Johan Van den Neste

unread,
Dec 31, 2010, 6:35:42 AM12/31/10
to Tim Peierls, Louis Wasserman, koppernickus, guava-discuss
>> While I'm a staunch advocate of functional programming (Haskell is my
>> single favorite language), I don't think Guava is the right place to have
>> these features.
>
> I agree that Guava should not stretch to support functional programming or
> persistent data structures. It would be hard to find agreement on what
> exactly those terms mean in a Java setting, so any stated attempt to include
> them would be bound to disappoint whole communities.

While I agree that doing FP in Java quickly becomes an awkward affair,
I still feel there is a place for some persistent data structures.
Just having a persistent map, set and list would be useful, and should
definitely be less disappointing than not having any.

> I believe that a very common motivation for turning to persistent data
> structures is to avoid having to worry about concurrent access to data:
> Immutable objects are automatically thread-safe, so you can pass them around
> freely between threads, even if they represent very large data structures.

Keeping the complexity of doing concurrent programming to a minimum is
for me indeed the primary reason for wanting to use persistent data
structures in Java.

> The problem with that reasoning is that you still might have to communicate
> new states to other threads, and that communication, safe as it is, can drag
> the whole multiprocessor down.

I don't really understand this yet :(

> It's far more natural and efficient in Java to decompose a mutable data
> structure and ensure that only one thread has access to any given element of
> the decomposition at one time. Doug Lea's Fork-Join framework (available
> now!) is ideal for expressing this kind of computation.

I've found it hard to squeeze it into existing code, but it is most
helpful when starting from scratch.

--
Johan Van den Neste

Tim Peierls

unread,
Dec 31, 2010, 11:47:21 AM12/31/10
to Johan Van den Neste, Louis Wasserman, koppernickus, guava-discuss
On Fri, Dec 31, 2010 at 6:35 AM, Johan Van den Neste <jvdn...@gmail.com> wrote:
> I believe that a very common motivation for turning to persistent data
> structures is to avoid having to worry about concurrent access to data:
> Immutable objects are automatically thread-safe, so you can pass them around
> freely between threads, even if they represent very large data structures.

Keeping the complexity of doing concurrent programming to a minimum is for me indeed the primary reason for wanting to use persistent data structures in Java.

> The problem with that reasoning is that you still might have to communicate
> new states to other threads, and that communication, safe as it is, can drag
> the whole multiprocessor down.

I don't really understand this yet :(

The Java Memory Model gives the programmer the pleasant illusion of working with a single flat memory space, with the only apparent trade-off being the need to abide by certain rules. Immutability is a conceptually simple way to abide by those rules, but the pleasant illusion is still just an illusion: In reality, communicating a newly computed piece of data, immutable or not, from one thread to another might require an expensive cache flush (or its equivalent). For example:

  final ImmutableState s0 = computeInitialState();
  Future<ImmutableState> f = exec.submit(new Callable<ImmutableState>() {
      public ImmutableState call() { return computeNewState(s0); }
  });
  // later in same thread:
  ImmutableState s1 = f.get(); 

If f.get() is called on processor P1 and computeNewState runs on processor P2 (for a typical SMP machine), the value stored in s0 will need to be passed from P1's local memory to P2's local memory, and the value to be stored in s1 will need to be passed from P2 back to P1. That communication can be very expensive.

Bottom line: Concurrent algorithms that use persistent data structures without taking into account communication costs run the risk of performance problems on multiprocessor architectures.


> It's far more natural and efficient in Java to decompose a mutable data
> structure and ensure that only one thread has access to any given element of
> the decomposition at one time. Doug Lea's Fork-Join framework (available
> now!) is ideal for expressing this kind of computation.

I've found it hard to squeeze it into existing code, but it is most helpful when starting from scratch.

Existing algorithms with an implicit sequential model are vulnerable to the communication issues above. To take full advantage of multiprocessors, it might be sensible to start from scratch. 

Fork-Join can replace leaves of computation without disturbing the rest of an existing program. The rest of your program doesn't care whether you sort an array sequentially with Arrays.sort() or in parallel using ParallelArray (built on FJ).

--tim

Johan Van den Neste

unread,
Dec 31, 2010, 3:10:07 PM12/31/10
to Tim Peierls, Louis Wasserman, koppernickus, guava-discuss
> The Java Memory Model gives the programmer the pleasant illusion of working
> with a single flat memory space, with the only apparent trade-off being the
> need to abide by certain rules. Immutability is a conceptually simple way to
> abide by those rules, but the pleasant illusion is still just an illusion:
> In reality, communicating a newly computed piece of data, immutable or not,
> from one thread to another might require an expensive cache flush (or its
> equivalent). For example:
>   final ImmutableState s0 = computeInitialState();
>   Future<ImmutableState> f = exec.submit(new Callable<ImmutableState>() {
>       public ImmutableState call() { return computeNewState(s0); }
>   });
>   // later in same thread:
>   ImmutableState s1 = f.get();
> If f.get() is called on processor P1 and computeNewState runs on processor
> P2 (for a typical SMP machine), the value stored in s0 will need to be
> passed from P1's local memory to P2's local memory, and the value to be
> stored in s1 will need to be passed from P2 back to P1. That communication
> can be very expensive.
> Bottom line: Concurrent algorithms that use persistent data structures
> without taking into account communication costs run the risk of performance
> problems on multiprocessor architectures.

Thanks for the explanation. I can see how this would be a serious
problem if one were to use persistent data structures exclusively.
It's hard for me to estimate how much of a problem this would be in
specific use cases.

To be investigated!

Maaartin-1

unread,
Dec 31, 2010, 7:57:23 AM12/31/10
to guava-...@googlegroups.com
On 10-12-30 17:47, Louis Wasserman wrote:
> I think that after Java 7 comes around, and closures and such become
> available (?), it might be a different story. But until then, Java is a
> spectacularly awkward language to do functional-style programming in,
> and a package like Guava shouldn't bend over backwards to supply those
> features.

But persistent collections are useful even without FP. For solving
CSP/optimizations problems by a divide-and-conquer style algorithm you
may use either use backtracking or a set (or priority queue) of still to
be explored states (where each state is a map of committed variable
assignments). Using the latter is much better for concurrency, much
easier to do, and more flexible (you may use heuristics for choosing the
most promising state first). It can be trivially implemented using a
persistent map like follows

StateMap s = queue.poll();
Variable v = chooseBranchVariable();
StateMap s0 = evaluate(s.with(v, false));
StateMap s1 = evaluate(s.with(v, true));
if (isAlive(s0)) queue.add(s0);
if (isAlive(s1)) queue.add(s1);

In the above s.with(v, false) means to create a map just like s with the
additional assignment of false to v. Doing this with non-persistent maps
is also possible, but it can be prohibitively time-consuming.

Louis Wasserman

unread,
Jan 2, 2011, 3:46:45 PM1/2/11
to Maaartin-1, guava-...@googlegroups.com
While it's true that persistent map implementations make backtracking much easier, it should be pointed out that in many situations, if your solution is backtracking, you're Doing It Wrong.  (And the concurrency benefits come with communication costs.)

Mind, as a Haskell fan, I totally know exactly what you mean, and backtracking solutions *are* far easier in Haskell precisely because of the persistent data structures.  (I'll point out that the use of monads plays at least as great a role in making it easier to write backtracking solutions, but not only is Java a terrible language to add monads to, but monads aren't the sort of thing that fit into most Java programmers' brains...'cause it took me a *long* time to grok those.)

That aside aside (lol), I still don't think Java's very well built for functional data structures, just because of the awkwardness of Pairs...and I absolutely don't think Guava's the right place for them.

--

Maaartin-1

unread,
Jan 2, 2011, 6:20:16 PM1/2/11
to guava-...@googlegroups.com
On 10-12-31 17:47, Tim Peierls wrote:
> > The Java Memory Model gives the programmer the pleasant illusion of
> > working with a single flat memory space, with the only apparent
> > trade-off being the need to abide by certain rules. Immutability is a
> > conceptually simple way to abide by those rules, but the pleasant
> > illusion is still just an illusion: In reality, communicating a newly
> > computed piece of data, immutable or not, from one thread to another
> > might require an expensive cache flush (or its equivalent). For example:
> >
> > final ImmutableState s0 = computeInitialState();
> > Future<ImmutableState> f = exec.submit(new
Callable<ImmutableState>() {
> > public ImmutableState call() { return computeNewState(s0); }
> > });
> > // later in same thread:
> > ImmutableState s1 = f.get();
> >
> > If f.get() is called on processor P1 and computeNewState runs on
> > processor P2 (for a typical SMP machine), the value stored in s0 will
> > need to be passed from P1's local memory to P2's local memory, and the
> > value to be stored in s1 will need to be passed from P2 back to P1. That
> > communication can be very expensive.
This is a very good point, but let me disagree a bit. The communication
may be way cheaper than transferring all the data to and from main
memory for the following reasons:

- P2 may actually need only a small portion of s1. This is quite
probable, especially in case of large data structures. The same may be
true for P1 and s0.

- With persistent structures most of the data tends to be shared between
s0 and s1.

- There are much better operations possibly available instead of cache
flush. For example, one core of an AMD Phenom CPU may get the dirty data
directly from other core's cache (using the MOESI protocol). This is
faster then simple memory access and leave the memory port free for
other work (I don't know how current INTEL CPUs works in the respect).
Of course, the situation for a machine employing 64k cores may be very
different (but this is not what most use).

Dimitris Andreou

unread,
Jan 2, 2011, 6:56:54 PM1/2/11
to Maaartin-1, guava-...@googlegroups.com
This echoes my sentiment expressed here too: http://groups.google.com/group/guava-discuss/msg/595bb076894c73ca (the last paragraph). Combined with fork/join, using a good such structure (and similarly a good stack/rope) would just be sweet for graph traversals (aka backtracking). I.e., to do a node visit/expansion, fork a job by giving it a new version of the persistent state of its parent node. Regarding the communication cost, it's fork/join's job itself to minimize it (by threads handling their local forked tasks most of the time), so this would be a non-issue.
Reply all
Reply to author
Forward
0 new messages