FP meaning of "list"

66 views
Skip to first unread message

Dave Aronson

unread,
Jun 16, 2016, 1:44:22 PM6/16/16
to elixir-l...@googlegroups.com
On Thu, Jun 16, 2016 at 12:59 PM, Greg Vaughn <gva...@gmail.com> wrote:

> In functional programming a "list" is a specific data structure,
> much more constrained than the vernacular use of the word "list".
> It inherently has this O(n) behavior for appending.

Can you please enlighten me on this, both the FP definition of a list,
and why (unless the definition makes it obvious) that makes it
inherently O(n) to append? I tried Googling it but came up with very
few things that looked relevant, among the lists of FP languages. :-)
I will now proceed to guess:

From what I can tell (without peeking under the hood) Erlang's (and
thus Elixir's) lists seem to be implemented in recursive terms, as
either an empty list, or an element prepended onto a list (which may
be empty)... and the system keeps track of *each* of those lists
individually. This makes sense as why, as I've seen in some
tutorials, [1, 2, 3] is simply shorthand for [1 | [2 | [3 | []]]].
Both are 1, prepended onto the list that consists of (2, prepended
onto the lists that consists of (3 prepended onto a list that consists
of (empty))). So, to append 4, it would need to reach down through
the layers (that's what makes it O(n)), to get to the empty list, and
replace that with 4 prepended onto a (new) list that consists of
(empty). Or rather, since things are immutable, create a new empty
list, prepend 4 to that, prepend 3 to *that* new list, and so on...
which would be O(n) even if the actual appending were instantaneous.
Whether LISP works similarly, I frankly forget, having barely touched
it since school, more years ago than I care to admit. ;-)

Anyway, is that at least close to why? It seems to me that the
"keeping track of each layer" part is an implementation detail, and
the immutability is... well, necessary for *pure* FP as I grok it
(which admittedly isn't very well, having done mostly imperative or
OO), but not necessarily all practical FP. (...he says, realizing
he's probably opening a very big can of religiously flaming worms....)

--
Dave Aronson, consulting software developer of Codosaur.us,
PullRequestRoulette.com, Blog.Codosaur.us, and Dare2XL.com.

Peter Hamilton

unread,
Jun 16, 2016, 1:54:24 PM6/16/16
to elixir-l...@googlegroups.com
Lists are a singly-linked list of immutable nodes.

Let's look at an example:

A -> B -> C -> D

That's a list of 4 nodes. A points to B points to C points to D points to nothing.

If we wanted to prepend Z to the list, we just create a node Z that points to A. Notice that A/B/C/D don't need any changes.

If we wanted to append Z it's a very different story.

D now needs to point to Z. Since D is immutable, we create D' that points to Z.
C now needs to point to D'. We create C' to point to D'.
B now needs to point to C'. We create B' to point to C'.
A now needs to point to B'. We create A' to point to B'.

Does that make sense?

--
You received this message because you are subscribed to the Google Groups "elixir-lang-talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elixir-lang-ta...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elixir-lang-talk/CAHxKQij_qCAhY-Eg7TXmgk9yLzbwTqQjT%3DaaxWr9amMA3mWgaQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

max s

unread,
Jun 16, 2016, 2:29:24 PM6/16/16
to elixir-l...@googlegroups.com
Hi Dave,

When dealing with enumerables you have basically the choice between arrays and linked lists (well there are more kinds like comprehension/streams/etc... but it's out of scope). Building a linked list is super fast (prepending is O(1) so building a list of n elements is O(n)). Building a list with arrays, if you embrace immutability (you should!), is O(n) as you need to copy the entire array each time ; therefore it's quadratic complexity when building a list of n elements.

Moreover they are easy to write and fit the recursivity (mapping, folding) design which is held very dear in functional programmers heart :)
If you need to constantly append things at the end of your linked list, you might to ask yourself the question : "but do I need to prepend too or is appending the only thing I want to do?". If yes, just switch the datastructure to a queue instead. 

The only reason you would want to handle arrays would be if you need random access data (maps). Otherwise, it's better to stick with linked lists. That is why functional programming language gives you both datastructures ready to use (linked lists and maps).

Max

Greg Vaughn

unread,
Jun 16, 2016, 2:50:11 PM6/16/16
to elixir-l...@googlegroups.com
Yep, you got it. If I were to be pedantic (and even then I'll probably miss something) I might call it a: Persistent Immutable Singly Linked List of Cons Cells

Persistent is how we can make immutability practical. Data structures when "updated" to a new copy, still share as much memory as possible with the old one. For example:

a = [1, 2, 3]
b = [0 | a]

The 1, 2 and 3 elements in memory are shared with both a and b. b simply has a new cons cell at the head of the list with a pointer to the 1 cell. a retains its pointer to the 1 cell and has no knowledge of the 0 cell. If instead you did:

a = [1, 2, 3]
b = a ++ [4]

b has to receive a new copy of the 1, 2 and 3 cons cells, plus the new 4 cell. Otherwise, a could traverse its list and find the 4 element at the end, violating immutability.

Similar types of things are done with maps using a lower level data structure called a "trie".

The clojure community has done some good writeups lately of persistent in-memory data structures, plus there are wikipedia pages if you want to search for more details.

-Greg Vaughn

PS. On the FP "can of worms" comment -- yes I can imagine a strictly defined FP to be about manipulating mutable data structures with higher order functions. However, you gain referential transparency with immutable data structures which are a foundation for concurrent benefits of FP, therefore practically when someone says "FP" they typically imply immutability as well as functions as first class.

Dave Aronson

unread,
Jun 16, 2016, 3:03:36 PM6/16/16
to elixir-l...@googlegroups.com
On Thu, Jun 16, 2016 at 1:54 PM, Peter Hamilton
<petergh...@gmail.com> wrote:

> If we wanted to append Z it's a very different story.
>
> D now needs to point to Z. Since D is immutable, we create D' that points to
> Z.
> C now needs to point to D'. We create C' to point to D'.

And so on. Okay, I think I get it now: it's the immutability not of
the list itself (which would also necessitate a copying even if
appending were fast), but of the *nodes*, that makes appending slow,
because of the mutation needed on the former end propagating all the
way back to the head.

This reminds me very much of some of the times I've designed linked
lists in C, specifically the pat of deciding whether the list-node
should contain the data item (e.g., have all the struct fields), or
merely point to it (i.e., have only a pointer to the struct or
whatever is being listed). Going with the "contain" decision,
considering the entire node to be the "element", and making *that*
immutable, seems to be what leads to this inefficiency. If there is
some Erlang/Elixir equivalent of having the list-node separate from
the datum, and considering the list-node simply a hidden
implementation detail that could therefore be mutable, then maybe we
could get efficient appending. If I grok in fullness, addressing this
in the definitions of Elixir would hinge on mutability in Erlang, so
that would be another layer we'd need to address. It's turtles all
the way down. :-)

I'm tempted to try to cobble something together with structs. The
pattern matching on function args would probably be rather challenging
though....

Thanks!

Brad O'Hearne

unread,
Jun 16, 2016, 3:04:21 PM6/16/16
to elixir-lang-talk
This is a good thread, with a good question and responses -- this will be helpful to those coming after whose searches will turn up this thread. I really appreciate the answers everyone gave -- a really good complement to the other thread on prepending / appending. 

Peter Hamilton

unread,
Jun 16, 2016, 3:45:50 PM6/16/16
to elixir-l...@googlegroups.com
You'll find that maps (and therefore structs) are implemented in a similar manner, albeit much more complex. Inserting an element into the map is generally going to create a shallow copy of the map with the element added, much like prepending to a linked list. However, because of the underlying structure of a map, it might do a shallow copy of one branch of the map and create a full copy of another branch (allowing it to insert the node with the correct balancing). Off the top of my head I can't explain exactly how it works, but it's in this general vein.

If you want to play around with some erlang data structures, I recommend looking at the queue and gb_tree data structures. They are fairly transparent (if you inspect them you will see they are just tuples and lists) and inserting and removing items is pretty easy to follow. There's been a lot of academic research into these data structures and so it's a pretty educational experience.
Thanks!

--
Dave Aronson, consulting software developer of Codosaur.us,
PullRequestRoulette.com, Blog.Codosaur.us, and Dare2XL.com.

--
You received this message because you are subscribed to the Google Groups "elixir-lang-talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elixir-lang-ta...@googlegroups.com.

Robert Virding

unread,
Jun 18, 2016, 2:03:26 PM6/18/16
to elixir-lang-talk

And so on.  Okay, I think I get it now: it's the immutability not of
the list itself (which would also necessitate a copying even if
appending were fast), but of the *nodes*, that makes appending slow,
because of the mutation needed on the former end propagating all the
way back to the head.

Yes, a list is not one blob but a linked structure where each link is itself a list. That is why for example if you want to get the length of the list you literally have to step down on the whole list until you get to the end. And as has pointed out here all structures are immutable and modifying them will cause parts of them to be copied. This also applies to internal data structures like maps. If I take a map and modify it the old map is still there, accessible and unchanged.

Robert

P.S. shh don't tell anyone but you can use list cells as general pairs to be any form of binary tree structure. Using them as lists going down to the right is just one way which is supported and check in all libraries, but it is not the only way:

iex(1)> [[:a|:b]|:c]
[[:a | :b] | :c]

Reply all
Reply to author
Forward
0 new messages