Maps : curious about data structure and memory usage

549 views
Skip to first unread message

Andy Gooden

unread,
Apr 6, 2016, 9:21:32 PM4/6/16
to elixir-lang-talk
Hi all, 
Sorry if this has already been covered.  Did a search and didn't see quite the answer...

in general, I'm curious about how maps are stored under the hood in Elixir 1.2 .. as a hashmap, linked list, tree structure, etc
and how this storage + immutability effects performance (both in memory usage and speed) for common operations such as inserts, deletes, updates.

A more specific question:

When we do:
map = Map.put(map, key, val)

are we getting a deep copy of the map to preserve immutability under the hood?

For example, from a "memory standpoint", would it be better to do option A?

Option A:

# map is a large map, but does not have keys :a, :b, :c
map = Map.merge( map,  %{a: 1, b: 2, c: 3} )

Option B:
map = map
|> Map.put (:a, 1)
|> Map.put (:b, 2)
|> Map.put (:c, 3)

Thanks in advance for any info (or pointers to previous answers :)
Andy


Ben Wilson

unread,
Apr 6, 2016, 9:56:14 PM4/6/16
to elixir-lang-talk
I can't address all of these, but I can address some.

The first thing to note is that answers to these questions only start to matter when dealing with LOTs of data in tight loops. Choose A or B on the basis of whichever ends up being clearer in your use case, it won't matter 99% of the time.

Maps over some relatively small number of items (32 I believe) are stored as a HAMT. https://en.wikipedia.org/wiki/Hash_array_mapped_trie Lookups should be O(log(n))

When it comes to evaluating A vs B there's a few things to keep in mind. First, Map.merge really just delegates to :maps.merge which itself is implemented as a NIF, meaning it's written in C. This means that it will be very fast and generally beat out manual calls to Map.put. If you already have two maps and you definitely want them to be combined, that's the way to go.

There's also some interesting cases about building maps. Basically, building a list of N {key, value} tuples and then calling :maps.from_list can be faster than N Map.put calls.

Anyway back to your question. If you already have two maps and you want to merge them, Map.merge is better both from a technical but also a semantic perspective. Merging is what you're doing. If you have one map and you want to add a bunch of stuff to it, chances are you just want a bunch of Map.put. If you don't have a map but want to build one, Map.new will do the fastest thing for whatever data you've given it. If you're literally just typing a handlful of key value pairs into your code, do whichever makes the most sense semantically, you're dealing with too few keys for the performance to matter.

I'm definitely missing bits to your answer here, hopefully those more knowledgeable than I will fill in the gaps :)

James Norton

unread,
Apr 7, 2016, 11:30:26 AM4/7/16
to elixir-lang-talk
Regarding the "are we getting a deep copy of the map to preserve immutability", because the original map is immutable there is no need for a deep copy. Have a look at persistent data structures - https://en.wikipedia.org/wiki/Persistent_data_structure.

-James


On Wednesday, April 6, 2016 at 9:21:32 PM UTC-4, Andy Gooden wrote:

Andy Gooden

unread,
Apr 7, 2016, 9:12:56 PM4/7/16
to elixir-lang-talk
Thank y'all for the replies.

@Ben : 
in my particular instance, I don't already have a second map I want to merge, but rather I'm adding several {k,v} pair to the map.  So, I'd agree that piping thru multiple Map.puts best represents what I'm really trying to do.

I am definitely curious about how things work under the hood.  Do you have any additional insights into why it is that building a list of tuples and call :maps.from_list is faster than the Map.puts?  Is it function call overhead of the Map.puts?  or would it have to do with the :maps.from_list call (perhaps) only having to construct the tree one time since it has visibility to all the keys at once.. instead of repeatedly inserting, and potentially having to restructure the tree during building?

@James : 
I guess to clarify, is I was thinking on the Map.put call, the incoming map would be passed by reference, but an entirely new copy of the map would be created that included the new {k,v} pair.   But it seems, from reading more about the persistent data structures, that the common parts of the old/ new map would be shared .. preventing an entirely new copy being created.  

This, for me, is much easier to visualize when prepending something onto a list (ie the old/ new list share the tail  .. found this post: https://groups.google.com/d/msg/elixir-lang-talk/REjQgVlZNcI/SYQrCb3AQRsJ  )
For the HAMT, I suppose there could be places the new key would land that could cause the old and new trees to be quite a bit different, which would require more under the hood work.  

I'm really glad that I asked this question, because I'm familiar with the nitty gritty's of various data structures as implemented in an imperative language, and naively didn't think much about how they'd be different in a functional language until recently.  I've been writing in Erlang for about three years now and only recently began using Elixir, so functional languages/ programming still feels relatively new.  

Thanks again for the insights!
Reply all
Reply to author
Forward
0 new messages