Space Optimized ImmutableMap/Set?

929 views
Skip to first unread message

Sam Berlin

unread,
Jul 5, 2009, 4:35:23 PM7/5/09
to Google Collections Library - users list
Hi Folks,

The current implementation of ImmutableMap (and ImmutableSet) seems to favor speed over space.  Is this just the nature of the implementations, and if you want a space-optimized version, you should use HashMap or HashSet instead?  Does the performance boost that ImmutableSet's javadoc refers to come from the double-storage of an 'entries' or 'elements' list plus the separate hash 'table'?  Would there be any benefit in creating a space-optimized variant (by a new builder method like '.spaceOptimized()') that does not store the duplicate entries/elements (and does not guarantee iteration is in order) and reduces the table size of the Map variant to == the entry size (as opposed to the current 2x the entry size)?

Sam

Jared Levy

unread,
Jul 5, 2009, 10:56:15 PM7/5/09
to Sam Berlin, Google Collections Library - users list
As you mention, the double-storage makes the iteration order of the set and map views that same as the insertion order. One array is used for iteration, and one is used for hash-based lookups. We could provide an implementation that doesn't specify the iteration order and takes less memory; its performance would be a little worse for some operations.

Note that the ImmutableSortedSet and ImmutableSortedMap implementations have a single array and therefore require less memory. However, their accessor performance is O(log(N)) instead of O(N).

There's another way we could reduce the memory usage of ImmutableMap and ImmutableSortedMap. Instead of storing a Map.Entry for each element, we could store an array of keys and an array of values. Whenever an entry is retrieved through the entrySet() view, it could be created on the fly. That approach requires less memory but more time.

ImmutableSet probably requires less memory than HashSet, since HashSet is based on a HashMap. I can't tell, from looking over the code, whether HashMap or ImmutableMap require more memory.

Jim Andreou

unread,
Jul 6, 2009, 6:02:15 AM7/6/09
to Jared Levy, Sam Berlin, Google Collections Library - users list
A quick experiment (on sun's 32-bit java 6) showed that generally ImmutableSet memory consumption is about 37% to 62% of that of HashSet.
On the other hand, an ImmutableMap memory consumption is about 112% to 162% of that of a HashMap.

ImmutableSortedMap is consistently at 37% of a TreeMap (a bit better for <10 elements).

ImmutableSortedSet is consistently at 87% of a TreeSet (a bit better for <10 elements).

Dimitris

2009/7/6 Jared Levy <jared....@gmail.com>

Sam Berlin

unread,
Jul 6, 2009, 9:12:07 AM7/6/09
to Jim Andreou, Jared Levy, Google Collections Library - users list
Looks like for every collection other than a non-sorted Map, the ImmutableX variant is far better in terms of memory consumption.  It'd be wonderful to see ImmutableMap fall in the category too, especially if iteration order isn't a required trait.

Sam

Kevin Bourrillion

unread,
Jul 6, 2009, 2:01:02 PM7/6/09
to Jim Andreou, Jared Levy, Sam Berlin, Google Collections Library - users list
The space consumption of ImmtuableSet is roughly (size + tablesize + 4) (multiply by 32 or 64 bits for your VM). If we dropped the array that maintains the ordering, it would be just, perhaps, (tablesize + 2).  But, HashSet's consumption is (6 * size + tablesize + 7).  Since we've already subtracted (5 * size + 3) from the memory usage, we figured that should be good enough.

Of course, there's the issue of tablesize; HashSet lets you configure it (in a roundabout way); we don't, and always size it to be 25-50% full.  While I'm reluctant to create a non-iteration-ordered ImmutableSet because of the sharp behavioral difference, it's possible that we could allow tweaking the density if there's really the need.

I'm really surprised about the ImmutableMap findings.  That really shouldn't be the case... will look into it.


Jared Levy

unread,
Jul 6, 2009, 2:24:33 PM7/6/09
to Jim Andreou, Sam Berlin, Google Collections Library - users list
Thanks, Jim.

I'd expect ImmutableSortedSet to have a larger memory improvement than ImmutableSortedMap, since the TreeSet implementation is based on a TreeMap while ImmutableSortedSet requires less memory than a ImmutableSortedMap. Jim, did you record those stats correctly?


On Mon, Jul 6, 2009 at 3:02 AM, Jim Andreou <jim.a...@gmail.com> wrote:

Jim Andreou

unread,
Jul 6, 2009, 2:50:03 PM7/6/09
to Jared Levy, Sam Berlin, Google Collections Library - users list
Oh, crap. That was too quick. I mistakenly did "(jdkMemory - googleMemory) / jdkMemory" for half cases instead of just "googleMemory / jdkMemory". 

So the following corrections are in order:

ImmutableSortedSet is _13%_ of TreeSet (instead of 87% I said earlier)
ImmutableSortedMap is _63%_ of TreeMap (not 37%)

ImmutableSet and ImmutableMap are still  37% to 62% of HashSet and 112% to 162% of HashMap respectively (sorry about the confusion).


This is the (quick&dirty) code on the ImmutableSortedSet case:

public class GoogleCollectMemoryBench {
    public static void main(String[] args) {
        for (int i = 1; i < 5000; i++) {
            System.out.println("Objects: " + i);
            ImmutableSortedSet.Builder<Element> b = ImmutableSortedSet.<Element>naturalOrder();
            for (int j = 0; j < i; j++) { b.add(new Element()); }

            long googleMemory = MemoryMeasurer.count(b.build(), ObjectFilters.stopAtAnyInstanceOf(Element.class));
            System.out.println("ImmutableSortedSet: " + googleMemory);

            TreeSet<Element> treeSet = new TreeSet<Element>();
            for (int j = 0; j < i; j++) { treeSet.add(new Element()); }

            long jdkMemory = MemoryMeasurer.count(treeSet, ObjectFilters.stopAtAnyInstanceOf(Element.class));
            System.out.println("TreeSet: " + jdkMemory);

            System.out.println(((double)googleMemory) / jdkMemory);
        }
    }

    static class Element implements Comparable<Element> {
        static int counter = 0;
        final int id = counter++;

        public int compareTo(Element o) {
            return id - o.id;
        }
    }
}



2009/7/6 Jared Levy <jared....@gmail.com>

Sam Berlin

unread,
Jul 6, 2009, 3:15:51 PM7/6/09
to Kevin Bourrillion, Jim Andreou, Jared Levy, Google Collections Library - users list
For Immutable collections, is it necessary to size the table at 25-50% full?  I would think that unless there's an extreme performance optimization made by sizing the table larger, you'd want the excess in the table to be as small as possible (because you can guarantee nothing will ever use the excess space).

Sam

Kevin Bourrillion

unread,
Jul 6, 2009, 3:27:14 PM7/6/09
to Jared Levy, Jim Andreou, Sam Berlin, Google Collections Library - users list
The empirically measured memory usage of ImmutableMap is

  8 * tablesize + 5.6 * size + 80

Whereas that of HashMap -- when also using 25-50% full as the guideline -- is

  4 * tablesize + 24 * size + 64

Considering that the median tablesize should be 2.8 * size, the net effect is that an ImmutableMap should be about 20% smaller.  Of course, that would make it 30-40% smaller than LinkedHashMap, whose behavior it supports (not going to run those numbers right now).



Kevin Bourrillion

unread,
Jul 6, 2009, 3:33:51 PM7/6/09
to Jared Levy, Jim Andreou, Sam Berlin, Google Collections Library - users list
Whoops, I expressed those as byte counts on a 32-bit VM; for the record I should express them as reference counts just like I did with ImmutableSet above.

HashSet:          6 * size + tablesize + 7 (roughly 8.8 * size)
ImmutableSet:   size + tablesize + 4 (roughly 3.8 * size)

HashMap:         6 * size + tablesize + 16 (roughly 8.8 * size)
ImmutableMap : 1.4 * size + 2 * tablesize + 20 (roughly 7 * size)

(Of course, it's no accident that HashSet and HashMap are the same, or that the Immutable win is so much greater with sets.)

Kevin Bourrillion

unread,
Jul 6, 2009, 3:38:44 PM7/6/09
to Sam Berlin, Jim Andreou, Jared Levy, Google Collections Library - users list
Yes, this can certainly be revisited.  We opted to take the simplest approach we could until we had stronger data supporting a different calculation.

Sam Berlin

unread,
Jul 6, 2009, 3:43:31 PM7/6/09
to Kevin Bourrillion, Jared Levy, Jim Andreou, Google Collections Library - users list
To offer one suggestion that is very likely something you've already thought of and dismissed for good reason...  is it possible for ImmutableMap to reduce the tablesize requirements by storing the Entry in the table, as opposed to the key & value separately?  It seems that the Entry[] is already stored (for iteration order), so there's no extra space requirements for keeping an Entry allocated. The difference would be limited to retrieving the key by an array index + getter (as opposed to currently just an array index) and the value by a getter (as opposed to currently an array index).

From the looks of it, this would reduce the space requirements by a factor of 2 without any meaningful impact on speed.

Sam

Kevin Bourrillion

unread,
Jul 6, 2009, 3:47:49 PM7/6/09
to Jared Levy, Jim Andreou, Sam Berlin, Google Collections Library - users list
FYI, here is the admittedly very clumsy and overkill-y, but effective, code that I used to make the measurements. Run it with -Xms2g or you'll get garbage results.  Then I painstakingly studied the output by hand to come up with the simple formulas.

import com.google.common.collect.*;
import java.util.*;
import java.util.concurrent.CountDownLatch;

public class MapMemory {
  private static final Runtime rt = Runtime.getRuntime();

  public static void main(String[] args) throws Exception {
    Map<Integer, String>[] holders = new Map[10];
    Integer[] integers = new Integer[200000];
    for (int i = 0; i < 200000; i++) {
      integers[i] = new Integer(i);
    }
    List<Integer> list = Arrays.asList(integers);
    Collections.shuffle(list);

    for (int count = 0 ; count < 3; count++) {
      for (int size : Arrays.asList(160000, 140000, 120000, 100000, 80000, 60000, 50000, 40000, 30000, 25000, 20000, 18000, 16000, 14000, 12000, 10000, 9000, 8000, 7000, 6000)) {
        for (int i = 0; i < 10; i++) {
          holders[i] = Maps.newHashMapWithExpectedSize(size);
          for (Integer intagar : list.subList(0, size)) {
            holders[i].put(intagar, "");
          }
        }
        // ... then try it this way instead ...
        //      ImmutableMap.Builder<Integer, String> builder = ImmutableMap.builder();
        //      for (Integer intagar : list.subList(0, size)) {
        //        builder.put(intagar, "");
        //      }
        //      for (int i = 0; i < 10; i++) {
        //        holders[i] = builder.build();
        //      }
        //      builder = null;

        gcTillYouDrop();
        long mem = rt.freeMemory();

        for (int i = 0; i < 10; i++) {
          holders[i] = null;
        }
        gcTillYouDrop();
        long diff = rt.freeMemory() - mem;
        System.out.format("%s - %s%n", size, diff / 10);
      }
      System.out.println();
    }
  }

  // Standard hyper-paranoid "gc everything" procedure from Martin Buchholz
  static void gcTillYouDrop() {
    rt.gc();
    rt.runFinalization();
    final CountDownLatch latch = new CountDownLatch(1);
    new Object() {
      protected void finalize() {
        latch.countDown();
      }
    };
    rt.gc();
    try {
      latch.await();
    }
    catch(InterruptedException ie){
      throw new Error(ie);

Jim Andreou

unread,
Jul 6, 2009, 3:52:50 PM7/6/09
to Kevin Bourrillion, Jared Levy, Sam Berlin, Google Collections Library - users list
I could be wrong again, but I can't reproduce this claim, for HashMap with load factor 0.5f I get ImmutableMap to be 112% to 129% of HashMap (with spikes to 162% for size() == 512, 1024, 2048, 4096 etc, presumably HashMap resizes just one step later than ImmutableMap). 

I don't know. Perhaps my tool is buggy, too tired to check right now. (Note though that I don't try to infer memory usage from heap usage, just directly measuring the reachable objects). I could put it print out the cost of each part of the structures to be sure, perhaps later.


2009/7/6 Kevin Bourrillion <kev...@google.com>

Kevin Bourrillion

unread,
Jul 6, 2009, 4:00:37 PM7/6/09
to Jared Levy, Jim Andreou, Sam Berlin, Google Collections Library - users list
There's a problem with my description of HashFoo as taking "roughly 8.8 * size" memory.

The problem is that that result depends on the assumption that you're sizing the table to be 25-50% full -- I made that assumption so this could be an "apples to apples" comparison.

But it's not an apples-to-apples comparison, because the design of HashFoo (it uses chaining) is more performance-tolerant of collisions than the design of ImmutableFoo (which uses open addressing (linear probing)).

So it's fair to size HashFoos a little smaller, meaning that the memory comparison of HashMap and ImmutableMap is probably a wash.  Of course, ImmutableSet is still a strong win, memory-wise.

(We really should undertake some proper benchmarking of the maps at various sizes and densities.  ImmutableMap can almost certainly be made faster than it is.)

Kevin Bourrillion

unread,
Jul 6, 2009, 4:06:01 PM7/6/09
to Jim Andreou, Jared Levy, Sam Berlin, Google Collections Library - users list
Let me know what you get running my code and/or what you conclude from reading my code.

My code is using Maps.newHashMapWithExpectedSize() which itself is subject to the same situation described earlier (it currently does the simplest thing of sizing to be 25-50% full, pending the day when we would come back and study what the right default behavior really is).  i see now that this is inappropriate since HashMap's use of closed addressing / open hashing / whatever you call it makes it more robust to collisions.

By the way, when considering the cost of collisions, we do have to take into account that some equals() methods of user objects are slow, and make collisions more costly across the board.

Jim Andreou

unread,
Jul 6, 2009, 4:43:12 PM7/6/09
to Kevin Bourrillion, Jared Levy, Sam Berlin, Google Collections Library - users list
Hey Kevin,

I'm attaching what I got by printing out the internals of the structures, for maps of size 7 and of size 8. (This is meant as a sanity check). As mentioned, powers of two are unfair to ImmutableMap, since HashMap resizes one step later (the text illustrates it).

I used a HashMap with load factor 0.5f (that is exactly 25-50% full, right?).

First it explores the ImmutableMap, and accounts for its objects, reporting cost of each thing, and also a running total cost that add them up. Then the HashMap.

Even though HashMap has more expensive nodes (24 bytes vs 16), it's Entry[] table is only of size 16 (80 bytes), while ImmutableMap contains an Object[] of size 64. Did I do something wrong? 64 is too big for 8 keys + 8 values. It looks like the ImmutableMap has a load factor of just 0.25 (i.e. 12.5 to 25% full).

I hope there is no glaring mistake again.. (sorry, didn't try your code yet)

Dimitris

2009/7/6 Kevin Bourrillion <kev...@google.com>
maps_size8.txt
maps_size7.txt

Jim Andreou

unread,
Jul 6, 2009, 4:46:35 PM7/6/09
to Kevin Bourrillion, Jared Levy, Sam Berlin, Google Collections Library - users list
It looks like the ImmutableMap has a load factor of just 0.25 (i.e. 12.5 to 25% full).

Most probably this is the issue. I tried using a HashMap with load factor also 0.25, and then it becomes slightly more costly than ImmutableMap, just as you said earlier (ImmutableMap is about ~90 of HashMap). (Except for sizes of powers of two, where ImmutableMap doubles while HashMap does not, so the balance there is inversed)

Kevin Bourrillion

unread,
Jul 6, 2009, 5:10:28 PM7/6/09
to Jim Andreou, Jared Levy, Sam Berlin, Google Collections Library - users list


On Mon, Jul 6, 2009 at 1:46 PM, Jim Andreou <jim.a...@gmail.com> wrote:
It looks like the ImmutableMap has a load factor of just 0.25 (i.e. 12.5 to 25% full).


What's probably confusing the issue here is that ImmutableMap uses a single array to store alternating keys and values.  So if you have 6 entries, the array size will end up being 32.

Sam Berlin

unread,
Jul 6, 2009, 5:19:43 PM7/6/09
to Jim Andreou, Kevin Bourrillion, Jared Levy, Google Collections Library - users list
If this is indeed the issue, where a 0.25 load factor ends up with ImmutableMap being smaller, I would say that that the comparison is tweaking the factors to be in favor of ImmutableMap.  Yes, strictly speaking it is an "all-things-being-equal" comparison, but it's one that's unfair to HashMap, since with a HashMap I have the ability to say "don't waste space, I know I'll have X items forever", and I don't have that ability with ImmutableMap (despite it implicitly knowing it will have X items forever).

Sam
Reply all
Reply to author
Forward
0 new messages