byte arrays in memory

24 views

ConcurrentInvertedRadixTreearraybyteconsumptionmemory

Skip to first unread message

Alper Sağlam

unread,

Jul 30, 2017, 3:27:09 PM7/30/17

to concurrent-trees-discuss

Hi,

When I had done some trials with the concurrent trees I saw the byte array that take a lot of spaces that I expected. Can you help me to explain the memory usage of those byte array according to the code that I pasted below

public class TreeCreator {

    public ConcurrentInvertedRadixTree createTree() {
        ConcurrentInvertedRadixTree tree = new ConcurrentInvertedRadixTree(new DefaultByteArrayNodeFactory());

        for(int i = 0; i < 10000000; i++){
            StringBuilder builder = new StringBuilder("alper");
            builder.append(i);
            builder.trimToSize();
            tree.put(builder, VoidValue.SINGLETON);
        }

        return tree;
    }
}

public class App
{
    public static void main( String[] args )
    {
        TreeCreator creator = new TreeCreator();
        ConcurrentInvertedRadixTree tree = creator.createTree();

        Scanner scan = new Scanner(System.in);
        scan.nextInt();
    }
}

Here this is what I don't understand. In your implementations you are keeping byte arrays in incomingedge arrays and outgoing edge arrays as bytes, do we really need to keep another byte array ourside of ByteArrayNodeLeafValue and ByteArrayNodeNoneLeafValue.

Here is a memory snapshot after garbage collection:

Niall Gallagher

unread,

Jul 30, 2017, 7:48:01 PM7/30/17

to concurrent-t...@googlegroups.com

Hi Alper,

I don't understand what you mean. It's an intrinsic part of radix trees to store character data at any location in the tree. In fact the goal is to store common prefixes within the tree, and to store trailing suffixes in the leaves.

In the current implementation the bytes for each edge is only stored once.

In your example you create 10 million unique strings. Because these are unique strings, right away this means we should expect at least 10 million arrays. You can expect that these trees may reduce the lengths of the arrays stored, but not necessarily the number of arrays. They can reduce the number of arrays too, but this only occurs when strings are not unique. (This applies when ByteArrayNodeFactory is used. It's a different case when CharSequenceNodeFactory is used.)

In your example, those strings share a common prefix "alper". This shared prefix will be stored once. There will be many other shared prefixes after "alper" between subsets of the strings too, as many of the numbers you append will also share common prefixes. Those shared prefixes will each be stored once. However then the question is where will the bytes for the trailing characters of each string, which do not share prefixes with the other strings be stored? Those trailing characters will be stored in separate byte arrays in leaf nodes.

Forgive me if I've misunderstood your example. I'm happy to help if you could clarify.

Btw if you are looking to reduce memory overhead you could take a look at https://github.com/npgall/concurrent-trees/blob/master/documentation/NodeFactoryAndMemoryUsage.md

Best regards,

Niall

--
-- You received this message because you are subscribed to the "concurrent-trees-discuss" group.
http://groups.google.com/group/concurrent-trees-discuss
---
You received this message because you are subscribed to the Google Groups "concurrent-trees-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to concurrent-trees-discuss+unsub...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all

Reply to author

Forward

0 new messages