If those are serialized String objects then I'm seeing the following mismatch between measurement and calculation:
The graph I'm using has 4.9 million nodes, each of which has 40 string properties (each of which has 16 characters). It has 70 million directed edges, each of which has 1 string property with 140 characters which.
Assuming JVM String objects incur a 2x overhead, then the total in-memory size of these properties are: (40*16*4.9*10^6 + 70*10^6*140) * 2 / 2^30 = 24 GB. This roughly matches the on-disk footprint:
10G neostore.propertystore.db
17G neostore.propertystore.db.strings
So I think this matches Chris' explanation well (these two store files are serialized String objects).
However, after this warmup [1] to load the whole graph including node & relationship properties, the JVM heap memory usage is: Max 68.6 GB, Allocated 67.6 GB, Used 55.9 GB.
Where does this mismatch (56GB vs. < 30GB) come from? What's wrong in my calculation & understanding? It cannot be the other stores (node / relationship) as `du -shc *store.db*` returns 29GB total on-disk, 27GB of which are the properties.
Any help would be appreciated!
Zongheng