I was allocating big temporary_buffer and noticed Seastar did not allow me to use most of my memory.
In short, free memory reported by seastar::memory::stats().free_memory() is roughly around my total memory divided by shard count. But maximum size of allocated buffer is that value divided by shard count again. Of 16G memory on my machine, I can allocate 4G with one shard (one core), 2G with two shards, 2G with 3 shards, 1G with 4 shards.
Expected behavior: I can allocate buffer of the same size reported by free_memory(), or about my total memory size divided by shard count
Actual behavior: I can allocate buffer of size about total memory size divided by shard count squared.
Can you try with 8 and 16 shards? Might need --overprovisioned.
The test program could be found at https://github.com/chulup/ext_sort/blob/master/src/memory_test.cpp ; here are results of runs with different shard amount requested:
You can use the "scylla memory" gdb command from https://github.com/scylladb/scylla/blob/master/scylla-gdb.py (we should move the seastar stuff into seastar.git) to see how memory was fragmented.
In general, you should not rely on large allocations as they
cannot work reliably in a long-running server, but it would be
nice to allocate almost all of memory on startup. But I think you
hit an edge case:
- a 4GB shard actually has less than 4GB, because some reserve is left for the OS
- some memory is allocated by Seastar, at lower addresses
So the memory map looks like this:
[allocated slabs] [free slabs up to 1GB boundary] [1GB slab]
[1GB slab] [free slabs up to 4GB-epsilon boundary]
Seastar uses buddy allocation (since 33d8f74fc83a12601618a4a9fa7ad1f3a9955c73); that means a 2GB allocation has to be 2GB aligned. There isn't such a slab in a 4GB-epsilon shard.
Try running with --memory 9G --smp 2 or --memory 13G --smp 3 and
you should see 2GB allocations succeed.
--
You received this message because you are subscribed to the Google Groups "seastar-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to seastar-dev...@googlegroups.com.
To post to this group, send email to seast...@googlegroups.com.
Visit this group at https://groups.google.com/group/seastar-dev.
To view this discussion on the web visit https://groups.google.com/d/msgid/seastar-dev/81def54c-5b64-4d13-a517-5f1515af4ca7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
On 24/10/2018 10.17, Michael Shulbaev wrote:
I was allocating big temporary_buffer and noticed Seastar did not allow me to use most of my memory.
In short, free memory reported by seastar::memory::stats().free_memory() is roughly around my total memory divided by shard count. But maximum size of allocated buffer is that value divided by shard count again. Of 16G memory on my machine, I can allocate 4G with one shard (one core), 2G with two shards, 2G with 3 shards, 1G with 4 shards.
Expected behavior: I can allocate buffer of the same size reported by free_memory(), or about my total memory size divided by shard count
Actual behavior: I can allocate buffer of size about total memory size divided by shard count squared.
Can you try with 8 and 16 shards? Might need --overprovisioned.
The test program could be found at https://github.com/chulup/ext_sort/blob/master/src/memory_test.cpp ; here are results of runs with different shard amount requested:
You can use the "scylla memory" gdb command from https://github.com/scylladb/scylla/blob/master/scylla-gdb.py (we should move the seastar stuff into seastar.git) to see how memory was fragmented.
In general, you should not rely on large allocations as they cannot work reliably in a long-running server, but it would be nice to allocate almost all of memory on startup. But I think you hit an edge case:
- a 4GB shard actually has less than 4GB, because some reserve is left for the OS
- some memory is allocated by Seastar, at lower addresses
So the memory map looks like this:
[allocated slabs] [free slabs up to 1GB boundary] [1GB slab] [1GB slab] [free slabs up to 4GB-epsilon boundary]
Seastar uses buddy allocation (since 33d8f74fc83a12601618a4a9fa7ad1f3a9955c73); that means a 2GB allocation has to be 2GB aligned. There isn't such a slab in a 4GB-epsilon shard.