Hi,
We recently introduced a new, zeromq based distributed system for bquery (see
https://github.com/visualfabriq/bqueryd). In short: we chunk large bcolz files into smaller 'shards' (normally 1 million records) and divide those over servers, creating a massively parallel querying system. So we have tables with over 3 billion records, leading to over 3000 shards being available over a series of servers where all the work is done in parallel.
We are finishing the last touches there, but in general it has gone live and it is working well. (entered a cfp for pydata amsterdam with this, let's see if they are interested).
Now we run on AWS, meaning we use c4.2xlarge instances, which means 8 cpus, 8000 IOPS with a 256kB io block size. From a previous cpu-limited operation we are now running into IO limitations (meaning we are a lot faster than before, but still can improve this bottleneck a lot).
So I think we came to the point where we should not use "expected_len" any more, but instead should start fine tuning with chunklen and compression methods. But this is a classical "communicating vessels" problem, so I wondered if anyone else has experience here? I will try to share ours with some follow-up posts in here if people are interested).
In short:
- We want to maximize cpu and disk bottlenecks
- The latest bquery can push everything except aggregation result itself to out-of-core, so we run on a very low-memory profile -> memory is not an issue
- CPU is now underused, with disk io being the main bottleneck
- After compression we want to end up with a chunk size that is ideally just under 256kb. It's more important to be under the limit than above it (because exceeding the limit doubles the IO count)
- Ideally we would have a variable chunklength that is set by testing it during writing/compression but 1) that is not supported now and 2) would make writing slower (not really an issue though normally, as we WORM-style write normally just once and then read many times)
- So we need to adjust the chunklen so that most often the compressed chunk file is near yet under 256kb (let's try to 90% of the time be under it for instance)
- We have mixed int64 and float64 ctables, which might also complicate matters even more because they might have different disk lengths and compression rates
- After this, we might still be IO limited, but it should be a lot less than before (I will do some research into the current average on-disk size of our chunks)
- If we are still IO limited, we can turn up compression so we offload more to the cpu and less to the file system
- That would also mean that we need to change the chunklen again though to stay near 256kB chunks
Anyone have experience with this? Or else, interested in the results?
BR,
Carst