Am Thu, 22 Jul 2021 10:19:11 +0200
> On the other hand, I will compare the achieved data rates. What
> performance impact (loss/gain of (perceived) peak transfer rate) do you
> measure with native caching compared to buffered mode?
OK, so I applied the change to our 7.1.5 client. This is the result
(the cluster is busy, so the uncached data rate is moderate).
$ for bs in 512 4K 16K 256K 1M; do echo "======== bs=$bs =========="; for d in work work_cached; do echo 3 >/proc/sys/vm/drop_caches; echo "***** $d *******"; for n in 1 2 3; do dd if=$d/testzero bs=$bs of=/dev/null; done; done; done
======== bs=512 ==========
***** work *******
8388608+0 records in
8388608+0 records out
4294967296 bytes (4.3 GB) copied, 21.307 s, 202 MB/s
8388608+0 records in
8388608+0 records out
4294967296 bytes (4.3 GB) copied, 21.8692 s, 196 MB/s
8388608+0 records in
8388608+0 records out
4294967296 bytes (4.3 GB) copied, 22.3251 s, 192 MB/s
***** work_cached *******
8388608+0 records in
8388608+0 records out
4294967296 bytes (4.3 GB) copied, 99.4572 s, 43.2 MB/s
8388608+0 records in
8388608+0 records out
4294967296 bytes (4.3 GB) copied, 8.21897 s, 523 MB/s
8388608+0 records in
8388608+0 records out
4294967296 bytes (4.3 GB) copied, 8.19792 s, 524 MB/s
======== bs=4K ==========
***** work *******
1048576+0 records in
1048576+0 records out
4294967296 bytes (4.3 GB) copied, 9.82118 s, 437 MB/s
1048576+0 records in
1048576+0 records out
4294967296 bytes (4.3 GB) copied, 10.997 s, 391 MB/s
1048576+0 records in
1048576+0 records out
4294967296 bytes (4.3 GB) copied, 10.5302 s, 408 MB/s
***** work_cached *******
1048576+0 records in
1048576+0 records out
4294967296 bytes (4.3 GB) copied, 91.0911 s, 47.2 MB/s
1048576+0 records in
1048576+0 records out
4294967296 bytes (4.3 GB) copied, 1.39473 s, 3.1 GB/s
1048576+0 records in
1048576+0 records out
4294967296 bytes (4.3 GB) copied, 1.31659 s, 3.3 GB/s
======== bs=16K ==========
***** work *******
262144+0 records in
262144+0 records out
4294967296 bytes (4.3 GB) copied, 10.5552 s, 407 MB/s
262144+0 records in
262144+0 records out
4294967296 bytes (4.3 GB) copied, 12.229 s, 351 MB/s
262144+0 records in
262144+0 records out
4294967296 bytes (4.3 GB) copied, 10.8311 s, 397 MB/s
***** work_cached *******
262144+0 records in
262144+0 records out
4294967296 bytes (4.3 GB) copied, 90.2333 s, 47.6 MB/s
262144+0 records in
262144+0 records out
4294967296 bytes (4.3 GB) copied, 0.711975 s, 6.0 GB/s
262144+0 records in
262144+0 records out
4294967296 bytes (4.3 GB) copied, 0.649651 s, 6.6 GB/s
======== bs=256K ==========
***** work *******
16384+0 records in
16384+0 records out
4294967296 bytes (4.3 GB) copied, 9.44409 s, 455 MB/s
16384+0 records in
16384+0 records out
4294967296 bytes (4.3 GB) copied, 8.92695 s, 481 MB/s
16384+0 records in
16384+0 records out
4294967296 bytes (4.3 GB) copied, 7.98414 s, 538 MB/s
***** work_cached *******
16384+0 records in
16384+0 records out
4294967296 bytes (4.3 GB) copied, 88.6168 s, 48.5 MB/s
16384+0 records in
16384+0 records out
4294967296 bytes (4.3 GB) copied, 0.551815 s, 7.8 GB/s
16384+0 records in
16384+0 records out
4294967296 bytes (4.3 GB) copied, 0.487542 s, 8.8 GB/s
======== bs=1M ==========
***** work *******
4096+0 records in
4096+0 records out
4294967296 bytes (4.3 GB) copied, 26.9552 s, 159 MB/s
4096+0 records in
4096+0 records out
4294967296 bytes (4.3 GB) copied, 24.0293 s, 179 MB/s
4096+0 records in
4096+0 records out
4294967296 bytes (4.3 GB) copied, 23.6395 s, 182 MB/s
***** work_cached *******
4096+0 records in
4096+0 records out
4294967296 bytes (4.3 GB) copied, 25.2812 s, 170 MB/s
4096+0 records in
4096+0 records out
4294967296 bytes (4.3 GB) copied, 20.2517 s, 212 MB/s
4096+0 records in
4096+0 records out
4294967296 bytes (4.3 GB) copied, 25.3047 s, 170 MB/s
So, I see the cache nicely working once data has been read. For 1M
block size, the cache is inactive by design, I presume (the cache block
size).
But it is very consistent that the first read into the cache is
extremely slow with below 50 MB/s, compared to the direct read rate of
around 200 to 400 MB/s.
There must be some rather obvious bottleneck that slows things down
when managing the cache blocks. Any chance to identify that? The
tradeoff to have 1/4 or even less of the initial transfer rate is
rather hefty.
Alrighty then,