On 20/12/2023 18:08, Theo wrote:
> Vir Campestris <vir.cam...@invalid.invalid> wrote:
>> This is not the right group for this - but I don't know where is.
>> Suggestions on a postcard please...
>
> I'm crossposting this to comp.arch, where they may have some ideas.
>
<snip>
>
> For 'series length 8B/16B/32B' do you mean 8 bytes? ie 8B is a single 64
> bit word transferred?
>
Yes. My system has a 64 bit CPU and 64 bit main memory.
> What instruction sequences are being generated for the 8/16/32/64 byte
> loops? I'm wondering if the compiler is using different instructions,
> eg using MMX, SSE, AVX to do the operations. Maybe they are having
> different caching behaviour?
>
It's running the same loop for each time, but with different values for
the loop sizes.
> It would help if you could tell us the compiler and platform you're using,
> including version.
>
g++ (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Copyright (C) 2021 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Which of course tells you I'm running Ubuntu!
On 20/12/2023 18:58, MitchAlsup wrote:
>
> Can we see the code ??
>
> Can you present a table of the timing results ??
I've run this with more detailed increments on the line size, but here
are my results for powers of 2.
Size 1 gave 3.82242e+09 bytes/second.
Size 2 gave 3.80533e+09 bytes/second.
Size 4 gave 2.68017e+09 bytes/second.
Size 8 gave 2.33751e+09 bytes/second.
Size 16 gave 2.18424e+09 bytes/second.
Size 32 gave 2.10243e+09 bytes/second.
Size 64 gave 1.99371e+09 bytes/second.
Size 128 gave 1.98475e+09 bytes/second.
Size 256 gave 2.01653e+09 bytes/second.
Size 512 gave 2.00884e+09 bytes/second.
Size 1024 gave 2.02713e+09 bytes/second.
Size 2048 gave 2.01803e+09 bytes/second.
Size 4096 gave 3.26472e+09 bytes/second.
Size 8192 gave 3.85126e+09 bytes/second.
Size 16384 gave 3.85377e+09 bytes/second.
Size 32768 gave 3.85293e+09 bytes/second.
Size 65536 gave 2.06793e+09 bytes/second.
Size 131072 gave 2.06845e+09 bytes/second.
The code will continue, but the results are roughly stable for larger sizes.
The code I have put in a signature block; there's no point in risking
someone posting it again. I've commented it, but no doubt not in all the
right places! I'd be interested to know what results other people get.
Thanks
Andy
--
#include <chrono>
#include <iostream>
#include <vector>
int main()
{
// If your computer is much slower or faster than mine
// you may need to adjust this value.
constexpr size_t NextCount = 1 << 28;
std::vector<uint64_t> CacheStore(NextCount, 0);
// Get a raw pointer to the vector.
// On my machine (Ubuntu, g++) this improves
// performance. Using vector's operator[]
// results in a function call.
uint64_t *CachePtr = &CacheStore[0];
// SetSize is the count of the uint64_t items to be tested.
// I assume that when this is too big for a cache the data
// will overflow to the next level.
// Each loop it doubles in size. I've run it with smaller
// increments too, and the behaviour
// is still confusing.
for (auto SetSize = 1; SetSize < NextCount; SetSize<<=1)
{
size_t loopcount = 0;
size_t j = NextCount / SetSize;
auto start = std::chrono::steady_clock::now();
// The outer loop repeats enough times so that the data
// written by the inner loops of various sizes is
// approximately constant.
for (size_t k = 0; k < j; ++k)
{
// The inner loop modifies data
// within a set of words.
for (size_t l = 0; l < SetSize; ++l)
{
// read-modify-write some data.
// Comment this out
// to confirm that the looping is not
// the cause of the anomaly
++CachePtr[l];
// this counts the actual number
// of memory accesses.
// rounding errors means that for
// different SetSize values
// the count is not completely
// consistent.
++loopcount;
}
}
// Work out how long the loops took in microseconds,
// then scale to seconds
auto delta =
std::chrono::duration_cast<std::chrono::microseconds>
(std::chrono::steady_clock::now() - start).count()
/ 1e6;
// calculate how many bytes per second, and print.
std::cout << "Size " << SetSize << " gave "
<< (double)loopcount * (double)sizeof(uint64_t) /
delta << " bytes/second." << std::endl;
}
return 0;
}