Sieve of Eratosthenes

Bonita Montero

unread,

Aug 16, 2023, 9:38:10 AM8/16/23

to

I tried to implement the sieve of Erastosthenes as fast as possible.
I didn't thought it would make a big difference, but having a bit-wise
vector<bool> instead of a vector<char> makes the code twice as fast
for my AMD 7950X PC and an upper bound of one billion. The additional
overhead for the bit-wise access is less than the additional overhead
of fetching more memory.
I also tried Eulers sieve but as I expected filtering the numbers by
modulo operations with the recent prime numbers takes much more time
than simply resetting a bit; the resulting code becomes mulltiple
times slower.

#include <iostream>
#include <vector>
#include <charconv>
#include <string_view>
#include <algorithm>
#include <fstream>
#include <cctype>
#include <cstring>

#if defined(_MSC_VER)
#pragma warning(disable: 4996)
#endif

using namespace std;

int main( int argc, char **argv )
{
constexpr bool USE_BOOL = true;
constexpr size_t BUF_SIZE = 0x100000;
try
{
size_t max;
if( argc < 2 )
return EXIT_FAILURE;
char const *sizeStr = argv[1];
bool hex = sizeStr[0] == '0' && (sizeStr[1] == 'x' || sizeStr[1] == 'X');
sizeStr += 2 * hex;
if( from_chars_result fcr = from_chars( sizeStr, sizeStr + strlen(
sizeStr ), max, !hex ? 10 : 16 ); (bool)fcr.ec || *fcr.ptr )
return EXIT_FAILURE;
using bool_t = conditional_t<USE_BOOL, bool, char>;
vector<bool_t> sieve( max, true );
for( size_t p = 2; p < max; )
{
for( size_t m = 2 * p; m < max; m += p )
sieve[m] = false;
while( ++p < max && !sieve[p] );
}
if( argc >= 3 && !*argv[2] )
return EXIT_SUCCESS;
ofstream ofs;
ofs.exceptions( ofstream::failbit | ofstream::badbit );
ofs.open( argc >= 3 ? argv[2] : "primes.txt", ofstream::trunc );
union ndi_t { }; // non-default-initialzied
vector<ndi_t> buf;
buf.resize( BUF_SIZE + 32 );
char
*bufBegin = (char *)buf.data(),
*bufEnd = bufBegin;
for( size_t i = 2; i != max; ++i )
if( sieve[i] )
{
to_chars_result tcr = to_chars( bufEnd, bufEnd + 32, i );
(bufEnd = tcr.ptr + 1)[-1] = '\n';
if( bufEnd - bufBegin >= BUF_SIZE )
ofs << string_view( bufBegin, bufEnd ),
bufEnd = bufBegin;
}
ofs << string_view( bufBegin, bufEnd );
}
catch( bad_alloc const & )
{
cout << "out of memory" << endl;
}
catch( ios_base::failure const & )
{
cout << "I/O error" << endl;
}
}

I also optimized the file output because simply doing a ofs << i << endl
is multiple times slower than buffering a megabyte of data until pushing
it to the file. The numbers printed to the buffer are directly printed
inside the buffer to make the code more efficient. So if the upper bound
is one billion the time taken to print the numbers to the file is within
the measurement error.

wij

unread,

Aug 16, 2023, 1:57:11 PM8/16/23

to

Confirmed, but you code don't compile.

--------------------------------- t.cpp
#include <Wy.stdio.h>
#include <Wy.time.h>
#include <CSCall/VLInt.h>
#include <vector>

using namespace Wy;

// [Syn] Print prime numbers less or equal to n
//
// Algo: sieve
//
void print_prime1(size_t n)
{
Errno r;
VLInt<unsigned int> barr; // bit=0 =prime

for(size_t i=2; i<=n; ++i) {
if(barr.bit(i)) {
continue;
}
// Cond: i is a prime number
for(size_t j=2*i; j<=n; j+=i) {
if((r=barr.set_bit(j,true))!=Ok) {
WY_THROW(r);
}
}
}
};

void print_prime2(size_t n)
{
Errno r;
Array<char> barr(n+1,0,0); // bit=0 =prime

for(size_t i=2; i<=n; ++i) {
if(barr[i]) {
continue;
}
// Cond: i is a prime number
for(size_t j=2*i; j<=n; j+=i) {
barr[j]=1;
}
}
};

void print_prime3(size_t n)
{
Errno r;
std::vector<char> barr(n+1,0); // bit=0 =prime

for(size_t i=2; i<=n; ++i) {
if(barr.at(i)) {
continue;
}
// Cond: i is a prime number
for(size_t j=2*i; j<=n; j+=i) {
barr.at(j)=1;
}
}
};

void print_prime4(size_t n)
{
Errno r;
std::vector<bool> barr(n+1,false); // bit=0 =prime

for(size_t i=2; i<=n; ++i) {
if(barr.at(i)) {
continue;
}
// Cond: i is a prime number
for(size_t j=2*i; j<=n; j+=i) {
barr.at(j)=true;
}
}
};

int main()
try {
constexpr size_t Cnt=100000000L;
Timespec tm0;

tm0=now(CLOCK_THREAD_CPUTIME_ID);
print_prime1(Cnt);
cout << "VLInt<unsigned>: " << (now(CLOCK_THREAD_CPUTIME_ID)-tm0) << WY_ENDL;

tm0=now(CLOCK_THREAD_CPUTIME_ID);
print_prime2(Cnt);
cout << "Array<char>: " << (now(CLOCK_THREAD_CPUTIME_ID)-tm0) << WY_ENDL;

tm0=now(CLOCK_THREAD_CPUTIME_ID);
print_prime3(Cnt);
cout << "vector<char>: " << (now(CLOCK_THREAD_CPUTIME_ID)-tm0) << WY_ENDL;

tm0=now(CLOCK_THREAD_CPUTIME_ID);
print_prime4(Cnt);
cout << "vector<bool>: " << (now(CLOCK_THREAD_CPUTIME_ID)-tm0) << WY_ENDL;

cout << "Ok" WY_ENDL;
return 0;
}
catch(const Errno& e) {
cerr << wrd(e) << WY_ENDL;
return -1;
}
catch(...) {
cerr << "unknown stack unwind" << WY_ENDL;
throw;
};
----------------------------------

[]$ g++ t.cpp -lwy
[]$ ./a.out
VLInt<unsigned>: 6.162083886
Array<char>: 3.555815058
vector<char>: 5.307075961
vector<bool>: 30.257717265

I am surprised std::vector's at(size_t) is unexpected slow.

Note: Wy::Array's operator[] checks index range, should be equ. to std::vector's at(size_t)

Bonita Montero

unread,

Aug 16, 2023, 2:22:02 PM8/16/23

to

-std=c++20

Show me the times to print primes below 10000000.

Paavo Helde

unread,

Aug 16, 2023, 2:45:12 PM8/16/23

to

16.08.2023 20:57 wij kirjutas:
>
> []$ g++ t.cpp -lwy

>
> I am surprised std::vector's at(size_t) is unexpected slow.

You are missing -O2 or -O3. There is no point to talk about the speed of
unoptimized code.

wij

unread,

Aug 16, 2023, 3:10:09 PM8/16/23

to

[]$ g++ t.cpp -lwy -O2
[]$ ./a.out
VLInt<unsigned>: 1.259642293
Array<char>: 1.525368272
vector<char>: 1.484200452
vector<bool>: 1.533252238

----------------
-O2 makes sense. But the problem now is why VLInt is faster? (VLInt allocates memory on demand)

wij

unread,

Aug 16, 2023, 3:17:39 PM8/16/23

to

[]$ g++ t12.cpp -lwy -std=c++20
[]$ ./a.out
2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 101 103 107 109 113 127 131 137 139 149 151 157 163 167 173 179 181 191 193 197 199 211 2
...(cut)
889 9999901 9999907 9999929 9999931 9999937 9999943 9999971 9999973 9999991
pi(10000000)= 664579
print_prime: 2.740338666
Ok

--------------------- t12.cpp

#include <Wy.stdio.h>
#include <Wy.time.h>
#include <CSCall/VLInt.h>

using namespace Wy;

// [Syn] Print prime numbers less or equal to n
//
// Algo: sieve
//

void print_prime(size_t n)

{
Errno r;
VLInt<unsigned int> barr; // bit=0 =prime

for(size_t i=2; i<=n; ++i) {
if(barr.bit(i)) {
continue;
}
// Cond: i is a prime number
for(size_t j=2*i; j<=n; j+=i) {
if((r=barr.set_bit(j,true))!=Ok) {
WY_THROW(r);
}
}
}

// print prime number (from 2)
size_t pcnt=0;

for(size_t i=2; i<=n; ++i) {

if(barr.bit(i)==false) {
cout << wrd(i) << ' ';
++pcnt;
}
}
cout << WY_ENDL;
cout << "pi(" << n << ")= " << pcnt << WY_ENDL;
};

int main()
try {
Timespec tm0;

tm0=now(CLOCK_THREAD_CPUTIME_ID);
print_prime(10000000);
cout << "print_prime: " << (now(CLOCK_THREAD_CPUTIME_ID)-tm0) << WY_ENDL;

Alf P. Steinbach

unread,

Aug 16, 2023, 5:17:54 PM8/16/23

to

On 2023-08-16 3:37 PM, Bonita Montero wrote:
> I tried to implement the sieve of Erastosthenes as fast as possible.
> I didn't thought it would make a big difference, but having a bit-wise
> vector<bool> instead of a vector<char> makes the code twice as fast
> for my AMD 7950X PC and an upper bound of one billion. The additional
> overhead for the bit-wise access is less than the additional overhead
> of fetching more memory.

An apparent mystery, because I get only a marginal difference, down at
10% or less.

On my computer the code below is slightly faster than yours for a
10'000'000 sieve, respectively (typical results, three runs, only the
vector allocation + sieve code measured, not file i/o)

0.0477802 seconds.
0.0466974 seconds.
0.0472202 seconds.

... for my code, versus

0.0502888 seconds.
0.0482821 seconds.
0.0505414 seconds.

... for yours with the `std::string_view` instantiations fixed,
producing the same file contents.

#include <fmt/core.h>
#include "cppm-Timer.hpp"

#include <vector>
using std::ofstream,
std::vector;

#include <stdlib.h> // EXIT_FAILURE
#include <stdio.h> // fopen, fclose

auto main() -> int
{
constexpr int n = 10'000'000;

cppm::Timer timer;
auto is_composite = vector<bool>( n + 1 );
for( int i = 2; i < n; ) {
while( is_composite[i] ) { ++i; }
const int prime = i;
for( int j = 2*prime; j < n; j += prime ) { is_composite[j]
= true; }
++i;
}
timer.stop();
fmt::print( "{} seconds.\n", timer.seconds() );

auto f = fopen( "primes.alf.txt", "w" );
if( not f ) { return EXIT_FAILURE; }
for( int i = 2; i < n; ++i ) {
if( not is_composite[i] ) {
fmt::print( f, "{:d}\n", i );
}
}
fclose( f );
}

The timer code:

#pragma once

#include <chrono>
#include <optional>
#include <type_traits>

namespace cppm {
namespace chrono = std::chrono;
using std::optional, // <optional>
std::conditional_t; // <type_traits>

using Duration = chrono::duration<double>;

class Timer
{
using Hrc = chrono::high_resolution_clock; using Sc =
chrono::steady_clock;

public:
using Clock = conditional_t<Hrc::is_steady, Hrc, Sc>;
using Time_point = chrono::time_point<Clock>;

private:
Time_point m_start_time;
optional<Time_point> m_opt_end_time;

public:
Timer(): m_start_time( Clock::now() ), m_opt_end_time() {}

void stop() { if( not m_opt_end_time ) { m_opt_end_time =
Clock::now(); } }

auto duration() const -> Duration { return
m_opt_end_time.value() - m_start_time; }
auto seconds() const -> double { return
duration().count(); }
};
} // namespace cppm

I have some other more advanced prime sieve code that I made for an
example, but I think it's much slower, it's just a way that can go on
and on forever, or more precisely till the end of the integer type's
range or it runs out of memory for a map of the deduces primes.

- Alf

Alf P. Steinbach

unread,

Aug 16, 2023, 5:21:51 PM8/16/23

to

On 2023-08-16 11:17 PM, Alf P. Steinbach wrote:
> ...

Unfortunately the `using std::ofstream`, left in the code due to
imperfect editing, compiled -- with Visual C++. Gah! Not used.

- Alf (wishing for old Modula-2, just with better syntax etc.)

Alf P. Steinbach

unread,

Aug 16, 2023, 5:39:19 PM8/16/23

to

On 2023-08-16 11:17 PM, Alf P. Steinbach wrote:

Well, I've become stupid in my old days. Reduced running time by say 25%
or so by very simple optimization. I did not think of that until /after/
I'd posted.

New running time, three tries:

0.0365686 seconds.
0.0362102 seconds.
0.0359283 seconds.

New better shiny sparkling code:

#include <fmt/core.h>
#include "cppm-Timer.hpp"

#include <vector>

using std::vector;

#include <stdlib.h> // EXIT_FAILURE
#include <stdio.h> // fopen, fclose

auto main() -> int
{
constexpr int n = 10'000'000;

constexpr int sqrt_n = 3'162;

cppm::Timer timer;
auto is_composite = vector<bool>( n + 1 );
for( int i = 2; i < n; ) {
while( is_composite[i] ) { ++i; }
const int prime = i;

if( prime < sqrt_n ) {
for( int j = prime*prime; j < n; j += prime ) {

is_composite[j] = true;
}
}
++i;
}
timer.stop();
fmt::print( "{} seconds.\n", timer.seconds() );

auto f = fopen( "primes.alf.txt", "w" );
if( not f ) { return EXIT_FAILURE; }
for( int i = 2; i < n; ++i ) {
if( not is_composite[i] ) {
fmt::print( f, "{:d}\n", i );
}
}
fclose( f );
}

Clue: for efficiency, avoid totally needless work.

- Alf

Alf P. Steinbach

unread,

Aug 16, 2023, 5:59:46 PM8/16/23

to

Oh, I'm REALLY old.

Even less slow, now clocking in at

0.0260816 seconds.
0.0267381 seconds.
0.0269437 seconds.

Code:

#include <fmt/core.h>
#include "cppm-Timer.hpp"

#include <vector>
using std::vector;

#include <stdlib.h> // EXIT_FAILURE
#include <stdio.h> // fopen, fclose

auto main() -> int
{
constexpr int n = 10'000'000;
constexpr int sqrt_n = 3'162;

cppm::Timer timer;
auto is_composite = vector<bool>( n + 1 );

for( int i = 4; i < n; i += 2 ) { is_composite[i] = true; }
for( int i = 3; i < n; ) {

while( is_composite[i] ) { ++i; }
const int prime = i;

if( prime <= sqrt_n ) {
for( int j = prime*prime; j < n; j += 2*prime ) {

is_composite[j] = true;
}
}
++i;
}
timer.stop();
fmt::print( "{} seconds.\n", timer.seconds() );

auto f = fopen( "primes.alf.txt", "w" );
if( not f ) { return EXIT_FAILURE; }
for( int i = 2; i < n; ++i ) {
if( not is_composite[i] ) {
fmt::print( f, "{:d}\n", i );
}
}
fclose( f );
}

But every little optimization adds complexity and verbosity.

- Alf

Chris M. Thomasson

unread,

Aug 16, 2023, 7:27:27 PM8/16/23

to

On 8/16/2023 2:59 PM, Alf P. Steinbach wrote:
[...]

Fwiw, keep in mind what your system is currently doing when you run a
benchmark. Personally, I shutdown as many unnecessary processes as I can
before I run any benchmark. Bare bones.

red floyd

unread,

Aug 16, 2023, 9:06:43 PM8/16/23

to

On 8/16/2023 2:17 PM, Alf P. Steinbach wrote:
> On 2023-08-16 3:37 PM, Bonita Montero wrote:
>> I tried to implement the sieve of Erastosthenes as fast as possible.
>> I didn't thought it would make a big difference, but having a bit-wise
>> vector<bool> instead of a vector<char> makes the code twice as fast
>> for my AMD 7950X PC and an upper bound of one billion. The additional
>> overhead for the bit-wise access is less than the additional overhead
>> of fetching more memory

[redacted]

Here's mine. I get rid of the vector<bool> by storing the prime factors
in their own vector, and only checking for divisibility by primes.

It can be sped up by checking for divisibility only by values less than
or equal to sqrt(candidate value).

I only check up to 1 million, but you can obviously change max_prime.
Also change to long long for values greater than 2 billion.

--- cut here --

#include <iostream>
#include <ostream>
#include <vector>
#include <cmath>
#include <cstdlib>

const long max_prime = 1000000;
const long max_factor =
static_cast<long>(std::sqrt(static_cast<double>(max_prime))) + 1;

int main(int, char *[])
{
std::vector<long> factors;

std::cout << "2" << std::endl;
for (long n = 3; n < max_prime ; n += 2 )
{
bool composite;
composite = false;
for (std::vector<long>::iterator it = factors.begin();
!composite && it != factors.end();
++it)
{
if ((n % *it) == 0)
composite = true;
}
if (!composite)
{
std::cout << n << std::endl;

if (n < max_factor)
factors.push_back(n);
}
}

return EXIT_SUCCESS;
}

Tim Rentsch

unread,

Aug 16, 2023, 10:26:37 PM8/16/23

to

red floyd <no.spa...@its.invalid> writes:

> On 8/16/2023 2:17 PM, Alf P. Steinbach wrote:
>
>> On 2023-08-16 3:37 PM, Bonita Montero wrote:
>>
>>> I tried to implement the sieve of Erastosthenes as fast as
>>> possible. I didn't thought it would make a big difference, but
>>> having a bit-wise vector<bool> instead of a vector<char> makes the
>>> code twice as fast for my AMD 7950X PC and an upper bound of one
>>> billion. The additional overhead for the bit-wise access is less
>>> than the additional overhead of fetching more memory
>
> [redacted]
>
> Here's mine. I get rid of the vector<bool> by storing the prime
> factors in their own vector, and only checking for divisibility by

> primes. [...]

Here are some suggestions for everyone.

Except for 2, 3, and 5, numbers can be considered mod 30, and
there are (conveniently) only 8 candidates: 1, 7, 11, 13, 17,
19, 23, and 29. For numbers 7 and up, every prime must be
one of those candidates (mod 30).

By storing the results for blocks of 30 in a byte, we can
store results for all 32 bit primes in 143165577 bytes.

There are some extra tricks taking advantage of what happens
with the bit numbers. For example 7 * 37 has two 7's, which
is 49, which is 19 mod 30, so the bit for that product will
be in the 19 position. Being clever will avoid any shifting
or modding.

I suggest computing all 32 bit primes for the benchmark
tests. Shouldn't take too long.

Bonita Montero

unread,

Aug 17, 2023, 2:43:12 AM8/17/23

to

Am 17.08.2023 um 03:06 schrieb red floyd:

> Here's mine. I get rid of the vector<bool> by storing the prime factors
> in their own vector, and only checking for divisibility by primes.

If I modify my code to use the same, called Euler's sieve, my code
becomes 19 times slower on my 7950X with an upper bound of 1E6.

Bonita Montero

unread,

Aug 17, 2023, 5:38:46 AM8/17/23

to

This is the code:

#include <iostream>
#include <vector>
#include <charconv>
#include <string_view>
#include <algorithm>
#include <fstream>
#include <cctype>
#include <cstring>

using namespace std;

int main( int argc, char **argv )
{
constexpr bool USE_BOOL = true;
constexpr size_t BUF_SIZE = 0x100000;
try
{
size_t max;
if( argc < 2 )
return EXIT_FAILURE;
char const *sizeStr = argv[1];
bool hex = sizeStr[0] == '0' && (sizeStr[1] == 'x' || sizeStr[1] == 'X');
sizeStr += 2 * hex;
if( from_chars_result fcr = from_chars( sizeStr, sizeStr + strlen(
sizeStr ), max, !hex ? 10 : 16 ); (bool)fcr.ec || *fcr.ptr )
return EXIT_FAILURE;

vector<size_t> primes;
for( size_t p = 2; p < max; ++p )
{
bool has = false;
for( size_t soFar : primes )
if( !(p % soFar) )
{
has = true;
break;
}
if( !has )
primes.emplace_back( p );

}
if( argc >= 3 && !*argv[2] )
return EXIT_SUCCESS;
ofstream ofs;
ofs.exceptions( ofstream::failbit | ofstream::badbit );
ofs.open( argc >= 3 ? argv[2] : "primes.txt", ofstream::trunc );
union ndi_t { }; // non-default-initialzied
vector<ndi_t> buf;
buf.resize( BUF_SIZE + 32 );
char
*bufBegin = (char *)buf.data(),

*bufEnd = bufBegin,
*const bufFlush = bufBegin + BUF_SIZE;
for( size_t p : primes )
{
to_chars_result tcr = to_chars( bufEnd, bufEnd + 32, p );
bufEnd = tcr.ptr;
*bufEnd++ = '\n';
if( bufEnd >= bufFlush )

ofs << string_view( bufBegin, bufEnd ),
bufEnd = bufBegin;
}
ofs << string_view( bufBegin, bufEnd );
}
catch( bad_alloc const & )
{
cout << "out of memory" << endl;
}
catch( ios_base::failure const & )
{
cout << "I/O error" << endl;
}
}

I tried that with an upper bound of 1E8 and after 2,5 hours of computing
I stopped the process. With my primary code the process ends after 0.4s.

Alf P. Steinbach

unread,

Aug 17, 2023, 6:50:55 AM8/17/23

to

On 2023-08-17 11:38 AM, Bonita Montero wrote:
> Am 17.08.2023 um 08:42 schrieb Bonita Montero:
>> Am 17.08.2023 um 03:06 schrieb red floyd:
>>
>>> Here's mine. I get rid of the vector<bool> by storing the prime factors
>>> in their own vector, and only checking for divisibility by primes.
>>
>> If I modify my code to use the same, called Euler's sieve, my code
>> becomes 19 times slower on my 7950X with an upper bound of 1E6.
>>
>
> This is the code:

[snip]

>         vector<size_t> primes;
>         for( size_t p = 2; p < max; ++p )
>         {
>             bool has = false;
>             for( size_t soFar : primes )
>                 if( !(p % soFar) )
>                 {
>                     has = true;
>                     break;
>                 }
>             if( !has )
>                 primes.emplace_back( p );
>         }

[snip]

> I tried that with an upper bound of 1E8 and after 2,5 hours of computing
> I stopped the process. With my primary code the process ends after 0.4s.

One less unreasonable way to check against already computed primes is to
generate and update the smallest multiple of each already known prime,
that is larger than the current number to be checked.

It reduces the complexity for each number check to roughly logarithmic
in the number of primes so far, instead of linear. And in total roughly
O(n log n) instead of O(n^2). Quadratic is just impractical, as you found.

Code for the collection-of-smallest-multiples approach:

#include <fmt/core.h>
#include "cppm-Timer.hpp"

#include <queue>
#include <vector>
using std::priority_queue,
std::vector;

#include <stdio.h> // fopen, fclose

template< class Item, class Compare_func >
using Priority_q = priority_queue<Item, std::vector<Item>,
Compare_func>;

struct Multiple
{
int base; int n;
auto value() const -> int{ return n*base; }
};

struct Descending_order
{
auto operator()( const Multiple& a, const Multiple& b ) const
-> bool
{ return a.value() > b.value(); } // Note: direction of
comparison.
};

auto main() -> int
{
constexpr int n = 10'000'000;

cppm::Timer timer;

auto primes = vector<int>();
primes.reserve( n/2 );
const auto produce = [&]( const int prime ) { primes.push_back(
prime ); };

produce( 2 );
auto larger_composites = Priority_q<Multiple, Descending_order>();
larger_composites.push( Multiple{2, 2} );
for( int current = 3; current < n; ++current ) {
bool is_composite = false;
for( ;; ) {
const Multiple next_composite = larger_composites.top();
if( current < next_composite.value() ) {
break;
}
larger_composites.pop();
is_composite = true;
larger_composites.push( Multiple{next_composite.base, 1
+ next_composite.n} );
}
if( not is_composite ) {
produce( current );
larger_composites.push( Multiple{current, 2} );

}
}
timer.stop();
fmt::print( "{} seconds.\n", timer.seconds() );

auto f = fopen( "primes.alf-gen.txt", "w" );

if( not f ) { return EXIT_FAILURE; }

for( const int prime: primes ) { fmt::print( f, "{:d}\n", prime
); }
fclose( f );
}

It runs roughly 100 times slower than the unslowest of the earlier
programs I posted in this thread, using 3.5736102 seconds to produce the
primes up to 10'000'000, but I believe still much faster than the
quadratic time code you showed (I would guess maybe 400 seconds = 6'40
minutes for the 1 billion case?), and has the possible advantage that
one doesn't need to know in advance a cap on the prime size.

It can just run and run, gobbling up memory as it goes.

- Alf

Bonita Montero

unread,

Aug 17, 2023, 7:10:26 AM8/17/23

to

Am 17.08.2023 um 12:50 schrieb Alf P. Steinbach:

> One less unreasonable way to check against already computed primes is to
> generate and update the smallest multiple of each already known prime,
> that is larger than the current number to be checked.

Beginning with the smaller primes aldready seen does make sense
since you can filter the values earlier. F.e. there are more
values which are a multiple of two than a multiple of 11.

Bonita Montero

unread,

Aug 17, 2023, 11:47:38 AM8/17/23

to

I did it the same way with some simpler code:

#include <iostream>
#include <queue>
#include <vector>
#include <charconv>
#include <fstream>

using namespace std;

int main( int argc, char **argv )
{

constexpr size_t BUF_SIZE = 0x100000;

size_t max;
if( argc < 2 )
return EXIT_FAILURE;
char const *sizeStr = argv[1];
bool hex = sizeStr[0] == '0' && (sizeStr[1] == 'x' || sizeStr[1] == 'X');
sizeStr += 2 * hex;
if( from_chars_result fcr = from_chars( sizeStr, sizeStr + strlen(
sizeStr ), max, !hex ? 10 : 16 ); (bool)fcr.ec || *fcr.ptr )
return EXIT_FAILURE;

struct next_multiple { size_t p, pNext; };
auto nm_less = []( next_multiple const &left, next_multiple const
&right ) { return left.pNext > right.pNext; };
priority_queue<next_multiple, vector<next_multiple>, decltype(nm_less)>
nextMultiples( nm_less );
nextMultiples.emplace( 2, 4 );
auto *hitOrNot = &nextMultiples.top();
for( size_t p = 3; p <= max; ++p )
{
if( p < hitOrNot->pNext ) [[unlikely]]
{
nextMultiples.emplace( p, p + p );
hitOrNot = &nextMultiples.top();
continue;
}
do
{
auto nextIncr = *hitOrNot;
nextMultiples.pop();
nextIncr.pNext += nextIncr.p;
nextMultiples.emplace( nextIncr );
hitOrNot = &nextMultiples.top();
} while( p == hitOrNot->pNext );
}
vector<size_t> primes;
primes.reserve( nextMultiples.size() );
while( !nextMultiples.empty() )
{
auto const next = nextMultiples.top();
primes.emplace_back( next.p );
nextMultiples.pop();
}
sort( primes.begin(), primes.end() );

if( argc >= 3 && !*argv[2] )

return EXIT_FAILURE;

ofstream ofs;
ofs.exceptions( ofstream::failbit | ofstream::badbit );
ofs.open( argc >= 3 ? argv[2] : "primes.txt", ofstream::trunc );
union ndi_t { }; // non-default-initialzied

vector<ndi_t> buf( BUF_SIZE + 32 );

char
*bufBegin = (char*)buf.data(),
*bufEnd = bufBegin,

*bufFlush = bufBegin + BUF_SIZE;

for( size_t p : primes )
{
to_chars_result tcr = to_chars( bufEnd, bufEnd + 32, p );
bufEnd = tcr.ptr;
*bufEnd++ = '\n';
if( bufEnd >= bufFlush )
ofs << string_view( bufBegin, bufEnd ),
bufEnd = bufBegin;
}
ofs << string_view( bufBegin, bufEnd );
}

If I print all primes <= 1E8 that's still 82 times slower than my
initial solution.

wij

unread,

Aug 17, 2023, 4:43:39 PM8/17/23

to

Code Comment:
1. The use of 'not' is impressive, bringing dead to life.
2. The creation of class Timer may be questionable. Or it is just specifically
to simply and emphasize the main codes.
3. I don't know what "fmt::print( f, "{:d}\n", i );" really mean, but in general
I think cstring format in its current usage (e.g. "foo %s= %d\n") does not
provide real benefit for the executing time spent interpreting it.

Alf P. Steinbach

unread,

Aug 17, 2023, 8:00:49 PM8/17/23

to

On 2023-08-17 10:43 PM, wij wrote:
> On Thursday, August 17, 2023 at 5:17:54 AM UTC+8, Alf P. Steinbach wrote:

[snip]

>
> Code Comment:
> 1. The use of 'not' is impressive, bringing dead to life.

I guess that means you didn't like it, but I do.

It may be an acquired taste, like coffee and beer. And I came to C++
from Pascals and Modula-2 and such. But try it: you may come to like it.

> 2. The creation of class Timer may be questionable. Or it is just specifically
> to simply and emphasize the main codes.

Factoring out that functionality in a header makes it easy to reuse.

And it was reused.

But even with no hope of reuse you should not be afraid to define useful
abstractions, as opposed to just using others' abstractions.

> 3. I don't know what "fmt::print( f, "{:d}\n", i );" really mean, but in general
> I think cstring format in its current usage (e.g. "foo %s= %d\n") does not
> provide real benefit for the executing time spent interpreting it.

With the {fmt} library's `fmt::print`, which will become C++23's
`std::print`, the format string is interpreted at compile time.

That's almost like a lint implemented via C++ template system, plus that
it's just about the fastest string formatting around, plus handles
Unicode (guaranteed by the standard) if the output device can do it.

The design of the compile time format string handling does however not
adhere to the principle of least surprise, and that caused the
standard's wording to be a bit sabotage-like for some time. That was
eventually fixed. But means that somewhere I have this C++20 code:

#if __cpp_lib_format >= (2022*100 + 7)
#define myprefix_FMTLIB_API_PROVIDES_FORMAT_STRING_TYPE
constexpr bool api_provides_format_string_type = true;

- Alf

wij

unread,

Aug 18, 2023, 8:01:58 AM8/18/23

to

Not only time, but many kind of resources are used by std::print.

My idea of the 'format' issue was adding a format member to class String should
be enough, e.g. "String& String::fmt(..)", where the .. means unclear.

So, the usecase (example) printf("IC0123 qty:%5d, price:%.2f\n",qty,prc) can be
translated like the following:

result << String("IC0123 qty:") << wrd(qty).fmt("5") << ", price:"
<< wrd(prc).fmt(".2") << "\n";

('resutl' can be String or any device)

The main point is that such provision does not deserve any complexity adding to
the language, so burden to the user as well. Of course, who knows what the stdc++'s
head is thinking about.

wij

unread,

Aug 18, 2023, 8:03:10 PM8/18/23

to

Just came to me, libwy already had the basic capability of 'format' :

wrd(i).indent('0',5) // left align, width=5, fill-character='0';
wrd(i).indent(5,' ') // right align, width=5, fill-character=' ';
wrd(val,10,2) // float number=val, radix=10, fraction digits=2
wrd(val, 0,2) // float number=val, scientific notation, fraction digits=2

I never remember all these rules of format specifiers:
https://cplusplus.com/reference/cstdio/printf/

As to the example shown "constexpr int n = 10'000'000;", there are many kind of
preferences, e.g. "10,000,000" or "1000 0000",... or 'unicode'. IMO, these are
better be done by programmers, not in 'standard'.

Bonita Montero

unread,

Aug 21, 2023, 5:00:33 AM8/21/23

to

I tried to combine the bitmap-approach with the heap-approach,
having a bitmap that fits into the cache and which is a window
inside the rows of numbers.

#include <iostream>
#include <vector>
#include <charconv>
#include <string_view>
#include <algorithm>
#include <fstream>
#include <cctype>
#include <cstring>

#include <queue>

#if defined(_MSC_VER)
#pragma warning(disable: 26495) // uninitialized member
#endif

using namespace std;

int main( int argc, char **argv )
{

try
{
constexpr size_t
BITMAP_SIZE = 0x80000,
BITS = BITMAP_SIZE * 8;

size_t max;
if( argc < 2 )
return EXIT_FAILURE;
char const *sizeStr = argv[1];
bool hex = sizeStr[0] == '0' && (sizeStr[1] == 'x' || sizeStr[1] == 'X');
sizeStr += 2 * hex;
if( from_chars_result fcr = from_chars( sizeStr, sizeStr + strlen(
sizeStr ), max, !hex ? 10 : 16 ); (bool)fcr.ec || *fcr.ptr )
return EXIT_FAILURE;

union ndi_t { uint8_t c; ndi_t() {} };
vector<ndi_t> bitmap( BITMAP_SIZE );

struct next_multiple { size_t p, pNext; };
auto nm_less = []( next_multiple const &left, next_multiple const
&right ) { return left.pNext > right.pNext; };
priority_queue<next_multiple, vector<next_multiple>, decltype(nm_less)>

multiples( nm_less );
for( size_t w = 2; w <= max; w += BITS )
{
for_each( bitmap.begin(), bitmap.end(), [&]( ndi_t &n ) { n.c = 0xFF;
} );
size_t wEnd = w + BITS <= max ? w + BITS : max;
auto clearInterval = [&]( next_multiple &nm )
{
size_t b = nm.pNext - w;
for( ; b < wEnd - w; b += nm.p )
bitmap[b / 8].c &= ~(1u << b % 8);
nm.pNext = w + b;
};
if( multiples.size() ) [[likely]]
for( next_multiple const *pNm; (pNm = &multiples.top())->pNext < wEnd; )
{
auto nm = *pNm;
multiples.pop();
clearInterval( nm );
multiples.push( nm );
}
for( size_t b = 0, e = wEnd - w; b != e; ++b )
if( bitmap[b / 8].c >> b % 8 & 1 )
{
next_multiple nm { w + b, 2 * (w + b) };
clearInterval( nm );
multiples.push( nm );
}
}
/*while (multiples.size())
cout << multiples.top().p << endl,
multiples.pop();*/

}
catch( bad_alloc const & )
{
cout << "out of memory" << endl;
}

}

This is faster than the pure heap approach but still not as fast
as the pure bitmap approach.

Mut...@dastardlyhq.com

unread,

Aug 21, 2023, 1:19:59 PM8/21/23

to

On Mon, 21 Aug 2023 11:00:15 +0200
Bonita Montero <Bonita....@gmail.com> wrote:
>I tried to combine the bitmap-approach with the heap-approach,
>having a bitmap that fits into the cache and which is a window
>inside the rows of numbers.

If only erastothenes had had C++20 he too could have come up with some
hopeless complicated solution no one understood. Or maybe he would have
done some simple baby code in C...

#include <stdio.h>
#include <stdlib.h>
#include <strings.h>
#include <math.h>

int main(int argc, char **argv)
{
char *primes;
int num;
int sqr;
int i;
int j;

if (argc != 2 || (num = atoi(argv[1])) < 1)
{
printf("Usage: %s <max number>\n",argv[0]);
return 1;
}
primes = (char *)malloc(num);

bzero(primes,num);

sqr = (int)ceil(sqrt(num));

for(i=3;i <= sqr;i+=2)
{
if (!primes[i])
{
for(j=i*2;j <= num;j+=i) primes[j] = 1;
}
}

for(i=1;i <= num;i+=2)
{
if (!primes[i])
{
printf("%d\n",i);
if (i == 1) puts("2");
}
}
return 0;
}

Bonita Montero

unread,

Aug 21, 2023, 1:50:49 PM8/21/23

to

Am 21.08.2023 um 19:19 schrieb Mut...@dastardlyhq.com:
> On Mon, 21 Aug 2023 11:00:15 +0200
> Bonita Montero <Bonita....@gmail.com> wrote:
>> I tried to combine the bitmap-approach with the heap-approach,
>> having a bitmap that fits into the cache and which is a window
>> inside the rows of numbers.
>
> If only erastothenes had had C++20 he too could have come up with some
> hopeless complicated solution no one understood. Or maybe he would have
> done some simple baby code in C...

On my PC printing the all primes below 1E8 takes 0.827s with your
code. With my code it takes about 0.36s. Take this as a reference:

#include <iostream>
#include <vector>
#include <charconv>
#include <string_view>
#include <algorithm>
#include <fstream>
#include <cctype>
#include <cstring>

#if defined(_MSC_VER)
#pragma warning(disable: 26495) // uninitialized member
#endif

using namespace std;

int main( int argc, char **argv )
{

constexpr size_t BUF_SIZE = 0x100000;
try
{

size_t max;
if( argc < 2 )
return EXIT_FAILURE;
char const *sizeStr = argv[1];
bool hex = sizeStr[0] == '0' && (sizeStr[1] == 'x' || sizeStr[1] == 'X');
sizeStr += 2 * hex;
if( from_chars_result fcr = from_chars( sizeStr, sizeStr + strlen(
sizeStr ), max, !hex ? 10 : 16 ); (bool)fcr.ec || *fcr.ptr )
return EXIT_FAILURE;

vector<uint8_t> sieve( (max + 1 + 7) / 8, (uint8_t)-1 );
ofstream ofs;
union ndi_t { char c; ndi_t() {} }; // non-default-initialzied
using ndi_vec = vector<ndi_t>;
using ndi_vec_it = ndi_vec::iterator;
constexpr size_t AHEAD = 32;
ndi_vec printBuf;
ndi_vec_it bufBegin, bufEnd, bufFlush;
if( argc < 3 || *argv[2] )
{

ofs.exceptions( ofstream::failbit | ofstream::badbit );
ofs.open( argc >= 3 ? argv[2] : "primes.txt", ofstream::trunc );

printBuf.resize( BUF_SIZE + AHEAD - 1 );
bufBegin = printBuf.begin();
bufEnd = bufBegin;

bufFlush = bufBegin + BUF_SIZE;
}

auto print = [&]() { ofs << string_view( &bufBegin->c, &bufEnd->c ); };
for( size_t p = 2; p <= max; )
{
for( size_t m = 2 * p; m <= max; m += p )
sieve[m / 8] &= ~(1u << m % 8);
while( ++p <= max && !(1 & sieve[p / 8] >> p % 8) );
if( !ofs.is_open() )
continue;
to_chars_result tcr = to_chars( &bufEnd->c, &to_address( bufEnd +
(AHEAD - 1) )->c, p );
bufEnd = tcr.ptr - &bufBegin->c + bufBegin; // NOP
bufEnd++->c = '\n';

if( bufEnd >= bufFlush )

print(),
bufEnd = bufBegin;

}
print();
return EXIT_SUCCESS;

}
catch( bad_alloc const & )
{
cout << "out of memory" << endl;
}

catch( ios_base::failure const & )
{
cout << "I/O error" << endl;
}
}

The complexity is all because of speed. Using a byte-array is much
slower and using C-streams is also slow. I'm using a bit-array and
I'm using my own output buffering.

Mut...@dastardlyhq.com

unread,

Aug 21, 2023, 2:12:34 PM8/21/23

to

On Mon, 21 Aug 2023 19:50:32 +0200

Bonita Montero <Bonita....@gmail.com> wrote:
>Am 21.08.2023 um 19:19 schrieb Mut...@dastardlyhq.com:
>> On Mon, 21 Aug 2023 11:00:15 +0200
>> Bonita Montero <Bonita....@gmail.com> wrote:
>>> I tried to combine the bitmap-approach with the heap-approach,
>>> having a bitmap that fits into the cache and which is a window
>>> inside the rows of numbers.
>>
>> If only erastothenes had had C++20 he too could have come up with some
>> hopeless complicated solution no one understood. Or maybe he would have
>> done some simple baby code in C...
>
>On my PC printing the all primes below 1E8 takes 0.827s with your
>code. With my code it takes about 0.36s.

I'm fairly sure I could speed it up if I could be bothered.

>Take this as a reference:

Sorry, can't be bothered to wade through that mess.

Bonita Montero

unread,

Aug 21, 2023, 2:22:50 PM8/21/23

to

For sure not.

>> Take this as a reference:

> Sorry, can't be bothered to wade through that mess.

It's a mess because you're overburdened with reading the code.

Mut...@dastardlyhq.com

unread,

Aug 21, 2023, 3:01:39 PM8/21/23

to

On Mon, 21 Aug 2023 20:22:36 +0200

Bonita Montero <Bonita....@gmail.com> wrote:
>Am 21.08.2023 um 20:12 schrieb Mut...@dastardlyhq.com:
>
>> On Mon, 21 Aug 2023 19:50:32 +0200
>> Bonita Montero <Bonita....@gmail.com> wrote:
>
>>> On my PC printing the all primes below 1E8 takes 0.827s with your
>>> code. With my code it takes about 0.36s.
>
>> I'm fairly sure I could speed it up if I could be bothered.
>
>For sure not.

And why's that then? Give us your insight.

>
>>> Take this as a reference:
>
>> Sorry, can't be bothered to wade through that mess.
>
>It's a mess because you're overburdened with reading the code.

A mess is burdensome.

Bonita Montero

unread,

Aug 21, 2023, 11:27:51 PM8/21/23

to

Am 21.08.2023 um 21:01 schrieb Mut...@dastardlyhq.com:
> On Mon, 21 Aug 2023 20:22:36 +0200
> Bonita Montero <Bonita....@gmail.com> wrote:
>> Am 21.08.2023 um 20:12 schrieb Mut...@dastardlyhq.com:
>>
>>> On Mon, 21 Aug 2023 19:50:32 +0200
>>> Bonita Montero <Bonita....@gmail.com> wrote:
>>
>>>> On my PC printing the all primes below 1E8 takes 0.827s with your
>>>> code. With my code it takes about 0.36s.
>>
>>> I'm fairly sure I could speed it up if I could be bothered.
>>
>> For sure not.
>

> And why's that then? ...

Because you can only write simple code.

Öö Tiib

unread,

Aug 22, 2023, 2:26:02 AM8/22/23

to

On Tuesday, 22 August 2023 at 06:27:51 UTC+3, Bonita Montero wrote:
> Am 21.08.2023 um 21:01 schrieb Mut...@dastardlyhq.com:
> > On Mon, 21 Aug 2023 20:22:36 +0200
> > Bonita Montero <Bonita....@gmail.com> wrote:
> >> Am 21.08.2023 um 20:12 schrieb Mut...@dastardlyhq.com:
> >>
> >>> On Mon, 21 Aug 2023 19:50:32 +0200
> >>> Bonita Montero <Bonita....@gmail.com> wrote:
> >>
> >>>> On my PC printing the all primes below 1E8 takes 0.827s with your
> >>>> code. With my code it takes about 0.36s.
> >>
> >>> I'm fairly sure I could speed it up if I could be bothered.
> >>
> >> For sure not.
> >
> > And why's that then? ...
>
> Because you can only write simple code.

Do not hijack nice thread of finding primes with your optimising
textual output.
Store primes however you want to and do not time text streaming
whatsoever. The problem was to find primes.

Mut...@dastardlyhq.com

unread,

Aug 22, 2023, 2:56:26 AM8/22/23

to

On Tue, 22 Aug 2023 05:27:36 +0200

Bonita Montero <Bonita....@gmail.com> wrote:
>Am 21.08.2023 um 21:01 schrieb Mut...@dastardlyhq.com:
>> On Mon, 21 Aug 2023 20:22:36 +0200
>> Bonita Montero <Bonita....@gmail.com> wrote:
>>> Am 21.08.2023 um 20:12 schrieb Mut...@dastardlyhq.com:
>>>
>>>> On Mon, 21 Aug 2023 19:50:32 +0200
>>>> Bonita Montero <Bonita....@gmail.com> wrote:
>>>
>>>>> On my PC printing the all primes below 1E8 takes 0.827s with your
>>>>> code. With my code it takes about 0.36s.
>>>
>>>> I'm fairly sure I could speed it up if I could be bothered.
>>>
>>> For sure not.
>>
>> And why's that then? ...
>
>Because you can only write simple code.

Simpler code = more efficient, less bugs and easier to maintain all other
things being equal. Complexity in code isn't a goal, its something to be
kept to a minimum. If you were even half the genius coder you seem to think
you are you'd already be aware of this.

Bonita Montero

unread,

Aug 22, 2023, 5:23:53 AM8/22/23

to

Am 22.08.2023 um 08:25 schrieb Öö Tiib:
> On Tuesday, 22 August 2023 at 06:27:51 UTC+3, Bonita Montero wrote:
>> Am 21.08.2023 um 21:01 schrieb Mut...@dastardlyhq.com:
>>> On Mon, 21 Aug 2023 20:22:36 +0200
>>> Bonita Montero <Bonita....@gmail.com> wrote:
>>>> Am 21.08.2023 um 20:12 schrieb Mut...@dastardlyhq.com:
>>>>
>>>>> On Mon, 21 Aug 2023 19:50:32 +0200
>>>>> Bonita Montero <Bonita....@gmail.com> wrote:
>>>>
>>>>>> On my PC printing the all primes below 1E8 takes 0.827s with your
>>>>>> code. With my code it takes about 0.36s.
>>>>
>>>>> I'm fairly sure I could speed it up if I could be bothered.
>>>>
>>>> For sure not.
>>>
>>> And why's that then? ...
>>
>> Because you can only write simple code.
>
> Do not hijack nice thread of finding primes with your optimising
> textual output.

The speedup over cout << p << endl is massive, at least with MSVC;
I've measured that, you didn't.

Bonita Montero

unread,

Aug 22, 2023, 5:27:44 AM8/22/23

to

Am 22.08.2023 um 08:56 schrieb Mut...@dastardlyhq.com:
> On Tue, 22 Aug 2023 05:27:36 +0200
> Bonita Montero <Bonita....@gmail.com> wrote:
>> Am 21.08.2023 um 21:01 schrieb Mut...@dastardlyhq.com:
>>> On Mon, 21 Aug 2023 20:22:36 +0200
>>> Bonita Montero <Bonita....@gmail.com> wrote:
>>>> Am 21.08.2023 um 20:12 schrieb Mut...@dastardlyhq.com:
>>>>
>>>>> On Mon, 21 Aug 2023 19:50:32 +0200
>>>>> Bonita Montero <Bonita....@gmail.com> wrote:
>>>>
>>>>>> On my PC printing the all primes below 1E8 takes 0.827s with your
>>>>>> code. With my code it takes about 0.36s.
>>>>
>>>>> I'm fairly sure I could speed it up if I could be bothered.
>>>>
>>>> For sure not.
>>>
>>> And why's that then? ...
>>
>> Because you can only write simple code.
>

> Simpler code = more efficient, ...

I've shown that my code is 2.3 times more efficient.
In most cases more complex code is faster.

> ... less bugs ...

Show me the bugs. I can manage such code, you don't. I did use
iterators as mich as possible to have iterator debugging to find
bugs if they would have happened.

> Complexity in code isn't a goal, ...

It's inevitable if the code should be more efficient.

> If you were even half the genius coder you seem to think
> you are you'd already be aware of this.

I'm not a genius, but I have extreme calm in my head that
I can program something like this.

Bonita Montero

unread,

Aug 22, 2023, 5:34:35 AM8/22/23

to

Am 22.08.2023 um 11:27 schrieb Bonita Montero:

>> Simpler code = more efficient, ...

> I've shown that my code is 2.3 times more efficient.
> In most cases more complex code is faster.

On my secondary Linux PC (Threadripper 3990X) my code is 61% faster.

Bonita Montero

unread,

Aug 22, 2023, 5:59:35 AM8/22/23

to

C:\Users\Boni\Documents\Source\bitmapSieve>bg --times --affinity 1
--high x64\Release\bitmapSieve 100000000
real 9675.36ms
user 2875.00ms
sys 6718.75ms
cycles 43.369.468.839

C:\Users\Boni\Documents\Source\bitmapSieve>bg --times --affinity 1
--high x64\Release\bitmapSieve 100000000
real 370.28ms
user 281.25ms
sys 62.50ms
cycles 1.653.010.785

26 times faster on Windows (Zen4):

boni@EliteDesk:~$ g++ -march=native -std=c++20 -O2 bitmapSieve.cpp
boni@EliteDesk:~$ time ./a.out 100000000

real 0m17,552s
user 0m5,406s
sys 0m12,113s
boni@EliteDesk:~$ g++ -march=native -std=c++20 -O2 bitmapSieve.cpp
boni@EliteDesk:~$ time ./a.out 100000000

real 0m1,808s
user 0m1,680s
sys 0m0,085s

9.7 times faster on Linux (Skylake).

Öö Tiib

unread,

Aug 22, 2023, 8:15:16 AM8/22/23

to

We already know that you are especially daff but I repeat: it does not
matter to efficiency of finding primes. Open other thread if you want
to discuss text output optimisations not finding primes.

Bonita Montero

unread,

Aug 22, 2023, 8:23:17 AM8/22/23

to

Am 22.08.2023 um 14:15 schrieb Öö Tiib:

> We already know that you are especially daff but I repeat: it does not
> matter to efficiency of finding primes. Open other thread if you want
> to discuss text output optimisations not finding primes.

I'd optimized both inside the _same_ program. You and Muttley
think that simple coding is enough, I've shown multiple times
that I can do it much better.
Unfortunately printf() and cout << are by far not that fast
as they could be, even if the output is redirected to a file.

Mut...@dastardlyhq.com

unread,

Aug 22, 2023, 10:14:08 AM8/22/23

to

On Tue, 22 Aug 2023 11:27:29 +0200
Bonita Montero <Bonita....@gmail.com> wrote:
>Am 22.08.2023 um 08:56 schrieb Mut...@dastardlyhq.com:
>> Simpler code = more efficient, ...
>
>I've shown that my code is 2.3 times more efficient.
>In most cases more complex code is faster.

No, not in most cases as endless bloated frameworks with their huge memory
and CPU requirements demonstrate on a daily basis to me. There's a big
difference between code thats fully featured and code thats needlessly
compicated because the developer(s) were unable to think clearly or see the
obvious solution.

>> Complexity in code isn't a goal, ...
>
>It's inevitable if the code should be more efficient.

Bollocks.

>> If you were even half the genius coder you seem to think
>> you are you'd already be aware of this.
>
>I'm not a genius, but I have extreme calm in my head that
>I can program something like this.

Oh right, you're a Zen dev are you? :) Whatever you say!

Bonita Montero

unread,

Aug 22, 2023, 10:17:38 AM8/22/23

to

Am 22.08.2023 um 16:13 schrieb Mut...@dastardlyhq.com:

> No, not in most cases as endless bloated frameworks ...

I'm using standard C++ and no bloating framworks. The memory-con-
sumtion is eight times less than with your code because I store
each bool as a bit. Your code has bloat, not mine in that sense.

> There's a big difference between code thats fully featured and code
> thats needlessly compicated because the developer(s) were unable to
> think clearly or see the obvious solution.

It overburdens you, not every developer.

Mut...@dastardlyhq.com

unread,

Aug 22, 2023, 10:27:20 AM8/22/23

to

On Tue, 22 Aug 2023 16:17:21 +0200
Bonita Montero <Bonita....@gmail.com> wrote:
>Am 22.08.2023 um 16:13 schrieb Mut...@dastardlyhq.com:
>
>> No, not in most cases as endless bloated frameworks ...
>
>I'm using standard C++ and no bloating framworks. The memory-con-
>sumtion is eight times less than with your code because I store
>each bool as a bit. Your code has bloat, not mine in that sense.

Accessing bits in a word is normally slower than accessing the whole word
on most CPUs. However due to the huge memory this sort of algo can use this
access time is probably more than offset by paging delays.

Using bits doesn't make you a genius.

>> There's a big difference between code thats fully featured and code
>> thats needlessly compicated because the developer(s) were unable to
>> think clearly or see the obvious solution.
>
>It overburdens you, not every developer.

Whoosh...

Bonita Montero

unread,

Aug 22, 2023, 10:49:03 AM8/22/23

to

Am 22.08.2023 um 16:27 schrieb Mut...@dastardlyhq.com:

> Accessing bits in a word is normally slower than accessing the whole word

> on most CPUs. ...

I have shown that when the amount of data is large enough, access time
has the greatest weight. Furthermore, most of the calculations for the
bits within the byte in other arithmetic units take place in parallel.
If I choose the data set small enough, your algorithm is faster, but
then performance doesn't count anyway.

> Using bits doesn't make you a genius.

Did I claim that for myself ?

Mut...@dastardlyhq.com

unread,

Aug 22, 2023, 10:57:44 AM8/22/23

to

On Tue, 22 Aug 2023 16:48:45 +0200
Bonita Montero <Bonita....@gmail.com> wrote:
>Am 22.08.2023 um 16:27 schrieb Mut...@dastardlyhq.com:
>
>> Accessing bits in a word is normally slower than accessing the whole word
>> on most CPUs. ...
>
>I have shown that when the amount of data is large enough, access time
>has the greatest weight. Furthermore, most of the calculations for the
>bits within the byte in other arithmetic units take place in parallel.

CPUs only have 1 data bus no matter how many cores they have so the optimum
number of threads for this sort of algorithm is very much a suck it and
see approach depending on hardware and on a single core CPU your approach
would suck badly.

>> Using bits doesn't make you a genius.
>
>Did I claim that for myself ?

By denigrating anyone who doesn't like your coding style the implication is
there for all to see. You think yourself a better programmer than anyone
else on this group.

Bonita Montero

unread,

Aug 22, 2023, 11:01:00 AM8/22/23

to

Am 22.08.2023 um 16:57 schrieb Mut...@dastardlyhq.com:

> CPUs only have 1 data bus no matter how many cores they have so the optimum
> number of threads for this sort of algorithm is very much a suck it and
> see approach depending on hardware and on a single core CPU your approach
> would suck badly.

Data is fetched in quantities of a cacheline size. And with my code
an eight of that quantity is fetched from memory. That's the reason
while my code is much faster than yours.

> By denigrating anyone who doesn't like your coding style the implication
> is there for all to see. You think yourself a better programmer than anyone
> else on this group.

That's obvious. I'm programming C++ since 1992 and I program at all
since I'm 10.

Scott Lurndal

unread,

Aug 22, 2023, 11:49:27 AM8/22/23

to

Mut...@dastardlyhq.com writes:
>On Tue, 22 Aug 2023 16:48:45 +0200
>Bonita Montero <Bonita....@gmail.com> wrote:
>>Am 22.08.2023 um 16:27 schrieb Mut...@dastardlyhq.com:
>>
>>> Accessing bits in a word is normally slower than accessing the whole word
>>> on most CPUs. ...
>>
>>I have shown that when the amount of data is large enough, access time
>>has the greatest weight. Furthermore, most of the calculations for the
>>bits within the byte in other arithmetic units take place in parallel.
>
>CPUs only have 1 data bus no matter how many cores

That is not an accurate statement for any processor built in
the last decade or more. Most of them use a mesh, crossbar or ring structure, not
a shared bus structure to interface the cores to the memory subsystem on the
chip. Most have multiple DRAM controllers (from one at
the real low end to twenty at the high end) and stripe
the DRAM address space across the controllers. All CPUs have private
L1 and L2 caches, and large shared LLC caches (LLC is distributed
across mesh elements) whic signifinctly reduce DRAM controller accesses.

Likewise, on the I/O side, there are multiple bridges from the
mesh/ring structures into the I/O side (e.g. one host bridge per
PCI Express root port, one or more for on-chip accelerators, etc).

Mut...@dastardlyhq.com

unread,

Aug 22, 2023, 1:56:39 PM8/22/23

to

On Tue, 22 Aug 2023 17:00:42 +0200
Bonita Montero <Bonita....@gmail.com> wrote:
>Am 22.08.2023 um 16:57 schrieb Mut...@dastardlyhq.com:
>
>> CPUs only have 1 data bus no matter how many cores they have so the optimum
>> number of threads for this sort of algorithm is very much a suck it and
>> see approach depending on hardware and on a single core CPU your approach
>> would suck badly.
>
>Data is fetched in quantities of a cacheline size. And with my code

If it has to be paged from disk in the first place the size of the cache is a
total irrelevance as far as time taken is concerned.

>an eight of that quantity is fetched from memory. That's the reason
>while my code is much faster than yours.

As I said, it depends on the hardware. Try both on a Z80 and see what happens.

>> By denigrating anyone who doesn't like your coding style the implication
>> is there for all to see. You think yourself a better programmer than anyone
>> else on this group.
>
>That's obvious.

Thanks for proving you're as arrogant as I thought.

>I'm programming C++ since 1992 and I program at all
>since I'm 10.

If you want to get into a pissing contest about who's been programming longer
I suspect a lot of people on this group have been doing it a lot longer than
you.

Mut...@dastardlyhq.com

unread,

Aug 22, 2023, 1:59:10 PM8/22/23

to

On Tue, 22 Aug 2023 15:49:09 GMT
sc...@slp53.sl.home (Scott Lurndal) wrote:
>Mut...@dastardlyhq.com writes:
>>On Tue, 22 Aug 2023 16:48:45 +0200
>>Bonita Montero <Bonita....@gmail.com> wrote:
>>>Am 22.08.2023 um 16:27 schrieb Mut...@dastardlyhq.com:
>>>
>>>> Accessing bits in a word is normally slower than accessing the whole word
>>>> on most CPUs. ...
>>>
>>>I have shown that when the amount of data is large enough, access time
>>>has the greatest weight. Furthermore, most of the calculations for the
>>>bits within the byte in other arithmetic units take place in parallel.
>>
>>CPUs only have 1 data bus no matter how many cores
>
>That is not an accurate statement for any processor built in
>the last decade or more. Most of them use a mesh, crossbar or ring

Given millions of 8 and 16 bit processors are still designed and produced for
embedded systems that statement is nonsense and ARM based systems can use
whatever parts they want in the CPU.

>structure, not
>a shared bus structure to interface the cores to the memory subsystem on the
>chip. Most have multiple DRAM controllers (from one at
>the real low end to twenty at the high end) and stripe
>the DRAM address space across the controllers. All CPUs have private
>L1 and L2 caches, and large shared LLC caches (LLC is distributed
>across mesh elements) whic signifinctly reduce DRAM controller accesses.
>
>Likewise, on the I/O side, there are multiple bridges from the
>mesh/ring structures into the I/O side (e.g. one host bridge per
>PCI Express root port, one or more for on-chip accelerators, etc).

Thats nice, but CPU > x86.

Bonita Montero

unread,

Aug 22, 2023, 2:13:35 PM8/22/23

to

Am 22.08.2023 um 19:56 schrieb Mut...@dastardlyhq.com:

> If it has to be paged from disk in the first place the size of the
> cache is a total irrelevance as far as time taken is concerned.

No one uses software that actually pages to disk.
Paging is just for that the softwar doesn't crash.

> As I said, it depends on the hardware. Try both on a Z80 and see what happens.

Your code wasn't intended for a Z-80.

> Thanks for proving you're as arrogant as I thought.

When you're good in sth. you often appear arrogant.

> If you want to get into a pissing contest about who's been programming
> longer I suspect a lot of people on this group have been doing it a lot
> longer than you.

Maybe, but at this level.

Mut...@dastardlyhq.com

unread,

Aug 22, 2023, 2:18:20 PM8/22/23

to

On Tue, 22 Aug 2023 20:13:17 +0200
Bonita Montero <Bonita....@gmail.com> wrote:
>Am 22.08.2023 um 19:56 schrieb Mut...@dastardlyhq.com:
>
>> If it has to be paged from disk in the first place the size of the
>> cache is a total irrelevance as far as time taken is concerned.
>
>No one uses software that actually pages to disk.
>Paging is just for that the softwar doesn't crash.

I suggest you go and google how operating system virtual memory works.

>> As I said, it depends on the hardware. Try both on a Z80 and see what
>happens.
>
>Your code wasn't intended for a Z-80.

What was it intended for then given it was written in basic C?

>> Thanks for proving you're as arrogant as I thought.
>
>When you're good in sth. you often appear arrogant.

"sth"?

>> If you want to get into a pissing contest about who's been programming
>> longer I suspect a lot of people on this group have been doing it a lot
>> longer than you.
>
>Maybe, but at this level.

At your level? Absolutely.

Scott Lurndal

unread,

Aug 22, 2023, 2:25:15 PM8/22/23

to

Mut...@dastardlyhq.com writes:
>On Tue, 22 Aug 2023 15:49:09 GMT
>sc...@slp53.sl.home (Scott Lurndal) wrote:
>>Mut...@dastardlyhq.com writes:
>>>On Tue, 22 Aug 2023 16:48:45 +0200
>>>Bonita Montero <Bonita....@gmail.com> wrote:
>>>>Am 22.08.2023 um 16:27 schrieb Mut...@dastardlyhq.com:
>>>>
>>>>> Accessing bits in a word is normally slower than accessing the whole word
>>>>> on most CPUs. ...
>>>>
>>>>I have shown that when the amount of data is large enough, access time
>>>>has the greatest weight. Furthermore, most of the calculations for the
>>>>bits within the byte in other arithmetic units take place in parallel.
>>>
>>>CPUs only have 1 data bus no matter how many cores
>>
>>That is not an accurate statement for any processor built in
>>the last decade or more. Most of them use a mesh, crossbar or ring
>
>Given millions of 8 and 16 bit processors are still designed and produced for
>embedded systems that statement is nonsense and ARM based systems can use
>whatever parts they want in the CPU.

ARM based multicore systems generally use CMN-400/500/600/700 IP (from ARM)
as the processor interconnect.

https://developer.arm.com/Processors/CoreLink%20CMN-600

And all those 8 and 16-bit processors are single core, so your
statement above, which references "no matter how many cores"
doesn't apply to them.

>
>>structure, not
>>a shared bus structure to interface the cores to the memory subsystem on the
>>chip. Most have multiple DRAM controllers (from one at
>>the real low end to twenty at the high end) and stripe
>>the DRAM address space across the controllers. All CPUs have private
>>L1 and L2 caches, and large shared LLC caches (LLC is distributed
>>across mesh elements) whic signifinctly reduce DRAM controller accesses.
>>
>>Likewise, on the I/O side, there are multiple bridges from the
>>mesh/ring structures into the I/O side (e.g. one host bridge per
>>PCI Express root port, one or more for on-chip accelerators, etc).
>
>Thats nice, but CPU > x86.

What does that mean? Every major multicore processor currently
shipping (including Apple M7+) operate as described above.
Including most mid-range and high-end application specific
system-on-chips (SoC), including those using the ARM
architecture.

The proceedings from https://hotchips.org/ over the last two
decades describe the state of the art.

Bonita Montero

unread,

Aug 22, 2023, 2:51:12 PM8/22/23

to

Am 22.08.2023 um 20:18 schrieb Mut...@dastardlyhq.com:
> On Tue, 22 Aug 2023 20:13:17 +0200
> Bonita Montero <Bonita....@gmail.com> wrote:
>> Am 22.08.2023 um 19:56 schrieb Mut...@dastardlyhq.com:
>>
>>> If it has to be paged from disk in the first place the size of the
>>> cache is a total irrelevance as far as time taken is concerned.
>>
>> No one uses software that actually pages to disk.
>> Paging is just for that the softwar doesn't crash.
>
> I suggest you go and google how operating system virtual memory works.

Swapping is for stability and not for performance.
No one considers running applications actually swapping.

> What was it intended for then given it was written in basic C?

You didn't write that on a Z-80.

Mut...@dastardlyhq.com

unread,

Aug 23, 2023, 2:50:46 AM8/23/23

to

On Tue, 22 Aug 2023 18:24:58 GMT

sc...@slp53.sl.home (Scott Lurndal) wrote:
>Mut...@dastardlyhq.com writes:
>>>That is not an accurate statement for any processor built in
>>>the last decade or more. Most of them use a mesh, crossbar or ring
>>
>>Given millions of 8 and 16 bit processors are still designed and produced for
>
>>embedded systems that statement is nonsense and ARM based systems can use
>>whatever parts they want in the CPU.
>
>ARM based multicore systems generally use CMN-400/500/600/700 IP (from ARM)
>as the processor interconnect.
>
>https://developer.arm.com/Processors/CoreLink%20CMN-600
>
>And all those 8 and 16-bit processors are single core, so your
>statement above, which references "no matter how many cores"
>doesn't apply to them.

You said any processor built in the last decade or more. You didn't specific
only multicore. Do you wish to retract that statement?

Mut...@dastardlyhq.com

unread,

Aug 23, 2023, 2:52:00 AM8/23/23

to

On Tue, 22 Aug 2023 20:50:55 +0200

Bonita Montero <Bonita....@gmail.com> wrote:
>Am 22.08.2023 um 20:18 schrieb Mut...@dastardlyhq.com:
>> On Tue, 22 Aug 2023 20:13:17 +0200
>> Bonita Montero <Bonita....@gmail.com> wrote:
>>> Am 22.08.2023 um 19:56 schrieb Mut...@dastardlyhq.com:
>>>
>>>> If it has to be paged from disk in the first place the size of the
>>>> cache is a total irrelevance as far as time taken is concerned.
>>>
>>> No one uses software that actually pages to disk.
>>> Paging is just for that the softwar doesn't crash.
>>
>> I suggest you go and google how operating system virtual memory works.
>
>Swapping is for stability and not for performance.

For stability? Seriously?

>No one considers running applications actually swapping.
>
>> What was it intended for then given it was written in basic C?
>
>You didn't write that on a Z-80.

No, but I've got a Z80 C compiler that would happily compile and run it.

Bonita Montero

unread,

Aug 23, 2023, 3:59:46 AM8/23/23

to

Am 23.08.2023 um 08:51 schrieb Mut...@dastardlyhq.com:
> On Tue, 22 Aug 2023 20:50:55 +0200
> Bonita Montero <Bonita....@gmail.com> wrote:
>> Am 22.08.2023 um 20:18 schrieb Mut...@dastardlyhq.com:
>>> On Tue, 22 Aug 2023 20:13:17 +0200
>>> Bonita Montero <Bonita....@gmail.com> wrote:
>>>> Am 22.08.2023 um 19:56 schrieb Mut...@dastardlyhq.com:
>>>>
>>>>> If it has to be paged from disk in the first place the size of the
>>>>> cache is a total irrelevance as far as time taken is concerned.
>>>>
>>>> No one uses software that actually pages to disk.
>>>> Paging is just for that the softwar doesn't crash.
>>>
>>> I suggest you go and google how operating system virtual memory works.
>>
>> Swapping is for stability and not for performance.
>
> For stability? Seriously?

Main memory has a latency from 20 - 30ns per access. My 1TB Samsung
PCIe 4.0 SSD has a synchronous latency of 32us, that's up to factor
1.800. Do you really think that a swappable application is usable ?
Swapping is just for low memory conditions so that the application
won't stop or crash.

> No, but I've got a Z80 C compiler that would happily compile and run it.

I don't believe you. And no one is using Z-80s today.

Richard Damon

unread,

Aug 23, 2023, 7:52:49 AM8/23/23

to

On 8/23/23 3:59 AM, Bonita Montero wrote:
>
> Main memory has a latency from 20 - 30ns per access. My 1TB Samsung
> PCIe 4.0 SSD has a synchronous latency of 32us, that's up to factor
> 1.800. Do you really think that a swappable application is usable ?
> Swapping is just for low memory conditions so that the application
> won't stop or crash.

Your forgetting that in thst 32us you get a whole sector of data, not
just one word, so the slow down in much less of a factor.

Also, and application being "slow" is better than the application just
can't be run because it uses too much resource (even having the machine
all to itself).

Yes, with todays 64-bit address space, we tend to not need to design in
"swapping" as might have been needed with 16-bit machines, but many
applications are still designed based on the knowledge that they are
likely using more memory that is likely physically present, so they try
to optimize access to take that into account.

Good practice looks at the whole hierarchy, after all, data in Level 1
cache is MUCH faster to access then data in main memory, which is faster
than data on local storage, is much faster than data out on the network.

Forget that and you task suddenly hits a wall and just crawls.

Bonita Montero

unread,

Aug 23, 2023, 8:40:56 AM8/23/23

to

Am 23.08.2023 um 13:52 schrieb Richard Damon:

> Your forgetting that in thst 32us you get a whole sector of data,
> not just one word, so the slow down in much less of a factor.

I'm not forgetting that. I dropped this because the relationship is
that hughe anyway. The sector is the size of a cacheline and when you
switch on your computer the SDRAM's burst size is configured according
to the cacheline size.

> Also, and application being "slow" is better than the application just
> can't be run because it uses too much resource (even having the machine
> all to itself).

That's what I told: swapping is just a stability feature. Applications
are normally not useable when they swap or even the whole machine isn't
usable because essential pages of other processes were swapped out.

> Yes, with todays 64-bit address space, we tend to not need to design

> in "swapping" as might have been needed with 16-bit machines, ...

Swapspace makes sense even if the swap actually isn't touched. When you
commit some memory with VirtualAlloc() under Windows and memory hasn't
been touched yet the kernel has subtracted an according amount of space
from swap so that if swapping would be necessary when the pages are
touched the first time there's enough swap space.
Solaris works the same way. But Linux by default does overcommit so that
an application might crash if there's not enough swapspace if a page is
touched the first time. But Linux can be configured like Windows and
Solaris so that there's enough swap reserved when you allocate memory.
The problem with that under Linux is that if you have a massively for-
king application where the child processes modify only a small portion
of the shared memory there might be a large amount of memory subtracted
from swap.

> ... but many applications are still designed based on the knowledge

> that they are likely using more memory that is likely physically present,

> ...

That's not really an optimizatiob but a common assumption.

Mut...@dastardlyhq.com

unread,

Aug 23, 2023, 10:20:59 AM8/23/23

to

On Wed, 23 Aug 2023 09:59:31 +0200
Bonita Montero <Bonita....@gmail.com> wrote:
>Am 23.08.2023 um 08:51 schrieb Mut...@dastardlyhq.com:
>> For stability? Seriously?
>
>Main memory has a latency from 20 - 30ns per access. My 1TB Samsung
>PCIe 4.0 SSD has a synchronous latency of 32us, that's up to factor
>1.800. Do you really think that a swappable application is usable ?
>Swapping is just for low memory conditions so that the application
>won't stop or crash.

And low memory conditions can easily occur on machines running large databases
if large query(s) are happening even with gigs of memory so swap space is still
a thing.

And thats not what I would call "stability" but I'm not going to get into
an argument over word meanings.

>> No, but I've got a Z80 C compiler that would happily compile and run it.
>
>I don't believe you. And no one is using Z-80s today.

Believe what you like. Some TI calculators use Z80s FYI. Besides which
a lot of 8 bit microcontrollers are LESS powerful than a Z80 and they still
get programmed in C as well as assembler.

Mut...@dastardlyhq.com

unread,

Aug 23, 2023, 10:24:35 AM8/23/23

to

On Wed, 23 Aug 2023 14:40:36 +0200
Bonita Montero <Bonita....@gmail.com> wrote:
>Am 23.08.2023 um 13:52 schrieb Richard Damon:
>> Yes, with todays 64-bit address space, we tend to not need to design
>> in "swapping" as might have been needed with 16-bit machines, ...
>
>Swapspace makes sense even if the swap actually isn't touched. When you
>commit some memory with VirtualAlloc() under Windows and memory hasn't

Does windows still use a filesystem file for swap making it horrendously
slow as swap data has to go in and out via the high level filesystem layer
or has MS discovered the concept of swap partitions yet?

>The problem with that under Linux is that if you have a massively for-
>king application where the child processes modify only a small portion
>of the shared memory there might be a large amount of memory subtracted
>from swap.
>
>> ... but many applications are still designed based on the knowledge
>> that they are likely using more memory that is likely physically present,
>> ...
>
>That's not really an optimizatiob but a common assumption.

Sadly not in the Java world where "grab as much memory as you can on
startup" seems to be the JVM motto.

Bonita Montero

unread,

Aug 23, 2023, 10:31:07 AM8/23/23

to

Am 23.08.2023 um 16:24 schrieb Mut...@dastardlyhq.com:

> Does windows still use a filesystem file for swap making it horrendously
> slow as swap data has to go in and out via the high level filesystem layer
> or has MS discovered the concept of swap partitions yet?

That's not slow if you simply memorize the cluster-addresses of the
blocks holding the pagefile in non-pageable memory. That's that simple
that I think it's the way how ms does that.

> Sadly not in the Java world where "grab as much memory as you can on
> startup" seems to be the JVM motto.

That's not true. Java-applications would be even less responsive
since a garbage collection involves copying the whole heap to a
new heap.

Bonita Montero

unread,

Aug 23, 2023, 10:32:36 AM8/23/23

to

Am 23.08.2023 um 16:20 schrieb Mut...@dastardlyhq.com:

> And low memory conditions can easily occur on machines running large

> databases ...

That's not true. If you have a database you chose the size of the
buffer pool and the heaps in a way that the machine doesn't swap.
Swapping is simply extremely uncommon.

> Believe what you like. Some TI calculators use Z80s FYI.

Prove that.

> Besides which a lot of 8 bit microcontrollers are LESS powerful
> than a Z80 and they still get programmed in C as well as assembler.

Show me some of them.

Mut...@dastardlyhq.com

unread,

Aug 23, 2023, 10:36:41 AM8/23/23

to

Certainly seemed to be the case back when I was involved in Java. One would
hope its evolved since then.

Bonita Montero

unread,

Aug 23, 2023, 10:39:08 AM8/23/23

to

Others just have the composure to devote
themselves to complex problems, you don't.

Mut...@dastardlyhq.com

unread,

Aug 23, 2023, 10:41:54 AM8/23/23

to

On Wed, 23 Aug 2023 16:32:19 +0200
Bonita Montero <Bonita....@gmail.com> wrote:
>Am 23.08.2023 um 16:20 schrieb Mut...@dastardlyhq.com:
>
>> And low memory conditions can easily occur on machines running large
>> databases ...
>
>That's not true. If you have a database you chose the size of the
>buffer pool and the heaps in a way that the machine doesn't swap.
>Swapping is simply extremely uncommon.

Depends on the DB. If like Oracle its tablespaces are on bespoke disk
partitions then it will do the equivalent of swapping if it needs to.

>> Believe what you like. Some TI calculators use Z80s FYI.
>
>Prove that.

You ever heard of google? Try "ti calculator z80". You never know what
you might find.

>> Besides which a lot of 8 bit microcontrollers are LESS powerful
>> than a Z80 and they still get programmed in C as well as assembler.
>
>Show me some of them.

FFS man, asre you a baby that needs to be spoon fed? Grow up!
Google "8 bit PIC".

We already had a "discussion" with you a while where you demonstrated your
complete and utter pig ignorance of the embedded world. Perhaps don't make
a fool of yourself again.

Mut...@dastardlyhq.com

unread,

Aug 23, 2023, 10:43:54 AM8/23/23

to

On Wed, 23 Aug 2023 16:38:51 +0200

Bonita Montero <Bonita....@gmail.com> wrote:
>Am 23.08.2023 um 16:36 schrieb Mut...@dastardlyhq.com:
>> On Wed, 23 Aug 2023 16:30:01 +0200
>> Bonita Montero <Bonita....@gmail.com> wrote:
>>> Am 23.08.2023 um 16:24 schrieb Mut...@dastardlyhq.com:
>>>> Sadly not in the Java world where "grab as much memory as you can on
>>>> startup" seems to be the JVM motto.
>>>
>>> That's not true. Java-applications would be even less responsive
>>> since a garbage collection involves copying the whole heap to a
>>> new heap.
>>
>> Certainly seemed to be the case back when I was involved in Java. One would
>> hope its evolved since then.
>>
>
>Others just have the composure to devote
>themselves to complex problems, you don't.

It wasn't my problem other than the JVM was killing our servers performance
so go swivel on it.

Bonita Montero

unread,

Aug 23, 2023, 10:45:22 AM8/23/23

to

Java Frameworks are really complex and you're at the level of
simple details. So I think you never were involved in such projects.

Bonita Montero

unread,

Aug 23, 2023, 10:48:02 AM8/23/23

to

Am 23.08.2023 um 16:41 schrieb Mut...@dastardlyhq.com:
> On Wed, 23 Aug 2023 16:32:19 +0200
> Bonita Montero <Bonita....@gmail.com> wrote:
>> Am 23.08.2023 um 16:20 schrieb Mut...@dastardlyhq.com:
>>
>>> And low memory conditions can easily occur on machines running large
>>> databases ...
>>
>> That's not true. If you have a database you chose the size of the
>> buffer pool and the heaps in a way that the machine doesn't swap.
>> Swapping is simply extremely uncommon.
>
> Depends on the DB. If like Oracle its tablespaces are on bespoke disk
> partitions then it will do the equivalent of swapping if it needs to.

The Oracle Data Warehousing Guide says that the performance
advantage of raw partitions is between three and five percent.
And you lose much comfort features like shnapshots. So no one
actually uses raw partitions.
And this feature has nothing to do with how you dimension your
buffer pool or your heaps.

>
>>> Believe what you like. Some TI calculators use Z80s FYI.
>>
>> Prove that.
>
> You ever heard of google? Try "ti calculator z80". You never know what
> you might find.

LOL

>>> Besides which a lot of 8 bit microcontrollers are LESS powerful
>>> than a Z80 and they still get programmed in C as well as assembler.
>>
>> Show me some of them.
>
> FFS man, asre you a baby that needs to be spoon fed? Grow up!
> Google "8 bit PIC".

LOL

Daniel

unread,

Aug 23, 2023, 11:00:53 AM8/23/23

to

On Wednesday, August 23, 2023 at 10:20:59 AM UTC-4, Mut...@dastardlyhq.com wrote:

> And low memory conditions can easily occur on machines running large databases
> if large query(s) are happening even with gigs of memory so swap space is still
> a thing.
>

When I started programming in the mid eighties, memory was a scarce resource, and
one of my earliest lessons was that there were only three numbers that counted:
zero, one, and as many as you like. Many years later, when memory was abundant,
I found it astonishing that so many applications were nonperforming when
processing large amounts of data. Few were designed to process "as many as you like",
a great number, from desktop photo imaging software to multimillion dollar capital
markets applications, were designed to load everything into memory, as if it were a
zero cost resource. But large database queries need not translate to unbounded
memory requirements, RDBMs have cursors, caching on the server, and other
features to avoid it. Applications that swapped too much were generally poorly
designed.

Daniel

Mut...@dastardlyhq.com

unread,

Aug 23, 2023, 1:49:23 PM8/23/23

to

On Wed, 23 Aug 2023 16:45:06 +0200

Bonita Montero <Bonita....@gmail.com> wrote:
>Am 23.08.2023 um 16:43 schrieb Mut...@dastardlyhq.com:
>
>> On Wed, 23 Aug 2023 16:38:51 +0200
>> Bonita Montero <Bonita....@gmail.com> wrote:
>
>>> Others just have the composure to devote
>>> themselves to complex problems, you don't.
>
>> It wasn't my problem other than the JVM was killing our
>> servers performance so go swivel on it.
>
>Java Frameworks are really complex and you're at the level of
>simple details. So I think you never were involved in such projects.

You must be a very insecure person constantly putting others down to make
yourself feel better.

What job do you do? Do you even have one?

Mut...@dastardlyhq.com

unread,

Aug 23, 2023, 1:51:24 PM8/23/23

to

On Wed, 23 Aug 2023 16:47:46 +0200
Bonita Montero <Bonita....@gmail.com> wrote:

>Am 23.08.2023 um 16:41 schrieb Mut...@dastardlyhq.com:
>> Depends on the DB. If like Oracle its tablespaces are on bespoke disk
>> partitions then it will do the equivalent of swapping if it needs to.
>
>The Oracle Data Warehousing Guide says that the performance
>advantage of raw partitions is between three and five percent.
>And you lose much comfort features like shnapshots. So no one
>actually uses raw partitions.

Oh right. You're an Oracle expert too now? :)

>And this feature has nothing to do with how you dimension your
>buffer pool or your heaps.

Its a type of paging.

>> FFS man, asre you a baby that needs to be spoon fed? Grow up!
>> Google "8 bit PIC".
>
>LOL

Get a grip aspie.

Bonita Montero

unread,

Aug 23, 2023, 2:05:31 PM8/23/23

to

Am 23.08.2023 um 19:49 schrieb Mut...@dastardlyhq.com:

> You must be a very insecure person constantly
> putting others down to make yourself feel better.

No, I just feel uncomfortable if people produce such images.
You're simply lying.

Bonita Montero

unread,

Aug 23, 2023, 2:09:17 PM8/23/23

to

Am 23.08.2023 um 19:51 schrieb Mut...@dastardlyhq.com:
> On Wed, 23 Aug 2023 16:47:46 +0200
> Bonita Montero <Bonita....@gmail.com> wrote:
>> Am 23.08.2023 um 16:41 schrieb Mut...@dastardlyhq.com:
>>> Depends on the DB. If like Oracle its tablespaces are on bespoke disk
>>> partitions then it will do the equivalent of swapping if it needs to.
>>
>> The Oracle Data Warehousing Guide says that the performance
>> advantage of raw partitions is between three and five percent.
>> And you lose much comfort features like shnapshots. So no one
>> actually uses raw partitions.
>
> Oh right. You're an Oracle expert too now? :)

I have a OCP-DBA for Oracle 8i and I read that 20 years ago.
I checked that in the current DWH-guide and they reverted
that statement to five to ten percent.

>> And this feature has nothing to do with how you dimension your
>> buffer pool or your heaps.

> Its a type of paging.

Paging is synchronous, i.e. the thread which pages is stopped until
the page is fetched. All modern database engines use asynchronous
I/O, thereby issuing multiple requests and answering the responses
often in a totally different order.
And it's also not paging because this is explicitly issued by the
database server, whereas real paging is a transparent memory access
which isn't noticed by the application.

Daniel

unread,

Aug 23, 2023, 2:13:25 PM8/23/23

to

On Wednesday, August 23, 2023 at 1:49:23 PM UTC-4, Mut...@dastardlyhq.com wrote:

> You must be a very insecure person constantly putting others down to make
> yourself feel better.
>

Regrettably, many participants on this group seem to like to put others down,
like Old Betsie in the village: giving them a piece of their mind. Öö Tiib does it,
Scott Lurndal does it, James, Keith, David, you're doing it now. The heavyweights
that used to post here, they never did that, they were invariably civil. But they're gone.

Best regards,
Daniel

Bonita Montero

unread,

Aug 23, 2023, 2:18:12 PM8/23/23

to

Am 23.08.2023 um 20:13 schrieb Daniel:

> Regrettably, many participants on this group seem to like to put others down,
> like Old Betsie in the village: giving them a piece of their mind. Öö Tiib
> does it, Scott Lurndal does it, James, Keith, David, you're doing it now.
> The heavyweights that used to post here, they never did that, they were
> invariably civil. But they're gone.

Muttley uses the scheme I mentioned very often,
at some point you have to say something.

Chris M. Thomasson

unread,

Aug 23, 2023, 2:21:18 PM8/23/23

to

Pot Kettle? Bonita, you have a lot of experience wrt verbal assaults, right?

Bonita Montero

unread,

Aug 23, 2023, 2:25:11 PM8/23/23

to

Muttley is just annoying. If I show a solution to a problem and think
it is technically elegant, he somehow wants to explain it away because
my approach is always too complicated for him. He never understands the
advantages of my approach and when I then explain them, in the end he
doesn't say anything more, except again that it's all too complicated.
It's been like this for months now. The guy is disturbed.
And because he then compares himself to me at that point, he produces
such images as saying that he was involved in some Java project,
although based on other statements about Java you can say that
he only has clichéd views of it.

Daniel

unread,

Aug 23, 2023, 2:37:26 PM8/23/23

to

On Wednesday, August 23, 2023 at 2:21:18 PM UTC-4, Chris M. Thomasson wrote:
> >
> Bonita, you have a lot of experience wrt verbal assaults, right?

How would you know? You don't seem to have _any_ experience of it
yourself :-) Alf neither.

Daniel

Scott Lurndal

unread,

Aug 23, 2023, 2:52:11 PM8/23/23

to

Bonita Montero <Bonita....@gmail.com> writes:
>Am 23.08.2023 um 19:51 schrieb Mut...@dastardlyhq.com:
>> On Wed, 23 Aug 2023 16:47:46 +0200
>> Bonita Montero <Bonita....@gmail.com> wrote:
>>> Am 23.08.2023 um 16:41 schrieb Mut...@dastardlyhq.com:
>>>> Depends on the DB. If like Oracle its tablespaces are on bespoke disk
>>>> partitions then it will do the equivalent of swapping if it needs to.
>>>
>>> The Oracle Data Warehousing Guide says that the performance
>>> advantage of raw partitions is between three and five percent.
>>> And you lose much comfort features like shnapshots. So no one
>>> actually uses raw partitions.
>>
>> Oh right. You're an Oracle expert too now? :)
>
>I have a OCP-DBA for Oracle 8i and I read that 20 years ago.
>I checked that in the current DWH-guide and they reverted
>that statement to five to ten percent.
>
>>> And this feature has nothing to do with how you dimension your
>>> buffer pool or your heaps.
>
>> Its a type of paging.
>
>Paging is synchronous, i.e. the thread which pages is stopped until
>the page is fetched.

Actually, sometimes the operating system will anticipate a future
page fault and preload the page. Particularly on POSIX systems
that support the madvise(2) or posix_madvise(2) system call(s).

Mut...@dastardlyhq.com

unread,

Aug 24, 2023, 1:19:00 PM8/24/23

to

What images?

>You're simply lying.

About what? Thats how you come across - an insecure loser who thinks he's
one of the best C++ programmers on the planet.

Bonita Montero

unread,

Aug 24, 2023, 1:21:14 PM8/24/23

to

Am 24.08.2023 um 19:18 schrieb Mut...@dastardlyhq.com:
> On Wed, 23 Aug 2023 20:05:14 +0200
> Bonita Montero <Bonita....@gmail.com> wrote:
>> Am 23.08.2023 um 19:49 schrieb Mut...@dastardlyhq.com:
>>
>>> You must be a very insecure person constantly
>>> putting others down to make yourself feel better.
>>
>> No, I just feel uncomfortable if people produce such images.
>
> What images?

You try to make other believe things which aren't true.

>> You're simply lying.
>
> About what? Thats how you come across - an insecure loser who thinks he's
> one of the best C++ programmers on the planet.
>

I'm neither insecure, nor a loser.

Mut...@dastardlyhq.com

unread,

Aug 24, 2023, 1:22:38 PM8/24/23

to

On Wed, 23 Aug 2023 20:09:01 +0200

Bonita Montero <Bonita....@gmail.com> wrote:
>Am 23.08.2023 um 19:51 schrieb Mut...@dastardlyhq.com:
>> On Wed, 23 Aug 2023 16:47:46 +0200
>> Bonita Montero <Bonita....@gmail.com> wrote:
>>> Am 23.08.2023 um 16:41 schrieb Mut...@dastardlyhq.com:
>>>> Depends on the DB. If like Oracle its tablespaces are on bespoke disk
>>>> partitions then it will do the equivalent of swapping if it needs to.
>>>
>>> The Oracle Data Warehousing Guide says that the performance
>>> advantage of raw partitions is between three and five percent.
>>> And you lose much comfort features like shnapshots. So no one
>>> actually uses raw partitions.
>>
>> Oh right. You're an Oracle expert too now? :)
>
>I have a OCP-DBA for Oracle 8i and I read that 20 years ago.

Does that on your wall alongside your Microsoft Certified Muppet certificates?

>> Its a type of paging.
>
>Paging is synchronous, i.e. the thread which pages is stopped until
>the page is fetched. All modern database engines use asynchronous
>I/O, thereby issuing multiple requests and answering the responses
>often in a totally different order.

Until the data can be loaded from disk a query - even a highly parallelised
one - will be partially or fully stalled however the I/O is done.

>And it's also not paging because this is explicitly issued by the
>database server, whereas real paging is a transparent memory access
>which isn't noticed by the application.

You think programs using SQL are aware of the database fetching and
saving to disk?

Mut...@dastardlyhq.com

unread,

Aug 24, 2023, 1:25:53 PM8/24/23

to

On Wed, 23 Aug 2023 20:24:55 +0200
Bonita Montero <Bonita....@gmail.com> wrote:
>Am 23.08.2023 um 20:20 schrieb Chris M. Thomasson:
>> Pot Kettle? Bonita, you have a lot of experience wrt verbal assaults,
>> right?
>
>Muttley is just annoying. If I show a solution to a problem and think
>it is technically elegant, he somehow wants to explain it away because

Your idea of elegance is completely orthogonal to most peoples.

>The guy is disturbed.

Irony day again?

>And because he then compares himself to me at that point, he produces
>such images as saying that he was involved in some Java project,

Actually I specifically said I WASNT involved except how Java slowed down
the server we were sharing with the java team. I realise English isn't your
1st language but do try to read properly.

>although based on other statements about Java you can say that
>he only has clichéd views of it.

Views gained from real world experience.

Bonita Montero

unread,

Aug 24, 2023, 1:27:42 PM8/24/23

to

Am 24.08.2023 um 19:22 schrieb Mut...@dastardlyhq.com:

>> Paging is synchronous, i.e. the thread which pages is stopped until
>> the page is fetched. All modern database engines use asynchronous
>> I/O, thereby issuing multiple requests and answering the responses
>> often in a totally different order.

> Until the data can be loaded from disk a query - even a highly parallelised
> one - will be partially or fully stalled however the I/O is done.

I wanted to explain why this isn't paging.
You don't underdtand the difference.

>> And it's also not paging because this is explicitly issued by the
>> database server, whereas real paging is a transparent memory access
>> which isn't noticed by the application.

> You think programs using SQL are aware of the database fetching and
> saving to disk?

What does this have to do with the discussion of
paging vs. asynchronous I/O through a database server?

Bonita Montero

unread,

Aug 24, 2023, 1:29:26 PM8/24/23

to

Am 24.08.2023 um 19:25 schrieb Mut...@dastardlyhq.com:
> On Wed, 23 Aug 2023 20:24:55 +0200
> Bonita Montero <Bonita....@gmail.com> wrote:
>> Am 23.08.2023 um 20:20 schrieb Chris M. Thomasson:
>>> Pot Kettle? Bonita, you have a lot of experience wrt verbal assaults,
>>> right?
>>
>> Muttley is just annoying. If I show a solution to a problem and think
>> it is technically elegant, he somehow wants to explain it away because
>
> Your idea of elegance is completely orthogonal to most peoples.

People who know C++ very well will appreciate my work.

> Actually I specifically said I WASNT involved except how Java slowed down
> the server we were sharing with the java team. I realise English isn't your
> 1st language but do try to read properly.

Now you row back.

>> although based on other statements about Java you can say that
>> he only has clichéd views of it.

> Views gained from real world experience.

You don't have any experience with that.
You're at the level of very simplex code.

Mut...@dastardlyhq.com

unread,

Aug 24, 2023, 1:37:35 PM8/24/23

to

On Thu, 24 Aug 2023 19:27:25 +0200
Bonita Montero <Bonita....@gmail.com> wrote:
>Am 24.08.2023 um 19:22 schrieb Mut...@dastardlyhq.com:
>
>>> Paging is synchronous, i.e. the thread which pages is stopped until
>>> the page is fetched. All modern database engines use asynchronous
>>> I/O, thereby issuing multiple requests and answering the responses
>>> often in a totally different order.
>
>> Until the data can be loaded from disk a query - even a highly parallelised
>> one - will be partially or fully stalled however the I/O is done.
>
>I wanted to explain why this isn't paging.
>You don't underdtand the difference.

They're analogous, not identical , but close enough. Both involve loading
data into memory and saving it out to disk before the operation can proceed
and in the case of ACID DBMSs the transaction data MUST be written to disk
before the operation can complete.

But I'm sure you know all this, right?

>>> And it's also not paging because this is explicitly issued by the
>>> database server, whereas real paging is a transparent memory access
>>> which isn't noticed by the application.
>
>> You think programs using SQL are aware of the database fetching and
>> saving to disk?
>
>What does this have to do with the discussion of
>paging vs. asynchronous I/O through a database server?

FFS , are you even aware of what you wrote?

Mut...@dastardlyhq.com

unread,

Aug 24, 2023, 1:40:56 PM8/24/23

to

On Thu, 24 Aug 2023 19:29:11 +0200
Bonita Montero <Bonita....@gmail.com> wrote:
>Am 24.08.2023 um 19:25 schrieb Mut...@dastardlyhq.com:
>> Your idea of elegance is completely orthogonal to most peoples.
>
>People who know C++ very well will appreciate my work.

Obviously none of them are on this group.

>> Actually I specifically said I WASNT involved except how Java slowed down
>> the server we were sharing with the java team. I realise English isn't your
>> 1st language but do try to read properly.
>
>Now you row back.

Go back and read my post.

>>> although based on other statements about Java you can say that
>>> he only has clichéd views of it.
>
>> Views gained from real world experience.
>
>You don't have any experience with that.
>You're at the level of very simplex code.

Some of my code is currently orbiting Earth and some more is in a missile
system. But yeah, all simple stuff.

You never did tell us what job you did - so what area do you work in?

Bonita Montero

unread,

Aug 24, 2023, 1:49:59 PM8/24/23

to

Am 24.08.2023 um 19:40 schrieb Mut...@dastardlyhq.com:
> On Thu, 24 Aug 2023 19:29:11 +0200
> Bonita Montero <Bonita....@gmail.com> wrote:
>> Am 24.08.2023 um 19:25 schrieb Mut...@dastardlyhq.com:
>>> Your idea of elegance is completely orthogonal to most peoples.
>>
>> People who know C++ very well will appreciate my work.
>
> Obviously none of them are on this group.

There aren't people who program like that, but in contrast
to you most readers here understand that code.

> You never did tell us what job you did - so what area do you work in?

I'm programming C++ / (C#) software for high frequency trading.

Bonita Montero

unread,

Aug 24, 2023, 1:52:25 PM8/24/23

to

Am 24.08.2023 um 19:37 schrieb Mut...@dastardlyhq.com:
> On Thu, 24 Aug 2023 19:27:25 +0200
> Bonita Montero <Bonita....@gmail.com> wrote:
>> Am 24.08.2023 um 19:22 schrieb Mut...@dastardlyhq.com:
>>
>>>> Paging is synchronous, i.e. the thread which pages is stopped until
>>>> the page is fetched. All modern database engines use asynchronous
>>>> I/O, thereby issuing multiple requests and answering the responses
>>>> often in a totally different order.
>>
>>> Until the data can be loaded from disk a query - even a highly parallelised
>>> one - will be partially or fully stalled however the I/O is done.
>>
>> I wanted to explain why this isn't paging.
>> You don't underdtand the difference.
>
> They're analogous, not identical , but close enough. Both involve loading
> data into memory and saving it out to disk before the operation can proceed
> and in the case of ACID DBMSs the transaction data MUST be written to disk
> before the operation can complete.

But that is a very superficial comparison.
And there are no transactions in paging either.

Chris M. Thomasson

unread,

Aug 24, 2023, 2:26:16 PM8/24/23

to

Yup. Cacheline prefetch can be damn good as well, in certain scenarios...

Mut...@dastardlyhq.com

unread,

Aug 25, 2023, 12:07:36 PM8/25/23

to

On Thu, 24 Aug 2023 19:49:43 +0200

Bonita Montero <Bonita....@gmail.com> wrote:
>Am 24.08.2023 um 19:40 schrieb Mut...@dastardlyhq.com:
>> On Thu, 24 Aug 2023 19:29:11 +0200
>> Bonita Montero <Bonita....@gmail.com> wrote:
>>> Am 24.08.2023 um 19:25 schrieb Mut...@dastardlyhq.com:
>>>> Your idea of elegance is completely orthogonal to most peoples.
>>>
>>> People who know C++ very well will appreciate my work.
>>
>> Obviously none of them are on this group.
>
>There aren't people who program like that, but in contrast
>to you most readers here understand that code.

Its not a case of not understanding it, its a case of I can't be arsed going
through it to figure it out. If you want to make a coding point KISS.

>> You never did tell us what job you did - so what area do you work in?
>
>I'm programming C++ / (C#) software for high frequency trading.

Home trader then.

Mut...@dastardlyhq.com

unread,

Aug 25, 2023, 12:08:15 PM8/25/23

to

On Thu, 24 Aug 2023 19:52:10 +0200
Bonita Montero <Bonita....@gmail.com> wrote:

>Am 24.08.2023 um 19:37 schrieb Mut...@dastardlyhq.com:
>> They're analogous, not identical , but close enough. Both involve loading
>> data into memory and saving it out to disk before the operation can proceed
>> and in the case of ACID DBMSs the transaction data MUST be written to disk
>> before the operation can complete.
>
>But that is a very superficial comparison.
>And there are no transactions in paging either.

*sigh*

Forget it.

Bonita Montero

unread,

Aug 25, 2023, 12:10:51 PM8/25/23

to

Am 25.08.2023 um 18:07 schrieb Mut...@dastardlyhq.com:

> Its not a case of not understanding it, ...

Comment it and show that you understand it.

>>> You never did tell us what job you did - so what area do you work in?

>> I'm programming C++ / (C#) software for high frequency trading.

> Home trader then.

No, I'm no trading myself.

Mut...@dastardlyhq.com

unread,

Aug 25, 2023, 12:15:50 PM8/25/23

to

On Fri, 25 Aug 2023 18:10:35 +0200
Bonita Montero <Bonita....@gmail.com> wrote:
>Am 25.08.2023 um 18:07 schrieb Mut...@dastardlyhq.com:
>
>> Its not a case of not understanding it, ...
>
>Comment it and show that you understand it.

Sorry pal, unlike you I have a life and have much more interesting things to
do than go through some dogs dinner code you posted by some random guy on
a newsgroup.

>>> I'm programming C++ / (C#) software for high frequency trading.
>
>> Home trader then.
>
>No, I'm no trading myself.

If you work for a fintec then I pity the devs who will have to maintain your
code in the future. More than likely they'll toss it and start again.

Bonita Montero

unread,

Aug 25, 2023, 12:18:26 PM8/25/23

to

Am 25.08.2023 um 18:15 schrieb Mut...@dastardlyhq.com:
> On Fri, 25 Aug 2023 18:10:35 +0200
> Bonita Montero <Bonita....@gmail.com> wrote:
>> Am 25.08.2023 um 18:07 schrieb Mut...@dastardlyhq.com:
>>
>>> Its not a case of not understanding it, ...
>>
>> Comment it and show that you understand it.
>

> Sorry pal, ...

That's what I expected.

>
>>>> I'm programming C++ / (C#) software for high frequency trading.
>>
>>> Home trader then.
>>
>> No, I'm no trading myself.
>
> If you work for a fintec then I pity the devs who will have to maintain
> your code in the future. More than likely they'll toss it and start again.

The company I work for has more skilled staff like you.

Mut...@dastardlyhq.com

unread,

Aug 26, 2023, 4:03:28 AM8/26/23

to

On Fri, 25 Aug 2023 18:18:09 +0200

Bonita Montero <Bonita....@gmail.com> wrote:
>Am 25.08.2023 um 18:15 schrieb Mut...@dastardlyhq.com:
>> On Fri, 25 Aug 2023 18:10:35 +0200
>> Bonita Montero <Bonita....@gmail.com> wrote:
>>> Am 25.08.2023 um 18:07 schrieb Mut...@dastardlyhq.com:
>>>
>>>> Its not a case of not understanding it, ...
>>>
>>> Comment it and show that you understand it.
>>
>> Sorry pal, ...
>
>That's what I expected.

The arrogance that you expect people to have to figure out your code then
comment on it is quite breathtaking.

>> If you work for a fintec then I pity the devs who will have to maintain
>> your code in the future. More than likely they'll toss it and start again.
>
>The company I work for has more skilled staff like you.

Ah, you think you're so much better than everyone else there. What a surprise.

Bonita Montero

unread,

Aug 26, 2023, 4:17:03 AM8/26/23

to

Am 26.08.2023 um 10:03 schrieb Mut...@dastardlyhq.com:

>>> If you work for a fintec then I pity the devs who will have to maintain
>>> your code in the future. More than likely they'll toss it and start again.

>> The company I work for has more skilled staff like you.

> Ah, you think you're so much better than everyone else there. What a surprise.

Stick with your C code and continue to ask beginner's C++ questions.

Mut...@dastardlyhq.com

unread,

Aug 26, 2023, 4:27:30 AM8/26/23

to

Do carry on with your pompous assery, I'll get some popcorn :)

Chris M. Thomasson

unread,

Aug 26, 2023, 6:34:02 PM8/26/23

to

:^)

Öö Tiib

unread,

Aug 27, 2023, 3:26:59 PM8/27/23

to

On Thursday, 17 August 2023 at 05:26:37 UTC+3, Tim Rentsch wrote:
>
> I suggest computing all 32 bit primes for the benchmark
> tests. Shouldn't take too long.
>
Took 14 minutes to find those 203280221 primes single threaded
on i7-8550U.

Ben Bacarisse

unread,

Aug 27, 2023, 7:50:05 PM8/27/23

to

What do you mean by find? Do you include printing them out because that
will depend rather more on the OS.

I wanted to try a newer algorithm, so I coded up a rather naive version
of Atkin & Bernstein's sieve in C. It needs some tidying and I imagine
a bit can be shaved off the time.

It can generate a bitmap of the 203280221 primes and then count them in
about 40 seconds on an i5-8265U CPU @ 1.60GHz (single threaded). It
takes about 55 seconds to write them (in decimal) to a file. The result
is over 2GB:

$ wc primes
203280221 203280221 2178719347 primes

--
Ben.

Bonita Montero

unread,

Aug 28, 2023, 12:16:06 AM8/28/23

to

I optimized the code somewhat further:

#include <iostream>
#include <vector>
#include <charconv>
#include <string_view>
#include <algorithm>
#include <fstream>
#include <cctype>
#include <cstring>
#include <bit>
#include "ndi.h"

#if defined(_MSC_VER)
#pragma warning(disable: 26495) // uninitialized member
#endif

using namespace std;

int main( int argc, char **argv )
{
constexpr size_t BUF_SIZE = 0x100000;
try
{
size_t end;
if( argc < 2 )
return EXIT_FAILURE;
char const *sizeStr = argv[1];
bool hex = sizeStr[0] == '0' && (sizeStr[1] == 'x' || sizeStr[1] == 'X');
sizeStr += 2 * hex;
if( from_chars_result fcr = from_chars( sizeStr, sizeStr + strlen(
sizeStr ), end, !hex ? 10 : 16 ); (bool)fcr.ec || *fcr.ptr )
return EXIT_FAILURE;
if( size_t e = end; ++end + 7 < e )
throw bad_alloc();
using bit_vec = vector<uint8_t>;
using bit_it = bit_vec::iterator;
bit_vec sieve( (end + 7) / 8, -1 );
ofstream ofs;
using ndi_vec = vector<ndi<char>>;
using ndi_vec_it = ndi_vec::iterator;
constexpr size_t AHEAD = 32;
ndi_vec printBuf;
ndi_vec_it bufBegin, bufEnd, bufFlush;
if( argc < 3 || *argv[2] )
{
ofs.exceptions( ofstream::failbit | ofstream::badbit );
ofs.open( argc >= 3 ? argv[2] : "primes.txt", ofstream::trunc );
printBuf.resize( BUF_SIZE + AHEAD - 1 );
bufBegin = printBuf.begin();
bufEnd = bufBegin;
bufFlush = bufBegin + BUF_SIZE;
}
auto print = [&]() { ofs << string_view( &*to_address( bufBegin ),
&*to_address( bufEnd ) ); };
for( size_t p = 2; p != end; )
{
bit_it pSieve = sieve.begin();
for( size_t m = p; (m += p) < end; )
pSieve[m / 8] &= ~(1u << m % 8);
pSieve += ++p / 8;
for( uint8_t m = 1u << p % 8, mNew; ; ++p )
{
if( p == end ) [[unlikely]]
goto brk;
if( *pSieve & m ) [[unlikely]]
break;
mNew = rotl( m, 1 );
pSieve += mNew < m;
m = mNew;
}
if( !ofs.is_open() )
continue;
auto [ptr, ec] = to_chars( &*bufEnd, &*to_address( bufEnd + (AHEAD -
1) ), p );
if( (bool)ec ) [[unlikely]]
throw system_error( (int)ec, generic_category(), "converson failed" );
bufEnd = ptr - &*bufBegin + bufBegin; // NOP
*bufEnd++ = '\n';
if( bufEnd >= bufFlush )
print(),
bufEnd = bufBegin;
}
brk:
print();
return EXIT_SUCCESS;
}
catch( bad_alloc const & )
{
cout << "out of memory" << endl;
}
catch( ios_base::failure const & )
{
cout << "I/O error" << endl;
}
catch( system_error const &se )
{
cout << se.what() << endl;
}
}

Öö Tiib

unread,

Aug 28, 2023, 2:24:08 AM8/28/23

to

On Monday, 28 August 2023 at 02:50:05 UTC+3, Ben Bacarisse wrote:

> Öö Tiib <oot...@hot.ee> writes:
>
> > On Thursday, 17 August 2023 at 05:26:37 UTC+3, Tim Rentsch wrote:
> >>
> >> I suggest computing all 32 bit primes for the benchmark
> >> tests. Shouldn't take too long.
> >>
> > Took 14 minutes to find those 203280221 primes single threaded
> > on i7-8550U.
> What do you mean by find? Do you include printing them out because that
> will depend rather more on the OS.
>

I did try to invent a sieve algorithm this weekend, it worked, but bit
sluggish, so I posted it.
I did not measure printing or storing to file, only sieving and counting
(to test that there are 203280221).

> I wanted to try a newer algorithm, so I coded up a rather naive version
> of Atkin & Bernstein's sieve in C. It needs some tidying and I imagine
> a bit can be shaved off the time.
>
> It can generate a bitmap of the 203280221 primes and then count them in
> about 40 seconds on an i5-8265U CPU @ 1.60GHz (single threaded). It
> takes about 55 seconds to write them (in decimal) to a file. The result
> is over 2GB:
>
> $ wc primes
> 203280221 203280221 2178719347 primes
>

Thanks, so I apparently need to optimize my algorithm at least 30 times
faster for it to be worth anything. Maybe next weekend.