Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

How useful is parallel processing for the SIEVE?

32 views
Skip to first unread message

Albert van der Horst

unread,
May 26, 2017, 7:58:11 AM5/26/17
to
Since very long I have an implementation of the Sieve of Eratosthenes
that has a bitmap of 30 numbers per byte.
Of each sequence of 30 numbers only half is coprime with 2,
2/3 is coprime with 3 and 4/5 is coprime with 5, leaving
1/2 * 2/3 * 4/5 * 30 = 8 numbers.
For a prime say p, where 11*p = 1 mod(30) we must flip bit 1
in byte (11*p)/30 and then going on with a stride of p.

The sieve is filled using wheels. This means the first 7 bytes
are filled with a patter where 7-folds are missing. This pattern
can then repeated 11 times using an unintelligent move and then
the 11 folds are removed etc.
With 32 Gbyte we then have a sieve holding 10^12 numbers, and all
primes until 31 or so taken care off. 1]

Then for each prime we must remove the multiples using mask
0x01 0x02 0x04 .. 0x80 .
This is a sure cache miss every time.

I've 8 processors, and I could deal out each of the masks to one
processor, running in the same Forth with 8 preemptive threads
and one sieve.

In view of the fact that cache misses remain the same, do I get
- no speedup at all, just lower processor load
- some speed up, but nowhere near a factor 8
- a decent speedup approaching 8.

(There are only a few critical loops. Assume they are written in
assembler or a good optimizing compiler.)

The basic operation is
get a byte from address N
AND it with a mask
store it back

This is one instruction for the usual Intel/AMD processor fortunately.
I think that means that two processors updating the same byte
do not interfere. (that is rare, but anyway)

Any insights are welcome.

1]
In recent post someone wanted to fill the sieve first with zeroes,
that is several seconds of computer time wasted.


Groetjes Albert
--
Albert van der Horst, UTRECHT,THE NETHERLANDS
Economic growth -- being exponential -- ultimately falters.
albert@spe&ar&c.xs4all.nl &=n http://home.hccnet.nl/a.w.m.van.der.horst

Anton Ertl

unread,
May 26, 2017, 11:21:09 AM5/26/17
to
alb...@cherry.spenarnc.xs4all.nl (Albert van der Horst) writes:
>This is a sure cache miss every time.
>
>I've 8 processors, and I could deal out each of the masks to one
>processor, running in the same Forth with 8 preemptive threads
>and one sieve.
>
>In view of the fact that cache misses remain the same, do I get
>- no speedup at all, just lower processor load
>- some speed up, but nowhere near a factor 8
>- a decent speedup approaching 8.

If you do it properly, with locking, you have a chance of getting a
heavy slowdown. I very much doubt that you will see any speedup, but
try it.

>(There are only a few critical loops. Assume they are written in
>assembler or a good optimizing compiler.)

If you have cache misses all the time, you can afford quite a bit of instruction overhead around them.

>The basic operation is
> get a byte from address N
> AND it with a mask
> store it back
>
>This is one instruction for the usual Intel/AMD processor fortunately.
>I think that means that two processors updating the same byte
>do not interfere. (that is rare, but anyway)

AFAIK you have to use a lock prefix, or you will get a race condition
on the update. Even with LOCK, they will interfere as far as
performance goes.

The way you do it was already slow in Pentium times. At the time I
wrote a sieve that used cache blocking, and I saw a speedup (on a
Pentium-133 with 256KB L2 cache, and on an Alpha 21064A). And that's
the way to parallelize your sieve: let each processor work on a part
of the number space, and use cache blocking for the work of each
processor. Then you see very few cache misses and writing the
critical loops in assembly language pays off.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: http://www.forth200x.org/forth200x.html
EuroForth 2017: http://www.euroforth.org/ef17/
0 new messages