Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Parallel prime counting revisited

1 view
Skip to first unread message

Albert van der Horst

unread,
Aug 27, 2010, 9:44:48 AM8/27/10
to
An update on the parallel prime counting program for the
SEAFORTH chip.

We are slowly getting more control and
the current program communicates with a PC.
It accepts an upper limit and gives as a result a count of primes,
plus a stream of numbers. Those are not sieved yet for lack of
resources and are possible prime (and could be connected to further
nodes.)
IN OUT
10 4
20 8
100 25
200 34 149 151 157 .. 193 199

The program can be sped up 6 fold by replacing the generator
by 2 generators, one for 6k+1 and one for 6k-1, later to
be merged.
They must be followed by 2 pipe lines of nodes that sieve by
5 7 11 13 17 19. Then only 51.1 % of the numbers are left,
such that the two streams can be merged.
23 29 41 } 23 29 31 37 41 43
31 37 43 }
Then it is business as usual.

Further down the line the stream is thinned out even more
and we could do several primes in a node.

So e.g. for the primes 101 103
Next multiples are 505 515, smallest is 505
Sieve all numbers up till 505.
Now the next multiples are 606 515 smallest is 515
Sieve all numbers up till 515.
Now the next multiples are 606 618 smallest is 606
Etc.
This requires 4 numbers to be stored. ( p k1.p q k2.q )
In this case: 101 505 103 515 | 101 606 103 618

(Actual programs would start at 101^2 103^2 but the idea
is the same.)

A non-trivial program exercise in the SEAFORTH world.

I start to worry about the following:
there is nowhere a buffering. If some node does a calculation
it stalls the whole pipe line. Of course it helps that fewer
and fewer numbers are streamed. Or is it?

Is it an idea to have nodes that just pass the input through?
They would act as one cell buffers.

Groetjes Albert

--
--
Albert van der Horst, UTRECHT,THE NETHERLANDS
Economic growth -- being exponential -- ultimately falters.
albert@spe&ar&c.xs4all.nl &=n http://home.hccnet.nl/a.w.m.van.der.horst

Dennis

unread,
Aug 28, 2010, 1:52:46 PM8/28/10
to
On Aug 27, 6:44 am, Albert van der Horst <alb...@spenarnc.xs4all.nl>
wrote:

Optimizing the stalls in data streams across cores is the reason I
wrote the Timing Diagrams at http://code.google.com/p/vf-plugins/downloads/detail?name=Timing-4.zip.
Yes, you do end up with some nodes that just pass data and yes, a node
can do some data buffering. I haven't posted a good example of this,
but I am still playing with a DFT program that does this type of
thing.

DaR

Albert van der Horst

unread,
Sep 3, 2010, 12:56:07 PM9/3/10
to
In article <l7tcu...@spenarnc.xs4all.nl>,

Albert van der Horst <alb...@spenarnc.xs4all.nl> wrote:
>An update on the parallel prime counting program for the
>SEAFORTH chip.
>
>We are slowly getting more control and
>the current program communicates with a PC.
>It accepts an upper limit and gives as a result a count of primes,
>plus a stream of numbers. Those are not sieved yet for lack of
>resources and are possible prime (and could be connected to further
>nodes.)

Leon and I have reached a stable version that starts counting
at the first node where (comparing the prime of the node and
the limit) all sieving has been done.
With a 40 node SeaForth it can count primes till about 19.000
and returns the count over the serial interface to the
PC. For larger number it returns a preliminary count and a stream of
numbers, such that another chip could be appended to finish the job.

Note 1:
Very proud of translating `` BEGIN 2DUP < WHILE '' into the efficient
`` begin over not over + not -while ''

Note 2:
For the node terminal the standard rs232 code of seaforth is used (not shown).

Note 3:
Specifying all those nodes and their connections gets real cumbersome.
(Imagine a 144 node.)

--------------------------8< ----- generator.vf -------8< --------------------------
decimal

-GENERATOR- {node

\ Get a number, i.e. all decimal digits until a blank.
: getnum
@b dup BL # xor if else drop drop ; then drop
push dup 2* 2* . + 2* pop \ Multiply by 10
-48 # . + . +
getnum -;

\ Generate a stream of increasing numbers to the port identified by a.
: main
over over xor if else ; then drop
dup !a 1 # . +
main -;

: init here =p
--C01-- _connect # a!
--C_END-- _connect # b!
dup xor getnum \ Get limit
dup !a \ Pass it
dup dup xor !a \ Start with count zero
2 # main \ Initial sieve 2..limit
dup xor !a \ end sentinel
init -;

node}
--------------------------8< ----- Cs.vf --------------8< --------------------------

decimal

-? macro: --, ( x y -- x-y )
not . +
1 # . +
macro;

-? macro: -swap, ( x y -- y-x )
not . + not
macro;

-? macro: *, ( x y -- x*y )
a@ push
dup a!
dup xor
17 # for . +* unext
drop drop
a@
pop a!
macro;

-? macro: gocount, begin @a while drop 1 # . + repeat drop macro;

\ (p n*p -- p n'*p ) Accept number from stream until a zero.
\ Output this number, unless it is a multiple of p.
-? macro: sieve,
begin
@a while
\ Increment n*p until it is larger than the number
| begin over not over + not -while drop push over . + pop | repeat drop
\ Output number unless it is equal to the multiple.
over over xor if drop !b else drop drop then
repeat
macro;

-? macro: parpi,

: main here =p
@a dup !b push \ R: limit
@a \ count
@a \ prime
if
pop \ limit
over dup *, \ prime^2
dup push \ R: prime^2 , often abandoned!
-swap,
| -if \ We are over the limit: cnt p diff
drop
over 1 # . + !b
pop sieve,
| else
drop drop 1 # . + gocount, !b
then
| else
over !b
then
dup xor !b
main -;

macro;

--C01-- {node 0 org -GENERATOR- _connect =a --C02-- _connect =b parpi, node}
--C02-- {node 0 org --C01-- _connect =a --C03-- _connect =b parpi, node}
--C03-- {node 0 org --C02-- _connect =a --C04-- _connect =b parpi, node}
--C04-- {node 0 org --C03-- _connect =a --C05-- _connect =b parpi, node}
--C05-- {node 0 org --C04-- _connect =a --C06-- _connect =b parpi, node}
--C06-- {node 0 org --C05-- _connect =a --C07-- _connect =b parpi, node}
--C07-- {node 0 org --C06-- _connect =a --C08-- _connect =b parpi, node}
--C08-- {node 0 org --C07-- _connect =a --C09-- _connect =b parpi, node}
--C09-- {node 0 org --C08-- _connect =a --C10-- _connect =b parpi, node}
--C10-- {node 0 org --C09-- _connect =a --C11-- _connect =b parpi, node}
--C11-- {node 0 org --C10-- _connect =a --C12-- _connect =b parpi, node}
--C12-- {node 0 org --C11-- _connect =a --C13-- _connect =b parpi, node}
--C13-- {node 0 org --C12-- _connect =a --C14-- _connect =b parpi, node}
--C14-- {node 0 org --C13-- _connect =a --C15-- _connect =b parpi, node}
--C15-- {node 0 org --C14-- _connect =a --C16-- _connect =b parpi, node}
--C16-- {node 0 org --C15-- _connect =a --C17-- _connect =b parpi, node}
--C17-- {node 0 org --C16-- _connect =a --C18-- _connect =b parpi, node}
--C18-- {node 0 org --C17-- _connect =a --C19-- _connect =b parpi, node}
--C19-- {node 0 org --C18-- _connect =a --C20-- _connect =b parpi, node}
--C20-- {node 0 org --C19-- _connect =a --C21-- _connect =b parpi, node}
--C21-- {node 0 org --C20-- _connect =a --C22-- _connect =b parpi, node}
--C22-- {node 0 org --C21-- _connect =a --C23-- _connect =b parpi, node}
--C23-- {node 0 org --C22-- _connect =a --C24-- _connect =b parpi, node}
--C24-- {node 0 org --C23-- _connect =a --C25-- _connect =b parpi, node}
--C25-- {node 0 org --C24-- _connect =a --C26-- _connect =b parpi, node}
--C26-- {node 0 org --C25-- _connect =a --C27-- _connect =b parpi, node}
--C27-- {node 0 org --C26-- _connect =a --C28-- _connect =b parpi, node}
--C28-- {node 0 org --C27-- _connect =a --C29-- _connect =b parpi, node}
--C29-- {node 0 org --C28-- _connect =a --C30-- _connect =b parpi, node}
--C30-- {node 0 org --C29-- _connect =a --C31-- _connect =b parpi, node}
--C31-- {node 0 org --C30-- _connect =a --C32-- _connect =b parpi, node}
--C32-- {node 0 org --C31-- _connect =a --C33-- _connect =b parpi, node}
--C33-- {node 0 org --C32-- _connect =a --C34-- _connect =b parpi, node}
--C34-- {node 0 org --C33-- _connect =a --C_END-- _connect =b parpi, node}


--------------------------8< -------c_end.vf----------8< --------------------------
decimal

-? macro: connection1,
TERMINAL _connect # a!
-GENERATOR- _connect # b!
macro;

-? macro: connection2,
--C34-- _connect # b!
macro;

-? macro: recv, ( -- char)
begin rx? -until not \ wait for a keypress
macro;


-? macro: @TERMINAL @a macro;
-? macro: !TERMINAL !a macro;
-? macro: !GENERATOR !b macro;
-? macro: @--Cx-- @b macro;

--C_END-- {node

: !io ( u -) \ initialize I/O device: 0=autobps, 0<>fixed bps
@p+ !TERMINAL !TERMINAL ; | @p+ TERMINAL 's !io jump |

: tx! ( c -) \ terminal output
@p+ !TERMINAL !TERMINAL ; | @p+ not ;

: rx? ( - ~c|0) \ terminal input
@p+ !TERMINAL @TERMINAL ; | !p+ TERMINAL 's rx? jump

: /mod ( n1 n2 -- n3 )
0 # push ( n1 n2 R: n )
dup push ( n1 n2 R: n n2 )
begin ( n1-n*n2 n2 R: n n2 )
( - ) over not . + not push drop pop
-if ( n1-n*n2 R: n n2 )
pop . + pop exit ( n3 n )
else
pop pop ( n1-n*n2 n2 n )
1 # . + push ( n1-n*n2 n2 R: n+1 )
dup push ( n1-n*n2 n2 R: n+1 n2 )
then
again
;

\ Output to TERMINAL node:
\ Decimal representation of n
\ followed by crlf.
: itoa ( n -- )
100 # /mod
if itoa 0 #
else 10 # tx! 13 # tx!
then drop
10 # /mod
48 # . + tx!
48 # . + tx!
;

: passnum recv, dup !GENERATOR BL # xor if else ; then passnum -;

: init here =p
connection1,
dup xor !io \ type space at terminal for autobaud
( space ) BL # tx!
passnum
connection2,
: communicate
@--Cx-- if else init then itoa
communicate -;

node}
--------------------------8< ----- parpimain.vf -------8< --------------------------

false [if] \ Creating the layout


[else] \ The real code

\ 2
nodeConstants --C32-- --C31-- --C30-- --C29-- --C28-- --C27-- --C26-- --C25-- --C24-- --C23-- --C33-- --C34-- --C15-- --C16-- --C17-- --C18-- --C19-- --C20-- --C21-- --C22-- TERMINAL --C_END-- --C14-- --C13-- --C12-- --C11-- --C10-- --C09-- --C08-- --N19-- --N00-- -GENERATOR- --C01-- --C02-- --C03-- --C04-- --C05-- --C06-- --C07-- --N09--

[then]

[DEFINED] TERMINAL [IF] include E:\KS\Forth\vf\projects\parpi\terminal.vf [THEN]
[DEFINED] -GENERATOR- [IF] include E:\KS\Forth\vf\projects\parpi\generator.vf [THEN]
[DEFINED] --C_END-- [IF] include E:\KS\Forth\vf\projects\parpi\c_end.vf [THEN]

include E:\KS\Forth\vf\projects\parpi\Cs.vf
decimal

-? macro: connection1,
TERMINAL _connect # a!
-GENERATOR- _connect # b!
macro;

-? macro: connection2,
--C34-- _connect # b!
macro;

-? macro: recv, ( -- char)
begin rx? -until not \ wait for a keypress
macro;


-? macro: @TERMINAL @a macro;
-? macro: !TERMINAL !a macro;
-? macro: !GENERATOR !b macro;
-? macro: @--Cx-- @b macro;

--C_END-- {node

: !io ( u -) \ initialize I/O device: 0=autobps, 0<>fixed bps
@p+ !TERMINAL !TERMINAL ; | @p+ TERMINAL 's !io jump |

: tx! ( c -) \ terminal output
@p+ !TERMINAL !TERMINAL ; | @p+ not ;

: rx? ( - ~c|0) \ terminal input
@p+ !TERMINAL @TERMINAL ; | !p+ TERMINAL 's rx? jump

: /mod ( n1 n2 -- n3 )
0 # push ( n1 n2 R: n )
dup push ( n1 n2 R: n n2 )
begin ( n1-n*n2 n2 R: n n2 )
( - ) over not . + not push drop pop
-if ( n1-n*n2 R: n n2 )
pop . + pop exit ( n3 n )
else
pop pop ( n1-n*n2 n2 n )
1 # . + push ( n1-n*n2 n2 R: n+1 )
dup push ( n1-n*n2 n2 R: n+1 n2 )
then
again
;

\ Output to TERMINAL node:
\ Decimal representation of n
\ followed by crlf.
: itoa ( n -- )
100 # /mod
if itoa 0 #
else 10 # tx! 13 # tx!
then drop
10 # /mod
48 # . + tx!
48 # . + tx!
;

: passnum recv, dup !GENERATOR BL # xor if else ; then passnum -;

: init here =p
connection1,
dup xor !io \ type space at terminal for autobaud
( space ) BL # tx!
passnum
connection2,
: communicate
@--Cx-- if else init then itoa
communicate -;

node}

------------------------

0 new messages