5 views

Skip to first unread message

Sep 20, 2015, 11:03:29â€¯AM9/20/15

to

Since more people might be interested in this I will re-post this a second

time to include more newsgroups... those two threads will need to be

followed if all responses are to be seen ;)

Here is your dummy decoder example:

Let's turn this into a somewhat of a contest and ofcourse also teaching

lesson... now I am a true teacher... I provided you with most of the code.

The code you will need to write yourself/replace is indicated by // ***

Good luck and may the force be with you !

Ofcourse as announced earlier on 29 september 2015 I will reveal my parallel

solution to all of you !

So you have now 9 days to come up with your own solution before I publish

mine ! ;) =D

For those that missed the start/most of this thread, try googling for

comp.arch and "Parallel decoding lesson for you" by Skybuck.

// Begin of Dummy Decoder Example

program TestProgram;

{$APPTYPE CONSOLE}

{$R *.res}

uses

System.SysUtils;

function Constrain( Para : integer ) : integer;

begin

result := Para;

if Para < 0 then Result := 0;

if Para >= 1 then Result := 1;

end;

procedure Main;

// must put variables here otherwise won't show up in debugger.

var

// information stream, input

Stream : array[0..20] of integer;

// bits representing fields of data

a1,a2,a3,a4 : integer;

b1,b2,b3 : integer;

c1 : integer;

d1,d2,d3,d4,d5,d6 : integer;

// output

RowIndex : integer;

RowCount : integer;

RowLength : array[0..5] of integer;

RowOffset : array[0..5] of integer;

DataOffset : integer;

FieldCount : integer;

FieldLength : array[0..3] of integer;

Processor : array[0..3] of integer;

// debug fields

FieldA : integer;

FieldB : integer;

FieldC : integer;

FieldD : integer;

begin

a1 := 1; a2 := 1; a3:= 1; a4 := 1;

b1 := 1; b2 := 1; b3 := 1;

c1 := 1;

d1 := 1; d2 := 1; d3 := 1; d4 := 1; d5 := 1; d6 := 1;

// compute input fields to compare it later with output fields

FieldA := (a1) or (a2 shl 1) or (a3 shl 2) or (a4 shl 3);

FieldB := (b1) or (b2 shl 1) or (b3 shl 2);

FieldC := (c1);

FieldD := (d1) or (d2 shl 1) or (d3 shl 2) or (d4 shl 3) or (d5 shl 4) or

(d6 shl 5);

// print field values

writeln( 'FieldD: ', FieldD );

writeln( 'FieldA: ', FieldA );

writeln( 'FieldB: ', FieldB );

writeln( 'FieldC: ', FieldC );

writeln;

// number of rows

Stream[0] := 6;

// row lengths

Stream[1] := 4;

Stream[2] := 3;

Stream[3] := 3;

Stream[4] := 2;

Stream[5] := 1;

Stream[6] := 1;

// sorted information stream:

// d1a1b1c1d2a2b2d3a3b3d4a4d5d6

// data bits

Stream[7] := d1;

Stream[8] := a1;

Stream[9] := b1;

Stream[10] := c1;

Stream[11] := d2;

Stream[12] := a2;

Stream[13] := b2;

Stream[14] := d3;

Stream[15] := a3;

Stream[16] := b3;

Stream[17] := d4;

Stream[18] := a4;

Stream[19] := d5;

Stream[20] := d6;

// now the decoding algorithm:

// determine number of rows

RowCount := Stream[0];

// extract row lengths

RowLength[0] := Stream[1];

RowLength[1] := Stream[2];

RowLength[2] := Stream[3];

RowLength[3] := Stream[4];

RowLength[4] := Stream[5];

RowLength[5] := Stream[6];

// determine field count

FieldCount := RowLength[0]; // row[0] indicates number of fields.

// I will help out a bit... by leaving this code in ! ;) seems somewhat

obvious ;)

// first determine data offset properly ! ;) :) 1 for the row count +

RowCount to skip over row lengths.

DataOffset := 1 + RowCount;

RowOffset[0] := DataOffset;

RowOffset[1] := RowOffset[0] + RowLength[0];

RowOffset[2] := RowOffset[1] + RowLength[1];

RowOffset[3] := RowOffset[2] + RowLength[2];

RowOffset[4] := RowOffset[3] + RowLength[3];

RowOffset[5] := RowOffset[4] + RowLength[4];

// some how the data bits from the stream needs to end up in these 4

processors so that it produces the same values

// as below:

// fields may be processed in a different order though.

// *** You will need to replace this code with your own code... and

preferably it should be parallel, fast and somewhat general/scalable. ***

Processor[0] := FieldD;

Processor[1] := FieldA;

Processor[2] := FieldB;

Processor[3] := FieldC;

// print processor values.

writeln( 'Processor[0]: ', Processor[0] );

writeln( 'Processor[1]: ', Processor[1] );

writeln( 'Processor[2]: ', Processor[2] );

writeln( 'Processor[3]: ', Processor[3] );

writeln;

end;

begin

try

Main;

except

on E: Exception do

Writeln(E.ClassName, ': ', E.Message);

end;

ReadLn;

end.

// End of Dummy Decoder Example

Oh one last thing when you think you have a valid solution and/or need to

test more,

replace the initialization of the test data bits with this initialization

for example:

a1 := 1; a2 := 0; a3:= 0; a4 := 1;

b1 := 1; b2 := 1; b3 := 0;

c1 := 1;

d1 := 1; d2 := 1; d3 := 0; d4 := 0; d5 := 1; d6 := 0;

in the original posting (the one before this they were all set to 1).

This helped a bit during the development just to see if it was reading

anything at all at the somewhat correct places.

But this may later put you off if the values are not matching input vs

output.

So this test data in this posting is better to spot any inconsistencies/bugs

in your solution(s).

Bye,

Skybuck.

time to include more newsgroups... those two threads will need to be

followed if all responses are to be seen ;)

Here is your dummy decoder example:

Let's turn this into a somewhat of a contest and ofcourse also teaching

lesson... now I am a true teacher... I provided you with most of the code.

The code you will need to write yourself/replace is indicated by // ***

Good luck and may the force be with you !

Ofcourse as announced earlier on 29 september 2015 I will reveal my parallel

solution to all of you !

So you have now 9 days to come up with your own solution before I publish

mine ! ;) =D

For those that missed the start/most of this thread, try googling for

comp.arch and "Parallel decoding lesson for you" by Skybuck.

// Begin of Dummy Decoder Example

program TestProgram;

{$APPTYPE CONSOLE}

{$R *.res}

uses

System.SysUtils;

function Constrain( Para : integer ) : integer;

begin

result := Para;

if Para < 0 then Result := 0;

if Para >= 1 then Result := 1;

end;

procedure Main;

// must put variables here otherwise won't show up in debugger.

var

// information stream, input

Stream : array[0..20] of integer;

// bits representing fields of data

a1,a2,a3,a4 : integer;

b1,b2,b3 : integer;

c1 : integer;

d1,d2,d3,d4,d5,d6 : integer;

// output

RowIndex : integer;

RowCount : integer;

RowLength : array[0..5] of integer;

RowOffset : array[0..5] of integer;

DataOffset : integer;

FieldCount : integer;

FieldLength : array[0..3] of integer;

Processor : array[0..3] of integer;

// debug fields

FieldA : integer;

FieldB : integer;

FieldC : integer;

FieldD : integer;

begin

a1 := 1; a2 := 1; a3:= 1; a4 := 1;

b1 := 1; b2 := 1; b3 := 1;

c1 := 1;

d1 := 1; d2 := 1; d3 := 1; d4 := 1; d5 := 1; d6 := 1;

// compute input fields to compare it later with output fields

FieldA := (a1) or (a2 shl 1) or (a3 shl 2) or (a4 shl 3);

FieldB := (b1) or (b2 shl 1) or (b3 shl 2);

FieldC := (c1);

FieldD := (d1) or (d2 shl 1) or (d3 shl 2) or (d4 shl 3) or (d5 shl 4) or

(d6 shl 5);

// print field values

writeln( 'FieldD: ', FieldD );

writeln( 'FieldA: ', FieldA );

writeln( 'FieldB: ', FieldB );

writeln( 'FieldC: ', FieldC );

writeln;

// number of rows

Stream[0] := 6;

// row lengths

Stream[1] := 4;

Stream[2] := 3;

Stream[3] := 3;

Stream[4] := 2;

Stream[5] := 1;

Stream[6] := 1;

// sorted information stream:

// d1a1b1c1d2a2b2d3a3b3d4a4d5d6

// data bits

Stream[7] := d1;

Stream[8] := a1;

Stream[9] := b1;

Stream[10] := c1;

Stream[11] := d2;

Stream[12] := a2;

Stream[13] := b2;

Stream[14] := d3;

Stream[15] := a3;

Stream[16] := b3;

Stream[17] := d4;

Stream[18] := a4;

Stream[19] := d5;

Stream[20] := d6;

// now the decoding algorithm:

// determine number of rows

RowCount := Stream[0];

// extract row lengths

RowLength[0] := Stream[1];

RowLength[1] := Stream[2];

RowLength[2] := Stream[3];

RowLength[3] := Stream[4];

RowLength[4] := Stream[5];

RowLength[5] := Stream[6];

// determine field count

FieldCount := RowLength[0]; // row[0] indicates number of fields.

// I will help out a bit... by leaving this code in ! ;) seems somewhat

obvious ;)

// first determine data offset properly ! ;) :) 1 for the row count +

RowCount to skip over row lengths.

DataOffset := 1 + RowCount;

RowOffset[0] := DataOffset;

RowOffset[1] := RowOffset[0] + RowLength[0];

RowOffset[2] := RowOffset[1] + RowLength[1];

RowOffset[3] := RowOffset[2] + RowLength[2];

RowOffset[4] := RowOffset[3] + RowLength[3];

RowOffset[5] := RowOffset[4] + RowLength[4];

// some how the data bits from the stream needs to end up in these 4

processors so that it produces the same values

// as below:

// fields may be processed in a different order though.

// *** You will need to replace this code with your own code... and

preferably it should be parallel, fast and somewhat general/scalable. ***

Processor[0] := FieldD;

Processor[1] := FieldA;

Processor[2] := FieldB;

Processor[3] := FieldC;

// print processor values.

writeln( 'Processor[0]: ', Processor[0] );

writeln( 'Processor[1]: ', Processor[1] );

writeln( 'Processor[2]: ', Processor[2] );

writeln( 'Processor[3]: ', Processor[3] );

writeln;

end;

begin

try

Main;

except

on E: Exception do

Writeln(E.ClassName, ': ', E.Message);

end;

ReadLn;

end.

// End of Dummy Decoder Example

Oh one last thing when you think you have a valid solution and/or need to

test more,

replace the initialization of the test data bits with this initialization

for example:

a1 := 1; a2 := 0; a3:= 0; a4 := 1;

b1 := 1; b2 := 1; b3 := 0;

c1 := 1;

d1 := 1; d2 := 1; d3 := 0; d4 := 0; d5 := 1; d6 := 0;

in the original posting (the one before this they were all set to 1).

This helped a bit during the development just to see if it was reading

anything at all at the somewhat correct places.

But this may later put you off if the values are not matching input vs

output.

So this test data in this posting is better to spot any inconsistencies/bugs

in your solution(s).

Bye,

Skybuck.

Sep 20, 2015, 1:08:39â€¯PM9/20/15

to

Or point your newsgroup reader to newsgroup: comp.arch

Then see thread/topic: "Parallel decoding lesson for you" by Skybuck.

Bye,

Skybuck.

Then see thread/topic: "Parallel decoding lesson for you" by Skybuck.

Bye,

Skybuck.

Sep 21, 2015, 2:55:22â€¯AM9/21/15

to

Just to be clear on this, the code you have to write doesn't need to be

truely parallel.

It must be parallel in potential, so it should be able to execute

independenlty from each other and out of order.

Bye,

Skybuck.

truely parallel.

It must be parallel in potential, so it should be able to execute

independenlty from each other and out of order.

Bye,

Skybuck.

Sep 23, 2015, 5:26:07â€¯AM9/23/15

to

The example may be modified as much as needed.

For now my solution needs a little reading pad to avoid costly mods or

branches or whatever.

I think this is a nice speedy solution, so code may be modified as follows:

const

MaxProcessorCount = 4;

count to create a safe "padding" for reading so no out of bounds/range check

errors with arrays.

Bye,

Sybuck.

For now my solution needs a little reading pad to avoid costly mods or

branches or whatever.

I think this is a nice speedy solution, so code may be modified as follows:

const

MaxProcessorCount = 4;

var

// information stream, input

Stream : array[0..20+(MaxProcessorCount-1)] of integer; // add max processor
// information stream, input

count to create a safe "padding" for reading so no out of bounds/range check

errors with arrays.

Bye,

Sybuck.

Sep 23, 2015, 5:39:38â€¯AM9/23/15

to

Here is the C version of the example in case your Delphi-to-C skills are not

so great or you lazy lol =D:

// ParallelDecodingCVersion.cpp : Defines the entry point for the console

application.

//

#include "stdafx.h"

// Begin of Dummy Decoder Example

const int MaxProcessorCount = 4;

int _tmain(int argc, _TCHAR* argv[])

{

// information stream, input

int Stream[21+(MaxProcessorCount-1)]; // add max processor count to create a

int b1,b2,b3;

int c1;

int d1,d2,d3,d4,d5,d6;

// output

int RowIndex;

int RowCount;

int RowLength[6];

int RowOffset[6];

int DataOffset;

int FieldCount;

int FieldLength;

int Processor[4];

// debug fields

int FieldA;

int FieldB;

int FieldC;

int FieldD;

a1 = 1; a2 = 1; a3 = 1; a4 = 1;

b1 = 1; b2 = 1; b3 = 1;

c1 = 1;

d1 = 1; d2 = 1; d3 = 1; d4 = 1; d5 = 1; d6 = 1;

// compute input fields to compare it later with output fields

FieldA = (a1) | (a2 << 1) | (a3 << 2) | (a4 << 3);

FieldB = (b1) | (b2 << 1) | (b3 << 2);

FieldC = (c1);

FieldD = (d1) | (d2 << 1) | (d3 << 2) | (d4 << 3) | (d5 << 4) | (d6 << 5);

// print field values

printf( "FieldD: %d \n", FieldD );

printf( "FieldA: %d \n", FieldA );

printf( "FieldB: %d \n", FieldB );

printf( "FieldC: %d \n\n", FieldC );

// RowCount to skip over row lengths.

printf( "Processor[1]: %d \n", Processor[1] );

printf( "Processor[2]: %d \n", Processor[2] );

printf( "Processor[3]: %d \n\n", Processor[3] );

return 0;

Skybuck :)

so great or you lazy lol =D:

// ParallelDecodingCVersion.cpp : Defines the entry point for the console

application.

//

#include "stdafx.h"

// Begin of Dummy Decoder Example

int _tmain(int argc, _TCHAR* argv[])

{

// information stream, input

int Stream[21+(MaxProcessorCount-1)]; // add max processor count to create a

safe "padding" for reading so no out of bounds/range check errors with

arrays.

arrays.

// bits representing fields of data

int a1,a2,a3,a4;
int b1,b2,b3;

int c1;

int d1,d2,d3,d4,d5,d6;

// output

int RowIndex;

int RowCount;

int RowLength[6];

int RowOffset[6];

int DataOffset;

int FieldCount;

int FieldLength;

int Processor[4];

// debug fields

int FieldA;

int FieldB;

int FieldC;

int FieldD;

a1 = 1; a2 = 1; a3 = 1; a4 = 1;

b1 = 1; b2 = 1; b3 = 1;

c1 = 1;

d1 = 1; d2 = 1; d3 = 1; d4 = 1; d5 = 1; d6 = 1;

// compute input fields to compare it later with output fields

FieldB = (b1) | (b2 << 1) | (b3 << 2);

FieldC = (c1);

FieldD = (d1) | (d2 << 1) | (d3 << 2) | (d4 << 3) | (d5 << 4) | (d6 << 5);

// print field values

printf( "FieldD: %d \n", FieldD );

printf( "FieldA: %d \n", FieldA );

printf( "FieldB: %d \n", FieldB );

printf( "FieldC: %d \n\n", FieldC );

DataOffset = 1 + RowCount;

RowOffset[0] = DataOffset;

RowOffset[1] = RowOffset[0] + RowLength[0];

RowOffset[2] = RowOffset[1] + RowLength[1];

RowOffset[3] = RowOffset[2] + RowLength[2];

RowOffset[4] = RowOffset[3] + RowLength[3];

RowOffset[5] = RowOffset[4] + RowLength[4];

// some how the data bits from the stream needs to end up in these 4

// processors so that it produces the same values
RowOffset[0] = DataOffset;

RowOffset[1] = RowOffset[0] + RowLength[0];

RowOffset[2] = RowOffset[1] + RowLength[1];

RowOffset[3] = RowOffset[2] + RowLength[2];

RowOffset[4] = RowOffset[3] + RowLength[3];

RowOffset[5] = RowOffset[4] + RowLength[4];

// some how the data bits from the stream needs to end up in these 4

// as below:

// fields may be processed in a different order though.

// *** You will need to replace this code with your own code... and

// preferably it should be parallel, fast and somewhat general/scalable. ***
// fields may be processed in a different order though.

// *** You will need to replace this code with your own code... and

Processor[0] = FieldD;

Processor[1] = FieldA;

Processor[2] = FieldB;

Processor[3] = FieldC;

// print processor values.

printf( "Processor[0]: %d \n", Processor[0] );
Processor[1] = FieldA;

Processor[2] = FieldB;

Processor[3] = FieldC;

// print processor values.

printf( "Processor[1]: %d \n", Processor[1] );

printf( "Processor[2]: %d \n", Processor[2] );

printf( "Processor[3]: %d \n\n", Processor[3] );

return 0;

}

// End of Dummy Decoder Example

Bye,
// End of Dummy Decoder Example

Skybuck :)

Sep 23, 2015, 5:43:40â€¯AM9/23/15

to

Also here is test set 2 to test input values:

// input test 2, c version

Skybuck.

// input test 2, c version

a1 = 1; a2 = 0; a3 = 0; a4 = 1;

b1 = 1; b2 = 1; b3 = 0;

c1 = 1;

d1 = 1; d2 = 1; d3 = 0; d4 = 0; d5 = 1; d6 = 0;

Bye,
b1 = 1; b2 = 1; b3 = 0;

c1 = 1;

d1 = 1; d2 = 1; d3 = 0; d4 = 0; d5 = 1; d6 = 0;

Skybuck.

Sep 29, 2015, 3:09:07â€¯AM9/29/15

to

Hello,

(It is now 29 september 2015)

As promised here is Skybuck's Parallel Universal Code demonstration program.

This posting contains a Delphi and C/C++ version for you to learn from.

(I was kinda thinking of adding some kind of case statement/array and out of

order execution/randomization to proof that the lines can be executed in any

order, though I didn't come around to that (yet) been playing World of

Warships =D at 1 fps to 30 fps lol)

(If anybody doubt that these lines can be executed out of order I may

eventually add such a feature as welll.. maybe I will do it even for the fun

of it ! ;))

(I was also maybe thinking of trying to through it all into a nice

Tprocessor class with an actual thread to show it off as well... didn't come

around to that yet either ;):))

(However if you clearly examine the indexes used you will discover that

there is no overlap, all indexes are uniquely read and such... no

write-read-write-read-sequential stuff going on... it can all be read in

parallel... except for building rowoffset array).

// ** Begin of Delphi Program ***

program TestProgram;

{$APPTYPE CONSOLE}

{$R *.res}

uses

System.SysUtils;

{

Skybuck's Parallel Universal Code, Demonstration program

version 0.01 created on 19 september 2015

This demonstration program will illustrate how to decode fields of data in

parallel.

For this demonstration program bit fiddling will be avoided by allowing 1

bit per array index to illustrate the general idea.

Also lengths may be stored by 1 integer per array index. In a real world

scenerio it would be Skybuck Universally Encoded.

Skybuck's Parallel Universal Code, Design Document:

version 1 created on 16 september 2015 (after seeing another mentioning of

IBM's processor MIL which makes me sick, as if decoding fields in parallel

is hard ! LOL ;) :))

(I also wrote a little introductionary question for it see other file for it

and I even learned something from it: Parallel decoding lesson for you.txt)

All bits of the fields are split up in such a way that the first bit of each

field is right next to each other, the second bit of each field is also next

to each other, and so forth.

Conceptually view:

First currently situation

Field A consists out of a1a2a3a4 (4 bits)

Field B consists out of b1b2b3 (3 bits)

Field C consists out of c1 (1 bit)

Field D consists out of d1d2d3d4d5d6 ( 6 bits)

These bits are stored as follows:

"First row":

a1b1c1d1

"Second row":

a2b2d2

"Third row":

a3b3d3

"Fourth row":

a4d4

"Fiveth row":

d5

"Sixth row":

d6

These rows will be stores sequentially as follows:

a1b1c1d1a2b2d2a3b3d4a4d4d5d6

Now the question is: How does a processor know where each row begins ? and

how many bits of each field there is, the answers are given below:

The bit length of each row is stored preemptively/prefixed:

"First row": 4

"Second row": 3

"Third row": 3

"Fourth row": 2

"Fiveth row": 1

"Sixth row" : 1

Now for each row their offset can be computed:

First row starts at 0,

Second row starts at 0 + 4 = 4

Third row starts at 0 + 4 + 3 = 7

Fouth row starts at 0 + 4 + 3 + 3 = 10

Fiveth row starts at 0 + 4 + 3 + 3 + 2 = 12

Sixth row starts at 0 + 4+ 3 + 3+ 2 + 1 = 13

Let's check if this is true:

0 1 2 3 4 5 6 7 8 9 10 11 12 13

a1 b1 c1 d1 a2 b2 d2 a3 b3 d3 a4 d4 d5 d6

Bingo, all match.

The processor can compute the offsets of each row.

And thus the processor can reach each field in parallel as follows:

Read field A bit 0 at offset 0

Read field B bit 0 at offset 4

Read field C bit 0 at offset 7

Read field D bit 0 at offset 10

Now the question is how can the processor know when to stop reading ?

A marker/meta bit could be used like described in Skybuck's Universal Code

version 1, which indicates if the field continues or stops.

These bits can be stored in the same way as the data bits above.

And thus for each data bit, a meta bit can be read as well.

This way the processor knows that C2 does not exist and can stop reading

field C

For huffman codes that would not even be required, since the processor can

see in the huffman tree when it reaches a leave/end node and then it will

know the end of C was reached.

So marker/beta bits can be avoided by using Huffman codes ! Pretty neeto !

;) =D

However huffman has a drawback that some fields might get large, and may not

be suited for rare information and modifications and so forth ! ;)

The last thing to do to make this usuable is to include another prefix field

which indicates how many prefixes there are.

So final stream will look something like:

[Number of Fields][Set of Field Lengths][Set of Field Data Bits

Intermixed/Parallel]

First field can be universally coded.

Second set of fields can be universally coded.

Third set of field contains data bits intermixed/in parallel.

Additional:

The problem can be solved by sorting the fields from largest to smallest

field, so new stream looks like:

(sorting from smallest to largest would be possible too... but code below

assumes first field is largest so it uses

largest to smallest sorting solution which is also applied to the stream A,

so stream A below is sorted that way)

Stream A: 433211d1a1b1c1d2a2b2d3a3b3d4a4d5d6

Bye,

Skybuck.

}

function Constrain( Para : integer ) : integer;

begin

result := Para;

if Para < 0 then Result := 0;

if Para >= 1 then Result := 1;

end;

procedure Main;

// must put variables here otherwise won't show up in debugger.

DataOffset : integer;

FieldCount : integer;

FieldLength : array[0..3] of integer;

Processor : array[0..3] of integer;

// debug fields

FieldA : integer;

FieldB : integer;

FieldC : integer;

FieldD : integer;

begin

FieldC := (c1);

// print field values

// let's assume first field is largest so it always consumes all row

information.

// should be 0 if negative, should be zero if zero, should be 1 if

positive so and will do the trick to constaint it.

// the -0, -1, -2, -3 represents subtracting the processor

number/identify from it... to allow parallel processing ! ;)

// contraint is wrong... hahga unnt.

FieldRowMultiplier[0,0] := Constrain(RowLength[0]-0); // shoudl be set

to one if larger.

FieldRowMultiplier[0,1] := Constrain(RowLength[1]-0);

FieldRowMultiplier[0,2] := Constrain(RowLength[2]-0);

FieldRowMultiplier[0,3] := Constrain(RowLength[3]-0);

FieldRowMultiplier[0,4] := Constrain(RowLength[4]-0);

FieldRowMultiplier[0,5] := Constrain(RowLength[5]-0);

// now second field may consume less if there are not enough bits.

FieldRowMultiplier[1,0] := Constrain(RowLength[0]-1);

FieldRowMultiplier[1,1] := Constrain(RowLength[1]-1);

FieldRowMultiplier[1,2] := Constrain(RowLength[2]-1);

FieldRowMultiplier[1,3] := Constrain(RowLength[3]-1);

FieldRowMultiplier[1,4] := Constrain(RowLength[4]-1);

FieldRowMultiplier[1,5] := Constrain(RowLength[5]-1);

// now third field may consume less if there are not enough bits.

FieldRowMultiplier[2,0] := Constrain(RowLength[0]-2);

FieldRowMultiplier[2,1] := Constrain(RowLength[1]-2);

FieldRowMultiplier[2,2] := Constrain(RowLength[2]-2);

FieldRowMultiplier[2,3] := Constrain(RowLength[3]-2);

FieldRowMultiplier[2,4] := Constrain(RowLength[4]-2);

FieldRowMultiplier[2,5] := Constrain(RowLength[5]-2);

// now fourth field may consume less if there are not enough bits.

FieldRowMultiplier[3,0] := Constrain(RowLength[0]-3);

FieldRowMultiplier[3,1] := Constrain(RowLength[1]-3);

FieldRowMultiplier[3,2] := Constrain(RowLength[2]-3);

FieldRowMultiplier[3,3] := Constrain(RowLength[3]-3);

FieldRowMultiplier[3,4] := Constrain(RowLength[4]-3);

FieldRowMultiplier[3,5] := Constrain(RowLength[5]-3);

// now compute field lengths

// not necessary to multiply anything just add them up ! ;) =D

FieldLength[0] :=

FieldRowMultiplier[0,0] +

FieldRowMultiplier[0,1] +

FieldRowMultiplier[0,2] +

FieldRowMultiplier[0,3] +

FieldRowMultiplier[0,4] +

FieldRowMultiplier[0,5];

FieldLength[1] :=

FieldRowMultiplier[1,0] +

FieldRowMultiplier[1,1] +

FieldRowMultiplier[1,2] +

FieldRowMultiplier[1,3] +

FieldRowMultiplier[1,4] +

FieldRowMultiplier[1,5];

FieldLength[2] :=

FieldRowMultiplier[2,0] +

FieldRowMultiplier[2,1] +

FieldRowMultiplier[2,2] +

FieldRowMultiplier[2,3] +

FieldRowMultiplier[2,4] +

FieldRowMultiplier[2,5];

FieldLength[3] :=

FieldRowMultiplier[3,0] +

FieldRowMultiplier[3,1] +

FieldRowMultiplier[3,2] +

FieldRowMultiplier[3,3] +

FieldRowMultiplier[3,4] +

FieldRowMultiplier[3,5];

// though the field multipliers could come in handy later to read the

bits ! nice ! ;) =D

// for each row the offset must be calculated this can be done serially

or by all processors for themselfes at the same time:

// row zero starts after the number of rows which is indicated by the

first stream value =D

// first determine data offset properly ! ;) :) 1 for the row count +

// first determine number of fields

// now that all row offsets are calculated it's possible to decode the

stream into each processor, by each processor.

// each processor knows it's own number/identitiy represented by the +0

+1 +2 +3 down below:

// the general idea is:

// row[] here is RowOffset[]

{

Processor[0] :=

Stream[Row[0]+0]+Stream[Row[1]+0]+Stream[Row[2]+0]+Stream[Row[3]+0]+Stream[Row[4]+0]+Stream[Row[5]+0];

Processor[1] :=

Stream[Row[0]+1]+Stream[Row[1]+1]+Stream[Row[2]+1]+Stream[Row[3]+1]+Stream[Row[4]+1]+Stream[Row[5]+1];

Processor[2] :=

Stream[Row[0]+2]+Stream[Row[1]+2]+Stream[Row[2]+2]+Stream[Row[3]+2]+Stream[Row[4]+2]+Stream[Row[5]+2];

Processor[3] :=

Stream[Row[0]+3]+Stream[Row[1]+3]+Stream[Row[2]+3]+Stream[Row[3]+3]+Stream[Row[4]+3]+Stream[Row[5]+3];

}

// however a processor should only include the bits of a stream if it's

within the field's length for that we need

// to compute each field length and here we have an oops ! ;) :) cannot

compute field length if multiple fields

// per row. or can we ?! ;) :)perhaps we can... we know that fields have

at least 1 bit because of row 0

// and we know which fields have 2 bits because of row 2 and so forth...

and since all fields

// are ditributed parallely... we don't actually need to know where

their bits are.... cause they all packed lol...

// we only need to know what max length is or something... though how

can we dan be sure that a field ends up

// wehere it needs to be... well we dont... a field can end up in any

processor.

// so how should it actually look like then... well as follows:

// since the fields are stored as follows: A,B,C,D we know A ends up in

processor 0, B in 1, C in 2, D in 3 and so forth.

// thus... processor 0 can determine length of A by looking at row[0],

row[1], row[2], row[3], row[4], row[5], row[6].

// but how to know which count matches to who's field ?

// is the length of row 5 for A or B or C or D ? we don't know do we ?!

;)

// now we should be able to solve the problem... we know the bits belong

to the first few fields... cool ! ;) =D

// now we should be able to solve it easily... by only including a bit

from the stream if the multiplier is set to 1 ;) :)

// and now only thing left to do is shifting the bits into proper

position and or-ing them together ! ;)

Processor[0] :=

((Stream[RowOffset[0]+0]*FieldRowMultiplier[0,0]) shl 0) or

((Stream[RowOffset[1]+0]*FieldRowMultiplier[0,1]) shl 1) or

((Stream[RowOffset[2]+0]*FieldRowMultiplier[0,2]) shl 2) or

((Stream[RowOffset[3]+0]*FieldRowMultiplier[0,3]) shl 3) or

((Stream[RowOffset[4]+0]*FieldRowMultiplier[0,4]) shl 4) or

((Stream[RowOffset[5]+0]*FieldRowMultiplier[0,5]) shl 5);

Processor[1] :=

((Stream[RowOffset[0]+1]*FieldRowMultiplier[1,0]) shl 0) or

((Stream[RowOffset[1]+1]*FieldRowMultiplier[1,1]) shl 1) or

((Stream[RowOffset[2]+1]*FieldRowMultiplier[1,2]) shl 2) or

((Stream[RowOffset[3]+1]*FieldRowMultiplier[1,3]) shl 3) or

((Stream[RowOffset[4]+1]*FieldRowMultiplier[1,4]) shl 4) or

((Stream[RowOffset[5]+1]*FieldRowMultiplier[1,5]) shl 5);

Processor[2] :=

((Stream[RowOffset[0]+2]*FieldRowMultiplier[2,0]) shl 0) or

((Stream[RowOffset[1]+2]*FieldRowMultiplier[2,1]) shl 1) or

((Stream[RowOffset[2]+2]*FieldRowMultiplier[2,2]) shl 2) or

((Stream[RowOffset[3]+2]*FieldRowMultiplier[2,3]) shl 3) or

((Stream[RowOffset[4]+2]*FieldRowMultiplier[2,4]) shl 4) or

((Stream[RowOffset[5]+2]*FieldRowMultiplier[2,5]) shl 5);

Processor[3] :=

((Stream[RowOffset[0]+3]*FieldRowMultiplier[3,0]) shl 0) or

((Stream[RowOffset[1]+3]*FieldRowMultiplier[3,1]) shl 1) or

((Stream[RowOffset[2]+3]*FieldRowMultiplier[3,2]) shl 2) or

((Stream[RowOffset[3]+3]*FieldRowMultiplier[3,3]) shl 3) or

((Stream[RowOffset[4]+3]*FieldRowMultiplier[3,4]) shl 4) or

((Stream[RowOffset[5]+3]*FieldRowMultiplier[3,5]) shl 5);

// *** STILL TO DO (solved by extending array):

********************************

// *** ^^^ may have to look into potential out of range problem ^^^ ****

// *********************************************************************

// for now it seems ok, also as long as stream array has

+NumberOfProcessors scratch pad at end it may be ok ! ;) :)

// one last problem might remain... the row offset may go out of

range...

// we could either use a scratch pad or solve this in another way.

// I think it's best to leave it as as and perhaps make the stream a bit

larger or omething let's see what happens.

// print processor values.

writeln( 'Processor[0]: ', Processor[0] );

writeln( 'Processor[1]: ', Processor[1] );

writeln( 'Processor[2]: ', Processor[2] );

writeln( 'Processor[3]: ', Processor[3] );

writeln;

end;

begin

try

Main;

except

on E: Exception do

Writeln(E.ClassName, ': ', E.Message);

end;

ReadLn;

end.

// *** End of Delphi Program ***

// *** Begin of C/C++ Program ***

// ParallelDecodingCVersion.cpp : Defines the entry point for the console

application.

//

// Full C version 0.01 created on 23 september 2015 by Skybuck Flying =D

#include "stdafx.h"

// Begin of Dummy Decoder Example

const int MaxProcessorCount = 4;

int Constrain( int Para )

{

if (Para < 0) Para = 0;

if (Para >= 1) Para = 1;

return Para;

int DataOffset;

int FieldCount;

int FieldLength[4];

int Processor[4];

// debug fields

int FieldA;

int FieldB;

int FieldC;

int FieldD;

// input test 1

/*

// input test 2

a1 = 1; a2 = 0; a3 = 0; a4 = 1;

b1 = 1; b2 = 1; b3 = 0;

c1 = 1;

d1 = 1; d2 = 1; d3 = 0; d4 = 0; d5 = 1; d6 = 0;

// let's assume first field is largest so it always consumes all row

information.

// should be 0 if negative, should be zero if zero, should be 1 if

positive so and will do the trick to constaint it.

// the -0, -1, -2, -3 represents subtracting the processor

number/identify from it... to allow parallel processing ! ;)

// contraint is wrong... hahga unnt.

FieldRowMultiplier[0][0] = Constrain(RowLength[0]-0); // shoudl be set

to one if larger.

FieldRowMultiplier[0][1] = Constrain(RowLength[1]-0);

FieldRowMultiplier[0][2] = Constrain(RowLength[2]-0);

FieldRowMultiplier[0][3] = Constrain(RowLength[3]-0);

FieldRowMultiplier[0][4] = Constrain(RowLength[4]-0);

FieldRowMultiplier[0][5] = Constrain(RowLength[5]-0);

// now second field may consume less if there are not enough bits.

FieldRowMultiplier[1][0] = Constrain(RowLength[0]-1);

FieldRowMultiplier[1][1] = Constrain(RowLength[1]-1);

FieldRowMultiplier[1][2] = Constrain(RowLength[2]-1);

FieldRowMultiplier[1][3] = Constrain(RowLength[3]-1);

FieldRowMultiplier[1][4] = Constrain(RowLength[4]-1);

FieldRowMultiplier[1][5] = Constrain(RowLength[5]-1);

// now third field may consume less if there are not enough bits.

FieldRowMultiplier[2][0] = Constrain(RowLength[0]-2);

FieldRowMultiplier[2][1] = Constrain(RowLength[1]-2);

FieldRowMultiplier[2][2] = Constrain(RowLength[2]-2);

FieldRowMultiplier[2][3] = Constrain(RowLength[3]-2);

FieldRowMultiplier[2][4] = Constrain(RowLength[4]-2);

FieldRowMultiplier[2][5] = Constrain(RowLength[5]-2);

// now fourth field may consume less if there are not enough bits.

FieldRowMultiplier[3][0] = Constrain(RowLength[0]-3);

FieldRowMultiplier[3][1] = Constrain(RowLength[1]-3);

FieldRowMultiplier[3][2] = Constrain(RowLength[2]-3);

FieldRowMultiplier[3][3] = Constrain(RowLength[3]-3);

FieldRowMultiplier[3][4] = Constrain(RowLength[4]-3);

FieldRowMultiplier[3][5] = Constrain(RowLength[5]-3);

// now compute field lengths

// not necessary to multiply anything just add them up ! ;) =D

FieldLength[0] =

FieldRowMultiplier[0][0] +

FieldRowMultiplier[0][1] +

FieldRowMultiplier[0][2] +

FieldRowMultiplier[0][3] +

FieldRowMultiplier[0][4] +

FieldRowMultiplier[0][5];

FieldLength[1] =

FieldRowMultiplier[1][0] +

FieldRowMultiplier[1][1] +

FieldRowMultiplier[1][2] +

FieldRowMultiplier[1][3] +

FieldRowMultiplier[1][4] +

FieldRowMultiplier[1][5];

FieldLength[2] =

FieldRowMultiplier[2][0] +

FieldRowMultiplier[2][1] +

FieldRowMultiplier[2][2] +

FieldRowMultiplier[2][3] +

FieldRowMultiplier[2][4] +

FieldRowMultiplier[2][5];

FieldLength[3] =

FieldRowMultiplier[3][0] +

FieldRowMultiplier[3][1] +

FieldRowMultiplier[3][2] +

FieldRowMultiplier[3][3] +

FieldRowMultiplier[3][4] +

FieldRowMultiplier[3][5];

// though the field multipliers could come in handy later to read the

bits ! nice ! ;) =D

// for each row the offset must be calculated this can be done serially

or by all processors for themselfes at the same time:

// row zero starts after the number of rows which is indicated by the

first stream value =D

// first determine data offset properly ! ;) :) 1 for the row count +

// first determine number of fields

// now that all row offsets are calculated it's possible to decode the

stream into each processor, by each processor.

// each processor knows it's own number/identitiy represented by the +0

+1 +2 +3 down below:

// the general idea is:

// row[] here is RowOffset[]

/*

Processor[0] :=

Stream[Row[0]+0]+Stream[Row[1]+0]+Stream[Row[2]+0]+Stream[Row[3]+0]+Stream[Row[4]+0]+Stream[Row[5]+0];

Processor[1] :=

Stream[Row[0]+1]+Stream[Row[1]+1]+Stream[Row[2]+1]+Stream[Row[3]+1]+Stream[Row[4]+1]+Stream[Row[5]+1];

Processor[2] :=

Stream[Row[0]+2]+Stream[Row[1]+2]+Stream[Row[2]+2]+Stream[Row[3]+2]+Stream[Row[4]+2]+Stream[Row[5]+2];

Processor[3] :=

Stream[Row[0]+3]+Stream[Row[1]+3]+Stream[Row[2]+3]+Stream[Row[3]+3]+Stream[Row[4]+3]+Stream[Row[5]+3];

*/

// however a processor should only include the bits of a stream if it's

within the field's length for that we need

// to compute each field length and here we have an oops ! ;) :) cannot

compute field length if multiple fields

// per row. or can we ?! ;) :)perhaps we can... we know that fields have

at least 1 bit because of row 0

// and we know which fields have 2 bits because of row 2 and so forth...

and since all fields

// are ditributed parallely... we don't actually need to know where

their bits are.... cause they all packed lol...

// we only need to know what max length is or something... though how

can we dan be sure that a field ends up

// wehere it needs to be... well we dont... a field can end up in any

processor.

// so how should it actually look like then... well as follows:

// since the fields are stored as follows: A,B,C,D we know A ends up in

processor 0, B in 1, C in 2, D in 3 and so forth.

// thus... processor 0 can determine length of A by looking at row[0],

row[1], row[2], row[3], row[4], row[5], row[6].

// but how to know which count matches to who's field ?

// is the length of row 5 for A or B or C or D ? we don't know do we ?!

;)

// now we should be able to solve the problem... we know the bits belong

to the first few fields... cool ! ;) =D

// now we should be able to solve it easily... by only including a bit

from the stream if the multiplier is set to 1 ;) :)

// and now only thing left to do is shifting the bits into proper

position and or-ing them together ! ;)

Processor[0] =

((Stream[RowOffset[0]+0]*FieldRowMultiplier[0][0]) << 0) |

((Stream[RowOffset[1]+0]*FieldRowMultiplier[0][1]) << 1) |

((Stream[RowOffset[2]+0]*FieldRowMultiplier[0][2]) << 2) |

((Stream[RowOffset[3]+0]*FieldRowMultiplier[0][3]) << 3) |

((Stream[RowOffset[4]+0]*FieldRowMultiplier[0][4]) << 4) |

((Stream[RowOffset[5]+0]*FieldRowMultiplier[0][5]) << 5);

Processor[1] =

((Stream[RowOffset[0]+1]*FieldRowMultiplier[1][0]) << 0) |

((Stream[RowOffset[1]+1]*FieldRowMultiplier[1][1]) << 1) |

((Stream[RowOffset[2]+1]*FieldRowMultiplier[1][2]) << 2) |

((Stream[RowOffset[3]+1]*FieldRowMultiplier[1][3]) << 3) |

((Stream[RowOffset[4]+1]*FieldRowMultiplier[1][4]) << 4) |

((Stream[RowOffset[5]+1]*FieldRowMultiplier[1][5]) << 5);

Processor[2] =

((Stream[RowOffset[0]+2]*FieldRowMultiplier[2][0]) << 0) |

((Stream[RowOffset[1]+2]*FieldRowMultiplier[2][1]) << 1) |

((Stream[RowOffset[2]+2]*FieldRowMultiplier[2][2]) << 2) |

((Stream[RowOffset[3]+2]*FieldRowMultiplier[2][3]) << 3) |

((Stream[RowOffset[4]+2]*FieldRowMultiplier[2][4]) << 4) |

((Stream[RowOffset[5]+2]*FieldRowMultiplier[2][5]) << 5);

Processor[3] =

((Stream[RowOffset[0]+3]*FieldRowMultiplier[3][0]) << 0) |

((Stream[RowOffset[1]+3]*FieldRowMultiplier[3][1]) << 1) |

((Stream[RowOffset[2]+3]*FieldRowMultiplier[3][2]) << 2) |

((Stream[RowOffset[3]+3]*FieldRowMultiplier[3][3]) << 3) |

((Stream[RowOffset[4]+3]*FieldRowMultiplier[3][4]) << 4) |

((Stream[RowOffset[5]+3]*FieldRowMultiplier[3][5]) << 5);

// *** STILL TO DO (solved by extending array):

********************************

// *** ^^^ may have to look into potential out of range problem ^^^ ****

// solved by adding padding for reading ;)

// *********************************************************************

// for now it seems ok, also as long as stream array has

+NumberOfProcessors scratch pad at end it may be ok ! ;) :)

// one last problem might remain... the row offset may go out of

range...

// we could either use a scratch pad|solve this in another way.

// I think it's best to leave it as as and perhaps make the stream a bit

larger or omething let's see what happens.

// print processor values.

printf( "Processor[0]: %d \n", Processor[0] );

printf( "Processor[1]: %d \n", Processor[1] );

printf( "Processor[2]: %d \n", Processor[2] );

printf( "Processor[3]: %d \n\n", Processor[3] );

return 0;

}

// *** Emd of C/C++ Program ***

Bye,

Skybuck.

(It is now 29 september 2015)

As promised here is Skybuck's Parallel Universal Code demonstration program.

This posting contains a Delphi and C/C++ version for you to learn from.

(I was kinda thinking of adding some kind of case statement/array and out of

order execution/randomization to proof that the lines can be executed in any

order, though I didn't come around to that (yet) been playing World of

Warships =D at 1 fps to 30 fps lol)

(If anybody doubt that these lines can be executed out of order I may

eventually add such a feature as welll.. maybe I will do it even for the fun

of it ! ;))

(I was also maybe thinking of trying to through it all into a nice

Tprocessor class with an actual thread to show it off as well... didn't come

around to that yet either ;):))

(However if you clearly examine the indexes used you will discover that

there is no overlap, all indexes are uniquely read and such... no

write-read-write-read-sequential stuff going on... it can all be read in

parallel... except for building rowoffset array).

// ** Begin of Delphi Program ***

program TestProgram;

{$APPTYPE CONSOLE}

{$R *.res}

uses

System.SysUtils;

Skybuck's Parallel Universal Code, Demonstration program

version 0.01 created on 19 september 2015

This demonstration program will illustrate how to decode fields of data in

parallel.

For this demonstration program bit fiddling will be avoided by allowing 1

bit per array index to illustrate the general idea.

Also lengths may be stored by 1 integer per array index. In a real world

scenerio it would be Skybuck Universally Encoded.

Skybuck's Parallel Universal Code, Design Document:

version 1 created on 16 september 2015 (after seeing another mentioning of

IBM's processor MIL which makes me sick, as if decoding fields in parallel

is hard ! LOL ;) :))

(I also wrote a little introductionary question for it see other file for it

and I even learned something from it: Parallel decoding lesson for you.txt)

All bits of the fields are split up in such a way that the first bit of each

field is right next to each other, the second bit of each field is also next

to each other, and so forth.

Conceptually view:

First currently situation

Field A consists out of a1a2a3a4 (4 bits)

Field B consists out of b1b2b3 (3 bits)

Field C consists out of c1 (1 bit)

Field D consists out of d1d2d3d4d5d6 ( 6 bits)

These bits are stored as follows:

"First row":

a1b1c1d1

"Second row":

a2b2d2

"Third row":

a3b3d3

"Fourth row":

a4d4

"Fiveth row":

d5

"Sixth row":

d6

These rows will be stores sequentially as follows:

a1b1c1d1a2b2d2a3b3d4a4d4d5d6

Now the question is: How does a processor know where each row begins ? and

how many bits of each field there is, the answers are given below:

The bit length of each row is stored preemptively/prefixed:

"First row": 4

"Second row": 3

"Third row": 3

"Fourth row": 2

"Fiveth row": 1

"Sixth row" : 1

Now for each row their offset can be computed:

First row starts at 0,

Second row starts at 0 + 4 = 4

Third row starts at 0 + 4 + 3 = 7

Fouth row starts at 0 + 4 + 3 + 3 = 10

Fiveth row starts at 0 + 4 + 3 + 3 + 2 = 12

Sixth row starts at 0 + 4+ 3 + 3+ 2 + 1 = 13

Let's check if this is true:

0 1 2 3 4 5 6 7 8 9 10 11 12 13

a1 b1 c1 d1 a2 b2 d2 a3 b3 d3 a4 d4 d5 d6

Bingo, all match.

The processor can compute the offsets of each row.

And thus the processor can reach each field in parallel as follows:

Read field A bit 0 at offset 0

Read field B bit 0 at offset 4

Read field C bit 0 at offset 7

Read field D bit 0 at offset 10

Now the question is how can the processor know when to stop reading ?

A marker/meta bit could be used like described in Skybuck's Universal Code

version 1, which indicates if the field continues or stops.

These bits can be stored in the same way as the data bits above.

And thus for each data bit, a meta bit can be read as well.

This way the processor knows that C2 does not exist and can stop reading

field C

For huffman codes that would not even be required, since the processor can

see in the huffman tree when it reaches a leave/end node and then it will

know the end of C was reached.

So marker/beta bits can be avoided by using Huffman codes ! Pretty neeto !

;) =D

However huffman has a drawback that some fields might get large, and may not

be suited for rare information and modifications and so forth ! ;)

The last thing to do to make this usuable is to include another prefix field

which indicates how many prefixes there are.

So final stream will look something like:

[Number of Fields][Set of Field Lengths][Set of Field Data Bits

Intermixed/Parallel]

First field can be universally coded.

Second set of fields can be universally coded.

Third set of field contains data bits intermixed/in parallel.

Additional:

The problem can be solved by sorting the fields from largest to smallest

field, so new stream looks like:

(sorting from smallest to largest would be possible too... but code below

assumes first field is largest so it uses

largest to smallest sorting solution which is also applied to the stream A,

so stream A below is sorted that way)

Stream A: 433211d1a1b1c1d2a2b2d3a3b3d4a4d5d6

Bye,

Skybuck.

}

function Constrain( Para : integer ) : integer;

begin

result := Para;

if Para < 0 then Result := 0;

if Para >= 1 then Result := 1;

end;

procedure Main;

// must put variables here otherwise won't show up in debugger.

const

MaxProcessorCount = 4;

var

// information stream, input

Stream : array[0..20+(MaxProcessorCount-1)] of integer; // add max
MaxProcessorCount = 4;

var

// information stream, input

processor count to create a safe "padding" for reading so no out of

bounds/range check errors with arrays.

// bits representing fields of data

bounds/range check errors with arrays.

// bits representing fields of data

a1,a2,a3,a4 : integer;

b1,b2,b3 : integer;

c1 : integer;

d1,d2,d3,d4,d5,d6 : integer;

// output

RowIndex : integer;

RowCount : integer;

RowLength : array[0..5] of integer;

RowOffset : array[0..5] of integer;

FieldRowMultiplier : array[0..3,0..5] of integer;
b1,b2,b3 : integer;

c1 : integer;

d1,d2,d3,d4,d5,d6 : integer;

// output

RowIndex : integer;

RowCount : integer;

RowLength : array[0..5] of integer;

RowOffset : array[0..5] of integer;

DataOffset : integer;

FieldCount : integer;

FieldLength : array[0..3] of integer;

Processor : array[0..3] of integer;

// debug fields

FieldA : integer;

FieldB : integer;

FieldC : integer;

FieldD : integer;

begin

a1 := 1; a2 := 1; a3:= 1; a4 := 1;

b1 := 1; b2 := 1; b3 := 1;

c1 := 1;

d1 := 1; d2 := 1; d3 := 1; d4 := 1; d5 := 1; d6 := 1;

// compute input fields to compare it later with output fields

b1 := 1; b2 := 1; b3 := 1;

c1 := 1;

d1 := 1; d2 := 1; d3 := 1; d4 := 1; d5 := 1; d6 := 1;

// compute input fields to compare it later with output fields

FieldA := (a1) or (a2 shl 1) or (a3 shl 2) or (a4 shl 3);

FieldB := (b1) or (b2 shl 1) or (b3 shl 2);
FieldC := (c1);

FieldD := (d1) or (d2 shl 1) or (d3 shl 2) or (d4 shl 3) or (d5 shl 4)

or (d6 shl 5);
// print field values

writeln( 'FieldA: ', FieldA );

writeln( 'FieldB: ', FieldB );

writeln( 'FieldC: ', FieldC );

writeln( 'FieldD: ', FieldD );

writeln;
writeln( 'FieldB: ', FieldB );

writeln( 'FieldC: ', FieldC );

writeln( 'FieldD: ', FieldD );

information.

// should be 0 if negative, should be zero if zero, should be 1 if

positive so and will do the trick to constaint it.

// the -0, -1, -2, -3 represents subtracting the processor

number/identify from it... to allow parallel processing ! ;)

// contraint is wrong... hahga unnt.

FieldRowMultiplier[0,0] := Constrain(RowLength[0]-0); // shoudl be set

to one if larger.

FieldRowMultiplier[0,1] := Constrain(RowLength[1]-0);

FieldRowMultiplier[0,2] := Constrain(RowLength[2]-0);

FieldRowMultiplier[0,3] := Constrain(RowLength[3]-0);

FieldRowMultiplier[0,4] := Constrain(RowLength[4]-0);

FieldRowMultiplier[0,5] := Constrain(RowLength[5]-0);

// now second field may consume less if there are not enough bits.

FieldRowMultiplier[1,0] := Constrain(RowLength[0]-1);

FieldRowMultiplier[1,1] := Constrain(RowLength[1]-1);

FieldRowMultiplier[1,2] := Constrain(RowLength[2]-1);

FieldRowMultiplier[1,3] := Constrain(RowLength[3]-1);

FieldRowMultiplier[1,4] := Constrain(RowLength[4]-1);

FieldRowMultiplier[1,5] := Constrain(RowLength[5]-1);

// now third field may consume less if there are not enough bits.

FieldRowMultiplier[2,0] := Constrain(RowLength[0]-2);

FieldRowMultiplier[2,1] := Constrain(RowLength[1]-2);

FieldRowMultiplier[2,2] := Constrain(RowLength[2]-2);

FieldRowMultiplier[2,3] := Constrain(RowLength[3]-2);

FieldRowMultiplier[2,4] := Constrain(RowLength[4]-2);

FieldRowMultiplier[2,5] := Constrain(RowLength[5]-2);

// now fourth field may consume less if there are not enough bits.

FieldRowMultiplier[3,0] := Constrain(RowLength[0]-3);

FieldRowMultiplier[3,1] := Constrain(RowLength[1]-3);

FieldRowMultiplier[3,2] := Constrain(RowLength[2]-3);

FieldRowMultiplier[3,3] := Constrain(RowLength[3]-3);

FieldRowMultiplier[3,4] := Constrain(RowLength[4]-3);

FieldRowMultiplier[3,5] := Constrain(RowLength[5]-3);

// now compute field lengths

// not necessary to multiply anything just add them up ! ;) =D

FieldLength[0] :=

FieldRowMultiplier[0,0] +

FieldRowMultiplier[0,1] +

FieldRowMultiplier[0,2] +

FieldRowMultiplier[0,3] +

FieldRowMultiplier[0,4] +

FieldRowMultiplier[0,5];

FieldLength[1] :=

FieldRowMultiplier[1,0] +

FieldRowMultiplier[1,1] +

FieldRowMultiplier[1,2] +

FieldRowMultiplier[1,3] +

FieldRowMultiplier[1,4] +

FieldRowMultiplier[1,5];

FieldLength[2] :=

FieldRowMultiplier[2,0] +

FieldRowMultiplier[2,1] +

FieldRowMultiplier[2,2] +

FieldRowMultiplier[2,3] +

FieldRowMultiplier[2,4] +

FieldRowMultiplier[2,5];

FieldLength[3] :=

FieldRowMultiplier[3,0] +

FieldRowMultiplier[3,1] +

FieldRowMultiplier[3,2] +

FieldRowMultiplier[3,3] +

FieldRowMultiplier[3,4] +

FieldRowMultiplier[3,5];

// though the field multipliers could come in handy later to read the

bits ! nice ! ;) =D

// for each row the offset must be calculated this can be done serially

or by all processors for themselfes at the same time:

// row zero starts after the number of rows which is indicated by the

first stream value =D

// first determine data offset properly ! ;) :) 1 for the row count +

RowCount to skip over row lengths.

DataOffset := 1 + RowCount;

RowOffset[0] := DataOffset;

RowOffset[1] := RowOffset[0] + RowLength[0];

RowOffset[2] := RowOffset[1] + RowLength[1];

RowOffset[3] := RowOffset[2] + RowLength[2];

RowOffset[4] := RowOffset[3] + RowLength[3];

RowOffset[5] := RowOffset[4] + RowLength[4];

// now calculate and/ore detemrine length of each field
DataOffset := 1 + RowCount;

RowOffset[0] := DataOffset;

RowOffset[1] := RowOffset[0] + RowLength[0];

RowOffset[2] := RowOffset[1] + RowLength[1];

RowOffset[3] := RowOffset[2] + RowLength[2];

RowOffset[4] := RowOffset[3] + RowLength[3];

RowOffset[5] := RowOffset[4] + RowLength[4];

// first determine number of fields

// now that all row offsets are calculated it's possible to decode the

stream into each processor, by each processor.

// each processor knows it's own number/identitiy represented by the +0

+1 +2 +3 down below:

// the general idea is:

// row[] here is RowOffset[]

{

Processor[0] :=

Stream[Row[0]+0]+Stream[Row[1]+0]+Stream[Row[2]+0]+Stream[Row[3]+0]+Stream[Row[4]+0]+Stream[Row[5]+0];

Processor[1] :=

Stream[Row[0]+1]+Stream[Row[1]+1]+Stream[Row[2]+1]+Stream[Row[3]+1]+Stream[Row[4]+1]+Stream[Row[5]+1];

Processor[2] :=

Stream[Row[0]+2]+Stream[Row[1]+2]+Stream[Row[2]+2]+Stream[Row[3]+2]+Stream[Row[4]+2]+Stream[Row[5]+2];

Processor[3] :=

Stream[Row[0]+3]+Stream[Row[1]+3]+Stream[Row[2]+3]+Stream[Row[3]+3]+Stream[Row[4]+3]+Stream[Row[5]+3];

}

// however a processor should only include the bits of a stream if it's

within the field's length for that we need

// to compute each field length and here we have an oops ! ;) :) cannot

compute field length if multiple fields

// per row. or can we ?! ;) :)perhaps we can... we know that fields have

at least 1 bit because of row 0

// and we know which fields have 2 bits because of row 2 and so forth...

and since all fields

// are ditributed parallely... we don't actually need to know where

their bits are.... cause they all packed lol...

// we only need to know what max length is or something... though how

can we dan be sure that a field ends up

// wehere it needs to be... well we dont... a field can end up in any

processor.

// so how should it actually look like then... well as follows:

// since the fields are stored as follows: A,B,C,D we know A ends up in

processor 0, B in 1, C in 2, D in 3 and so forth.

// thus... processor 0 can determine length of A by looking at row[0],

row[1], row[2], row[3], row[4], row[5], row[6].

// but how to know which count matches to who's field ?

// is the length of row 5 for A or B or C or D ? we don't know do we ?!

;)

// now we should be able to solve the problem... we know the bits belong

to the first few fields... cool ! ;) =D

// now we should be able to solve it easily... by only including a bit

from the stream if the multiplier is set to 1 ;) :)

// and now only thing left to do is shifting the bits into proper

position and or-ing them together ! ;)

Processor[0] :=

((Stream[RowOffset[0]+0]*FieldRowMultiplier[0,0]) shl 0) or

((Stream[RowOffset[1]+0]*FieldRowMultiplier[0,1]) shl 1) or

((Stream[RowOffset[2]+0]*FieldRowMultiplier[0,2]) shl 2) or

((Stream[RowOffset[3]+0]*FieldRowMultiplier[0,3]) shl 3) or

((Stream[RowOffset[4]+0]*FieldRowMultiplier[0,4]) shl 4) or

((Stream[RowOffset[5]+0]*FieldRowMultiplier[0,5]) shl 5);

Processor[1] :=

((Stream[RowOffset[0]+1]*FieldRowMultiplier[1,0]) shl 0) or

((Stream[RowOffset[1]+1]*FieldRowMultiplier[1,1]) shl 1) or

((Stream[RowOffset[2]+1]*FieldRowMultiplier[1,2]) shl 2) or

((Stream[RowOffset[3]+1]*FieldRowMultiplier[1,3]) shl 3) or

((Stream[RowOffset[4]+1]*FieldRowMultiplier[1,4]) shl 4) or

((Stream[RowOffset[5]+1]*FieldRowMultiplier[1,5]) shl 5);

Processor[2] :=

((Stream[RowOffset[0]+2]*FieldRowMultiplier[2,0]) shl 0) or

((Stream[RowOffset[1]+2]*FieldRowMultiplier[2,1]) shl 1) or

((Stream[RowOffset[2]+2]*FieldRowMultiplier[2,2]) shl 2) or

((Stream[RowOffset[3]+2]*FieldRowMultiplier[2,3]) shl 3) or

((Stream[RowOffset[4]+2]*FieldRowMultiplier[2,4]) shl 4) or

((Stream[RowOffset[5]+2]*FieldRowMultiplier[2,5]) shl 5);

Processor[3] :=

((Stream[RowOffset[0]+3]*FieldRowMultiplier[3,0]) shl 0) or

((Stream[RowOffset[1]+3]*FieldRowMultiplier[3,1]) shl 1) or

((Stream[RowOffset[2]+3]*FieldRowMultiplier[3,2]) shl 2) or

((Stream[RowOffset[3]+3]*FieldRowMultiplier[3,3]) shl 3) or

((Stream[RowOffset[4]+3]*FieldRowMultiplier[3,4]) shl 4) or

((Stream[RowOffset[5]+3]*FieldRowMultiplier[3,5]) shl 5);

// *** STILL TO DO (solved by extending array):

********************************

// *** ^^^ may have to look into potential out of range problem ^^^ ****

// *********************************************************************

// for now it seems ok, also as long as stream array has

+NumberOfProcessors scratch pad at end it may be ok ! ;) :)

// one last problem might remain... the row offset may go out of

range...

// we could either use a scratch pad or solve this in another way.

// I think it's best to leave it as as and perhaps make the stream a bit

larger or omething let's see what happens.

// print processor values.

writeln( 'Processor[0]: ', Processor[0] );

writeln( 'Processor[1]: ', Processor[1] );

writeln( 'Processor[2]: ', Processor[2] );

writeln( 'Processor[3]: ', Processor[3] );

writeln;

end;

begin

try

Main;

except

on E: Exception do

Writeln(E.ClassName, ': ', E.Message);

end;

ReadLn;

end.

// *** Begin of C/C++ Program ***

// ParallelDecodingCVersion.cpp : Defines the entry point for the console

application.

//

#include "stdafx.h"

// Begin of Dummy Decoder Example

const int MaxProcessorCount = 4;

{

if (Para < 0) Para = 0;

if (Para >= 1) Para = 1;

return Para;

}

int _tmain(int argc, _TCHAR* argv[])

{

// information stream, input

int Stream[21+(MaxProcessorCount-1)]; // add max processor count to

create a safe "padding" for reading so no out of bounds/range check errors

with arrays.

// bits representing fields of data

int a1,a2,a3,a4;

int b1,b2,b3;

int c1;

int d1,d2,d3,d4,d5,d6;

// output

int RowIndex;

int RowCount;

int RowLength[6];

int RowOffset[6];

int FieldRowMultiplier[4][6];
int _tmain(int argc, _TCHAR* argv[])

{

// information stream, input

int Stream[21+(MaxProcessorCount-1)]; // add max processor count to

create a safe "padding" for reading so no out of bounds/range check errors

with arrays.

// bits representing fields of data

int a1,a2,a3,a4;

int b1,b2,b3;

int c1;

int d1,d2,d3,d4,d5,d6;

// output

int RowIndex;

int RowCount;

int RowLength[6];

int RowOffset[6];

int DataOffset;

int FieldCount;

int FieldLength[4];

int Processor[4];

// debug fields

int FieldA;

int FieldB;

int FieldC;

int FieldD;

/*

a1 = 1; a2 = 1; a3 = 1; a4 = 1;

b1 = 1; b2 = 1; b3 = 1;

c1 = 1;

d1 = 1; d2 = 1; d3 = 1; d4 = 1; d5 = 1; d6 = 1;

*/
b1 = 1; b2 = 1; b3 = 1;

c1 = 1;

d1 = 1; d2 = 1; d3 = 1; d4 = 1; d5 = 1; d6 = 1;

// input test 2

a1 = 1; a2 = 0; a3 = 0; a4 = 1;

b1 = 1; b2 = 1; b3 = 0;

c1 = 1;

d1 = 1; d2 = 1; d3 = 0; d4 = 0; d5 = 1; d6 = 0;

information.

// should be 0 if negative, should be zero if zero, should be 1 if

positive so and will do the trick to constaint it.

// the -0, -1, -2, -3 represents subtracting the processor

number/identify from it... to allow parallel processing ! ;)

// contraint is wrong... hahga unnt.

FieldRowMultiplier[0][0] = Constrain(RowLength[0]-0); // shoudl be set

to one if larger.

FieldRowMultiplier[0][1] = Constrain(RowLength[1]-0);

FieldRowMultiplier[0][2] = Constrain(RowLength[2]-0);

FieldRowMultiplier[0][3] = Constrain(RowLength[3]-0);

FieldRowMultiplier[0][4] = Constrain(RowLength[4]-0);

FieldRowMultiplier[0][5] = Constrain(RowLength[5]-0);

// now second field may consume less if there are not enough bits.

FieldRowMultiplier[1][0] = Constrain(RowLength[0]-1);

FieldRowMultiplier[1][1] = Constrain(RowLength[1]-1);

FieldRowMultiplier[1][2] = Constrain(RowLength[2]-1);

FieldRowMultiplier[1][3] = Constrain(RowLength[3]-1);

FieldRowMultiplier[1][4] = Constrain(RowLength[4]-1);

FieldRowMultiplier[1][5] = Constrain(RowLength[5]-1);

// now third field may consume less if there are not enough bits.

FieldRowMultiplier[2][0] = Constrain(RowLength[0]-2);

FieldRowMultiplier[2][1] = Constrain(RowLength[1]-2);

FieldRowMultiplier[2][2] = Constrain(RowLength[2]-2);

FieldRowMultiplier[2][3] = Constrain(RowLength[3]-2);

FieldRowMultiplier[2][4] = Constrain(RowLength[4]-2);

FieldRowMultiplier[2][5] = Constrain(RowLength[5]-2);

// now fourth field may consume less if there are not enough bits.

FieldRowMultiplier[3][0] = Constrain(RowLength[0]-3);

FieldRowMultiplier[3][1] = Constrain(RowLength[1]-3);

FieldRowMultiplier[3][2] = Constrain(RowLength[2]-3);

FieldRowMultiplier[3][3] = Constrain(RowLength[3]-3);

FieldRowMultiplier[3][4] = Constrain(RowLength[4]-3);

FieldRowMultiplier[3][5] = Constrain(RowLength[5]-3);

// now compute field lengths

// not necessary to multiply anything just add them up ! ;) =D

FieldLength[0] =

FieldRowMultiplier[0][0] +

FieldRowMultiplier[0][1] +

FieldRowMultiplier[0][2] +

FieldRowMultiplier[0][3] +

FieldRowMultiplier[0][4] +

FieldRowMultiplier[0][5];

FieldLength[1] =

FieldRowMultiplier[1][0] +

FieldRowMultiplier[1][1] +

FieldRowMultiplier[1][2] +

FieldRowMultiplier[1][3] +

FieldRowMultiplier[1][4] +

FieldRowMultiplier[1][5];

FieldLength[2] =

FieldRowMultiplier[2][0] +

FieldRowMultiplier[2][1] +

FieldRowMultiplier[2][2] +

FieldRowMultiplier[2][3] +

FieldRowMultiplier[2][4] +

FieldRowMultiplier[2][5];

FieldLength[3] =

FieldRowMultiplier[3][0] +

FieldRowMultiplier[3][1] +

FieldRowMultiplier[3][2] +

FieldRowMultiplier[3][3] +

FieldRowMultiplier[3][4] +

FieldRowMultiplier[3][5];

// though the field multipliers could come in handy later to read the

bits ! nice ! ;) =D

// for each row the offset must be calculated this can be done serially

or by all processors for themselfes at the same time:

// row zero starts after the number of rows which is indicated by the

first stream value =D

// first determine data offset properly ! ;) :) 1 for the row count +

RowCount to skip over row lengths.

DataOffset = 1 + RowCount;

RowOffset[0] = DataOffset;

RowOffset[1] = RowOffset[0] + RowLength[0];

RowOffset[2] = RowOffset[1] + RowLength[1];

RowOffset[3] = RowOffset[2] + RowLength[2];

RowOffset[4] = RowOffset[3] + RowLength[3];

RowOffset[5] = RowOffset[4] + RowLength[4];

// now calculate and/ore detemrine length of each field
DataOffset = 1 + RowCount;

RowOffset[0] = DataOffset;

RowOffset[1] = RowOffset[0] + RowLength[0];

RowOffset[2] = RowOffset[1] + RowLength[1];

RowOffset[3] = RowOffset[2] + RowLength[2];

RowOffset[4] = RowOffset[3] + RowLength[3];

RowOffset[5] = RowOffset[4] + RowLength[4];

// first determine number of fields

// now that all row offsets are calculated it's possible to decode the

stream into each processor, by each processor.

// each processor knows it's own number/identitiy represented by the +0

+1 +2 +3 down below:

// the general idea is:

// row[] here is RowOffset[]

/*

Processor[0] :=

Stream[Row[0]+0]+Stream[Row[1]+0]+Stream[Row[2]+0]+Stream[Row[3]+0]+Stream[Row[4]+0]+Stream[Row[5]+0];

Processor[1] :=

Stream[Row[0]+1]+Stream[Row[1]+1]+Stream[Row[2]+1]+Stream[Row[3]+1]+Stream[Row[4]+1]+Stream[Row[5]+1];

Processor[2] :=

Stream[Row[0]+2]+Stream[Row[1]+2]+Stream[Row[2]+2]+Stream[Row[3]+2]+Stream[Row[4]+2]+Stream[Row[5]+2];

Processor[3] :=

Stream[Row[0]+3]+Stream[Row[1]+3]+Stream[Row[2]+3]+Stream[Row[3]+3]+Stream[Row[4]+3]+Stream[Row[5]+3];

*/

// however a processor should only include the bits of a stream if it's

within the field's length for that we need

// to compute each field length and here we have an oops ! ;) :) cannot

compute field length if multiple fields

// per row. or can we ?! ;) :)perhaps we can... we know that fields have

at least 1 bit because of row 0

// and we know which fields have 2 bits because of row 2 and so forth...

and since all fields

// are ditributed parallely... we don't actually need to know where

their bits are.... cause they all packed lol...

// we only need to know what max length is or something... though how

can we dan be sure that a field ends up

// wehere it needs to be... well we dont... a field can end up in any

processor.

// so how should it actually look like then... well as follows:

// since the fields are stored as follows: A,B,C,D we know A ends up in

processor 0, B in 1, C in 2, D in 3 and so forth.

// thus... processor 0 can determine length of A by looking at row[0],

row[1], row[2], row[3], row[4], row[5], row[6].

// but how to know which count matches to who's field ?

// is the length of row 5 for A or B or C or D ? we don't know do we ?!

;)

// now we should be able to solve the problem... we know the bits belong

to the first few fields... cool ! ;) =D

// now we should be able to solve it easily... by only including a bit

from the stream if the multiplier is set to 1 ;) :)

// and now only thing left to do is shifting the bits into proper

position and or-ing them together ! ;)

Processor[0] =

((Stream[RowOffset[0]+0]*FieldRowMultiplier[0][0]) << 0) |

((Stream[RowOffset[1]+0]*FieldRowMultiplier[0][1]) << 1) |

((Stream[RowOffset[2]+0]*FieldRowMultiplier[0][2]) << 2) |

((Stream[RowOffset[3]+0]*FieldRowMultiplier[0][3]) << 3) |

((Stream[RowOffset[4]+0]*FieldRowMultiplier[0][4]) << 4) |

((Stream[RowOffset[5]+0]*FieldRowMultiplier[0][5]) << 5);

Processor[1] =

((Stream[RowOffset[0]+1]*FieldRowMultiplier[1][0]) << 0) |

((Stream[RowOffset[1]+1]*FieldRowMultiplier[1][1]) << 1) |

((Stream[RowOffset[2]+1]*FieldRowMultiplier[1][2]) << 2) |

((Stream[RowOffset[3]+1]*FieldRowMultiplier[1][3]) << 3) |

((Stream[RowOffset[4]+1]*FieldRowMultiplier[1][4]) << 4) |

((Stream[RowOffset[5]+1]*FieldRowMultiplier[1][5]) << 5);

Processor[2] =

((Stream[RowOffset[0]+2]*FieldRowMultiplier[2][0]) << 0) |

((Stream[RowOffset[1]+2]*FieldRowMultiplier[2][1]) << 1) |

((Stream[RowOffset[2]+2]*FieldRowMultiplier[2][2]) << 2) |

((Stream[RowOffset[3]+2]*FieldRowMultiplier[2][3]) << 3) |

((Stream[RowOffset[4]+2]*FieldRowMultiplier[2][4]) << 4) |

((Stream[RowOffset[5]+2]*FieldRowMultiplier[2][5]) << 5);

Processor[3] =

((Stream[RowOffset[0]+3]*FieldRowMultiplier[3][0]) << 0) |

((Stream[RowOffset[1]+3]*FieldRowMultiplier[3][1]) << 1) |

((Stream[RowOffset[2]+3]*FieldRowMultiplier[3][2]) << 2) |

((Stream[RowOffset[3]+3]*FieldRowMultiplier[3][3]) << 3) |

((Stream[RowOffset[4]+3]*FieldRowMultiplier[3][4]) << 4) |

((Stream[RowOffset[5]+3]*FieldRowMultiplier[3][5]) << 5);

// *** STILL TO DO (solved by extending array):

********************************

// *** ^^^ may have to look into potential out of range problem ^^^ ****

// solved by adding padding for reading ;)

// *********************************************************************

// for now it seems ok, also as long as stream array has

+NumberOfProcessors scratch pad at end it may be ok ! ;) :)

// one last problem might remain... the row offset may go out of

range...

// we could either use a scratch pad|solve this in another way.

// I think it's best to leave it as as and perhaps make the stream a bit

larger or omething let's see what happens.

// print processor values.

printf( "Processor[0]: %d \n", Processor[0] );

printf( "Processor[1]: %d \n", Processor[1] );

printf( "Processor[2]: %d \n", Processor[2] );

printf( "Processor[3]: %d \n\n", Processor[3] );

return 0;

}

Bye,

Skybuck.

Reply all

Reply to author

Forward

0 new messages

Search

Clear search

Close search

Google apps

Main menu