Re: Dummy Decoder Example (was Re: Parallel decoding lesson for you.)

Skybuck Flying

unread,

Sep 20, 2015, 11:03:29 AM9/20/15

to

Since more people might be interested in this I will re-post this a second
time to include more newsgroups... those two threads will need to be
followed if all responses are to be seen ;)

Here is your dummy decoder example:

Let's turn this into a somewhat of a contest and ofcourse also teaching
lesson... now I am a true teacher... I provided you with most of the code.

The code you will need to write yourself/replace is indicated by // ***

Good luck and may the force be with you !

Ofcourse as announced earlier on 29 september 2015 I will reveal my parallel
solution to all of you !

So you have now 9 days to come up with your own solution before I publish
mine ! ;) =D

For those that missed the start/most of this thread, try googling for
comp.arch and "Parallel decoding lesson for you" by Skybuck.

// Begin of Dummy Decoder Example

program TestProgram;

{$APPTYPE CONSOLE}

{$R *.res}

uses
System.SysUtils;

function Constrain( Para : integer ) : integer;
begin
result := Para;
if Para < 0 then Result := 0;
if Para >= 1 then Result := 1;
end;

procedure Main;
// must put variables here otherwise won't show up in debugger.
var
// information stream, input
Stream : array[0..20] of integer;

// bits representing fields of data
a1,a2,a3,a4 : integer;
b1,b2,b3 : integer;
c1 : integer;
d1,d2,d3,d4,d5,d6 : integer;

// output
RowIndex : integer;

RowCount : integer;
RowLength : array[0..5] of integer;
RowOffset : array[0..5] of integer;

DataOffset : integer;

FieldCount : integer;
FieldLength : array[0..3] of integer;

Processor : array[0..3] of integer;

// debug fields
FieldA : integer;
FieldB : integer;
FieldC : integer;
FieldD : integer;
begin
a1 := 1; a2 := 1; a3:= 1; a4 := 1;
b1 := 1; b2 := 1; b3 := 1;
c1 := 1;
d1 := 1; d2 := 1; d3 := 1; d4 := 1; d5 := 1; d6 := 1;

// compute input fields to compare it later with output fields
FieldA := (a1) or (a2 shl 1) or (a3 shl 2) or (a4 shl 3);
FieldB := (b1) or (b2 shl 1) or (b3 shl 2);
FieldC := (c1);
FieldD := (d1) or (d2 shl 1) or (d3 shl 2) or (d4 shl 3) or (d5 shl 4) or
(d6 shl 5);

// print field values
writeln( 'FieldD: ', FieldD );
writeln( 'FieldA: ', FieldA );
writeln( 'FieldB: ', FieldB );
writeln( 'FieldC: ', FieldC );
writeln;

// number of rows
Stream[0] := 6;

// row lengths
Stream[1] := 4;
Stream[2] := 3;
Stream[3] := 3;
Stream[4] := 2;
Stream[5] := 1;
Stream[6] := 1;

// sorted information stream:
// d1a1b1c1d2a2b2d3a3b3d4a4d5d6

// data bits
Stream[7] := d1;
Stream[8] := a1;
Stream[9] := b1;
Stream[10] := c1;
Stream[11] := d2;
Stream[12] := a2;
Stream[13] := b2;
Stream[14] := d3;
Stream[15] := a3;
Stream[16] := b3;
Stream[17] := d4;
Stream[18] := a4;
Stream[19] := d5;
Stream[20] := d6;

// now the decoding algorithm:

// determine number of rows
RowCount := Stream[0];

// extract row lengths
RowLength[0] := Stream[1];
RowLength[1] := Stream[2];
RowLength[2] := Stream[3];
RowLength[3] := Stream[4];
RowLength[4] := Stream[5];
RowLength[5] := Stream[6];

// determine field count
FieldCount := RowLength[0]; // row[0] indicates number of fields.

// I will help out a bit... by leaving this code in ! ;) seems somewhat
obvious ;)
// first determine data offset properly ! ;) :) 1 for the row count +
RowCount to skip over row lengths.
DataOffset := 1 + RowCount;

RowOffset[0] := DataOffset;
RowOffset[1] := RowOffset[0] + RowLength[0];
RowOffset[2] := RowOffset[1] + RowLength[1];
RowOffset[3] := RowOffset[2] + RowLength[2];
RowOffset[4] := RowOffset[3] + RowLength[3];
RowOffset[5] := RowOffset[4] + RowLength[4];

// some how the data bits from the stream needs to end up in these 4
processors so that it produces the same values
// as below:
// fields may be processed in a different order though.

// *** You will need to replace this code with your own code... and
preferably it should be parallel, fast and somewhat general/scalable. ***
Processor[0] := FieldD;
Processor[1] := FieldA;
Processor[2] := FieldB;
Processor[3] := FieldC;

// print processor values.
writeln( 'Processor[0]: ', Processor[0] );
writeln( 'Processor[1]: ', Processor[1] );
writeln( 'Processor[2]: ', Processor[2] );
writeln( 'Processor[3]: ', Processor[3] );
writeln;
end;

begin
try
Main;
except
on E: Exception do
Writeln(E.ClassName, ': ', E.Message);
end;
ReadLn;
end.

// End of Dummy Decoder Example

Oh one last thing when you think you have a valid solution and/or need to
test more,

replace the initialization of the test data bits with this initialization
for example:

a1 := 1; a2 := 0; a3:= 0; a4 := 1;
b1 := 1; b2 := 1; b3 := 0;
c1 := 1;
d1 := 1; d2 := 1; d3 := 0; d4 := 0; d5 := 1; d6 := 0;

in the original posting (the one before this they were all set to 1).

This helped a bit during the development just to see if it was reading
anything at all at the somewhat correct places.

But this may later put you off if the values are not matching input vs
output.

So this test data in this posting is better to spot any inconsistencies/bugs
in your solution(s).

Bye,
Skybuck.

Skybuck Flying

unread,

Sep 20, 2015, 1:08:39 PM9/20/15

to

Or point your newsgroup reader to newsgroup: comp.arch

Then see thread/topic: "Parallel decoding lesson for you" by Skybuck.

Bye,
Skybuck.

Skybuck Flying

unread,

Sep 21, 2015, 2:55:22 AM9/21/15

to

Just to be clear on this, the code you have to write doesn't need to be
truely parallel.

It must be parallel in potential, so it should be able to execute
independenlty from each other and out of order.

Bye,
Skybuck.

Skybuck Flying

unread,

Sep 23, 2015, 5:26:07 AM9/23/15

to

The example may be modified as much as needed.

For now my solution needs a little reading pad to avoid costly mods or
branches or whatever.

I think this is a nice speedy solution, so code may be modified as follows:

const
MaxProcessorCount = 4;

var
// information stream, input

Stream : array[0..20+(MaxProcessorCount-1)] of integer; // add max processor
count to create a safe "padding" for reading so no out of bounds/range check
errors with arrays.

Bye,
Sybuck.

Skybuck Flying

unread,

Sep 23, 2015, 5:39:38 AM9/23/15

to

Here is the C version of the example in case your Delphi-to-C skills are not
so great or you lazy lol =D:

// ParallelDecodingCVersion.cpp : Defines the entry point for the console
application.
//

#include "stdafx.h"

// Begin of Dummy Decoder Example

const int MaxProcessorCount = 4;

int _tmain(int argc, _TCHAR* argv[])
{
// information stream, input
int Stream[21+(MaxProcessorCount-1)]; // add max processor count to create a

safe "padding" for reading so no out of bounds/range check errors with
arrays.

// bits representing fields of data

int a1,a2,a3,a4;
int b1,b2,b3;
int c1;
int d1,d2,d3,d4,d5,d6;

// output
int RowIndex;

int RowCount;
int RowLength[6];
int RowOffset[6];

int DataOffset;

int FieldCount;
int FieldLength;

int Processor[4];

// debug fields
int FieldA;
int FieldB;
int FieldC;
int FieldD;

a1 = 1; a2 = 1; a3 = 1; a4 = 1;
b1 = 1; b2 = 1; b3 = 1;
c1 = 1;
d1 = 1; d2 = 1; d3 = 1; d4 = 1; d5 = 1; d6 = 1;

// compute input fields to compare it later with output fields

FieldA = (a1) | (a2 << 1) | (a3 << 2) | (a4 << 3);
FieldB = (b1) | (b2 << 1) | (b3 << 2);
FieldC = (c1);
FieldD = (d1) | (d2 << 1) | (d3 << 2) | (d4 << 3) | (d5 << 4) | (d6 << 5);

// print field values
printf( "FieldD: %d \n", FieldD );
printf( "FieldA: %d \n", FieldA );
printf( "FieldB: %d \n", FieldB );
printf( "FieldC: %d \n\n", FieldC );

// RowCount to skip over row lengths.

DataOffset = 1 + RowCount;

RowOffset[0] = DataOffset;
RowOffset[1] = RowOffset[0] + RowLength[0];
RowOffset[2] = RowOffset[1] + RowLength[1];
RowOffset[3] = RowOffset[2] + RowLength[2];
RowOffset[4] = RowOffset[3] + RowLength[3];
RowOffset[5] = RowOffset[4] + RowLength[4];

// some how the data bits from the stream needs to end up in these 4

// processors so that it produces the same values

// as below:
// fields may be processed in a different order though.

// *** You will need to replace this code with your own code... and

// preferably it should be parallel, fast and somewhat general/scalable. ***

Processor[0] = FieldD;
Processor[1] = FieldA;
Processor[2] = FieldB;
Processor[3] = FieldC;

// print processor values.

printf( "Processor[0]: %d \n", Processor[0] );
printf( "Processor[1]: %d \n", Processor[1] );
printf( "Processor[2]: %d \n", Processor[2] );
printf( "Processor[3]: %d \n\n", Processor[3] );

return 0;

}

// End of Dummy Decoder Example

Bye,
Skybuck :)

Skybuck Flying

unread,

Sep 23, 2015, 5:43:40 AM9/23/15

to

Also here is test set 2 to test input values:

// input test 2, c version

a1 = 1; a2 = 0; a3 = 0; a4 = 1;
b1 = 1; b2 = 1; b3 = 0;
c1 = 1;
d1 = 1; d2 = 1; d3 = 0; d4 = 0; d5 = 1; d6 = 0;

Bye,
Skybuck.

Skybuck Flying

unread,

Sep 29, 2015, 3:09:07 AM9/29/15

to

Hello,

(It is now 29 september 2015)

As promised here is Skybuck's Parallel Universal Code demonstration program.

This posting contains a Delphi and C/C++ version for you to learn from.

(I was kinda thinking of adding some kind of case statement/array and out of
order execution/randomization to proof that the lines can be executed in any
order, though I didn't come around to that (yet) been playing World of
Warships =D at 1 fps to 30 fps lol)
(If anybody doubt that these lines can be executed out of order I may
eventually add such a feature as welll.. maybe I will do it even for the fun
of it ! ;))
(I was also maybe thinking of trying to through it all into a nice
Tprocessor class with an actual thread to show it off as well... didn't come
around to that yet either ;):))
(However if you clearly examine the indexes used you will discover that
there is no overlap, all indexes are uniquely read and such... no
write-read-write-read-sequential stuff going on... it can all be read in
parallel... except for building rowoffset array).

// ** Begin of Delphi Program ***

program TestProgram;

{$APPTYPE CONSOLE}

{$R *.res}

uses
System.SysUtils;

{

Skybuck's Parallel Universal Code, Demonstration program

version 0.01 created on 19 september 2015

This demonstration program will illustrate how to decode fields of data in
parallel.

For this demonstration program bit fiddling will be avoided by allowing 1
bit per array index to illustrate the general idea.

Also lengths may be stored by 1 integer per array index. In a real world
scenerio it would be Skybuck Universally Encoded.

Skybuck's Parallel Universal Code, Design Document:

version 1 created on 16 september 2015 (after seeing another mentioning of
IBM's processor MIL which makes me sick, as if decoding fields in parallel
is hard ! LOL ;) :))

(I also wrote a little introductionary question for it see other file for it
and I even learned something from it: Parallel decoding lesson for you.txt)

All bits of the fields are split up in such a way that the first bit of each
field is right next to each other, the second bit of each field is also next
to each other, and so forth.

Conceptually view:

First currently situation

Field A consists out of a1a2a3a4 (4 bits)
Field B consists out of b1b2b3 (3 bits)
Field C consists out of c1 (1 bit)
Field D consists out of d1d2d3d4d5d6 ( 6 bits)

These bits are stored as follows:

"First row":
a1b1c1d1

"Second row":
a2b2d2

"Third row":
a3b3d3

"Fourth row":
a4d4

"Fiveth row":
d5

"Sixth row":
d6

These rows will be stores sequentially as follows:
a1b1c1d1a2b2d2a3b3d4a4d4d5d6

Now the question is: How does a processor know where each row begins ? and
how many bits of each field there is, the answers are given below:

The bit length of each row is stored preemptively/prefixed:

"First row": 4
"Second row": 3
"Third row": 3
"Fourth row": 2
"Fiveth row": 1
"Sixth row" : 1

Now for each row their offset can be computed:

First row starts at 0,
Second row starts at 0 + 4 = 4
Third row starts at 0 + 4 + 3 = 7
Fouth row starts at 0 + 4 + 3 + 3 = 10
Fiveth row starts at 0 + 4 + 3 + 3 + 2 = 12
Sixth row starts at 0 + 4+ 3 + 3+ 2 + 1 = 13

Let's check if this is true:

0 1 2 3 4 5 6 7 8 9 10 11 12 13
a1 b1 c1 d1 a2 b2 d2 a3 b3 d3 a4 d4 d5 d6

Bingo, all match.

The processor can compute the offsets of each row.

And thus the processor can reach each field in parallel as follows:

Read field A bit 0 at offset 0
Read field B bit 0 at offset 4
Read field C bit 0 at offset 7
Read field D bit 0 at offset 10

Now the question is how can the processor know when to stop reading ?

A marker/meta bit could be used like described in Skybuck's Universal Code
version 1, which indicates if the field continues or stops.

These bits can be stored in the same way as the data bits above.

And thus for each data bit, a meta bit can be read as well.

This way the processor knows that C2 does not exist and can stop reading
field C

For huffman codes that would not even be required, since the processor can
see in the huffman tree when it reaches a leave/end node and then it will
know the end of C was reached.

So marker/beta bits can be avoided by using Huffman codes ! Pretty neeto !
;) =D

However huffman has a drawback that some fields might get large, and may not
be suited for rare information and modifications and so forth ! ;)

The last thing to do to make this usuable is to include another prefix field
which indicates how many prefixes there are.

So final stream will look something like:

[Number of Fields][Set of Field Lengths][Set of Field Data Bits
Intermixed/Parallel]

First field can be universally coded.
Second set of fields can be universally coded.
Third set of field contains data bits intermixed/in parallel.

Additional:

The problem can be solved by sorting the fields from largest to smallest
field, so new stream looks like:
(sorting from smallest to largest would be possible too... but code below
assumes first field is largest so it uses
largest to smallest sorting solution which is also applied to the stream A,
so stream A below is sorted that way)

Stream A: 433211d1a1b1c1d2a2b2d3a3b3d4a4d5d6

Bye,
Skybuck.

}

function Constrain( Para : integer ) : integer;
begin
result := Para;
if Para < 0 then Result := 0;
if Para >= 1 then Result := 1;
end;

procedure Main;
// must put variables here otherwise won't show up in debugger.

const
MaxProcessorCount = 4;
var
// information stream, input

Stream : array[0..20+(MaxProcessorCount-1)] of integer; // add max

processor count to create a safe "padding" for reading so no out of
bounds/range check errors with arrays.

// bits representing fields of data

a1,a2,a3,a4 : integer;
b1,b2,b3 : integer;
c1 : integer;
d1,d2,d3,d4,d5,d6 : integer;

// output
RowIndex : integer;

RowCount : integer;
RowLength : array[0..5] of integer;
RowOffset : array[0..5] of integer;

FieldRowMultiplier : array[0..3,0..5] of integer;

DataOffset : integer;

FieldCount : integer;
FieldLength : array[0..3] of integer;

Processor : array[0..3] of integer;

// debug fields
FieldA : integer;
FieldB : integer;
FieldC : integer;
FieldD : integer;
begin

a1 := 1; a2 := 1; a3:= 1; a4 := 1;
b1 := 1; b2 := 1; b3 := 1;
c1 := 1;
d1 := 1; d2 := 1; d3 := 1; d4 := 1; d5 := 1; d6 := 1;

// compute input fields to compare it later with output fields

FieldA := (a1) or (a2 shl 1) or (a3 shl 2) or (a4 shl 3);

FieldB := (b1) or (b2 shl 1) or (b3 shl 2);
FieldC := (c1);

FieldD := (d1) or (d2 shl 1) or (d3 shl 2) or (d4 shl 3) or (d5 shl 4)

or (d6 shl 5);

// print field values

writeln( 'FieldA: ', FieldA );
writeln( 'FieldB: ', FieldB );
writeln( 'FieldC: ', FieldC );
writeln( 'FieldD: ', FieldD );

writeln;

// let's assume first field is largest so it always consumes all row
information.
// should be 0 if negative, should be zero if zero, should be 1 if
positive so and will do the trick to constaint it.
// the -0, -1, -2, -3 represents subtracting the processor
number/identify from it... to allow parallel processing ! ;)

// contraint is wrong... hahga unnt.
FieldRowMultiplier[0,0] := Constrain(RowLength[0]-0); // shoudl be set
to one if larger.
FieldRowMultiplier[0,1] := Constrain(RowLength[1]-0);
FieldRowMultiplier[0,2] := Constrain(RowLength[2]-0);
FieldRowMultiplier[0,3] := Constrain(RowLength[3]-0);
FieldRowMultiplier[0,4] := Constrain(RowLength[4]-0);
FieldRowMultiplier[0,5] := Constrain(RowLength[5]-0);

// now second field may consume less if there are not enough bits.
FieldRowMultiplier[1,0] := Constrain(RowLength[0]-1);
FieldRowMultiplier[1,1] := Constrain(RowLength[1]-1);
FieldRowMultiplier[1,2] := Constrain(RowLength[2]-1);
FieldRowMultiplier[1,3] := Constrain(RowLength[3]-1);
FieldRowMultiplier[1,4] := Constrain(RowLength[4]-1);
FieldRowMultiplier[1,5] := Constrain(RowLength[5]-1);

// now third field may consume less if there are not enough bits.
FieldRowMultiplier[2,0] := Constrain(RowLength[0]-2);
FieldRowMultiplier[2,1] := Constrain(RowLength[1]-2);
FieldRowMultiplier[2,2] := Constrain(RowLength[2]-2);
FieldRowMultiplier[2,3] := Constrain(RowLength[3]-2);
FieldRowMultiplier[2,4] := Constrain(RowLength[4]-2);
FieldRowMultiplier[2,5] := Constrain(RowLength[5]-2);

// now fourth field may consume less if there are not enough bits.
FieldRowMultiplier[3,0] := Constrain(RowLength[0]-3);
FieldRowMultiplier[3,1] := Constrain(RowLength[1]-3);
FieldRowMultiplier[3,2] := Constrain(RowLength[2]-3);
FieldRowMultiplier[3,3] := Constrain(RowLength[3]-3);
FieldRowMultiplier[3,4] := Constrain(RowLength[4]-3);
FieldRowMultiplier[3,5] := Constrain(RowLength[5]-3);

// now compute field lengths

// not necessary to multiply anything just add them up ! ;) =D
FieldLength[0] :=
FieldRowMultiplier[0,0] +
FieldRowMultiplier[0,1] +
FieldRowMultiplier[0,2] +
FieldRowMultiplier[0,3] +
FieldRowMultiplier[0,4] +
FieldRowMultiplier[0,5];

FieldLength[1] :=
FieldRowMultiplier[1,0] +
FieldRowMultiplier[1,1] +
FieldRowMultiplier[1,2] +
FieldRowMultiplier[1,3] +
FieldRowMultiplier[1,4] +
FieldRowMultiplier[1,5];

FieldLength[2] :=
FieldRowMultiplier[2,0] +
FieldRowMultiplier[2,1] +
FieldRowMultiplier[2,2] +
FieldRowMultiplier[2,3] +
FieldRowMultiplier[2,4] +
FieldRowMultiplier[2,5];

FieldLength[3] :=
FieldRowMultiplier[3,0] +
FieldRowMultiplier[3,1] +
FieldRowMultiplier[3,2] +
FieldRowMultiplier[3,3] +
FieldRowMultiplier[3,4] +
FieldRowMultiplier[3,5];

// though the field multipliers could come in handy later to read the
bits ! nice ! ;) =D

// for each row the offset must be calculated this can be done serially
or by all processors for themselfes at the same time:
// row zero starts after the number of rows which is indicated by the
first stream value =D

// first determine data offset properly ! ;) :) 1 for the row count +

RowCount to skip over row lengths.
DataOffset := 1 + RowCount;

RowOffset[0] := DataOffset;
RowOffset[1] := RowOffset[0] + RowLength[0];
RowOffset[2] := RowOffset[1] + RowLength[1];
RowOffset[3] := RowOffset[2] + RowLength[2];
RowOffset[4] := RowOffset[3] + RowLength[3];
RowOffset[5] := RowOffset[4] + RowLength[4];

// now calculate and/ore detemrine length of each field

// first determine number of fields

// now that all row offsets are calculated it's possible to decode the
stream into each processor, by each processor.

// each processor knows it's own number/identitiy represented by the +0
+1 +2 +3 down below:
// the general idea is:
// row[] here is RowOffset[]
{
Processor[0] :=
Stream[Row[0]+0]+Stream[Row[1]+0]+Stream[Row[2]+0]+Stream[Row[3]+0]+Stream[Row[4]+0]+Stream[Row[5]+0];
Processor[1] :=
Stream[Row[0]+1]+Stream[Row[1]+1]+Stream[Row[2]+1]+Stream[Row[3]+1]+Stream[Row[4]+1]+Stream[Row[5]+1];
Processor[2] :=
Stream[Row[0]+2]+Stream[Row[1]+2]+Stream[Row[2]+2]+Stream[Row[3]+2]+Stream[Row[4]+2]+Stream[Row[5]+2];
Processor[3] :=
Stream[Row[0]+3]+Stream[Row[1]+3]+Stream[Row[2]+3]+Stream[Row[3]+3]+Stream[Row[4]+3]+Stream[Row[5]+3];
}
// however a processor should only include the bits of a stream if it's
within the field's length for that we need
// to compute each field length and here we have an oops ! ;) :) cannot
compute field length if multiple fields
// per row. or can we ?! ;) :)perhaps we can... we know that fields have
at least 1 bit because of row 0
// and we know which fields have 2 bits because of row 2 and so forth...
and since all fields
// are ditributed parallely... we don't actually need to know where
their bits are.... cause they all packed lol...
// we only need to know what max length is or something... though how
can we dan be sure that a field ends up
// wehere it needs to be... well we dont... a field can end up in any
processor.

// so how should it actually look like then... well as follows:

// since the fields are stored as follows: A,B,C,D we know A ends up in
processor 0, B in 1, C in 2, D in 3 and so forth.
// thus... processor 0 can determine length of A by looking at row[0],
row[1], row[2], row[3], row[4], row[5], row[6].
// but how to know which count matches to who's field ?
// is the length of row 5 for A or B or C or D ? we don't know do we ?!
;)

// now we should be able to solve the problem... we know the bits belong
to the first few fields... cool ! ;) =D

// now we should be able to solve it easily... by only including a bit
from the stream if the multiplier is set to 1 ;) :)
// and now only thing left to do is shifting the bits into proper
position and or-ing them together ! ;)

Processor[0] :=
((Stream[RowOffset[0]+0]*FieldRowMultiplier[0,0]) shl 0) or
((Stream[RowOffset[1]+0]*FieldRowMultiplier[0,1]) shl 1) or
((Stream[RowOffset[2]+0]*FieldRowMultiplier[0,2]) shl 2) or
((Stream[RowOffset[3]+0]*FieldRowMultiplier[0,3]) shl 3) or
((Stream[RowOffset[4]+0]*FieldRowMultiplier[0,4]) shl 4) or
((Stream[RowOffset[5]+0]*FieldRowMultiplier[0,5]) shl 5);

Processor[1] :=
((Stream[RowOffset[0]+1]*FieldRowMultiplier[1,0]) shl 0) or
((Stream[RowOffset[1]+1]*FieldRowMultiplier[1,1]) shl 1) or
((Stream[RowOffset[2]+1]*FieldRowMultiplier[1,2]) shl 2) or
((Stream[RowOffset[3]+1]*FieldRowMultiplier[1,3]) shl 3) or
((Stream[RowOffset[4]+1]*FieldRowMultiplier[1,4]) shl 4) or
((Stream[RowOffset[5]+1]*FieldRowMultiplier[1,5]) shl 5);

Processor[2] :=
((Stream[RowOffset[0]+2]*FieldRowMultiplier[2,0]) shl 0) or
((Stream[RowOffset[1]+2]*FieldRowMultiplier[2,1]) shl 1) or
((Stream[RowOffset[2]+2]*FieldRowMultiplier[2,2]) shl 2) or
((Stream[RowOffset[3]+2]*FieldRowMultiplier[2,3]) shl 3) or
((Stream[RowOffset[4]+2]*FieldRowMultiplier[2,4]) shl 4) or
((Stream[RowOffset[5]+2]*FieldRowMultiplier[2,5]) shl 5);

Processor[3] :=
((Stream[RowOffset[0]+3]*FieldRowMultiplier[3,0]) shl 0) or
((Stream[RowOffset[1]+3]*FieldRowMultiplier[3,1]) shl 1) or
((Stream[RowOffset[2]+3]*FieldRowMultiplier[3,2]) shl 2) or
((Stream[RowOffset[3]+3]*FieldRowMultiplier[3,3]) shl 3) or
((Stream[RowOffset[4]+3]*FieldRowMultiplier[3,4]) shl 4) or
((Stream[RowOffset[5]+3]*FieldRowMultiplier[3,5]) shl 5);

// *** STILL TO DO (solved by extending array):
********************************
// *** ^^^ may have to look into potential out of range problem ^^^ ****
// *********************************************************************
// for now it seems ok, also as long as stream array has
+NumberOfProcessors scratch pad at end it may be ok ! ;) :)

// one last problem might remain... the row offset may go out of
range...
// we could either use a scratch pad or solve this in another way.
// I think it's best to leave it as as and perhaps make the stream a bit
larger or omething let's see what happens.

// print processor values.
writeln( 'Processor[0]: ', Processor[0] );
writeln( 'Processor[1]: ', Processor[1] );
writeln( 'Processor[2]: ', Processor[2] );
writeln( 'Processor[3]: ', Processor[3] );
writeln;
end;

begin
try
Main;
except
on E: Exception do
Writeln(E.ClassName, ': ', E.Message);
end;
ReadLn;
end.

// *** End of Delphi Program ***

// *** Begin of C/C++ Program ***

// ParallelDecodingCVersion.cpp : Defines the entry point for the console
application.
//

// Full C version 0.01 created on 23 september 2015 by Skybuck Flying =D

#include "stdafx.h"

// Begin of Dummy Decoder Example

const int MaxProcessorCount = 4;

int Constrain( int Para )
{
if (Para < 0) Para = 0;
if (Para >= 1) Para = 1;
return Para;

}

int _tmain(int argc, _TCHAR* argv[])
{
// information stream, input
int Stream[21+(MaxProcessorCount-1)]; // add max processor count to
create a safe "padding" for reading so no out of bounds/range check errors
with arrays.

// bits representing fields of data
int a1,a2,a3,a4;
int b1,b2,b3;
int c1;
int d1,d2,d3,d4,d5,d6;

// output
int RowIndex;

int RowCount;
int RowLength[6];
int RowOffset[6];

int FieldRowMultiplier[4][6];

int DataOffset;

int FieldCount;
int FieldLength[4];

int Processor[4];

// debug fields
int FieldA;
int FieldB;
int FieldC;
int FieldD;

// input test 1
/*

a1 = 1; a2 = 1; a3 = 1; a4 = 1;
b1 = 1; b2 = 1; b3 = 1;
c1 = 1;
d1 = 1; d2 = 1; d3 = 1; d4 = 1; d5 = 1; d6 = 1;

*/

// input test 2
a1 = 1; a2 = 0; a3 = 0; a4 = 1;
b1 = 1; b2 = 1; b3 = 0;
c1 = 1;
d1 = 1; d2 = 1; d3 = 0; d4 = 0; d5 = 1; d6 = 0;

// let's assume first field is largest so it always consumes all row
information.
// should be 0 if negative, should be zero if zero, should be 1 if
positive so and will do the trick to constaint it.
// the -0, -1, -2, -3 represents subtracting the processor
number/identify from it... to allow parallel processing ! ;)

// contraint is wrong... hahga unnt.
FieldRowMultiplier[0][0] = Constrain(RowLength[0]-0); // shoudl be set
to one if larger.
FieldRowMultiplier[0][1] = Constrain(RowLength[1]-0);
FieldRowMultiplier[0][2] = Constrain(RowLength[2]-0);
FieldRowMultiplier[0][3] = Constrain(RowLength[3]-0);
FieldRowMultiplier[0][4] = Constrain(RowLength[4]-0);
FieldRowMultiplier[0][5] = Constrain(RowLength[5]-0);

// now second field may consume less if there are not enough bits.
FieldRowMultiplier[1][0] = Constrain(RowLength[0]-1);
FieldRowMultiplier[1][1] = Constrain(RowLength[1]-1);
FieldRowMultiplier[1][2] = Constrain(RowLength[2]-1);
FieldRowMultiplier[1][3] = Constrain(RowLength[3]-1);
FieldRowMultiplier[1][4] = Constrain(RowLength[4]-1);
FieldRowMultiplier[1][5] = Constrain(RowLength[5]-1);

// now third field may consume less if there are not enough bits.
FieldRowMultiplier[2][0] = Constrain(RowLength[0]-2);
FieldRowMultiplier[2][1] = Constrain(RowLength[1]-2);
FieldRowMultiplier[2][2] = Constrain(RowLength[2]-2);
FieldRowMultiplier[2][3] = Constrain(RowLength[3]-2);
FieldRowMultiplier[2][4] = Constrain(RowLength[4]-2);
FieldRowMultiplier[2][5] = Constrain(RowLength[5]-2);

// now fourth field may consume less if there are not enough bits.
FieldRowMultiplier[3][0] = Constrain(RowLength[0]-3);
FieldRowMultiplier[3][1] = Constrain(RowLength[1]-3);
FieldRowMultiplier[3][2] = Constrain(RowLength[2]-3);
FieldRowMultiplier[3][3] = Constrain(RowLength[3]-3);
FieldRowMultiplier[3][4] = Constrain(RowLength[4]-3);
FieldRowMultiplier[3][5] = Constrain(RowLength[5]-3);

// now compute field lengths

// not necessary to multiply anything just add them up ! ;) =D
FieldLength[0] =
FieldRowMultiplier[0][0] +
FieldRowMultiplier[0][1] +
FieldRowMultiplier[0][2] +
FieldRowMultiplier[0][3] +
FieldRowMultiplier[0][4] +
FieldRowMultiplier[0][5];

FieldLength[1] =
FieldRowMultiplier[1][0] +
FieldRowMultiplier[1][1] +
FieldRowMultiplier[1][2] +
FieldRowMultiplier[1][3] +
FieldRowMultiplier[1][4] +
FieldRowMultiplier[1][5];

FieldLength[2] =
FieldRowMultiplier[2][0] +
FieldRowMultiplier[2][1] +
FieldRowMultiplier[2][2] +
FieldRowMultiplier[2][3] +
FieldRowMultiplier[2][4] +
FieldRowMultiplier[2][5];

FieldLength[3] =
FieldRowMultiplier[3][0] +
FieldRowMultiplier[3][1] +
FieldRowMultiplier[3][2] +
FieldRowMultiplier[3][3] +
FieldRowMultiplier[3][4] +
FieldRowMultiplier[3][5];

// though the field multipliers could come in handy later to read the
bits ! nice ! ;) =D

// for each row the offset must be calculated this can be done serially
or by all processors for themselfes at the same time:
// row zero starts after the number of rows which is indicated by the
first stream value =D

// first determine data offset properly ! ;) :) 1 for the row count +

RowCount to skip over row lengths.
DataOffset = 1 + RowCount;

RowOffset[0] = DataOffset;
RowOffset[1] = RowOffset[0] + RowLength[0];
RowOffset[2] = RowOffset[1] + RowLength[1];
RowOffset[3] = RowOffset[2] + RowLength[2];
RowOffset[4] = RowOffset[3] + RowLength[3];
RowOffset[5] = RowOffset[4] + RowLength[4];

// now calculate and/ore detemrine length of each field

// first determine number of fields

// now that all row offsets are calculated it's possible to decode the
stream into each processor, by each processor.

// each processor knows it's own number/identitiy represented by the +0
+1 +2 +3 down below:
// the general idea is:
// row[] here is RowOffset[]
/*
Processor[0] :=
Stream[Row[0]+0]+Stream[Row[1]+0]+Stream[Row[2]+0]+Stream[Row[3]+0]+Stream[Row[4]+0]+Stream[Row[5]+0];
Processor[1] :=
Stream[Row[0]+1]+Stream[Row[1]+1]+Stream[Row[2]+1]+Stream[Row[3]+1]+Stream[Row[4]+1]+Stream[Row[5]+1];
Processor[2] :=
Stream[Row[0]+2]+Stream[Row[1]+2]+Stream[Row[2]+2]+Stream[Row[3]+2]+Stream[Row[4]+2]+Stream[Row[5]+2];
Processor[3] :=
Stream[Row[0]+3]+Stream[Row[1]+3]+Stream[Row[2]+3]+Stream[Row[3]+3]+Stream[Row[4]+3]+Stream[Row[5]+3];
*/
// however a processor should only include the bits of a stream if it's
within the field's length for that we need
// to compute each field length and here we have an oops ! ;) :) cannot
compute field length if multiple fields
// per row. or can we ?! ;) :)perhaps we can... we know that fields have
at least 1 bit because of row 0
// and we know which fields have 2 bits because of row 2 and so forth...
and since all fields
// are ditributed parallely... we don't actually need to know where
their bits are.... cause they all packed lol...
// we only need to know what max length is or something... though how
can we dan be sure that a field ends up
// wehere it needs to be... well we dont... a field can end up in any
processor.

// so how should it actually look like then... well as follows:

// since the fields are stored as follows: A,B,C,D we know A ends up in
processor 0, B in 1, C in 2, D in 3 and so forth.
// thus... processor 0 can determine length of A by looking at row[0],
row[1], row[2], row[3], row[4], row[5], row[6].
// but how to know which count matches to who's field ?
// is the length of row 5 for A or B or C or D ? we don't know do we ?!
;)

// now we should be able to solve the problem... we know the bits belong
to the first few fields... cool ! ;) =D

// now we should be able to solve it easily... by only including a bit
from the stream if the multiplier is set to 1 ;) :)
// and now only thing left to do is shifting the bits into proper
position and or-ing them together ! ;)

Processor[0] =
((Stream[RowOffset[0]+0]*FieldRowMultiplier[0][0]) << 0) |
((Stream[RowOffset[1]+0]*FieldRowMultiplier[0][1]) << 1) |
((Stream[RowOffset[2]+0]*FieldRowMultiplier[0][2]) << 2) |
((Stream[RowOffset[3]+0]*FieldRowMultiplier[0][3]) << 3) |
((Stream[RowOffset[4]+0]*FieldRowMultiplier[0][4]) << 4) |
((Stream[RowOffset[5]+0]*FieldRowMultiplier[0][5]) << 5);

Processor[1] =
((Stream[RowOffset[0]+1]*FieldRowMultiplier[1][0]) << 0) |
((Stream[RowOffset[1]+1]*FieldRowMultiplier[1][1]) << 1) |
((Stream[RowOffset[2]+1]*FieldRowMultiplier[1][2]) << 2) |
((Stream[RowOffset[3]+1]*FieldRowMultiplier[1][3]) << 3) |
((Stream[RowOffset[4]+1]*FieldRowMultiplier[1][4]) << 4) |
((Stream[RowOffset[5]+1]*FieldRowMultiplier[1][5]) << 5);

Processor[2] =
((Stream[RowOffset[0]+2]*FieldRowMultiplier[2][0]) << 0) |
((Stream[RowOffset[1]+2]*FieldRowMultiplier[2][1]) << 1) |
((Stream[RowOffset[2]+2]*FieldRowMultiplier[2][2]) << 2) |
((Stream[RowOffset[3]+2]*FieldRowMultiplier[2][3]) << 3) |
((Stream[RowOffset[4]+2]*FieldRowMultiplier[2][4]) << 4) |
((Stream[RowOffset[5]+2]*FieldRowMultiplier[2][5]) << 5);

Processor[3] =
((Stream[RowOffset[0]+3]*FieldRowMultiplier[3][0]) << 0) |
((Stream[RowOffset[1]+3]*FieldRowMultiplier[3][1]) << 1) |
((Stream[RowOffset[2]+3]*FieldRowMultiplier[3][2]) << 2) |
((Stream[RowOffset[3]+3]*FieldRowMultiplier[3][3]) << 3) |
((Stream[RowOffset[4]+3]*FieldRowMultiplier[3][4]) << 4) |
((Stream[RowOffset[5]+3]*FieldRowMultiplier[3][5]) << 5);

// *** STILL TO DO (solved by extending array):
********************************
// *** ^^^ may have to look into potential out of range problem ^^^ ****
// solved by adding padding for reading ;)
// *********************************************************************
// for now it seems ok, also as long as stream array has
+NumberOfProcessors scratch pad at end it may be ok ! ;) :)

// one last problem might remain... the row offset may go out of
range...
// we could either use a scratch pad|solve this in another way.
// I think it's best to leave it as as and perhaps make the stream a bit
larger or omething let's see what happens.

// print processor values.
printf( "Processor[0]: %d \n", Processor[0] );
printf( "Processor[1]: %d \n", Processor[1] );
printf( "Processor[2]: %d \n", Processor[2] );
printf( "Processor[3]: %d \n\n", Processor[3] );

return 0;
}

// *** Emd of C/C++ Program ***

Bye,
Skybuck.