Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Basic,Advanced,Power of 2, Super Advanced, Slotting Distributing problem

29 views
Skip to first unread message

Skybuck Flying

unread,
Jun 6, 2022, 12:00:10 PM6/6/22
to
A. Basic Slotting Distributing problem:

1. Given a number of "total slots" (can also be described as output slots)

2. Given a number of "input numbers"

3. Distribute the "input numbers" over the "total slots" such that the "number of intermediate slots" is the same.

(It'd be helpfull to also mention the number of intermediate ranges. This will always be "input numbers-1". (Input numbers<=1 is invalid)

B. Advanced Slotting Distributing problem:

4. It is allowed to calculate "total slots" as a multiplicative of "input numbers", (do specify the multiplier)

For example: Total Slots = "Input Numbers" x2 or x3 or x4 ox 5
So for example Total Slots = (Inputs Numbers x 3)
Multiplier = 3

C. Advanced Power of 2 Slotting Distributing problem:

5. Total Slots must be a power of 2.

D. Super Advanced Power of 2 Slotting Distributing problem;

6. Input numbers must be a power of 2.

In all situations, the total ammount of slots should all be used. In such a solution is not available, then nearest fit could still be interesting.
But for now only solutions that use all slots and meet all other constraints are to be considered valid and interesting solutions.
Especially because these solutions use their "input bits" and "output bits" for input numbers and total numbers efficiently in the case
of "super advanced power of 2 slotting problem and solutions".

To describe the solutions it's usefull to also specify the multiplier that was used to multiple the number of input slots to produce the number of output/total slots.

Which combinations of input, total slots and intermediate numbers are possible ?

Is/are there formulas that can help with this ?

So examples of valid solutions:

Input Numbers = 2 (0 to 1)
Intermediate ranges = 1 (2-1=1)
Total Slots = 32 (0 to 31)
Intermediate Slots = 30 ( 1 to 30)
Multiplier = 16

This simple solution meets all criteria for D
Input numbers is power of 2
Total slots is power of 2
All slots are used.
Intermediate slots is equal (only one set)

Other random example:

Input number = 8 ( 0, 1, 2, 3, 4, 5, 6, 7)
Intermediate ranges = (8-1=7)
Multiplier = 4
Total slots = 32
Intermediate slots = 32-8=26/7 = 3,714 = 3 (inperfect fit) 3x7 = 21
Invalid solution.
Total slots used would be: 21+8 = 29 (best inexact fit)

For now I intend to brute force this problem, to find any solutions that way, but it would be interesting to have some math people look
at this, maybe they can find some formulas that can help with finding perfect solutions.

Practical application for this is:

Scale VGA palette to a larger Palette.

Classic VGA palette has 64 input colors.
Modern VGA palette has 256 input colors.

Goal is to find a new VGA palette with more input colors, say 512, 1024, 2048, 4096, 8192.

For now it's interesting to look at "rows" of 16 input colors.

So scaling 16 input colors to:
32,64,128,256,512,1024,2048,4096,8192, and so on.

Is there a combination where there is a perfect fit for intermediates and totals and such as described above ? where multiplier is also an integer, preferably also a power of 2 for shifting purposes.
(Shifting faster than multiplications and divisions and should therefore allow real-time computation to convert from input to output or vice versa, a few other instructions consuming 1 or 2 clock cycles would be allowable as well, like addition and subtraction)

All variables should be at least integer for valid solutions.

Bye for now,
Skybuck.

MitchAlsup

unread,
Jun 6, 2022, 12:28:10 PM6/6/22
to
How do you handle the case(s) where at least 1 function unit never needs more than
1 operand, and where at least 1 function unit needs 3-operands ?

How do you handle the case where there is a non-power of 2 number of function units ?

And why do you insist on using terminology different from that which has been stable for
at least 30 years ?

Skybuck Flying

unread,
Jun 6, 2022, 6:47:26 PM6/6/22
to
I have come to the conclusion that the "redistributing/intermediate insertion (for interpolation) problem" is a little bit more complex than in this problem description so far.

But the problem description is still usefull per dimension.

However currently the VGA palette is split into 16 rows of 16 shades.

Each row is it's own dimension.

For example
Dimension 0: shade 0 to 15
Dimension 1: shade 16 to 31
Dimension 2: shade 32 to 47

I do not want to shade/interpolate/insert between 15 and 16. So these dimensions are not inside a single scalable linear dimension.

Simply multiplieing or shifting these indexes would create "empty additional space" between 15 and 16, basically wasted space/slots.

Bye for now,
Skybuck.

Skybuck Flying

unread,
Jun 6, 2022, 8:25:44 PM6/6/22
to
> And why do you insist on using terminology different from that which has been stable for
> at least 30 years ?

Cause I am probably doing something that few or nobody has done before.

Anyway I think I have found an interesting case/solution but everything is very tricky.

Imagine a vga table of 16 rows x 16 shades.

The question is how to scale/interpolate the shades to something higher, preferably something that fits into a power of 2 table, perhaps the power of two table will lead to efficient computations:

Let's examine one row of shades, counting from 0 to 15:

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
r1 r2 r3 r4 r5 r6 r7 r8 r9 r10 r11 r12 r13 r14 r15

There are 15 "rooms" in between, basically interpolation spaces. However we must be carefull cause here we are counting from 1.

And when dividing/calculation a distribution we need the total ammount, not counting from zero.

However when visualizing the tables, it makes sense to start from 0 to keep consistent with binary code starting from 0.

So re-visualizing above distribution challenge, leads to:

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
ip0 ip1 ip2 ip3 ip4 ip5 ip6 ip7 ip8 ip9 ip10 ip11 ip12 ip13 ip14

0 to 14 "interpolation spaces".

Finding this was trial and error, I don't recall it exactly, but probably simply trying x16 and /16 and it gives a perfect fit:

interpolation table for scaling 16 shades to 256 shades with 16 intermediate shades.
total count = 256
input count = 16
output count = 16 x multiplier(16) = 256
intermediate count = 16

interpolation space 0: 0 1.. 16 17
interpolation space 1: 17 18.. 33 34
interpolation space 2: 34 35.. 50 51
interpolation space 3: 51 52.. 67 68
interpolation space 4: 68 69.. 84 85
interpolation space 5: 85 86..101 102
interpolation space 6: 102 103..118 119
interpolation space 7: 119 120..135 136
interpolation space 8: 136 137..152 153
interpolation space 9: 153 154..169 170
interpolation space 10: 170 171..186 187
interpolation space 11: 187 188..203 204
interpolation space 12: 204 205..220 221
interpolation space 13: 221 222..237 238
interpolation space 14: 238 239..254 255

Visualization the interpolation table/space above was tricky because the end point of interpolation space zero has to be repeated as begin point of interpolation space 1, otherwise it would lead to a duplicate and this is not desired.

So in reality these interpolation spaces are connected and form a single dimension/line/slots as follows:

0 ... 17 ... 34 .. 51 .. 68 .. 85 .. 102 .. 119 .. 136 .. 153 .. 170 .. 187 .. 204 .. 221 .. 238 .. 255

Which indeed gives 15 interpolation spaces counting the dots, each double dot is one interpolation space.

So the input range 0..15 is mapped to output range 0..255

where
0 ends up on 0
1 ends up on 17
2 ends up on 34
3 ends up on 51
4 ends up on 68
5 ends up on 85
6 ends up on 102
7 ends up on 119
8 ends up on 136
9 ends up on 153
10 ends up on 170
11 ends up on 187
12 ends up on 204
13 ends up on 221
14 ends up on 238
15 ends up on 255

Now the mission for the programmer is to find instructions which can convert the input range 0..15 efficiently to 0..255 and back again from 0..255 to 0..15

This would be a start but is not yet the end of the story.

For real-time interpolation the programmer would have to convert from 0..255 down to 0..15 but also compute an intermediate value which would range from 1 to 16 in the first row example.

To avoid duplicates calculated interpolation entries... the interpolation calculation would then probably be:
(InterpolationValue is then the interpolated color):

InterpolationValue = (Left * (1-T)) + (Right * T)

This interpolation formula is a bit unusual, it has the parameters for T swapped, so that it interpolates correctly from a low value/range to a high value/range.

T should not start or end at 0 or 1 because that would cause duplicates entries/interpolation colors.

Thus it seems to me computing T will have to be done as follows:

T = IntermediateValue / 17

Where immediate value would range from 1 to 16 so it would never be zero or one which is a bit interesting and again unusual, have not tested it yet, but it seems sound ! ;)

So as you can see, there are many values which are not powers of 2.

So this will make it difficult to find efficient instructions to compute this fast and in real time.

Hence the idea of using a lookup table for "palette" interpolation seems interesting. Except at higher resolutions and especially color depth, say color depths of R,G,B,A is increased from 8 bit per component to 10 bit per component or even higher 16 bit per component than such look-up tables consume a lot of space in memory !

The current lookup table would consume 16 rows x 256 entries x 3 rgb x 1 byte = 12288 bytes

where currently L1 data caches seem limited to 32 kilobytes, so this takes quite a byte out of it.

256 entries would be enough to shade from black to red. However there are some shades/gradients which are more complex where the shading goes from red, to yellow, or from green to blue to purple. Such more complex gradients would benefit from even more entries to make the gradient more smooth and prevent any visible banding.

So I would like 16 rows x 512 entries x 3 rgb x 1 byte = 24576 bytes

to make absolutely sure that no banding would be visible, but maybe this is pushing it a little bit, but I do believe it may be necessary.

Now I have not yet looked into 48 bit monitors however I know somebody that purchased a monitor from sony which claims to do 10 bits of precision per R,G,B so that is interesting.

I am not yet sure how to program for such monitors, adobe photoshop seems to use 16 bits per component, and accounting for any alpha values leads to 64 bits per color:

Now I can live without the alpha channel for now, but let's compute both, however I am no longer sure if 512 would not lead to any banding but let's assume for now;
(entry = palette index = a color slot for a mix of r,g,b)

16 rows x 512 entries x 3 rgb x 2 bytes = 49152 bytes

So this surpasses the L1 data cache size of 32 KB, this would be 48 KB.

Now if alpha channel would be included as well, perhaps for 8 byte alignment purposes or so:

16 rows x 512 entries x 4 rgba x 2 bytes = 65.536 bytes would be 64 KB.

So here again it takes up the full space of my old DreamPC 2006 L1 data cache if I recall correctly the AMD X2 3800+ processor had 64 KB L1 data cache per core lol.
(Not that it matters cause PCI Express 1.0 would be too limited to drive a 4K monitor anyway ! ;) even PCI Express 2.0 will probably struggle without compression at 120 hz cause that is what the panel can do too, 60 hz might still be doable, 120 hz would be 8 GB/sec if I recall correctly at 3840x2160)

Anyway so seeing this large lookup table requirement makes it interesting to try and find a real-time computational solution with some kind of computer instructions.

So basically the situation would be as follows:

Let's assume a beautifull situation:

4K monitor at 120 Hz at 10 bit color precision but let's make it 16 bits just to be a bit future ready but can also do 30 bits thus 32 bits precision for now, makes more realistic sense I guess so no alpha channel, those bits repurposed for 10 bit precision/bit/color depth I guess:

3280x2160x120x4 bytes = 3.400.704.000 bandwidth required/bytes to process.

Now with a say 3 to 4 to 5 Gigahertz processor it becomes apperently that the ammount of instructions that can be spent per byte or pixel is "low".... also in the sense of ammount of "compute cycles" spent.

SIMD/SSE might be able to process some more bytes per "compute cycle" ("clock cycle").

So now the final question becomes:

How to "scale up VGA colors" / "interpolate VGA colors" such that it can be done in real-time to provide more bit depth.

A lookup table might not be the solution, because lookups can requires 100 or more compute cycles ?! ;)

Multi-threading might also be part of a solution.

Also the assumption may be made that the source code to the VGA computer game is available and "interpolation" can be implemented into the game, to make the shading between vga colors more smooth.

So the VGA colors themselfes could first be "value scaled" to 10 bit per component and then later "slot/gradient/palette scaled" up further by interpolating more slots and thus increasing the final palette size for more smooth gradients/shading etc.

Bye for now,
Skybuck.
0 new messages