Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Bit swizzling

899 views
Skip to first unread message

Rick C. Hodgin

unread,
Sep 5, 2020, 12:45:43 PM9/5/20
to
Are there any algorithms which take a known-at-compile-time sequence
of bitwise operations on an 8-bit to 64-bit quantity, and optimize
them down to their minimal set of operations?

For example, if I have an 8-bit byte and I want to swizzle the bits
thusly:

Input: 07 06 05 04 03 02 01 00
Output: 05 04 07 02 01 03 00 06

I can easily swizzle the data using a brute force method:

v = get_value();
o = 0;
swizzle1(o, v, 0, 6);
swizzle1(o, v, 1, 0);
swizzle1(o, v, 2, 3);
swizzle1(o, v, 3, 1);
swizzle1(o, v, 4, 2);
swizzle1(o, v, 5, 7);
swizzle1(o, v, 6, 4);
swizzle1(o, v, 7, 5);

// Untested, off the top of my head
void swizzle(unsigned char& o, unsigned char v, int s, int d)
{
o |= (((v & (1 << s)) >> s) << d);
}

And, of course, that algorithm can be optimized based on relative
values of s and d, and if s is 0, etc.

However, there also exists a minimal set of steps to complete that
swizzling because in the operation above, bits 5 and 4 are used in
sequence. It could be done using a swizzle2() operation, for example
and handle 2 bits at a time.

In addition, given the bit operator abilities that exist on various
CPUs there are potentially other combinations that exist behind an
operation, such as bitswap, where the order of bits flips or mirrors
across the center position.

Are there any existing algorithms which examine the operations that
must be conducted and then create an optimized / minimal sequence of
mechanical steps to conduct it given a constrained set of features
(such as those present on a given CPU)?

Thank you in advance.

--
Rick C. Hodgin

Hans-Peter Diettrich

unread,
Sep 5, 2020, 2:51:20 PM9/5/20
to
Am 05.09.2020 um 18:05 schrieb Rick C. Hodgin:
> Are there any algorithms which take a known-at-compile-time sequence
> of bitwise operations on an 8-bit to 64-bit quantity, and optimize
> them down to their minimal set of operations?
>
> For example, if I have an 8-bit byte and I want to swizzle the bits
> thusly:
>
>     Input:   07 06 05 04 03 02 01 00
>    Output:   05 04 07 02 01 03 00 06

In your 8-bit case an array of outputs is sufficient, indexed by the input.


> In addition, given the bit operator abilities that exist on various
> CPUs there are potentially other combinations that exist behind an
> operation, such as bitswap, where the order of bits flips or mirrors
> across the center position.

Somebody must build the tables for those operations, distinct for each
CPU, then write code to make use of these tables. I don't think that it
has been done yet, except perhaps for the basic operations (AND, OR...)

> Are there any existing algorithms which examine the operations that
> must be conducted and then create an optimized / minimal sequence of
> mechanical steps to conduct it given a constrained set of features
> (such as those present on a given CPU)?

I often used Quine-McCluskey to minimize my logic circuits.

In most applications a toggled bit in the outputs invalidates all prior
computation optimization attempts, and everything has to be analyzed and
constructed again.

In general a hash code of the inputs could be used to index an array of
outputs. That code is insensitive to changes of output bits, only the
affected array element has to be uptdated then.

DoDi

John Levine

unread,
Sep 5, 2020, 2:52:16 PM9/5/20
to
In article <rivvah$1neg$2...@gioia.aioe.org>,
>> -----
>> Are there any algorithms which take a known-at-compile-time sequence
>> of bitwise operations on an 8-bit to 64-bit quantity, and optimize
>> them down to their minimal set of operations?

>Why not just use a lookup table ?. Minimum ops and fast...

Assuming you're looking for something you can implement in logic
rather than by table lookup, it sounds like a set of Karnaugh maps.

--
Regards,
John Levine, jo...@taugh.com, Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly

Kaz Kylheku

unread,
Sep 5, 2020, 4:21:52 PM9/5/20
to
On 2020-09-05, Rick C. Hodgin <rick.c...@gmail.com> wrote:
> Are there any existing algorithms which examine the operations that
> must be conducted and then create an optimized / minimal sequence of
> mechanical steps to conduct it given a constrained set of features
> (such as those present on a given CPU)?

For mapping 8 bit quantities to other 8 bit quantities, you can always
use a 256 byte look up table.

Of course, it's not practical for larger spaces. Still, it may be possibel to
identify the possibility of involving multiple smaller look-up tables.

davidl...@gmail.com

unread,
Sep 6, 2020, 11:43:15 AM9/6/20
to
On Saturday, September 5, 2020 at 5:45:43 PM UTC+1, Rick C. Hodgin wrote:
> Are there any algorithms which take a known-at-compile-time sequence
> of bitwise operations on an 8-bit to 64-bit quantity, and optimize
> them down to their minimal set of operations?

There's some good resources on bit permutations including this online calculator here:

http://programming.sirrida.de/calcperm.php

For:
> Input: 07 06 05 04 03 02 01 00
> Output: 05 04 07 02 01 03 00 06

It gives:

x = bit_permute_step_simple(x, 0x55555555, 1); // Bit index complement 0
x = bit_permute_step_simple(x, 0x33333333, 2); // Bit index complement 1
x = bit_permute_step_simple(x, 0x0f0f0f0f, 4); // Bit index complement 2

where

t_bits bit_permute_step_simple(t_bits x, t_bits m, t_uint shift) {
return ((x & m) << shift) | ((x >> shift) & m);
}

Approx 12 cycles on superscalar processor.

Source code for a more sophisticated permutation code generator is also available linked from the page.

Some notes here on the online generator:

http://programming.sirrida.de/bit_perm.html#calculator

Chris

unread,
Sep 6, 2020, 1:20:18 PM9/6/20
to
On 09/05/20 19:50, John Levine wrote:
> In article<rivvah$1neg$2...@gioia.aioe.org>,
>>> -----
>>> Are there any algorithms which take a known-at-compile-time sequence
>>> of bitwise operations on an 8-bit to 64-bit quantity, and optimize
>>> them down to their minimal set of operations?
>
>> Why not just use a lookup table ?. Minimum ops and fast...
>
> Assuming you're looking for something you can implement in logic
> rather than by table lookup, it sounds like a set of Karnaugh maps.

Unless this is just an intellectual exercise for the fun of it, an
engineer would choose the minimal design at lowest cost to
satisfy the requirements. Table methods don't have to be in
software as a single eprom or gate array could do it in hardware.
8 inputs to address lines, then 8 bits of output, scale as required,
so why make life more difficult than necessary ?.

Old project here, where a programmer spent nearly 5 pages of 6800
asm to translate an input connect pin layout to that required for
the internal functions. Code was impenetrable, so substituted a
256 byte lookup table. Less space that the code and easily
modified for new requirements...

Chris
[I think the question is whether there is a way to mechanically
generate a version of the opaque assembler. -John]

Martin Ward

unread,
Sep 7, 2020, 1:57:47 PM9/7/20
to
On 05/09/2020 17:05, Rick C. Hodgin wrote:
> Are there any existing algorithms which examine the operations that
> must be conducted and then create an optimized / minimal sequence of
> mechanical steps to conduct it given a constrained set of features
> (such as those present on a given CPU)?

The process you are describing is called "Superoptimization":
finding the optimal code sequence for one loop-free sequence
of instructions.

https://en.wikipedia.org/wiki/Superoptimization

The term superoptimization was first coined by Alexia Massalin in the
1987 paper "Superoptimizer: A Look at the Smallest Program":

http://www.stanford.edu/class/cs343/resources/superoptimizer.pdf

Back in 1980 there was a discussion in the magazine
"Pratical Electronics" regarding the shortest sequence of
instructions needed to reverse the bits in the accumulator
for a 6800 system (an operation that is needed in some Fast Fourier
Transform algorithms).

One solution (given in June 1980, page 70) consisted of just
two instructions and used no additional data.
The instructions were:

STAA PORTA write to output port
LDAB PORTB read from input port

with the output port being wired directly to the input port with
the bits reversed!

--
Martin

Dr Martin Ward | Email: mar...@gkc.org.uk | http://www.gkc.org.uk
G.K.Chesterton site: http://www.gkc.org.uk/gkc | Erdos number: 4

Rick C. Hodgin

unread,
Sep 7, 2020, 1:58:16 PM9/7/20
to
Thank you, Martin.

I saw a Google developer talk on C/C++ where they discussed generating
optimum code for a given logic flow (as dictated by source code).

I remember thinking, what if your base logic breakout isn't optimum?
What if the way the developer was designing the algorithm wasn't optimum?
What if the entire premise of what you're trying to do needed to be first
broken down into fundamentals, and then re-assembled in a truly optimum
form?

There are cases where, if you know specific input ranges, you could sub-
stitute the algorithm with something completely unlike the actual function,
yet yield the same results.

One such idea is writing to a port, and reading back from another port,
where the external thing does the processing for you as a co-processor or
custom ASIC.  The actual 6800 algorithm as dictated by the developer may
have been in C code or other lagugae, and would've used Nn asm opcodes to
conduct the same workload with even an optimizing compiler.  But an intel-
ligent compiler, one that could break down the operations to fundamentals,
understand what's trying to be accomplished, and then re-engineer the
logic to accomodate the abilities which exist (such as those extra read/
write ports), is what's really quired.

In some algorithm cases, the nature of what you're trying to do may be
handled fully by a completely unexpected series of instructions.

I've long desired for my CAlive language to support that type of opti-
mization.  Walter Banks and I used to discuss the possibility, and had
he lived we would've continued down that line.  It goes along with
another fundamental idea I have for processing that I would like to
explore someday ... if there's time.

I appreciate your response.

--
Rick C. Hodgin


On 9/7/20 7:08 AM, Martin Ward wrote:
> On 05/09/2020 17:05, Rick C. Hodgin wrote:
>> Are there any existing algorithms which examine the operations that
>> must be conducted and then create an optimized / minimal sequence of
>> mechanical steps to conduct it given a constrained set of features
>> (such as those present on a given CPU)?
>

Hans-Peter Diettrich

unread,
Sep 7, 2020, 8:57:15 PM9/7/20
to
Am 07.09.2020 um 16:55 schrieb Rick C. Hodgin:

> One such idea is writing to a port, and reading back from another port,
> where the external thing does the processing for you as a co-processor or
> custom ASIC.

In a more general approach you connect a ROM between the output and
input port, for any possible mapping of n-into-n bits. Nowadays a table
in flash memory is better applicable.

DoDi

Tom Crick

unread,
Sep 8, 2020, 10:34:58 PM9/8/20
to
On Mon, 7 Sep 2020 at 18:57, Martin Ward <mar...@gkc.org.uk> wrote:
>
> On 05/09/2020 17:05, Rick C. Hodgin wrote:
> > Are there any existing algorithms which examine the operations that
> > must be conducted and then create an optimized / minimal sequence of
> > mechanical steps to conduct it given a constrained set of features
> > (such as those present on a given CPU)?
>
> The process you are describing is called "Superoptimization":
> finding the optimal code sequence for one loop-free sequence
> of instructions. ...

Back in the distant past (2009), I did my PhD on superoptimisation —
provably optimal code generation using answer set programming:

https://proftomcrick.com/2012/02/18/three-papers-on-superoptimisation/

Still an area with lots of potential (IMHO)...

Best wishes,

Tom

gah4

unread,
Sep 10, 2020, 1:55:04 PM9/10/20
to
On Saturday, September 5, 2020 at 9:45:43 AM UTC-7, Rick C. Hodgin wrote:
> Are there any algorithms which take a known-at-compile-time sequence
> of bitwise operations on an 8-bit to 64-bit quantity, and optimize
> them down to their minimal set of operations?
>
> For example, if I have an 8-bit byte and I want to swizzle the bits
> thusly:
>
> Input: 07 06 05 04 03 02 01 00
> Output: 05 04 07 02 01 03 00 06

There is a lot of work, and many algorithms, for logic optimization,
or minimization, that, for example, will find the optimal combination
of NAND and NOR gates to evaluate some logical operation.

This is used to compile Verilog or VHDL into either FPGA bit files
or ASIC masks. On the other hand, logic minimization tends to
believe that you can wire anything to anything else. That is, bit
swizzle is just wiring. The question you ask doesn't come up.
(Later on, routing has to get all the wires to the appropriate
place, but that is true in general, not just for this case.)

Note that it isn't just adjacent bits, but bits with the same spacing
in the input and output, such that you can mask with AND, shift,
and combine with OR. I suspect that with the operations usually
available: AND, OR, XOR, and shift, it wouldn't be hard to find an
algorithm for the optimal case. If you add more operations,
maybe allow also add and multiply, it gets interesting.

Rick C. Hodgin

unread,
Sep 10, 2020, 8:13:02 PM9/10/20
to
On 9/10/20 1:34 PM, gah4 wrote:
> On Saturday, September 5, 2020 at 9:45:43 AM UTC-7, Rick C. Hodgin wrote:
>> Are there any algorithms which take a known-at-compile-time sequence
>> of bitwise operations on an 8-bit to 64-bit quantity, and optimize
>> them down to their minimal set of operations? ...

> There is a lot of work, and many algorithms, for logic optimization,
> or minimization, that, for example, will find the optimal combination
> of NAND and NOR gates to evaluate some logical operation. ...

I have a preliminary algorithm that works, but it's still not fully
optimized and is a little clunky.

I want a way to abstract the logic and work with it there.

I haven't put in any additional time on this algorithm yet, due to
some life things happening. But I plan to come back to it.

--
Rick C. Hodgin

0 new messages