Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

XOR, written in awk

929 views
Skip to first unread message

Jurgen Riepe

unread,
Jun 12, 2002, 9:33:42 AM6/12/02
to
I need to do a logical bitwise XOR between two 16-bit variables or two
4-char-hex strings.
Gawk 3.* cannot be used; I have to take an older version of new awk.

An XOR-function, written in awk, is available, but it is incredibly
slow.

Does anybody have alternative solutions?

Thanks
Jurgen

Mark Katz

unread,
Jun 12, 2002, 9:58:46 AM6/12/02
to
In article <b0a8f4e1.02061...@posting.google.com>
Jurge...@web.de "Jurgen Riepe" writes:

>I need to do a logical bitwise XOR between two 16-bit variables or two
>4-char-hex strings.
>Gawk 3.* cannot be used; I have to take an older version of new awk.
>
>An XOR-function, written in awk, is available, but it is incredibly
>slow.
>

How are you holding these 16 bit numbers
Can you extract them 4 bits at a time

If yes, it would be easy to set up a 16 x 16 matrix
that holds the result of XOR'ing all the combinations of 1-16, by 1-16
You could then look it up which would be fast

The creation of that matrix is a nice exercise for those who
dont have GAWK ;-)

Mark
---
Mark Katz
Mark-IT, London. Delivering MR-IT/Internet solutions
Tel: (44) 20-8731 7516, Fax: (44) 20-8458 9554; http://www.mark-it.co.uk

Dan Haygood

unread,
Jun 12, 2002, 11:07:58 AM6/12/02
to
"Mark Katz" <ma...@ispc001.demon.co.uk> wrote in message
news:102389...@ispc001.demon.co.uk...


Why was the OP's XOR slow? Was it that integer-math code that shows up with
the search "awk XOR" on Google? No wonder that is slow.

> Can you extract them 4 bits at a time?
Since OP has 4-character hex-strings, I would suppose so. If not, printf
"%X" will get them. Of course, your table indicies become the characters
"0"-"9" and "A"-"F", instead of 0-15. Using 0-15 adds the step of
int(/16),%16 and *16+ to reassemple bytes, etc. But it might be faster.

You could take advantage of the inherent "stringiness" of awk numbers, and
make a 10x10 table indexed with "0"-"9". Using split(n,a,"") may be faster
to iterate through the characters than substr(n,i,1). Use a 10x10 table to
bootstrap a 100x100 table (only 10000 elements, not the 65K required for
0-15), and you can leap through the numbers two digits at a time. But then
you're back to substr() or math to break out the 0-99 numbers.

I'll actually perform the excercise later, hope.

- Dan

Stepan Kasal

unread,
Jun 13, 2002, 4:07:38 AM6/13/02
to
Hallo,

On Wed, 12 Jun 02 13:58:46 GMT, Mark Katz <ma...@ispc001.demon.co.uk> wrote:
> If yes, it would be easy to set up a 16 x 16 matrix
> that holds the result of XOR'ing all the combinations of 1-16, by 1-16
> You could then look it up which would be fast
>
> The creation of that matrix is a nice exercise for those who
> dont have GAWK ;-)

why? I don't understand.

The matrix should be written explicitly in the source, I guess it the
fastest way.
That part of source won't be written by hand, but by an awk program.
(Jurgen alredy has _some_ code to compute XOR, remember.)

Regards,
Stepan

Stepan Kasal

unread,
Jun 13, 2002, 4:23:25 AM6/13/02
to
On Wed, 12 Jun 2002 15:07:58 GMT, Dan Haygood <reply...@you.read.it> wrote:
> > Can you extract them 4 bits at a time?
> Since OP has 4-character hex-strings, I would suppose so. If not, printf
> "%X" will get them. Of course, your table indicies become the characters
> "0"-"9" and "A"-"F", instead of 0-15. Using 0-15 adds the step of
> int(/16),%16 and *16+ to reassemple bytes, etc. But it might be faster.
>
> You could take advantage of the inherent "stringiness" of awk numbers, and

Unfortunately, 10 is not a power of 2 anymore ;->

But if you have those data also as numbers, not only hex strings (or if you
can use --non-decimal-data option of awk), you can use printf to print them as
octal numbers and then start with 8x8 table, use that to bootstrap 64x64 table
and then index the table by four digits.

BTW: it's probably a bit quicker not to use comma, ie. instead of xor["01","23"]
use just xor["0123"].

HTH,
Stepan Kasal

Mark Katz

unread,
Jun 13, 2002, 7:36:56 AM6/13/02
to
In article <slrnaggljt...@matsrv.math.cas.cz>
ka...@matsrv.math.cas.cz "Stepan Kasal" writes:

>On Wed, 12 Jun 2002 15:07:58 GMT, Dan Haygood <reply...@you.read.it> wrote:
>> > Can you extract them 4 bits at a time?
>> Since OP has 4-character hex-strings, I would suppose so. If not, printf
>> "%X" will get them. Of course, your table indicies become the characters
>> "0"-"9" and "A"-"F", instead of 0-15. Using 0-15 adds the step of
>> int(/16),%16 and *16+ to reassemple bytes, etc. But it might be faster.
>>
>> You could take advantage of the inherent "stringiness" of awk numbers, and
>
>Unfortunately, 10 is not a power of 2 anymore ;->

Crumbs - maths has moved on so much since I got my Masters in '67!

>But if you have those data also as numbers, not only hex strings (or if you
>can use --non-decimal-data option of awk), you can use printf to print them as
>octal numbers and then start with 8x8 table, use that to bootstrap 64x64 table

This all sounds so complicated


> then index the table by four digits.

>BTW: it's probably a bit quicker not to use comma, ie. instead of xor["01","23"]>use just xor["0123"].

Why not xor["ABDF"]= - leaving your number in Hex

You have to strike a balance between memory size for this matrix
and speed. Theoretically, you could make it of size [64K x 64K] to
avoid any breaking up of the original number

As I e-said - the generation of that matrix is easily done by
a subtle awk program. Either way it's best read in as a supplementary
file rather than insert N lines of assigment statements

Mark
---
Mark Katz
Mark-IT, London. Delivering MR-IT/Internet solutions
Tel: (44) 20-8731 7516, Fax: (44) 20-8458 9554; http://www.mark-it.co.uk

>
>HTH,
> Stepan Kasal
>

Stepan Kasal

unread,
Jun 13, 2002, 10:26:35 AM6/13/02
to
Hallo,

On Thu, 13 Jun 02 11:36:56 GMT, Mark Katz <ma...@ispc001.demon.co.uk> wrote:
> In article <slrnaggljt...@matsrv.math.cas.cz>
> ka...@matsrv.math.cas.cz "Stepan Kasal" writes:
> >But if you have those data also as numbers, not only hex strings (or if you
> >can use --non-decimal-data option of awk), you can use printf to print them
> >as octal numbers and then start with 8x8 table, use that to bootstrap 64x64
>

> This all sounds so complicated

combining the table to a larger one might be quicker that reading huge file,
I guess.

> Why not xor["ABDF"]= - leaving your number in Hex

You are right, table of size 64K should work.

> You have to strike a balance between memory size for this matrix
> and speed. Theoretically, you could make it of size [64K x 64K] to
> avoid any breaking up of the original number

Well, that would assume 4GB of memory in that context where you have to
use nawk and don't have gawk.

But if the balance seems to be somewhere between 256 and 64K,
you can try 4K (ie. two digit octal numbers).

> Either way it's best read in as a supplementary
> file rather than insert N lines of assigment statements

I missed that, thanks.

Stepan

Harlan Grove

unread,
Jun 13, 2002, 3:06:50 PM6/13/02
to
Jurgen Riepe <Jurge...@web.de> wrote...

>I need to do a logical bitwise XOR between two 16-bit variables or two
>4-char-hex strings.
>Gawk 3.* cannot be used; I have to take an older version of new awk.
>
>An XOR-function, written in awk, is available, but it is incredibly
>slow.

Bitwise XOR is just bitwise addition without carry. Would the following work
better?


BEGIN { __HexDec = "00112233445566778899AaBbCcDdEeFf" }

function h2d(n , i, k, r) {
k = length(n)
r = 0
for (i = 1; i <= k; ++i) {
r = 16 * r + int((index(__HexDec, substr(n, i, 1)) - 1) / 2)
}
return r
}

function d2h(n) {
return sprintf("%X", n)
}

function Xor(a, b , k, r) {
k = 1
r = 0
while (a > 0 || b > 0) {
a = int(a)
b = int(b)
r += k * ((a % 2 + b % 2) % 2)
a /= 2
b /= 2
k *= 2
}
return r
}

BEGIN { #very simplistic testing!
a = "1234"
b = "DCBA"
print a, h2d(a), b, h2d(b)
x = Xor(h2d(a), h2d(b))
print x, d2h(x)
}


Dan Haygood

unread,
Jun 14, 2002, 12:14:01 AM6/14/02
to
"Stepan Kasal" <ka...@matsrv.math.cas.cz> wrote in message
news:slrnaggljt...@matsrv.math.cas.cz...

Arggh! Got me, got me bad.

> Unfortunately, 10 is not a power of 2 anymore ;->

10 ~ 2^3.32, always has been. It's just not an integral power of 2.

But yes, 456 xor 123 is not equal to (4 xor 1) cat (5 xor 2) cat (6 xor 3).
I spoke without thinking too hard.

And other things like bootstrapping from a 16x16 matrix to a 255x255 matrix
still hold.

- Dan

P.S. Code in a bit.

Stepan Kasal

unread,
Jun 14, 2002, 2:34:10 AM6/14/02
to
Hi Dan,

On Fri, 14 Jun 2002 04:14:01 GMT, Dan Haygood wrote:
> And other things like bootstrapping from a 16x16 matrix to
> a 255x255 matrix still hold.

^^^^^^^

Gotcha again!

Shouldn't we call you "Pentium"?

Well, I try to write some code, to supply you all with some material
that you can also laugh at me.

Stepan

Alan Linton

unread,
Jun 14, 2002, 11:53:11 AM6/14/02
to
In message <102396...@ispc001.demon.co.uk>, Mark Katz
<ma...@ispc001.demon.co.uk> writes

I think the best option is probably to use a table of 4-bit hex digits
pre calculated and built into the awk program. It's fairly easy to work
with either decimal or hex inputs and outputs to the xor function so I
have done it both ways as follows: -

BEGIN {

#table of one-hex-digit xor values
tmp[ 0] = "0 1 2 3 4 5 6 7 8 9 a b c d e f"
tmp[ 1] = "1 0 3 2 5 4 7 6 9 8 b a d c f e"
tmp[ 2] = "2 3 0 1 6 7 4 5 a b 8 9 e f c d"
tmp[ 3] = "3 2 1 0 7 6 5 4 b a 9 8 f e d c"
tmp[ 4] = "4 5 6 7 0 1 2 3 c d e f 8 9 a b"
tmp[ 5] = "5 4 7 6 1 0 3 2 d c f e 9 8 b a"
tmp[ 6] = "6 7 4 5 2 3 0 1 e f c d a b 8 9"
tmp[ 7] = "7 6 5 4 3 2 1 0 f e d c b a 9 8"
tmp[ 8] = "8 9 a b c d e f 0 1 2 3 4 5 6 7"
tmp[ 9] = "9 8 b a d c f e 1 0 3 2 5 4 7 6"
tmp[10] = "a b 8 9 e f c d 2 3 0 1 6 7 4 5"
tmp[11] = "b a 9 8 f e d c 3 2 1 0 7 6 5 4"
tmp[12] = "c d e f 8 9 a b 4 5 6 7 0 1 2 3"
tmp[13] = "d c f e 9 8 b a 5 4 7 6 1 0 3 2"
tmp[14] = "e f c d a b 8 9 6 7 4 5 2 3 0 1"
tmp[15] = "f e d c b a 9 8 7 6 5 4 3 2 1 0"

for (a = 0; a <= 15; a++) {
ahex = sprintf("%x",a)
dec[ahex] = a # table : decimal equivalent to hex digit
$0 = tmp[a]
for (b = 0; b <= 15; b++) {
bhex = sprintf("%x",b)
xortab[ahex,bhex] = $(b+1)
}
delete tmp[a]
}

}

#xor : decimal inputs, decimal output
#note : inputs must be less than 65536
function xordec(a,b, ahex,bhex,i,result) {
ahex = sprintf("%04x",a)
bhex = sprintf("%04x",b)
for (i = 1; i <= 4; i++) {
result = result*16+dec[xortab[substr(ahex,i,1),substr(bhex,i,1)]]
}
return result
}

#xor : hex inputs, hex output
#note inputs must have 4 hex digits each including leading zeros
function xorhex(ahex,bhex, i,result) {
for (i = 1; i <= 4; i++) {
result = result xortab[substr(ahex,i,1),substr(bhex,i,1)]
}
return result
}

#minimal testing
BEGIN {
srand()
for (i = 1; i <= 10; i++) {
a = int(rand()*2^16)
b = int(rand()*2^16)
ahex = sprintf("%04x",a)
bhex = sprintf("%04x",b)
print a,b,xordec(a,b),ahex,bhex,xorhex(ahex,bhex)
#validation using gawk's built-in xor
#print xor(a,b),sprintf("%04x",xor(a,b))
}
}

Note that the program uses lowercase hex. I think it is mostly obvious
how to change it if uppercase hex is required. I have found that
sprintf("%04x",a) gives lowercase hex and sprintf("%04X",a) produces
uppercase hex consistently in gawk, mawk and awk95 but I don't know if
other awks conform to this rule.

Hope this helps
--
Alan Linton

Dan Haygood

unread,
Jun 14, 2002, 12:31:16 PM6/14/02
to

"Stepan Kasal" <ka...@matsrv.math.cas.cz> wrote in message
news:slrnagj3j2...@matsrv.math.cas.cz...

OK, at this point, I should stop posting on this thread. But, what the
heck, now you can tear apart some code.

I didn't finish it all the way to come up with a numeric XOR; it does all of
its work with strings of ASCII representations of base16 digits, except for
slow_xor, which might not be too bad itself. But I haven't timed anything
yet.

AND WORST OF ALL...I haven't tested this against known good answers! I have
spot checked it, and it looks right, though. Besides, it compiled, so it
must be good!

BTW, slow_xor() can be improved upon: right now, it shifts out the longer
operand to ensure all of the bit positions are hit. It really only needs to
shift out for the length of the shorter operand (until b is zero), and then
just OR in all of the upper bits of the upper operand all at once, with
addition.

In other words,
11101010110101
00000010010100
--------^ <shift and compare only to here,
and then prepend 111010.

And, to avoid the swap, just duplicate the code with the other variable.

xor.awk
----------
function slow_xor(a, b, local, t, r, m, x, y) {
if (a < b) {t = a; a = b; b = t}
# r = 0
m = 1
while (a > 0) {
r += m * ( (x = a % 2) != (y = b % 2) )
a = (a - x) / 2
b = (b - y) / 2
m *= 2
}
return r
}

function setup_fast_xor(local, I, i, J, j, C, c, D, d, C_c, C_D, s) {
hexits = "0123456789ABCDEF"
for (i = 0; i < 16; i++) {
c = substr(hexits,i+1,1)
xorF[c c] = "0"
for (j = i + 1; j < 16; j++) {
d = substr(hexits,j+1,1)
xorF[c d] = xorF[d c] = substr(hexits,slow_xor(i,j)+1,1)
}
}

for (I = 0; I < 16; I++) {
C = substr(hexits,I+1,1)
for (i = 0; i < 16; i++) {
c = substr(hexits,i+1,1)
C_c = C c
xorFF[C_c C_c] = "00"
s = i + 1
for (J = I; J < 16; J++) {
D = substr(hexits,J+1,1)
C_D = C D
for (j = s; j < 16; j++) {
d = substr(hexits,j+1,1)
D_d = D d
xorFF[C_c D_d] = xorFF[D_d C_c] = xorF[C_D] xorF[c
d]
}
s = 0
}
}
}
}

function fast_xor_F(a, b) {
if (a == "") a = "0"
if (b == "") b = "0"
return xorF[a b]
}

function fast_xor_FF(a, b) {
while (length(a) < 2) a = "0" a
while (length(b) < 2) b = "0" b
return xorFF[a b]
}

function fast_xor_FFF(a, b, local, la, z, r, i) {
if (a > b) {t = a; a = b; b = t}
la = length(a)
# z = ""
for (d = la - length(b); d > 0; d--) z = z "0"
b = z b
# r = ""
for (i = 1; i < la; i += 2)
r = r xorFF[substr(a,i,2) substr(b,i,2)]
if (i == la)
r = r xorF[substr(a,la,1) substr(b,la,1)]
return r
}

BEGIN {
setup_fast_xor()

printf "XOR |"
for (j = 0; j < (n=16); j++)
printf " %X", j
print ""
printf "----|"
for (j = 0; j < n; j++)
printf "--", j
print ""
for (i = 0; i < 16; i++) {
printf " %X |", i
for (j = 0; j < n; j++)
printf " %s", xorF[sprintf("%X",i) sprintf("%X",j)]
print ""
}
print ""

# This table is wider than most display line lengths.
printf " XOR |"
for (j = (cl=4070); j < (ch=4110); j++)
printf " %4X", j
print ""
printf "------|"
for (j = cl; j < ch; j++)
printf "-----", j
print ""
for (i = (rl=cl); i < (rh=ch); i++) {
printf " %4X |", i
for (j = cl; j < ch; j++)
printf " %4s", fast_xor_FFF(sprintf("%X",i),sprintf("%X",j))
# printf " %4s", xorFF[sprintf("%02X",i)sprintf("%02X",j)]
# printf " %s", sprintf("%02X",i)sprintf("%02X",j)
print ""
}
print ""
}


Alan Linton

unread,
Jun 15, 2002, 7:16:00 AM6/15/02
to
In message <oToO8.9605$Ok1.7...@news2.west.cox.net>, Dan Haygood
<reply...@you.read.it> writes
>
[snip]
>

Time in seconds for 100,000 xor calculations

(1) = Harlan Grove's Xor (decimal)
(2) = Alan Linton's xordec (decimal)
(3) = Dan Haygood's slow_xor (decimal)
(4) = Alan Linton's xorhex (hex)
(5) = Dan Haygood's fast_xor_FFF (hex)

(1) (2) (3) (4) (5)
gawk 3.1.0 djgpp 14.616 4.616 12.088 4.396 4.945
mawk 1.2.2 DOS 43.516 22.527 41.538 19.450 crash
awk95 19990602 22.637 44.835 16.374 21.429 13.077

(5) crashed mawk with error message "mawk: run time error: out of
memory", caused by the large array storage requirement.

BTW gawk's built-in xor function is orders of magnitude faster.

All tests done on a Pentium 4, 1.4 GHz, Windows ME, using sh-2.04 in a
DOS window. The figures are the difference between the run times with
and without the call to the xor function.

Conclusions

- for maximum speed the algorithm needs to be optimised to use your awk
interpreter's strong points

- it's faster to use hex rather than decimal provided that does not
involve many decimal to hex conversions

- precalculating a large table helps if the awk interpreter can handle
the storage requirement but there is a significant overhead time penalty
(24 seconds for (5) using awk95)

This is the test harness: -

BEGIN {
setup_fast_xor() # same overhead for all tests
ntests = 100000
max = 2^16
decimal = 0
if (decimal) { # decimal input and output
for (i=1; i<=ntests; i++) {
a = int(rand()*max)
b = int(rand()*max)
#x = xor(a,b) # gawk built-in
#x = Xor(a,b) # Harlan Grove
#x = xordec(a,b) # Alan Linton
#x = slow_xor(a,b) # Dan Haygood
#print a,b,x,xor(a,b)
}
} else {
for (i=1; i<=ntests; i++) {
a = int(rand()*max)
b = int(rand()*max)
ahex = sprintf("%04X",a)
bhex = sprintf("%04X",b)
#x = d2h(xor(a,b)) # gawk built-in
#x = xorhex(ahex,bhex) # Alan Linton
#x = fast_xor_FFF(ahex,bhex) # Dan Haygood
#print ahex,bhex,x,d2h(xor(a,b))
}
}
}

--
Alan Linton

Patrick TJ McPhee

unread,
Jun 15, 2002, 1:46:20 PM6/15/02
to
In article <4uugQ1Cw...@cranley.demon.co.uk>,
Alan Linton <al...@cranley.demon.co.uk> wrote:

% (1) = Harlan Grove's Xor (decimal)
% (2) = Alan Linton's xordec (decimal)
% (3) = Dan Haygood's slow_xor (decimal)
% (4) = Alan Linton's xorhex (hex)
% (5) = Dan Haygood's fast_xor_FFF (hex)

Is that a 16-bit mawk?

--

Patrick TJ McPhee
East York Canada
pt...@interlog.com

Alan Linton

unread,
Jun 15, 2002, 6:46:07 PM6/15/02
to
In message <M3LO8.216$DJ5.1...@news.ca.inter.net>, Patrick TJ McPhee
<pt...@interlog.com> writes

How can I check? I couldn't get mawk to compile using gcc so I
downloaded an executable - I think it was from a gnuish collection for
MSDOS. That was years ago so I am not sure.

C:\>mawk -W version
mawk 1.2.2dos+os2 Jan 1996, Copyright (C) Michael D. Brennan

Microsoft C/C++ _MSC_VER 600
compiled limits:
max NF 32767
sprintf buffer 1020
stack size 17836

C:\>

--
Alan Linton

Harlan Grove

unread,
Jun 15, 2002, 10:06:40 PM6/15/02
to
Alan Linton <al...@cranley.demon.co.uk> wrote...
...

>Time in seconds for 100,000 xor calculations
>
>(1) = Harlan Grove's Xor (decimal)
>(2) = Alan Linton's xordec (decimal)
>(3) = Dan Haygood's slow_xor (decimal)
>(4) = Alan Linton's xorhex (hex)
>(5) = Dan Haygood's fast_xor_FFF (hex)
>
> (1) (2) (3) (4) (5)
>gawk 3.1.0 djgpp 14.616 4.616 12.088 4.396 4.945
>mawk 1.2.2 DOS 43.516 22.527 41.538 19.450 crash
>awk95 19990602 22.637 44.835 16.374 21.429 13.077
...

I thought mawk was usually fastest of all awk implementations.

Anyway, mine was a brute force procedure, so not surprising its slow.

The table-driven procedures may be improved by using smaller tables.
Xor is commutative, i.e., a XOR b == b XOR a, so a table of XOR[a b]
values will be symmetric, so no reason to fill out both upper and
lower triangles. Also, the first column and first row will be 0 to
2^N-1 in order (a XOR 0 == a for all a), the main diagonal will be 0
(a XOR a == 0 for all a), so boiling this down,

if (a == b) {
return 0
} else if (a * b == 0) {
return a + b
} else if (a > b) {
return XOR[a b]
} else {
return XOR[b a]
}

Dan Haygood

unread,
Jun 15, 2002, 11:38:13 PM6/15/02
to
"Harlan Grove" <hrl...@aol.com> wrote in message
news:d2c023b0.02061...@posting.google.com...

I considered triangular tables; I filled them that way. My concern was
whether the degradation in string lookup and the conceptualization of the
triangle was worth the extra
> } else if (a > b) {
comparison. I didn't test this out, but I'd think the extra comparison is
faster. Supposing awk uses a binary tree for storage, double the table size
requires one more search level, and thus, one more string comparison--which
should be slower than a number comparison. And, you do use less than half
the memory. Which wouldn't have crashed mawk.

The next question is "How much does an 00-FF table save over (twice) the
repetitions of the 0-F table?" Is it so much more trouble to split strings
into chunks of two, than it would be to endure twice the work walking them
one charater at a time?

This does seem like a lot of thought to put in over a machine instruction,
available on every machine, that executes in like a zillionth of a second.

- Dan


Patrick TJ McPhee

unread,
Jun 16, 2002, 1:24:19 PM6/16/02
to
In article <QQpKvwAv...@cranley.demon.co.uk>,
Alan Linton <al...@cranley.demon.co.uk> wrote:

% Microsoft C/C++ _MSC_VER 600

This is a 16-bit-only version of MSC, so I assume it's a 16-bit app.
I was merely surprised because mawk was so much slower than the other
awks, and my experience has been that it's almost always much faster.
There must be a win32 port out there by now.

Alan Linton

unread,
Jun 16, 2002, 2:01:35 PM6/16/02
to
In message <FKTO8.15713$Ok1.1...@news2.west.cox.net>, Dan Haygood
<reply...@you.read.it> writes

>"Harlan Grove" <hrl...@aol.com> wrote in message
>news:d2c023b0.02061...@posting.google.com...
[snip]

I have found that speed can be improved for some versions of awk by
storing values in a string rather than an array. It seems that using
substr is faster than finding an array element so I wrote another xor
function (xordec5) to exploit this.

The overhead for setting up the string holding a 16x16 table of 4-bit
xor values is less than 2 seconds. The benefit over not using a look-up
table is a factor of 2 or 3 on speed per xor calculation - based on
comparing (6) with (3) in the table below.

It would take a 256*256*3 = 196608 character string to store a 256x256
table of 8-bit (3 decimal digit) xor values which would probably crash
the version of mawk that I am using so I have not tried it.

(1) = Harlan Grove's Xor (decimal)
(2) = Alan Linton's xordec (decimal)
(3) = Dan Haygood's slow_xor (decimal)
(4) = Alan Linton's xorhex (hex)
(5) = Dan Haygood's fast_xor_FFF (hex)

(6) = Alan Linton's xordec5 (decimal)

(1) (2) (3) (4) (5) (6)
gawk 3.1.0 djgpp 14.616 4.616 12.088 4.396 4.945 4.175
mawk 1.2.2 DOS 43.516 22.527 41.538 19.450 crash 13.626
awk95 19990602 22.637 44.835 16.374 21.429 13.077 8.352

This is xordec5: -

BEGIN {

#table of 4-bit xor values
tmp[ 0] = " 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15"
tmp[ 1] = " 1 0 3 2 5 4 7 6 9 8 11 10 13 12 15 14"
tmp[ 2] = " 2 3 0 1 6 7 4 5 10 11 8 9 14 15 12 13"
tmp[ 3] = " 3 2 1 0 7 6 5 4 11 10 9 8 15 14 13 12"
tmp[ 4] = " 4 5 6 7 0 1 2 3 12 13 14 15 8 9 10 11"
tmp[ 5] = " 5 4 7 6 1 0 3 2 13 12 15 14 9 8 11 10"
tmp[ 6] = " 6 7 4 5 2 3 0 1 14 15 12 13 10 11 8 9"
tmp[ 7] = " 7 6 5 4 3 2 1 0 15 14 13 12 11 10 9 8"
tmp[ 8] = " 8 9 10 11 12 13 14 15 0 1 2 3 4 5 6 7"
tmp[ 9] = " 9 8 11 10 13 12 15 14 1 0 3 2 5 4 7 6"
tmp[10] = "10 11 8 9 14 15 12 13 2 3 0 1 6 7 4 5"
tmp[11] = "11 10 9 8 15 14 13 12 3 2 1 0 7 6 5 4"
tmp[12] = "12 13 14 15 8 9 10 11 4 5 6 7 0 1 2 3"
tmp[13] = "13 12 15 14 9 8 11 10 5 4 7 6 1 0 3 2"
tmp[14] = "14 15 12 13 10 11 8 9 6 7 4 5 2 3 0 1"
tmp[15] = "15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0"

xorstr = ""


for (a = 0; a <= 15; a++) {

$0 = tmp[a]
for (b = 0; b <= 15; b++) {

xorstr = xorstr sprintf("%2d",$(b+1))
}
delete tmp[a]
}

}

#xor : decimal inputs, decimal output
#note : inputs must be less than 65536

function xordec5(a, b, i, j, r) {
j = ( ( int(a/4096) % 16 ) * 16 + ( int(b/4096) % 16 ) ) * 2 + 1
r = substr( xorstr, j, 2)
j = ( ( int(a/256) % 16 ) * 16 + ( int(b/256) % 16 ) ) * 2 + 1
r = r * 16 + substr( xorstr, j, 2)
j = ( ( int(a/16) % 16 ) * 16 + ( int(b/16) % 16 ) ) * 2 + 1
r = r * 16 + substr( xorstr, j, 2)
j = ( ( int(a) % 16 ) * 16 + ( int(b) % 16 ) ) * 2 + 1
r = r * 16 + substr( xorstr, j, 2)
return r
}

BEGIN {


ntests = 100000
max = 2^16

for (i=1; i<=ntests; i++) {
a = int(rand()*max)
b = int(rand()*max)

x = xordec5(a,b)
#print a,b,x
}
}

--
Alan Linton

Harlan Grove

unread,
Jun 16, 2002, 8:13:27 PM6/16/02
to
Alan Linton <al...@cranley.demon.co.uk> wrote...
...
>It would take a 256*256*3 = 196608 character string to store a 256x256
>table of 8-bit (3 decimal digit) xor values which would probably crash
>the version of mawk that I am using so I have not tried it.
...

You're missing the point that Dan caught: it's POINTLESS to use a 256x256 table
(or a string with as many entries). All that's needed is 255x128 entries and
extra logic to handle a == b, a * b == 0, or to swap a and b if a < b when all
indices into the table are a b where a > b. The first two may offer limited
efficiency, but failing to make use of the commutativity of the XOR operation
to cut the size of the table/string in half is foolish.

Then there's the quibble about the domain limitations of xordec5. Why? Loops
are good. Loops are our friends. If you initialize r to 0, then the first r =
statement can look *exactly* like the others. At that point I'm sure you're up
to rewriting with a loop to handle numbers up to 52 binary digits of double
precision.

Jurgen Riepe

unread,
Jun 18, 2002, 1:57:53 AM6/18/02
to
Thank you all for your help!!!

I borrowed several ideas and came to the 1st final version, listed
below.

(1) I split the 2 4-hex-digit-text-strings to pairs of corresponding
hex-characters, take the XOR of each character pair from a 16*16 table
(like Alan) and combine it to the 4-char-hex-XOR.

(2) The table is generated in the BEGIN section by "xor_init", the
values are computed by "xor_b2c" from binary variables, range 0..15
(like Dan & Harlan).
This was stripped down from my old slow version.
Binary-to-hex conversion is done via the "xor_c" character vector.

As for this particular application the xor-components are program
generated and therefore always 4 digit hex strings, I can omit some
checks and special cases.

Even more: I have to compute an XOR checksum along a long series of
hex data. So in the "2nd final" version (not shown) I do not combine
the checksum to one string for each XOR action, but only once at the
end, and so I do not always have to split one of the XOR parameters
and recombine the sum.

Jurgen

#**************** Bitweise XOR: 4-char Variables ******************

function xor( x, y ) {
split( x, xx, "" )
split( y, yy, "" )

return( xor_ar[xx[1] yy[1]] xor_ar[xx[2] yy[2]] xor_ar[xx[3]
yy[3]] xor_ar[xx[4] yy[4]] )
}

#**************** Bitweise XOR: Initialisation ********************

function xor_init( x, y ) {
split( "0123456789ABCDEF", xor_c, "" );

for( x = 0; x <= 15; x++ )
for( y = 0; y<= 15; y++ )
xor_ar[ xor_c[x+1] xor_c[y+1] ] = xor_c[ xor_b2c( x,y ) +
1 ]
}

#**************** Bitweise XOR: 1-hex-digit numeric Vars **********

function xor_b2c( a,b ,c,p ) {
for( p = 1 ; p <= 8; p += p )
{
if ( a%2 != b%2 ) c += p
a = int( a/2 )
b = int( b/2 )
}
return( c )
}

#*******************************************************************

0 new messages