xorEncode()

mss

unread,

Jan 3, 2010, 12:04:48 PM1/3/10

to

Okay, just noodling around and still teaching myself gawk.
I hope my posts are not fiddling around with the scheme of things
in this newsgroup, just seeking input and enjoying the 'brainfood'.

Anyhow, here's an example of some more basic encryption.

How would you make it faster?

BEGIN {

input = "The quick brown fox jumped over the lazy dog..."
password = "We can be heros if just for one day"

tmp1 = xorEncode(input, password)
tmp2 = xorEncode(tmp1, password)

# careful... the following might garble
# your console if so invoke 'reset'

print "Encoded string:"
print tmp1
print "--"
print "Decoded string:"
print tmp2

}

function xorEncode(str, pass, x, y, z, q, j, ord, str_a, pass_a, buf) {

x = length(str)
y = length(pass)

for (z = 0; z <= 255; z++) {
ord[sprintf("%c", ++q)] = q
}

for (z = 1; z <= x; z++) {
str_a[z] = substr(str, z, 1)
}

for (z = 1; z <= y; z++) {
pass_a[z] = substr(pass, z, 1)
}

for (z = 1; z <= x; z++) {
j = (++j < y) ? ++j : 1
buf = buf sprintf("%c", xor(ord[str_a[z]], ord[pass_a[j]]))
}

return buf

}

--
later on,
Mike

http://topcat.hypermart.net/

mss

unread,

Jan 3, 2010, 1:11:12 PM1/3/10

to

Take two...

BEGIN {

input = "The quick brown fox jumped over the lazy dog..."
password = "We can be heros if just for one day"

tmp1 = xorEncode(input, password)
tmp2 = xorEncode(tmp1, password)

# careful... the following might garble
# your console if so invoke 'reset'

print "Encoded string:"
print tmp1
print "--"
print "Decoded string:"
print tmp2

}

function xorEncode(str, pass, x, y, z, j, ord, str_a, pass_a, buf) {

# more info about xor at:
# http://en.wikipedia.org/wiki/Exclusive_or

x = length(str)
y = length(pass)

for (z = 1; z <= 255; z++) {ord[sprintf("%c", z)] = z}

for (z = 1; z <= x; z++) {str_a[z] = substr(str, z, 1)}
for (z = 1; z <= y; z++) {pass_a[z] = substr(pass, z, 1)}

for (z = 1; z <= x; z++) {

j = (++j <= y) ? ++j : 1

mss

unread,

Jan 3, 2010, 1:34:04 PM1/3/10

to

Best so far (for me that is).

BEGIN {

input = "The quick brown fox jumped over the lazy dog..."
password = "We can be heros if just for one day"

tmp1 = xorEncode(input, password)
tmp2 = xorEncode(tmp1, password)

# careful... the following might garble
# your console if so invoke 'reset'

print "Encoded string:"
print tmp1
print "--"
print "Decoded string:"
print tmp2

}

function xorEncode(str, pass, x, y, z, j, ord, buf) {

# more info about xor at:
# http://en.wikipedia.org/wiki/Exclusive_or

x = length(str)
y = length(pass)

for (z = 1; z <= 255; z++) {ord[sprintf("%c", z)] = z}

for (z = 1; z <= x; z++) {

j = (++j <= y) ? ++j : 1

buf = buf sprintf("%c", xor(ord[substr(str, z, 1)], \
ord[substr(pass, j, 1)]))

pk

unread,

Jan 3, 2010, 1:37:24 PM1/3/10

to

mss wrote:

> buf = buf sprintf("%c", xor(ord[substr(str, z, 1)], \

Just in case, xor() is a GNU awk extension, though it would probably be not
too difficult to implement it.

Anton Treuenfels

unread,

Jan 4, 2010, 1:03:23 AM1/4/10

to

"mss" <m...@dev.null> wrote in message news:hhqnur$1q5$1...@news.albasani.net...

> function xorEncode(str, pass, x, y, z, j, ord, buf) {
>
> # more info about xor at:
> # http://en.wikipedia.org/wiki/Exclusive_or
>
> x = length(str)
> y = length(pass)
>
> for (z = 1; z <= 255; z++) {ord[sprintf("%c", z)] = z}
>
> for (z = 1; z <= x; z++) {
> j = (++j <= y) ? ++j : 1
> buf = buf sprintf("%c", xor(ord[substr(str, z, 1)], \
> ord[substr(pass, j, 1)]))
> }
>
> return buf
> }

Odd that GAWK provides xor() but not ord(), although I see its on-line
manual discusses creating an _ord_ array that functions essentially the same
way yours does.

Your line:

j = ( ++j <= y ) ? ++j : 1

appears to have two mistakes. First, it increments twice if j <= y, thus
skipping half the potential characters in 'pass'. Second, if ever ++j == y,
the second increment makes j = y + 1. Using that as an index into 'pass'
probably results in a null string, which in turn probably results in
silently using a value of zero as the second argument of the xor() function.

You might be interested in the split() function, particularly in its ability
to break a string into individual characters:

function xorEncode(str, pass, x, y, z, j, ord, sbuf, pbuf) {

x = split( str, sbuf, "" )
y = split( pass, pbuf, "" )

# presumably you don't really do this each time...
# - also if you're only using the ASCII character set 1..127 will do, or
even 32..126
# if you know you're only going to use printable ASCII characters in
your strings

for (z = 1; z <= 255; z++) {ord[sprintf("%c", z)] = z}

str = ""
for ( j = z = 1; z <= x; z++ ) {
str = str sprintf("%c", xor(ord[sbuf[z]], ord[pbuf[j]]) )
j = ( j < y ) ? j++ : 1
}

return str
}

You could write a debug/test function that takes an encoded (or any, really)
string and replaces unprintable characters with some other printable
representation so they don't mess up your console when you print it.

Just some thoughts.

- Anton Treuenfels

mss

unread,

Jan 4, 2010, 12:17:24 PM1/4/10

to

Anton Treuenfels wrote:

...

> Your line:
>
> j = ( ++j <= y ) ? ++j : 1
>
> appears to have two mistakes. First, it increments twice if j <= y, thus
> skipping half the potential characters in 'pass'. Second, if ever ++j == y,
> the second increment makes j = y + 1. Using that as an index into 'pass'
> probably results in a null string, which in turn probably results in
> silently using a value of zero as the second argument of the xor() function.

Yes, you're correct (I only discovered this myself late last night...)

> You might be interested in the split() function, particularly in its ability
> to break a string into individual characters:

...

> x = split( str, sbuf, "" )
> y = split( pass, pbuf, "" )

You know, I thought about that Anton... but what advantage does it have
over substr()?

> # presumably you don't really do this each time...
> # - also if you're only using the ASCII character set 1..127 will do, or
> even 32..126
> # if you know you're only going to use printable ASCII characters in
> your strings

I'd prefer 32-126, but what if someone wishes to use (say) the character for
the British pound '�' (decimal 163), its not that uncommon in my thinking...

> You could write a debug/test function that takes an encoded (or any, really)
> string and replaces unprintable characters with some other printable
> representation so they don't mess up your console when you print it.

I wrote a quick & dirty debugger so I can (knock on wood) avoid this sort
of thing in the future:

http://topcat.hypermart.net/papers/debug.txt

> Just some thoughts.

And thanks for your thought too. I'm glad you folks are offering helpful
insights, as I'm still learning =)

Here's where I'm at (I've added your change to 'j'):

BEGIN {

input = "The quick brown fox jumped over the lazy dog..."
password = "We can be heros if just for one day"

tmp1 = xorEncode(input, password)
tmp2 = xorEncode(tmp1, password)

print "Encoded string:"

print tmp1
print "--"
print "Decoded string:"
print tmp2

}

function xorEncode(str, pass, x, y, z, j, ord, buf) {

# more info about xor at:

# http://en.wikipedia.org/wiki/XOR_cipher
# note: xor is specific to GNU gawk

x = length(str)
y = length(pass)

for (z = 1; z <= 255; z++) {ord[sprintf("%c", z)] = z}

for (j = z = 1; z <= x; z++) {

buf = buf sprintf("%c", xor(ord[substr(str, z, 1)], \
ord[substr(pass, j, 1)]))

j = ( j < y ) ? j++ : 1
}

return buf

Anton Treuenfels

unread,

Jan 4, 2010, 6:21:22 PM1/4/10

to

"mss" <m...@dev.null> wrote in message news:hht7r3$tnm$1...@news.albasani.net...

> Anton Treuenfels wrote:
>> You might be interested in the split() function, particularly in its
>> ability
>> to break a string into individual characters:
>

> You know, I thought about that Anton... but what advantage does it have
> over substr()?

In one line it converts an unwieldy string into easily manipulated
individual characters. It also turns function calls into array lookups in
the main loop. Since in AWKs array lookups generally involve a hash function
rather than a single integer multiply the time savings may not be as great
as they would be otherwise, but it also makes the code look cleaner.

>> # presumably you don't really do this each time...
>> # - also if you're only using the ASCII character set 1..127 will do,
>> or
>> even 32..126
>> # if you know you're only going to use printable ASCII characters in
>> your strings

Actually I take that back. The encoded string can contain any value 0..255
in its chars.

- Anton Treuenfels

Loki Harfagr

unread,

Jan 5, 2010, 2:45:57 AM1/5/10

to

Mon, 04 Jan 2010 17:21:22 -0600, Anton Treuenfels did cat :

> "mss" <m...@dev.null> wrote in message
> news:hht7r3$tnm$1...@news.albasani.net...
>> Anton Treuenfels wrote:
>>> You might be interested in the split() function, particularly in its
>>> ability
>>> to break a string into individual characters:
>>
>> You know, I thought about that Anton... but what advantage does it have
>> over substr()?
>
> In one line it converts an unwieldy string into easily manipulated
> individual characters. It also turns function calls into array lookups
> in the main loop. Since in AWKs array lookups generally involve a hash
> function rather than a single integer multiply the time savings may not
> be as great as they would be otherwise, but it also makes the code look
> cleaner.

Exactly, the comfort in reading/writing is perfect for "ideas montages" but
the clarity has a price :-) Mss, check again your other recent thread
in the branch where it is compared and discussed ;-)
(precisely those few should be clear enough:
http://groups.google.com/group/comp.lang.awk/msg/86d2f426c69fe4fb?dmode=source&output=gplain
http://groups.google.com/group/comp.lang.awk/msg/b38795dc74ca7a71?dmode=source&output=gplain
http://groups.google.com/group/comp.lang.awk/msg/b849ea2a41128db2?dmode=source&output=gplain
http://groups.google.com/group/comp.lang.awk/msg/826ccb49dcc01b77?dmode=source&output=gplain
)

mss

unread,

Jan 5, 2010, 8:18:07 AM1/5/10

to

Anton Treuenfels wrote:

> In one line it converts an unwieldy string into easily manipulated
> individual characters. It also turns function calls into array lookups in
> the main loop. Since in AWKs array lookups generally involve a hash function
> rather than a single integer multiply the time savings may not be as great
> as they would be otherwise, but it also makes the code look cleaner.

Well, I had a chance to think about this some more Anton,
and two things come to mind (that you mentioned)...

- it removes what could be a bottleneck from the 2nd loop

- it makes the main string easier to read

These are decidedly good things. Here's the next iteration
using your input. I made two changes... the 1st, is a
cosmetic change only (arrays are renamed to better reflect
their meaning). And 2nd'ly, I chose not to change an input
parameter (str = "") in the function because it seems wiser
to let it be.

function xorEncode(str, pass, x, y, z, j, ord_a, str_a, pass_a, buf) {

# more info about xor at:
# http://en.wikipedia.org/wiki/XOR_cipher
# note: xor is specific to GNU gawk

x = split(str, str_a, "")
y = split(pass, pass_a, "")

for (z = 1; z <= 255; z++) {ord_a[sprintf("%c", z)] = z}

for (j = z = 1; z <= x; z++) {

buf = buf sprintf("%c", xor(ord_a[str_a[z]], ord_a[pass_a[j]]))
j = (j < y) ? j++ : 1

mss

unread,

Jan 5, 2010, 8:24:01 AM1/5/10

to

Loki Harfagr wrote:

> Exactly, the comfort in reading/writing is perfect for "ideas montages" but
> the clarity has a price :-) Mss, check again your other recent thread
> in the branch where it is compared and discussed ;-)
> (precisely those few should be clear enough:

Hey Hey Loki (please call me Mike friend).

Yes you're correct, I should strive for greater efficiency but
_NOT_ at the expense of readability...

You must remember I'm only a newbie & not as well versed at using
AWK as you are =)

And something else that is difficult for me to explain...

I want to code stand-alone functions(). This allows me to modularize
my code with reusable building blocks & simplifies debugging.

I need you to 'roll back the focus' & think in terms of building an
entire application & not just single function right now.

This is frustrating because I don't have the vocabulary to convey my
plans, & can only grasp at an explanation... Here's a screen shot:

<http://topcat.hypermart.net/glyphs/stash-win.png>

So... to understand what I mean, please follow these steps:

1. download the pre-release of project I'm working on here:

<http://topcat.hypermart.net/code/stash_pre.tar.gz> [27KB]

2. Inside the archive are two folders, one for Linux,
the other for Windows. Choose one of the two folders,
and within that folder, read the text file named 'readme'.

Here is a portion of the readme:

[quote]

Stash is a small knowledge base system intended for managing fragments of
text like your book collection, code snippets, phone numbers, etc, from
your browser. Any type of textural data can be stored in a free form
fashion with only a minimum of fuss. While not a big time, multi-user
database, Stash is never the less well suited to administering several
thousand records.

And since your data is stored as a simple ASCII file, you're not locked
into a proprietary format, thus allowing you to mine your data with any
number of other tools. With that design philosophy in mind, the goal here
is to 'grab your data and get back to the task at hand'. Bottom line: If
you need a solution that's both lightweight and portable, Stash is a solid
choice.

Stash has two interfaces, a command line interface and a browser based
interface...

[/quote]

Let me know your thoughts Loki, its vital to the project to
have experienced folks like you offering input...

Kees Nuyt

unread,

Jan 5, 2010, 2:58:11 PM1/5/10

to

On Tue, 5 Jan 2010 13:18:07 +0000 (UTC), mss <m...@dev.null>
wrote:

>function xorEncode(str, pass, x, y, z, j, ord_a, str_a, pass_a, buf) {
>
># more info about xor at:
># http://en.wikipedia.org/wiki/XOR_cipher
># note: xor is specific to GNU gawk
>
> x = split(str, str_a, "")
> y = split(pass, pass_a, "")
>

To improve performance in scripts where xorEncode() is called
more than once, the building of the constant array ord_a[] :

> for (z = 1; z <= 255; z++) {ord_a[sprintf("%c", z)] = z}

should be in the BEGIN{} pattern. Of cause it has the
disadvantage of being a global variable. Being constant, I'd call
it ORD_A[] .

> for (j = z = 1; z <= x; z++) {
> buf = buf sprintf("%c", xor(ord_a[str_a[z]], ord_a[pass_a[j]]))
> j = (j < y) ? j++ : 1
> }
>
> return buf
>
>}
--

) Kees
(
c[_] Inertia makes the world go round.
-- [#359]