BEGIN {
from = "NOPQRSTUVWXYZABCDEFGHIJKLMnopqrstuvwxyzabcdefghijklm0987654321"
to = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz1234567890"
for (i = 1; i <= length(from); i++) {
letter[substr(from, i, 1)] = substr(to, i, 1)
}
}
{
for (i = 1; i <= length($0); i++) {
char = substr($0, i, 1)
if (match(char, "[a-zA-Z]|[0-9]") != 0) {
printf("%c", letter[char])
} else {
printf("%c", char)
}
}
printf("\n")
}
--
later on,
Mike
You can write this as...
if (match(char, /[a-zA-Z0-9]/) {
or as...
if (match(char, /[[:alnum:]]/) {
But why not just...
if (char in letter) {
> printf("%c", letter[char])
> } else {
> printf("%c", char)
> }
> }
> printf("\n")
> }
If you're using GNU awk you may make use of FS=""...
BEGIN { FS = ""
n = split("ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz1234567890",a)
split("NOPQRSTUVWXYZABCDEFGHIJKLMnopqrstuvwxyzabcdefghijklm0987654321",b)
for (i=1; i<=n; i++) t[a[i]] = ""b[i]
}
{
for (i=1; i<=NF; i++)
printf "%c",(($i in t)?t[$i]:""$i)
print ""
}
Or use index() into a string of allowed characters and add some offset
(e.g. 13) modulo length of the string to get the respective new index.
(This is depending on the character class subsets you use, and special
handling for your number range is necessary.)
Janis
> if (match(char, /[a-zA-Z0-9]/) {
Sure enough. [a-zA-Z] | [0-9] is redundant in my 1st post.
> if (match(char, /[[:alnum:]]/) {
Is this portable? And also, something I don't yet 'grok'...
Why must double brackets be used in [[:those:]] classes?
> if (char in letter) {
I like this best myself too.
> (This is depending on the character class subsets you use, and special
> handling for your number range is necessary.)
Further to that end, another very nifty implementation here:
http://rosettacode.org/wiki/Rot-13#AWK
The other change WRT your post was to use /.../ instead of "...".
Not much different in case of the given character set, but using the
former you've less trouble generally; I prefer them where possible.
>
>> if (match(char, /[[:alnum:]]/) {
>
> Is this portable?
It's POSIX standard, I'm sure. And quite surely not portable WRT very
old awk's. I also think it isn't mentioned in the book of A, K, and W.
WRT the rot13 task in general, and character classes specifically, you
have to consider locales. E.g. in German, how would the umlauts ������
and the � be rot13'ed, and you have to consider that those additional
characters would be in the [:alnum:] character class as well.
> And also, something I don't yet 'grok'...
> Why must double brackets be used in [[:those:]] classes?
To be able to differentiate syntactically between charaxters and classes
of characters. The outer brackets define the character set, and the
inner [:alnum:] defines the predefined set of alphanumeric characters.
You can, for example, add an underscore to the alnum set by either of
those expressions
[[:alnum:]_] [_[:alnum:]]
>
>> if (char in letter) {
>
> I like this best myself too.
Yes, it's the most elegant.
Janis
function rot13(str, from, to, q, letter, char, buf) {
# rot13 for awk
# more info at: http://en.wikipedia.org/wiki/ROT13
# a slight modifcation of the example found at:
# http://www.miranda.org/~jkominek/rot13/awk/
# authors: Janis Papanagnou and Michael Sanders
from = "NOPQRSTUVWXYZABCDEFGHIJKLMnopqrstuvwxyzabcdefghijklm0987654321"
to = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz1234567890"
for (q = 1; q <= length(from); q++) {
letter[substr(from, q, 1)] = substr(to, q, 1)
}
for (q = 1; q <= length(str); q++) {
char = substr(str, q, 1)
if (char in letter) {
buf = buf sprintf("%c", letter[char])
} else {
buf = buf sprintf("%c", char)
}
}
return buf
You're welcome. And thanks for your attribution :-)
Though, I don't feel like being a co-author; just gave some feedback.
>
> function rot13(str, from, to, q, letter, char, buf) {
>
> # rot13 for awk
> # more info at: http://en.wikipedia.org/wiki/ROT13
> # a slight modifcation of the example found at:
> # http://www.miranda.org/~jkominek/rot13/awk/
> # authors: Janis Papanagnou and Michael Sanders
# author: Michael Sanders (with comments from Janis Papanagnou)
>Okay, lets see. Here's the version I'll be using for now.
>This iteration, encapsulates the functionality in its
>own function. Thanks for your help & input Janis.
>
>function rot13(str, from, to, q, letter, char, buf) {
>
># rot13 for awk
># more info at: http://en.wikipedia.org/wiki/ROT13
># a slight modifcation of the example found at:
># http://www.miranda.org/~jkominek/rot13/awk/
># authors: Janis Papanagnou and Michael Sanders
>
>from = "NOPQRSTUVWXYZABCDEFGHIJKLMnopqrstuvwxyzabcdefghijklm0987654321"
>to = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz1234567890"
>
> for (q = 1; q <= length(from); q++) {
> letter[substr(from, q, 1)] = substr(to, q, 1)
> }
If you call this function more than once in a session, the above should
be in a BEGIN block, you may need a 'buf = ""' here to clear old one?
> for (q = 1; q <= length(str); q++) {
> char = substr(str, q, 1)
> if (char in letter) {
> buf = buf sprintf("%c", letter[char])
> } else {
> buf = buf sprintf("%c", char)
> }
> }
>
> return buf
>
>}
>
>--
sig delim --> s/-- /-- / :)
Grant.
--
http://bugsplatter.id.au
>>from = "NOPQRSTUVWXYZABCDEFGHIJKLMnopqrstuvwxyzabcdefghijklm0987654321"
>>to = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz1234567890"
>>
>> for (q = 1; q <= length(from); q++) {
>> letter[substr(from, q, 1)] = substr(to, q, 1)
>> }
>
> If you call this function more than once in a session, the above should
> be in a BEGIN block
Yes, however for the sake of brevity...
> you may need a 'buf = ""' here to clear old one?
Repeated use here has no side effects...
My understanding is that 'buf' would only last the life of the function,
since its locally scoped:
A function definition:
function myfunction (param, local)
{
print param # the value passed to the function
print local # a locally scoped variable declared in the formal
# parameter list (hides global variables with the
# same name)
print global # a global variable
}
> sig delim --> s/-- /-- / :)
On my end it contains the space 'before leaving the box'... Mutt problem?
> You're welcome. And thanks for your attribution :-)
> Though, I don't feel like being a co-author; just gave some feedback.
Corrected.
> for (q = 1; q <= length(str); q++) {
> char = substr(str, q, 1)
> if (char in letter) {
> buf = buf sprintf("%c", letter[char])
> } else {
> buf = buf sprintf("%c", char)
> }
> }
Isn't this bit here doing too much work, since the contents of 'char' and
'letter[]' are already one character strings? Here's one alternative:
for ( q = length(str); q; --q ) {
char = substr( str, q, 1 )
if ( char in letter )
buf = letter[ char ] buf
else
buf = char buf
}
Um, I also like to count down to avoid repeated evaluation of 'length(str)',
which always gives the same result.
- Anton Treuenfels
...
>
> Isn't this bit here doing too much work, since the contents of 'char' and
> 'letter[]' are already one character strings? Here's one alternative:
Yes, its not yet very efficient at all to be honest, but neither am
I at awk yet...
> for ( q = length(str); q; --q ) {
> char = substr( str, q, 1 )
> if ( char in letter )
> buf = letter[ char ] buf
> else
> buf = char buf
> }
I'll study this more...
> Um, I also like to count down to avoid repeated evaluation of 'length(str)',
> which always gives the same result.
Agreed, & in fact already thought of this too, the function now uses variables
rather evaluating the length(s) with every iteration (I know 'ouch'), so...
x = length(from)
y = length(str)
Thanks Anton, I'm learning.
> Agreed, & in fact already thought of this too, the function now uses variables
> rather evaluating the length(s) with every iteration (I know 'ouch'), so...
>
> x = length(from)
> y = length(str)
>
> Thanks Anton, I'm learning.
Here's where I'm at currently:
function rot13(str, from, to, x, y, z, letter, char, buf) {
# rot13 for awk
# more info at: http://en.wikipedia.org/wiki/ROT13
# a slight modification of the example found at:
# http://www.miranda.org/~jkominek/rot13/awk/
from = "NOPQRSTUVWXYZABCDEFGHIJKLMnopqrstuvwxyzabcdefghijklm0987654321"
to = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz1234567890"
x = length(from)
y = length(str)
for (z = 1; z <= x; z++) {
letter[substr(from, z, 1)] = substr(to, z, 1)
}
for (z = 1; z <= y; z++) {
char = substr(str, z, 1)
if (char in letter) {
buf = buf letter[char]
} else {
buf = buf char
}
}
return buf
BEGIN {
FS = OFS = ""
from = rng("A","Z") rng("a","z") "0987654321"
to = rng("N","Z") rng("A","M")
to = to tolower( to ) "1234567890"
for (i=1; i<=length(to); i++)
map[ substr( from, i, 1 ) ] = substr( to, i, 1 )
}
{
for (i=1; i<=NF; i++) $i = rot13( $i )
print
}
function rng( lo, hi, all )
{ for (i=1; i<128; i++)
{ c = sprintf( "%c", i )
if ( lo <= c && c <= hi ) all = all c
}
return all
}
function rot13( c )
{ return (c in map) ? map[c] : c
}
--
> BEGIN {
> FS = OFS = ""
> from = rng("A","Z") rng("a","z") "0987654321"
> to = rng("N","Z") rng("A","M")
> to = to tolower( to ) "1234567890"
> for (i=1; i<=length(to); i++)
> map[ substr( from, i, 1 ) ] = substr( to, i, 1 )
> }
>
> {
> for (i=1; i<=NF; i++) $i = rot13( $i )
> print
> }
>
> function rng( lo, hi, all )
> { for (i=1; i<128; i++)
> { c = sprintf( "%c", i )
> if ( lo <= c && c <= hi ) all = all c
> }
> return all
> }
>
> function rot13( c )
> { return (c in map) ? map[c] : c
> }
Interesting take, thanks for sharing. Take a look at the other posts
in this thread to see how its progressing, for more ideas...
My goals are:
- A single reusable generic function
- Isolated variables (we don't want to clobber the rest of a script)
- A rigor that *reduces complexity*
> Here's where I'm at currently:
>
> function rot13(str, from, to, x, y, z, letter, char, buf) {
>
> # rot13 for awk
> # more info at: http://en.wikipedia.org/wiki/ROT13
> # a slight modification of the example found at:
> # http://www.miranda.org/~jkominek/rot13/awk/
>
> from = "NOPQRSTUVWXYZABCDEFGHIJKLMnopqrstuvwxyzabcdefghijklm0987654321"
> to = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz1234567890"
> x = length(from)
> y = length(str)
>
> for (z = 1; z <= x; z++) {
> letter[substr(from, z, 1)] = substr(to, z, 1)
> }
You rebuild the letter[] array every time the function is called, which is
probably not too efficient, especially if you call it lots of times. Since
it doesn't change, you could just build it once and put it in a global
variable.
>
> for (z = 1; z <= y; z++) {
> char = substr(str, z, 1)
> if (char in letter) {
> buf = buf letter[char]
> } else {
> buf = buf char
> }
> }
It's just cosmetic syntax, but how about
for (z = 1; z <= y; z++) {
buf = buf ((char = substr(str, z, 1)) in letter)?letter[char]:char
}
or even (to save a variable!)
while (y) {
buf = (((char = substr(str, y--, 1)) in letter)?letter[char]:char) buf
}
But I agree that in this case there is not a real difference in efficiency
so it might be worth keeping it more readable if that's better for you.
>
> You rebuild the letter[] array every time the function is called, which is
> probably not too efficient, especially if you call it lots of times. Since
> it doesn't change, you could just build it once and put it in a global
> variable.
Hey pk.
Yes you're right. (Its only included in the function to illustrate,
else wise I'll use it in BEGIN{}).
> for (z = 1; z <= y; z++) {
> buf = buf ((char = substr(str, z, 1)) in letter)?letter[char]:char
> }
>
> or even (to save a variable!)
>
> while (y) {
> buf = (((char = substr(str, y--, 1)) in letter)?letter[char]:char) buf
> }
>
> But I agree that in this case there is not a real difference in efficiency
> so it might be worth keeping it more readable if that's better for you.
The 2nd fragment is nifty! You and Anton seem to have a knack for
'unwinding' a loop.
Question... Have only been coding in C (Pelles) about two years now,
and of that, its nearly all Windows API related. Can you provide a blurb
or two on how '?'..':' works in AWK? I typically use If, or Case
statements...
Anyhow, thanks for the ideas & stay tuned... will work some of these ideas
into the mix, I like the thinking.
It's a conditional expression; if you've coded in C you might have seen
the identical construct there.
A conditional statement...
if (c) x = 1 ; else x = 2 ;
and an equivalent conditional expression...
x = c ? 1 : 2
Conditional expressions are long existing constructs, even Algol had them.
They had long been considered deprecated as being less performant that the
equivalent conditional statement. Meanwhile, as compiler have optimizers,
typically, it shouldn't make a difference in performance any more.
Janis
> [...]
> if (c) x = 1 ; else x = 2 ;
...
> x = c ? 1 : 2
Thanks Janis!
function rot13(str, from, to, x, y, z, letter, char, buf) {
# rot13 for gawk
# more info at: http://en.wikipedia.org/wiki/ROT13
# a modification of the example found at:
# http://www.miranda.org/~jkominek/rot13/awk/
from = "NOPQRSTUVWXYZABCDEFGHIJKLMnopqrstuvwxyzabcdefghijklm0987654321"
to = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz1234567890"
x = length(from)
y = length(str)
for (z = 1; z <= x; z++) {
letter[substr(from, z, 1)] = substr(to, z, 1)
}
for (z = 1; z <= y; z++) {
char = substr(str, z, 1)
buf = (char in letter) ? buf letter[char] : buf char
}
return buf
And see the thread started by AaronM, "increment letters", 10/18/2008
> And see the thread started by AaronM, "increment letters", 10/18/2008
Will do, appreciate the heads up.
Hi, Mike
In my trials, I found that indexing $z in the "from" string and
returning the char at that index in the "to" string was slightly
faster than building and using an associative array to relate the two
strings.
Also, as a New Year's pleasantry, I send this:--
BEGIN { FS = ""
s = "aNbOcPdQeRfSgThUiVjWkXlYmZnAoBpCqDrEsFtGuHvIwJxKyLzM"
}
{ l = ""
for (i = 1; i <= NF; i++) {
n = index(s, $i)
l = l (n ? (n%2 ? tolower(substr(s, n+1, 1)) : \
toupper(substr(s, n-1, 1))) : $i)
}
print l
}
Best wishes to all Awkers for the New Year, Brian
> Hi, Mike
>
> In my trials, I found that indexing $z in the "from" string and
> returning the char at that index in the "to" string was slightly
> faster than building and using an associative array to relate the two
> strings.
>
> Also, as a New Year's pleasantry, I send this:--
>
> BEGIN { FS = ""
> s = "aNbOcPdQeRfSgThUiVjWkXlYmZnAoBpCqDrEsFtGuHvIwJxKyLzM"
> }
>
> { l = ""
> for (i = 1; i <= NF; i++) {
> n = index(s, $i)
> l = l (n ? (n%2 ? tolower(substr(s, n+1, 1)) : \
> toupper(substr(s, n-1, 1))) : $i)
> }
> print l
> }
>
> Best wishes to all Awkers for the New Year, Brian
How nifty is that? Brian this is great. That's cool the way you've
inter-woven the string in such a manner. I can see that shaving a
few cycles off the time too...
This is a keeper for my snippet collection. Thank you kind sir.
As a foolowup, though I'd have posted earlier if I hadn't been
hit by a gastro/celebration virus those last days ;-(
here below are my 2 cents and a half ;-)
- allow me to insist that your said "rot13" is strictly speaking
a "rot13 and a rev symmetry on nums.
- as the chosen "cipher" is bijective there's no need to list
the full set twice (as Brian also noticed ;-)
- as you insist on using a unique function I put my example that way
but in case of a use on an input with many records you'd really
better think about a _pre_function generating "global span" arrays.
- the parsing of the input 'str' can be made as you like it with
a 'substr' but I here chose to use the 'split' way ;-)
----------
#!/bin/awk -f
# ROT13 and crossrev5 in awk
# a slight modification on a slight modifcation of
# http://www.miranda.org/~jkominek/rot13/awk/
###
function rot13xrev5(str, ALF, res, h, i, j, lin, lDNA, rDNA) {
ALF="ABCDEFGHIJKLMnopqrstuvwxyz09876NOPQRSTUVWXYZabcdefghijklm12345"
h=(split(ALF,alf,""))/2
for(j=h+(i=1); i<=h; j=h+ ++i){
lDNA[alf[i]]=alf[j]
rDNA[alf[j]]=alf[i]
}
j=split(str,lin,res="")
for(i=1; i<=j; i++)
res=res (index(ALF,lin[i])?(lDNA[lin[i]] rDNA[lin[i]]):lin[i])
return res
}
{ print rot13xrev5($0) }
----------
> better think about a _pre_function generating "global span" arrays.
Yes! *Its only in the function for the example*. Otherwise, with
repeated use, the lookup table (that long string) should be in BEGIN{}.
And thanks, I'll study your example Loki!
Hi, Mike--
The single-string code I posted lightheartedly seems to run slower
than a script using two strings mapped to each other as in your code.
Haven't tried Loki's; I was influenced by his responses to the AaronM
10/18/2008 posting mentioned earlier.
Loki, I hope you're recovered from the holidays. Brian
Hah! well, of course it goes the same path (hash and index) hence runs
slower than 'direct' (string-pointer + displacement), I post below
the probably "best" effort version, on a one million lines sample
input it runs in 'one TimeUnit' while mss code runs in 151p100 TU
and my sample first coed with all the arrays was 555p100 :D)
> I was influenced by his responses to the AaronM 10/18/2008
> posting mentioned earlier.
Er?-) You do mean the "easy to obfuscate" idea, don't you ?-)
>
> Loki, I hope you're recovered from the holidays. Brian
Thanks Brian, (I hope too ,D) time will tell and if anyone comes with a
highly faster version that below I'll know I'm not yet
back to sourcery ,-)
-------------------
#!/bin/awk -f
# ROT13 and crossrev5 in awk
# a slight modification on a slight modifcation of
# http://www.miranda.org/~jkominek/rot13/awk/
###
function _pre_rot13xrev5(h, i, j) {
ALF="ABCDEFGHIJKLMnopqrstuvwxyz09876NOPQRSTUVWXYZabcdefghijklm12345"
h=(split(ALF,alf,""))/2
for(j=h+(i=1); i<=h; j=h+ ++i){
DNA[i]=alf[j]
DNA[j]=alf[i]
}
}
function rot13xrev5(str,ALF,res, h, i, c) {
i=1+length(str)
while(--i){
c=substr(str,i,1)
h=index(ALF,c)
res=(h?DNA[h]:c) res
}
return res
}
BEGIN{ _pre_rot13xrev5() }
{ print rot13xrev5($0,ALF) }
-------------------
If we're seriously starting to inspect performance we should at that
point note that one should always use the right tool for the task.
On Unix'es, e.g., you'd apply the tr(1) command, which not only runs
ten times faster than this awk program below but is also much easier
in the code, actually just half a line of code (40 characters, or so).
;-}
Good recovery, Loki.
Janis
> Loki Harfagr wrote:
> [...]
>>
>> Hah! well, of course it goes the same path (hash and index) hence runs
>> slower than 'direct' (string-pointer + displacement), I post below the
>> probably "best" effort version, on a one million lines sample input it
>> runs in 'one TimeUnit' while mss code runs in 151p100 TU and my sample
>> first coed with all the arrays was 555p100 :D)
>>
>>> I was influenced by his responses to the AaronM 10/18/2008 posting
>>> mentioned earlier.
>>
>> Er?-) You do mean the "easy to obfuscate" idea, don't you ?-)
>>
>>> Loki, I hope you're recovered from the holidays. Brian
>>
>> Thanks Brian, (I hope too ,D) time will tell and if anyone comes with a
>> highly faster version that below I'll know I'm not yet back to sourcery
>> ,-)
>
> If we're seriously starting to inspect performance we should at that
> point note that one should always use the right tool for the task. On
> Unix'es, e.g., you'd apply the tr(1) command, which not only runs ten
> times faster than this awk program below but is also much easier in the
> code, actually just half a line of code (40 characters, or so). ;-}
Oh well, yes indeed :-) but the OP seemed to search paths of
exploration in awk and I just added the "speed optimized" version
as Brian mentionned the perf hiatus between using arrays hashes and
using 'semi-direct' targetting :-)
I reckon I once tried and build a 'rice' and 'arith' toolbench in gawk just
because I thought it'd help the trainees not to have to struggle with
asm or C (or wotever), well the tools somewhat worked but the perfs were
really beyond the human life span when testing on actual big files :D)
> Good recovery, Loki.
Thank you Janis, after my first day at work it's already much better,
now back in my dent ;D)
>
> Janis
>
>
>> -------------------
>> #!/bin/awk -f
>> # ROT13 and crossrev5 in awk
...