Is it possible to call out to a function to calculate the md5sum of a
named string in gawk ?
{HashedPassword=md5sum($2)}
The background is I have a delimeted file with a column with sensitive
data, and a request came in to encrypt it using a md5sum....
thanks
Andrew
(OT: md5 is no encryption)
Awk has no built in function to calculate that. You have to either implement
it yourself, or call an external command from within awk, or preprocess
input to replace the field with its md5 before it's passed to awk.
Example follows, using the md5sum command available in linux, and assuming
the field of which we want the md5 is $2.
$ echo 'a b c' | awk '{
command="printf \"%s\" \"" $2 "\" | md5sum"
command | getline $2
close(command)
sub(/[[:blank:]].*/,"",$2)
print "output line is: " $0}'
output line is: a 92eb5ffee6ae2fec3ad71c777531578f c
Please note that if $2 contains quote those would need to be escaped before
calling the command.
Just to follow up on this. I think your example calculated the md5sum
incorrectly. The md5 for a file containing just the letter b is
3b5d5c3712955042212316173ccf37be, and in your example above it
generates a different md5 result.
Anyway, I had a play around and got it working, but I'll reiterate,
without your help I've have never worked this out, so really, thanks
again. Here's a corrected version that just echoes the field to the
md5sum, and this seems to generate good hashes for my data.
Same note about escaping quotes would apply to this example too, but
as it happens it doesn't affect my current task at hand.
echo 'a b c' | awk '{command="echo "$2"|md5sum" ;command | getline $2;
close(command);sub(/[[:blank:]].*/,"",$2); print "output line is: "
$0}'
output line is: a 3b5d5c3712955042212316173ccf37be c
One word of note for anyone picking this up later: this is running
very slowly for me.
Unfortunately I'm in a very constrained environment and need to do
this in awk, and for me speed is not an issue, but if you have a
choice keep in mind the context switching here is definately affecting
performance.
cheers,
Andrew
> Thank you very much for this snippet. I've never seen the "command"
> syntax in awk before. It's very useful.
"command" is just the name of a variable. I could have called it "xhdfkgj"
and it would have worked just the same.
> Just to follow up on this. I think your example calculated the md5sum
> incorrectly. The md5 for a file containing just the letter b is
> 3b5d5c3712955042212316173ccf37be, and in your example above it
> generates a different md5 result.
Your file most likely includes a trailing newline, so I'm pretty sure that
what you're doing is generating the md5 for "b\n", not for "b" alone.
> Anyway, I had a play around and got it working, but I'll reiterate,
> without your help I've have never worked this out, so really, thanks
> again. Here's a corrected version that just echoes the field to the
> md5sum, and this seems to generate good hashes for my data.
> Same note about escaping quotes would apply to this example too, but
> as it happens it doesn't affect my current task at hand.
>
> echo 'a b c' | awk '{command="echo "$2"|md5sum" ;command | getline $2;
> close(command);sub(/[[:blank:]].*/,"",$2); print "output line is: "
> $0}'
>
>
> output line is: a 3b5d5c3712955042212316173ccf37be c
No, it's wrong. The command "echo" outputs a trailing newline, so you're
taking the md5 for the string PLUS a newline, not the string only. If you
want the md5 for the string only, you have to use printf as in my first
example of "echo -n" if your echo supports that. Also, not quoting the
argument as you're doing is very dangerous and will give you (in the best
case) unexpected results. Hint: try with
echo 'a * c' | awk ....
and see what you get. The md5 of '*' is 3389dae361af79b04c9c8e7057f60cc6.
1. You have the C source for an md5sum function,
2. You are using gawk, and
3. Are up to hacking a bit, you can write a loadable builtin.
All this isn't super straightforward, but it's not overly awful either.
There are examples and some minimal documentation in the gawk distribution.
Arnold
In article <7217ab14-f3bd-4918...@t21g2000yqi.googlegroups.com>,
--
Aharon (Arnold) Robbins arnold AT skeeve DOT com
P.O. Box 354 Home Phone: +972 8 979-0381
Nof Ayalon Cell Phone: +972 50 729-7545
D.N. Shimshon 99785 ISRAEL
I stand very much corrected, thanks for explaining it.
A
That's a good idea which I'll look into.
thanks
A