function() return VAR vs ..return $0

someone

no leída,

20 feb 2023, 12:50:04 p.m.20/2/23

para

Hello again,
I wrote a small looping AWK script to practice use of functions and have
a few questions which maybe some of you could weigh in on.

The script:
--
#! /usr/bin/awk -f
# dwmstat.awk -- populate dwm(1) window mgr status area.

BEGIN {
while(1) {
status_str = " " temp() " | " load() " | " date() " "
#system("xsetroot -name '"status_str"'")
printf "%s\n", status_str
sleep()
}
}

function temp() {
while ("sensors -A coretemp-isa-0000"|getline) {
if ($0 ~ /Package/) {
sub("\\+","",$4)
TEMP = "core: " $4
break
}
}
close("sensors -A coretemp-isa-0000")
return TEMP
}

function uptime() {
"uptime -p" |getline
close("uptime -p")
sub("up","&:")
sub("ou","")
sub("utes","")
return $0
}

function load() {
"uptime" |getline
close("uptime")
sub("^.*age:","load:")
return $0
}

function date() {
"date '+%a %b %d %Y | %I:%M.%S %p'" |getline
close("date '+%a %b %d %Y | %I:%M.%S %p'")
sub("\n","")
return $0
}

function sleep () {
return system("sleep 5");\
close("sleep 5")
}

--
The script returns a status line that looks like this:
core: 33.0°C | load: 0.05, 0.10, 0.09 | Mon Feb 20 2023 | 10:44.49 AM

The commented out xsetroot(1) line will eventually be used to write
status area via the "-name" parameter; 'printf "%s\n", status_str' is
just for testing.

The questions:
All the functions just return "$0" and the script as written appears to
run fine. However I've also written a version that uses VARS in the
various functions, i.e.
--

function load_v() {
"uptime" |getline LOAD
close("uptime")
sub("^.*age:","load:",LOAD)
return LOAD
}

--

I did this because I noticed that while omitting the function vars
mostly works in some cases -- splitting date and time into two functions
for example -- the 'return $0' is contaminated with data returned from
other functions. Is this because "$0" is ultimately a global variable
or something else, say, a lack of garbage collection?

In case it matters:
- OS system being used: Debian 11.x
- AWKs being used: gawk and mawk

Other questions:
- should 'load_v()' be 'load( LOAD)' ? Why?
- are all these close() calls really necessary?
- can this script be improved or streamlined further?

Regards,
jeorge

Janis Papanagnou

no leída,

20 feb 2023, 3:50:02 p.m.20/2/23

para

It matters with e.g. 'date' that gawk supports with built-in functions.
(Cannot tell about mawk's 'date' support.)

> - OS system being used: Debian 11.x
> - AWKs being used: gawk and mawk
>
> Other questions:
> - should 'load_v()' be 'load( LOAD)' ? Why?

Local variables (like LOAD) should be declared in the function argument
list to create a local variable instance (and not a global variable).

'getline var' is to prefer (to a simple 'getline') to not overwrite $0
and to leave awk's native read-loop intact.

> - are all these close() calls really necessary?

close() is necessary for commands that need to be re-invoked anew. To
make that clear some examples...

Every '"ps" | getline var' will return in var one line of the same 'ps'
output. '"date" | getline var' will return the output of the one same
'date' call if called repeatedly, so subsequent calls will be empty.

> - can this script be improved or streamlined further?

To prevent code duplication and errors I'd put the commands as strings
and use, e.g.,
date_cmd = "date '+%a %b %d %Y | %I:%M.%S %p'"
date_cmd | getline
close (date_cmd)
(and similar for the other external commands, especially for those that
need a close()).

I'd use arguments for the functions, e.g. funct sleep(sec) to make them
more universally usable in case of extensions.

And it's not obvious to me why all these shell functionality is embedded
in an awk script, and what the awk code frame actually adds here.

Janis

>
> Regards,
> jeorge

jeorge

no leída,

25 feb 2023, 2:34:22 p.m.25/2/23

para

Thanks for the feedback. Ya, embedding shell commands in AWK frame was
just a practice thing mostly; probably not much of an additional burden
on most modern computers.

jeorge

Ed Morton

no leída,

26 feb 2023, 4:43:36 p.m.26/2/23

para

Embedding shell commands in AWK introduces a massive burden on any
computer, often turning tasks that should run in seconds or minutes into
tasks that take hours or days to run, due to awk having to create a
subshell each time it has to call such a command. Consider this with
just 1000 lines of input:

1) Call a GNU awk function to print the seconds since the epoch:

$ time seq 1000 | awk '{print systime()}' >/dev/null

real 0m0.040s
user 0m0.000s
sys 0m0.000s

2) Embed a shell command to do the same thing:

$ time seq 1000 | awk '{system("date +%s")}' >/dev/null

real 0m29.628s
user 0m0.420s
sys 0m2.410s

3) Doing the same thing in a shell loop (slow but still much faster than
calling it from awk):

$ time { seq 1000 | while IFS= read -r; do date +%s; done >/dev/null; }

real 0m17.796s
user 0m0.858s
sys 0m2.198s

Regards,

Ed.

jeorge

no leída,

26 feb 2023, 10:06:50 p.m.26/2/23

para

On 2/26/23 2:43 PM, Ed Morton wrote:
> Embedding shell commands in AWK introduces a massive burden on any
> computer, often turning tasks that should run in seconds or minutes into
> tasks that take hours or days to run, due to awk having to create a
> subshell each time it has to call such a command. Consider this with
> just 1000 lines of input:
>
> 1) Call a GNU awk function to print the seconds since the epoch:
>
> $ time seq 1000 | awk '{print systime()}' >/dev/null
>
> real    0m0.040s
> user    0m0.000s
> sys     0m0.000s
>
>
> 2) Embed a shell command to do the same thing:
>
> $ time seq 1000 | awk '{system("date +%s")}' >/dev/null
>
> real    0m29.628s
> user    0m0.420s
> sys     0m2.410s
>
>
> 3) Doing the same thing in a shell loop (slow but still much faster than
> calling it from awk):
>
> $ time { seq 1000 | while IFS= read -r; do date +%s; done >/dev/null; }
>
> real    0m17.796s
> user    0m0.858s
> sys     0m2.198s

Hmm, I guess my computer is a bit faster:

$ time seq 1000 | awk '{print systime()}' >/dev/null

real 0m0.004s
user 0m0.003s
sys 0m0.002s

$ time seq 1000 | awk '{system("date +%s")}' >/dev/null

real 0m0.836s
user 0m0.782s
sys 0m0.099s

$ time { seq 1000 | while IFS= read -r; do date +%s; done >/dev/null; }

real 0m0.826s
user 0m0.676s
sys 0m0.218s

But I do take your point -- using systime() is over 200 times faster.
Probably one simple uses something other than awk when things need to
happen quickly/efficiently and awk lacks a built-in.

Looking at my practice script some of the data could be pulled from
/proc, i.e. load and uptime. Other things like core temp, pulled from
sensors(1), or battery charge, pulled from upower(1), might not be too
bad if only done every minute or so.

Anyway, I appreciate the feedback. I should probably try to rein in my
compulsion to over-apply awk as I learn more about it.

jeorge

Janis Papanagnou

no leída,

27 feb 2023, 1:23:50 a.m.27/2/23

para

Even though in this specific test case we only want "seconds since
Epoch" that systime() returns, in the general case, to be fair, we
better compare the time functions with formatting included; instead
of

$ time seq 100000 | awk '{print systime()}' >/dev/null

better

$ time seq 100000 | awk '{print strftime("%s")}' >/dev/null

Otherwise we'd only measure the single specific case.

And since we're at it; values of magnitude "0m0.004s" might measure
just noise. To get a more accurate result the N of 'seq <N>' should
be made larger.

Janis

Kenny McCormack

no leída,

27 feb 2023, 2:26:29 a.m.27/2/23

para

In article <tth6o8$2o3p$1...@nnrp.usenet.blueworldhosting.com>,
jeorge <som...@invalid.invalid> wrote:
>> Embedding shell commands in AWK introduces a burden on small
>> computers, often turning tasks that should run in micro-seconds into
>> tasks that take seconds or minutes to run, due to awk having to create a

>> subshell each time it has to call such a command.

I have made some edits to the above text, to better reflect modern reality.
...

>Looking at my practice script some of the data could be pulled from
>/proc, i.e. load and uptime. Other things like core temp, pulled from
>sensors(1), or battery charge, pulled from upower(1), might not be too
>bad if only done every minute or so.

Given what I think I understand about your task(s), it'd probably be better
to just write it as a (bash) shell script. Modern bash has most of what
you need to do real scripting. About the only thing missing is decimal
(aka, floating point) arithmetic, and this can usually be easily done via a
"bc" co-process (or, in a pinch, a call to awk).

Note that many, but not all, of the things you list can be done "natively"
in gawk, using direct access to things in /proc and/or /sys, but it is
often easier and clearer to use the access tools mentioned above (the
various things listed with (1) after their names).

--
Never, ever, ever forget that "Both sides do it" is strictly a Republican meme.

It is always the side that sucks that insists on saying "Well, you suck, too".

Janis Papanagnou

no leída,

27 feb 2023, 3:08:47 a.m.27/2/23

para

On 27.02.2023 08:26, Kenny McCormack wrote:
> In article <tth6o8$2o3p$1...@nnrp.usenet.blueworldhosting.com>,
> jeorge <som...@invalid.invalid> wrote:
>

>> [...]

>
> Given what I think I understand about your task(s), it'd probably be better
> to just write it as a (bash) shell script.

Yeah, that's what I also thought when I upthread asked for a rationale
of using awk as technical frame for a shell task. - I read the OP's
statements as if he's just experimenting with awk.

> Modern bash has most of what
> you need to do real scripting. About the only thing missing is decimal
> (aka, floating point) arithmetic, and this can usually be easily done via a
> "bc" co-process (or, in a pinch, a call to awk).

Note that the shell features are minimalistic here, so any standard
POSIX shell will do, and then you can use ksh (instead of bash) to
also do the FP arithmetics in shell and avoid clumsy and inefficient
workaround with external processes.

Janis

>
> [...]

Kpop 2GM

no leída,

8 mar 2023, 2:43:42 p.m.8/3/23

para

> > […]

if you only care for unix epochs numerically, and don't mind constantly resetting your rand() seed, then here's one way to extract it within just about any awk, even those without systime() :

for _____ in 1; do (____='($!NF = sprintf("%.0s%.*f", srand(), ___ = ( (__=srand()) ~ "#" ) * 6, substr(__, ++___)))^_'; for ___ in 'gawk -P' 'gawk -c' 'gawk -M' 'gawk -l time' 'gawk -be' nawk mawk1 mawk2 ; do echo " $___ ::\n\n$( (time ( jot 1000 | $(printf '%s' "$___" ) "$____" ) | gcat -b ) | gtail -n 3 )\n"; done; gawk -p- "$____" <<<'' ) done | gsed -zE 's/\t/ /g; s/ /. /g'

( jot 1000 | $(printf '%s' "$___" ) "$____"; ) 0.01s user 0.00s system 88% cpu 0.018 total
gcat -b 0.00s user 0.00s system 7% cpu 0.018 total
( jot 1000 | $(printf '%s' "$___" ) "$____"; ) 0.01s user 0.00s system 90% cpu 0.017 total
gcat -b 0.00s user 0.00s system 9% cpu 0.016 total
( jot 1000 | $(printf '%s' "$___" ) "$____"; ) 0.48s user 0.01s system 99% cpu 0.484 total
gcat -b 0.00s user 0.00s system 0% cpu 0.484 total
gawk: warning: The time extension is obsolete. Use the timex extension from gawkextlib instead.
( jot 1000 | $(printf '%s' "$___" ) "$____"; ) 0.01s user 0.00s system 89% cpu 0.017 total
gcat -b 0.00s user 0.00s system 8% cpu 0.017 total
( jot 1000 | $(printf '%s' "$___" ) "$____"; ) 0.01s user 0.00s system 91% cpu 0.016 total
gcat -b 0.00s user 0.00s system 9% cpu 0.016 total
( jot 1000 | $(printf '%s' "$___" ) "$____"; ) 0.00s user 0.00s system 102% cpu 0.007 total
gcat -b 0.00s user 0.00s system 20% cpu 0.007 total
( jot 1000 | $(printf '%s' "$___" ) "$____"; ) 0.00s user 0.00s system 106% cpu 0.004 total
gcat -b 0.00s user 0.00s system 41% cpu 0.004 total
( jot 1000 | $(printf '%s' "$___" ) "$____"; ) 0.00s user 0.00s system 103% cpu 0.006 total
gcat -b 0.00s user 0.00s system 22% cpu 0.006 total
gawk -P ::

. 998. . 1678303956
. 999. . 1678303956
. 1000. . 1678303956

gawk -c ::

. 998. . 1678303956
. 999. . 1678303956
. 1000. . 1678303956

gawk -M ::

. 998. . 1678303956
. 999. . 1678303956
. 1000. . 1678303956

gawk -l time ::

. 998. . 1678303956
. 999. . 1678303956
. 1000. . 1678303956

gawk -be ::

. 998. . 1678303956
. 999. . 1678303956
. 1000. . 1678303956

nawk ::

. 998. . 1678303956
. 999. . 1678303956
. 1000. . 1678303956

mawk1 ::

. 998. . 1678303956
. 999. . 1678303956
. 1000. . 1678303956

mawk2 ::

. 998. . 1678303956.854256
. 999. . 1678303956.854260
. 1000. . 1678303956.854263

1678303956
. . # gawk profile, created Wed Mar. 8 14:32:36 2023

. . # Rule(s)

. . 1. ($! NF = sprintf("%.0s%.*f", srand(), ___ = ((__ = srand()) ~ "#") * 6, substr(__, ++___))) ^ _ { # 1
. . 1. . . print
. . }

** srand() needs to be called twice since it only returns the previous seed
** mawk2 uniquely provides micro-second precision for platforms that support it. Floor it or round it to align with the others.