Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Re: Why the gawk has asort and asorti instead of only supplying the asorti to users?

106 views
Skip to first unread message

Kenny McCormack

unread,
Oct 1, 2016, 12:09:56 PM10/1/16
to
In article <nsoj5l$2ci$1...@dont-email.me>,
Hongyi Zhao <hongy...@gmail.com> wrote:
>Hi all,
>
>From the descriptions of the manual of gawk, it seems all of the
>functions done by asort also can be done by asorti when using the
>optional string how -- ie., valid values for any of the strings valid
>for PROCINFO["sorted_in"].

You've misunderstood.

Basically, the asort/asorti functions are obsolete. Think of them as "Try #1"
at providing the "array sorting" capability. Now that we can do in GAWK
what we could always do in TAWK, there's not really any need to think about
asort/asorti.

To be more explicit about this, the PROCINFO["sorted_in"] method affects
how the "for (i in A) ..." construct works. It has nothing to do with how
asort/asorti work.

--
"You can safely assume that you have created God in your own image when
it turns out that God hates all the same people you do." -- Anne Lamott

Kaz Kylheku

unread,
Oct 1, 2016, 12:34:32 PM10/1/16
to
On 2016-10-01, Kenny McCormack <gaz...@shell.xmission.com> wrote:
> In article <nsoj5l$2ci$1...@dont-email.me>,
> Hongyi Zhao <hongy...@gmail.com> wrote:
>>Hi all,
>>
>>From the descriptions of the manual of gawk, it seems all of the
>>functions done by asort also can be done by asorti when using the
>>optional string how -- ie., valid values for any of the strings valid
>>for PROCINFO["sorted_in"].
>
> You've misunderstood.

So did you. Hongyi only said that the sorting method argument in
asorti takes the same strings as can be stored in PROCINFO["sorted_in"],
not that he is storing anything there, expecting it to affect those
functions.

Thomas 'PointedEars' Lahn

unread,
Oct 1, 2016, 2:42:41 PM10/1/16
to
Kenny McCormack wrote in <news:comp.unix.shell>:

> Hongyi Zhao <hongy...@gmail.com> wrote:
>> From the descriptions of the manual of gawk, it seems all of the
>> functions done by asort also can be done by asorti when using the
>> optional string how -- ie., valid values for any of the strings valid
>> for PROCINFO["sorted_in"].
>
> You've misunderstood.
>
> Basically, the asort/asorti functions are obsolete. Think of them as "Try
> #1" at providing the "array sorting" capability.

How did you get that idea?

> Now that we can do in GAWK what we could always do in TAWK, there's not
> really any need to think about asort/asorti.

What “could [we] always do” in TAWK?

> To be more explicit about this, the PROCINFO["sorted_in"] method affects
> how the "for (i in A) ..." construct works. It has nothing to do with how
> asort/asorti work.

Indeed. And assigning a value to PROCINFO["sorted_in"] affects for-in-
iteration of *all* arrays after that point.

F'up2 <news:comp.lang.awk>

--
PointedEars

Twitter: @PointedEars2
Please do not cc me. / Bitte keine Kopien per E-Mail.

Thomas 'PointedEars' Lahn

unread,
Oct 1, 2016, 10:21:35 PM10/1/16
to
Hongyi Zhao wrote in <news:comp.unix.shell>:

> On Sat, 01 Oct 2016 20:23:08 +0200, Thomas 'PointedEars' Lahn wrote:
>
> Thanks a lot for your help.

You’re welcome.

> Now, another problem confusing me:
>
>> $ echo | gawk '{
>
> Above code can also be done with:
>
> $ gawk 'BEGIN{

Yes, thanks. I was trying to get rid of the pipe with here-doc but to no
avail; so I left the example this way.

> But, why your original method with `echo' also work?

Without arguments, (g)awk is reading from the standard input stream. Not
specifying a pattern for an action is equivalent to execute that action for
every line of input. (Not specifying an action for a pattern is equivalent
to '{ print }'.) See the section “PATTERNS AND ACTIONS” in the gawk man
page.

“echo” without arguments and redirection writes one empty line to the
standard output stream (it writes one newline character), so this code is
executed exactly one time as if the “BEGIN” pattern would have been used.

You should post (g)awk questions in <news:comp.lang.awk> instead. X-Post &
F'up2 set.

Hongyi Zhao

unread,
Oct 2, 2016, 1:30:16 AM10/2/16
to
On Sun, 02 Oct 2016 04:21:33 +0200, Thomas 'PointedEars' Lahn wrote:

> “echo” without arguments and redirection writes one empty line to the
> standard output stream (it writes one newline character), so this code
> is executed exactly one time as if the “BEGIN” pattern would have been
> used.

Though echo will do the job, but if I feed a empty file to gawk for this
test, I will fail:

werner@debian-01:~$ echo > 111
werner@debian-01:~$ gawk '{
a[2] = 4;
a[12] = 19;
asort(a, values);
for (i in values) { print i " => " values[i]; }
}' 111
1 => 4
2 => 19
werner@debian-01:~$ touch 222
werner@debian-01:~$ gawk '{
a[2] = 4;
a[12] = 19;
asort(a, values);
for (i in values) { print i " => " values[i]; }
}' 222
werner@debian-01:~$

Why?

Regards
--
.: Hongyi Zhao [ hongyi.zhao AT gmail.com ] Free as in Freedom :.

Janis Papanagnou

unread,
Oct 2, 2016, 1:42:00 AM10/2/16
to
A {...} clause with an implicit true condition will be executed for every
line of a file. With an echo used as above you create a file with an empty
line, and with touch your create a file without any line; so in the latter
case the clause will not be executed.

I suggest to not use echo if you don't want to process an [empty] line, but
use the BEGIN {...} clause instead for a procedural non-data driven sample
as you have it here.

Janis

>
> Regards
>

Hongyi Zhao

unread,
Oct 2, 2016, 1:56:32 AM10/2/16
to
On Sun, 02 Oct 2016 07:41:59 +0200, Janis Papanagnou wrote:

> A {...} clause with an implicit true condition will be executed for
> every line of a file. With an echo used as above you create a file with
> an empty line, and with touch your create a file without any line; so in
> the latter case the clause will not be executed.
>
> I suggest to not use echo if you don't want to process an [empty] line,
> but use the BEGIN {...} clause instead for a procedural non-data driven
> sample as you have it here.
>
> Janis

Thanks a lot.

Marc de Bourget

unread,
Oct 6, 2016, 4:53:26 PM10/6/16
to
Not exactly the same topic, but I tested a bit GAWK with and without
PROCINFO["sorted_in"] = "@ind_num_asc"

I've noticed that
for (i in ARRAY)
print i
yields the same results (numerical order) with or without PROCINFO
if the array indices (i) are only numbers.
Am I right and what are the exact rules if PROCINFO does not exist?
Is this a new behaviour?

Input file:
1
20
4
11
33
2
5
3
18
9

Output file:
1
2
3
4
5
9
11
18
20
33

Kaz Kylheku

unread,
Oct 6, 2016, 5:06:07 PM10/6/16
to
On 2016-10-06, Marc de Bourget <marcde...@gmail.com> wrote:
> Not exactly the same topic, but I tested a bit GAWK with and without
> PROCINFO["sorted_in"] = "@ind_num_asc"
>
> I've noticed that
> for (i in ARRAY)
> print i
> yields the same results (numerical order) with or without PROCINFO
> if the array indices (i) are only numbers.

Arrays are associative structures in Awk, probably implemented
as hash tables. Its conceivable that consecutive integer objects
hash in a way so they end up ordered in the hash table in some
predictable patterns.

You have not tried a large enough range of numbers to
see the discontinuities in the hashing:

$ awk 'BEGIN { for (i = 0; i < 1000; i++) a[i] = i; for (i in a) print i;
}'
[ ... big snip ... ]
166
125
0 <-- here is zero, finally
167
126
1 <-- 1 is ordered after zero
168
127
2 <---
169
128
3 <---
129
4 }
5 |
6 > <--- consecutive run ..
7 |
8 |
9 }
860
[ ... big snip ... ]
10 <---
795
754
713
480
93
52
11 <---
796
755
714
481
440
94
53
12 <---
797
756
715
482
441
400
95
54
13 <---

Marc de Bourget

unread,
Oct 6, 2016, 5:17:11 PM10/6/16
to
Hi Kaz,

my output is properly ordered from 0 to 999 :-)
GNU Awk 4.1.4, MINGW version from Eli.
https://sourceforge.net/projects/ezwinports/files/
BEGIN {
for (i = 0; i < 1000; i++) a[i] = i; for (i in a) print i >"out.txt"
}

Ed Morton

unread,
Oct 6, 2016, 5:40:14 PM10/6/16
to
On 10/6/2016 4:17 PM, Marc de Bourget wrote:
<snip>
> my output is properly ordered from 0 to 999 :-)
> GNU Awk 4.1.4, MINGW version from Eli.
> https://sourceforge.net/projects/ezwinports/files/
> BEGIN {
> for (i = 0; i < 1000; i++) a[i] = i; for (i in a) print i >"out.txt"
> }
>

It's a fluke, don't rely on it. See
https://www.gnu.org/software/gawk/manual/gawk.html#Scanning-an-Array.

Ed.

Andrew Schorr

unread,
Oct 6, 2016, 10:56:20 PM10/6/16
to
Try starting at -1 and see what happens. I think the behavior you're saying is an artifact of the data structure used for arrays where the subscripts are non-negative integers. It's a side-effect of the optimized implementation.

bash-4.3$ ./gawk 'BEGIN { for (i = 0; i < 10; i++) a[i]; for (i in a) print i} '
0
1
2
3
4
5
6
7
8
9

bash-4.3$ ./gawk 'BEGIN { for (i = -1; i < 10; i++) a[i]; for (i in a) print i} '
0
7
4
5
-1
8
3
6
2
1
9

As you can see, the integer array implementation (which also supports negative integers) does not have this behavior.

Regards,
Andy

Marc de Bourget

unread,
Oct 20, 2016, 12:06:09 PM10/20/16
to
Hi Andy,

thank you and sorry for the late reply, I was on holiday.
Yes, the behaviour is as described. For some reason positive integers
are ordered properly even without the help of PROCINFO["sorted_in"].
For ordering negative Integers correctly,
PROCINFO["sorted_in"] = "@ind_num_asc" has to be used.

Another hint. I have tried this:
BEGIN {
PROCINFO["sorted_in"] = "@ind_num_asc"

b["8"] = "1"
b["2"] = "1"
b["m"] = "1"
b["-1"] = "1"
b["11"] = "1"
b["1"] = "1"
b["z"] = "1"
b["a"] = "1"
b["9"] = "1"
for (i in b)
print i
}

The result is:
-1
a
m
z
1
2
8
9
11

Wouldn't it be nicer if the result was:
-1
1
2
8
9
11
a
m
z

Kenny McCormack

unread,
Oct 20, 2016, 12:21:44 PM10/20/16
to
In article <621bb39a-ef56-473e...@googlegroups.com>,
Marc de Bourget <marcde...@gmail.com> wrote:
...
>The result is:
>-1
>a
>m
>z
>1
>2
>8
>9
>11

I assume (without testing) that when you tell it to sort numerically, that
means that each thing which is to be sorted must be converted to a number
first. In which case, all of you alphabetics get converted to 0 (which is
between -1 and 1, as shown above).

>Wouldn't it be nicer if the result was:
>-1
>1
>2
>8
>9
>11
>a
>m
>z

I suppose, but you'll have to go back to alphabetical sorting, if you want
alphabetics to sort predictably.

--
Christianity is not a religion.

- Rick C Hodgin -

Kaz Kylheku

unread,
Oct 20, 2016, 12:35:18 PM10/20/16
to
On 2016-10-20, Marc de Bourget <marcde...@gmail.com> wrote:
> Wouldn't it be nicer if the result was:
> -1
> 1
> 2
> 8
> 9
> 11
> a
> m
> z

I got your back.

$ txr
This is the TXR Lisp interactive listener of TXR 154.
Use the :quit command or type Ctrl-D on empty line to exit.
1> (sort '("a" 2 8 9 "z" -1 "m" 11 1))
(-1 1 2 8 9 11 "a" "m" "z")

Ed Morton

unread,
Oct 20, 2016, 5:38:19 PM10/20/16
to
No the current ordering makes perfect sense and is consistent with other tools:

$ cat file
8
2
m
-1
11
1
z
a
9

$ sort -n file
-1
a
m
z
1
2
8
9
11

Regards,

Ed.

Marc de Bourget

unread,
Oct 25, 2016, 11:13:51 AM10/25/16
to
Not sure if it is perfect, but it is generally really good.
In any case it is better than the default PROCINFO["sorted_in"]="@unsorted"

So wouldn't it be nice to set PROCINFO["sorted_in"] = "@ind_num_asc"
as the default for GAWK, would it?

Can we

Ed Morton

unread,
Oct 25, 2016, 7:30:06 PM10/25/16
to
No, that's arguably less likely to be what's desired than, say, "@ind_str_asc"
since array indices are strings and that's the default for "sort" but by far the
MOST important thing for a loop on the indices is to by default execute as fast
as possible which I assume is what the current hash order provides and if you
HAVE to impose a different order on it then you are taking a performance hit and
need to write SOMETHING to enable that functionality so it may as well be a
statement of the sort order with no default.

Ed.

>
> Can we
>

0 new messages