Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Implementing WHINY_USERS with asorti()

15 views
Skip to first unread message

Kenny McCormack

unread,
Jul 1, 2010, 5:04:42 PM7/1/10
to
As many of you know, GAWK does not do array sorting (N.B. In the way
that I define that term; you may well have your own definitions - you
are welcome, of course, to said own definitions) by default, but it
can be enabled by setting the WHINY_USERS variable. TAWK, of course,
does array sorting by default (and this can be turned off by setting
some other variable, if you so desire).

Anyway, normally, I set WHINY_USERS and everything's fine. However, I
found myself in a situation today where I could not do this, so I needed
to implement that functionality in terms of other GAWK builtins. This
seems to be the ticket:

{
# Build up the array in the main loop
A[whatever] = something
}
END {
n = asorti(A,A1)
for (i = 1; i<=n; i++)
print A1[i],A[A1[i]]
}

Note that this (as far as I can tell) gets you the same functionality as
WHINY_USERS, but not quite the same (not quite as good) as TAWK, since
TAWK automatically sorts numbers numerically. To get this working in
GAWK, you have to zero-fill your indexes.

--
(This discussion group is about C, ...)

Wrong. It is only OCCASIONALLY a discussion group
about C; mostly, like most "discussion" groups, it is
off-topic Rorsharch [sic] revelations of the childhood
traumas of the participants...

Hermann Peifer

unread,
Jul 2, 2010, 4:18:32 AM7/2/10
to
On 01/07/2010 23:04, Kenny McCormack wrote:
> As many of you know, GAWK does not do array sorting (N.B. In the way
> that I define that term; you may well have your own definitions - you
> are welcome, of course, to said own definitions) by default, but it
> can be enabled by setting the WHINY_USERS variable. TAWK, of course,
> does array sorting by default (and this can be turned off by setting
> some other variable, if you so desire).
>
> Anyway, normally, I set WHINY_USERS and everything's fine. However, I
> found myself in a situation today where I could not do this, so I needed
> to implement that functionality in terms of other GAWK builtins. This
> seems to be the ticket:
>
> {
> # Build up the array in the main loop
> A[whatever] = something
> }
> END {
> n = asorti(A,A1)
> for (i = 1; i<=n; i++)
> print A1[i],A[A1[i]]
> }
>
> Note that this (as far as I can tell) gets you the same functionality as
> WHINY_USERS, but not quite the same (not quite as good) as TAWK, since
> TAWK automatically sorts numbers numerically. To get this working in
> GAWK, you have to zero-fill your indexes.
>

Kenny,

I am not quite sure what the question is, but the above solution is
already given in the GAWK manual:

--- snip ---

END {
n = asorti(source, dest)
for (i = 1; i <= n; i++) {
do something with dest[i] # Work with sorted indices directly
...
do something with source[dest[i]] # Access original array via
sorted indices
}
}

--- snip ---

http://www.gnu.org/manual/gawk/html_node/Array-Sorting.html#Array-Sorting

What could be mentioned is that both WHINY_USERS and asorti() sorting is
not locale-aware.

Hermann

Aharon Robbins

unread,
Jul 2, 2010, 5:33:49 AM7/2/10
to
In article <i0k7ck$1qm$1...@news.albasani.net>,

Hermann Peifer <pei...@gmx.eu> wrote:
>http://www.gnu.org/manual/gawk/html_node/Array-Sorting.html#Array-Sorting
>
>What could be mentioned is that both WHINY_USERS and asorti() sorting is
>not locale-aware.
>
>Hermann

Good point. I'll try to remember to mention that.

FWIW, I think this is as it should be, since it guarantees consistency;
if someone really needs locale based sorting, they should just give
up and call the external sort utility.
--
Aharon (Arnold) Robbins arnold AT skeeve DOT com
P.O. Box 354 Home Phone: +972 8 979-0381
Nof Ayalon Cell Phone: +972 50 729-7545
D.N. Shimshon 99785 ISRAEL

pk

unread,
Jul 2, 2010, 5:57:09 AM7/2/10
to
Aharon Robbins wrote:

> In article <i0k7ck$1qm$1...@news.albasani.net>,
> Hermann Peifer <pei...@gmx.eu> wrote:
>>http://www.gnu.org/manual/gawk/html_node/Array-Sorting.html#Array-Sorting
>>
>>What could be mentioned is that both WHINY_USERS and asorti() sorting is
>>not locale-aware.
>>
>>Hermann
>
> Good point. I'll try to remember to mention that.
>
> FWIW, I think this is as it should be, since it guarantees consistency;
> if someone really needs locale based sorting, they should just give
> up and call the external sort utility.

Somewhat related, is it expected that asorti() only sorts by string even if
the index was explicitly been given numeric type, eg

$ awk 'BEGIN{a[10+0];a[4+0];n=asorti(a);for(i=1;i<=n;i++)print a[i]}'
10
4

Seems to me this is not consistent with asort().

Hermann Peifer

unread,
Jul 2, 2010, 6:16:30 AM7/2/10
to

"An important aspect about arrays to remember is that array subscripts
are always strings..."

http://www.gnu.org/manual/gawk/html_node/Numeric-Array-Subscripts.html#Numeric-Array-Subscripts

Hermann

Grant

unread,
Jul 2, 2010, 7:27:43 AM7/2/10
to

Also, it's easy enough to force numeric sorting with some formatting
to add leading zeroes ;) They can be removed later, again with easy
formatting.

Grant.

Kenny McCormack

unread,
Jul 2, 2010, 10:59:54 AM7/2/10
to
In article <i0kd5j$86l$1...@speranza.aioe.org>, pk <p...@pk.invalid> wrote:
...

>Somewhat related, is it expected that asorti() only sorts by string even if
>the index was explicitly been given numeric type, eg
>
>$ awk 'BEGIN{a[10+0];a[4+0];n=asorti(a);for(i=1;i<=n;i++)print a[i]}'
>10
>4
>
>Seems to me this is not consistent with asort().

Correct. Others have noted the reasons why this is so, but I think it
should be made clearer in the documentation (and by "documentation", I
mean "man gawk"). It says:

asort(s [, d]) Returns the number of elements in the source
array s. The contents of s are sorted using
gawk's normal rules for comparing values, and
the indices of the sorted values of s are
replaced with sequential integers starting with
1. If the optional destination array d is spec-
ified, then s is first duplicated into d, and
then d is sorted, leaving the indices of the
source array s unchanged.

asorti(s [, d]) Returns the number of elements in the source
array s. The behavior is the same as that of
asort(), except that the array indices are used
for sorting, not the array values. When done,
the array is indexed numerically, and the val-
ues are those of the original indices. The
original values are lost; thus provide a second
array if you wish to preserve the original.

Note the second sentence under "asorti". It should say "The behavior is
the same ... except that ... and the sorting is always done in string
mode". For that matter, the previous paragraph ("asort") should be a
little clearer (and less "Unix man speak") about the fact that asort
*does* do correct numeric sorting.

pk

unread,
Jul 2, 2010, 11:08:18 AM7/2/10
to
Kenny McCormack wrote:

> asorti(s [, d]) Returns the number of elements in the
> source
> array s. The behavior is the same as
> that of asort(), except that the array
> indices are used
> for sorting, not the array values. When
> done,
> the array is indexed numerically, and the
> val-
> ues are those of the original indices.
> The original values are lost; thus provide
> a second array if you wish to preserve the
> original.
>
> Note the second sentence under "asorti". It should say "The behavior is
> the same ... except that ... and the sorting is always done in string
> mode".

Yes agreed, and in case WHINY_USERS gets documented some day, it should be
noted there as well (I think it's the same underlying mechanism).

> For that matter, the previous paragraph ("asort") should be a
> little clearer (and less "Unix man speak") about the fact that asort
> *does* do correct numeric sorting.

It does if awk thinks that the elements are numeric, otherwise it does
string sort (but this is the same dual behavior awk has almost everywhere).

Aharon Robbins

unread,
Jul 4, 2010, 1:51:19 PM7/4/10
to
I thought the text was pretty clear but perhaps I'm biased. A patch
with suggested wording changes will be welcomed.

Arnold

In article <i0kuta$45d$2...@news.xmission.com>,

0 new messages