Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Numeric array index sort for gawk

890 views
Skip to first unread message

jh

unread,
Sep 2, 2008, 10:31:58 PM9/2/08
to
I need to sort an array index in numeric sequence. However, since all
array indices are strings and "... integer values are always converted
to strings as integers, no matter what
the value of CONVFMT may happen to be...", asorti() sorts integer
indices in alphabetic order not numeric order. In other words, this program:


BEGIN{
pos[2] = "a"
pos[10] = "a"
pos[30] = "a"
pos[110] = "a"
n = asorti(pos,sorted)
for(i=1;i<=n;i++) print sorted[i]
}

prints:

10
110
2
30

I want:

2
10
30
110

It can probably be hacked by making the integers into floats and using
the right CONVFMT, but it would look messy.

Instead, here are 3 small functions that provide ascending and
descending numeric array index sorts using the external "sort" command
and gawk's co-process operator, so there are no temporary files or
extra arrays.


function asortina(arr1,arr2, cmd){
cmd = "sort +0n"
return __asort(arr1,arr2,cmd)
}

function asortind(arr1,arr2, cmd){
cmd = "sort +0nr"
return __asort(arr1,arr2,cmd)
}

function __asort(arr1,arr2,cmd, i,n,m){
for(i in arr1) { print i |& cmd }
close(cmd,"to")
while((cmd |& getline m) > 0)
arr2[++n] = m
close(cmd,"from")
return n
}

Jim Hart

pk

unread,
Sep 3, 2008, 6:47:27 AM9/3/08
to

Or just by doing

BEGIN{
pos[2] = "a"
pos[10] = "a"
pos[30] = "a"
pos[110] = "a"

for (i in pos) sorted[j++]=i+0
n = asort(sorted)

jh

unread,
Sep 3, 2008, 7:22:30 AM9/3/08
to
That's much better. Thank you!

Kenny McCormack

unread,
Sep 3, 2008, 8:11:57 AM9/3/08
to
In article <faCdnaKPtpfr7iPV...@neonova.net>,

The man page does not make it that clear (yes, I know it can be read in,
but it is not stated explicitly) that different rules apply for the
sorting algorithms of asort() and asorti(). But it does seem to be true.

Manuel Collado

unread,
Sep 3, 2008, 2:34:13 PM9/3/08
to
Kenny McCormack escribió:

I assume the sorting algorithm is the same, but asort sorts values of
mixed types (numbers and/or strings) while asorti sorts index values,
that are always strings. The latter is documented in section "7.7 Using
Numbers to Subscript Arrays" of "GAWK: Effective AWK Programming". But
you probably already know that.

--
Manuel Collado - http://lml.ls.fi.upm.es/~mcollado

Kenny McCormack

unread,
Sep 3, 2008, 3:29:04 PM9/3/08
to
In article <g9mlg1$6qd$1...@heraldo.rediris.es>,
Manuel Collado <m.co...@lml.ls.fi.upm.es> wrote:
...

>I assume the sorting algorithm is the same, but asort sorts values of
>mixed types (numbers and/or strings) while asorti sorts index values,
>that are always strings. The latter is documented in section "7.7 Using
>Numbers to Subscript Arrays" of "GAWK: Effective AWK Programming". But
>you probably already know that.

Yes. I was just making the (small) point that it could be made more
explicit.

Note, BTW, and apropos of nothing, that TAWK sorts numeric indices as
numbers. I.e., if you do "for i in A", it looks at the indices, and, if
they are all numbers, sorts accordingly. Just another fine TAWK feature...

GAWK *could* (but doesn't) do the same thing.

Anton Treuenfels

unread,
Sep 3, 2008, 11:10:26 PM9/3/08
to

"Kenny McCormack" <gaz...@shell.xmission.com> wrote in message
news:g9moi0$dja$1...@news.xmission.com...

> Note, BTW, and apropos of nothing, that TAWK sorts numeric indices as
> numbers. I.e., if you do "for i in A", it looks at the indices, and, if
> they are all numbers, sorts accordingly. Just another fine TAWK
feature...

Yes. Beyond that, if you happen to know that the indices are consecutive
integers, you can skip invoking the sort by using "for ( i = min; i <max;
i++ )" instead of "for i in A". That "in" keyword invokes the sort, which
loses you time if you don't need that done.

You can gain even if numeric indices are not consecutive, just monotonically
increasing (or decreasing). Just skip the indices that aren't there: "if !(i
in A) continue".

You can even be really really tricky and use the presence or absence of a
particular numeric indice to tell you meta information about the data that
is/is not at that indice, but that's really straying far off the point.

- Anton Treuenfels


0 new messages