Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

coverting list to table

1 view
Skip to first unread message

jay

unread,
Nov 5, 2009, 9:34:50 AM11/5/09
to
Hi All,

Am new to using newgroup for help, but have exhausted my searching
online for a solution

I have a long list of data of associations between values with a value
to that association as follows..

(var) to (var) = (var) hits
A B 1
B A 1
A B 3
B A 3
A C 7
C A 2

And need to build a table as follows that accumulates the above:

row is (from)
column is (to)

A B C
A 0 4 7
B 4 0 0
C 2 0 0

Just can't seam to figure out how to manage this programatically.

Any help or guidance much appreciated !!!


Cheers,


Jay

Ed Morton

unread,
Nov 5, 2009, 9:51:29 AM11/5/09
to
jay wrote:
> Hi All,
>
> Am new to using newgroup for help, but have exhausted my searching
> online for a solution
>
> I have a long list of data of associations between values with a value
> to that association as follows..
>
> (var) to (var) = (var) hits
> A B 1
> B A 1
> A B 3
> B A 3
> A C 7
> C A 2
>
> And need to build a table as follows that accumulates the above:
>
> row is (from)
> column is (to)
>
> A B C
> A 0 4 7
> B 4 0 0
> C 2 0 0
>

$ cat file


A B 1
B A 1
A B 3
B A 3
A C 7
C A 2

$ cat tst.awk
{ hits[$1,$2]+=$3; keys[$1]; keys[$2] }
END{
printf "%1s",""
for (col in keys) {
printf "%2s",col
}
print ""
for (row in keys) {
printf "%1s",row
for (col in keys) {
printf "%2d",hits[row,col]
}
print ""
}
}

$ awk -f tst.awk file


A B C
A 0 4 7
B 4 0 0
C 2 0 0

Note that getting the output alphabetically sorted isn't guaranteed. See
the "in" operator in the awk man pages.

Ed.

pk

unread,
Nov 5, 2009, 10:19:03 AM11/5/09
to
Ed Morton wrote:

> $ cat tst.awk
> { hits[$1,$2]+=$3; keys[$1]; keys[$2] }
> END{
> printf "%1s",""
> for (col in keys) {
> printf "%2s",col
> }
> print ""
> for (row in keys) {
> printf "%1s",row
> for (col in keys) {
> printf "%2d",hits[row,col]
> }
> print ""
> }
> }

This is something I've always wondered. Of course the order in which items
are returned when doing "for (i in array)" is undefined.
But is it guaranteed to be the same in subsequent runs (if the array isn't
changed, of course)? I don't see any reason why it shouldn't (and probably
all the implementations do), but then there's nothing that requires that
AFAICT.
I suppose relying on that would be relying on the implementation behaving
that way.

(I'm just curious, not argumentative)

pk

unread,
Nov 5, 2009, 10:29:36 AM11/5/09
to
pk wrote:

> Ed Morton wrote:
>
>> $ cat tst.awk
>> { hits[$1,$2]+=$3; keys[$1]; keys[$2] }
>> END{
>> printf "%1s",""
>> for (col in keys) {
>> printf "%2s",col
>> }
>> print ""
>> for (row in keys) {
>> printf "%1s",row
>> for (col in keys) {
>> printf "%2d",hits[row,col]
>> }
>> print ""
>> }
>> }
>
> This is something I've always wondered. Of course the order in which items
> are returned when doing "for (i in array)" is undefined.
> But is it guaranteed to be the same in subsequent runs (if the array isn't
> changed, of course)?

Note: by "subsequent runs" here I meant doing "for (i in array)" again later
in the same program, not subsequent runs of the program.

Ed Morton

unread,
Nov 5, 2009, 11:16:21 AM11/5/09
to
> in the same program, not subsequent runs of the program.- Hide quoted text -
>
> - Show quoted text -

POSIX doesn't specify it, but the GNU awk manual says:

"The order in which elements of the array are accessed by this
statement is determined by the internal arrangement of the array
elements within awk and cannot be controlled or changed."

so unless the array gets re-organised (somehow???)between "for ... in"
loops, the order will be consistent. I think a lot of awk scripts
would break if the order could change.

Ed.

Aharon Robbins

unread,
Nov 5, 2009, 3:00:14 PM11/5/09
to
In article <hcur02$ple$1...@aioe.org>, pk <p...@pk.invalid> wrote:
>> This is something I've always wondered. Of course the order in which items
>> are returned when doing "for (i in array)" is undefined.
>> But is it guaranteed to be the same in subsequent runs (if the array isn't
>> changed, of course)?
>
>Note: by "subsequent runs" here I meant doing "for (i in array)" again later
>in the same program, not subsequent runs of the program.

As Ed said, if nothing changes the array in the meantime, you should get the
same order. It's just a hash table. For gawk, before starting a for loop,
gawk allocates an array of pointers whose size is the number of elements
in the array, loops over the hash chains filling in the array, and then
runs the array body.

Long ago it did things differently, but that implementation led to serious
problems.

Arnold
--
Aharon (Arnold) Robbins arnold AT skeeve DOT com
P.O. Box 354 Home Phone: +972 8 979-0381
Nof Ayalon Cell Phone: +972 50 729-7545
D.N. Shimshon 99785 ISRAEL

Kenny McCormack

unread,
Nov 6, 2009, 11:01:15 AM11/6/09
to
In article <a2d3dfd9-b637-4632...@j4g2000yqe.googlegroups.com>,
Ed Morton <morto...@gmail.com> wrote:
...

>"The order in which elements of the array are accessed by this
>statement is determined by the internal arrangement of the array
>elements within awk and cannot be controlled or changed."

If the GAWK manual says this, it is in error. You and I both know how
to fix this problem, entirely within the confines of GAWK as publicly
released.

>so unless the array gets re-organised (somehow???)between "for ... in"
>loops, the order will be consistent. I think a lot of awk scripts
>would break if the order could change.

I'm a little surprised at this comment. I'm curious which sorts of
scripts you have in mind here. I've written lots of AWK scripts and I
can't think of ever having to iterate over a given array more than once
(without, of course, reloading the array in between).

Ed Morton

unread,
Nov 6, 2009, 11:39:14 AM11/6/09
to
On Nov 6, 10:01 am, gaze...@shell.xmission.com (Kenny McCormack)
wrote:
> In article <a2d3dfd9-b637-4632-aee8-c5dc1cd90...@j4g2000yqe.googlegroups.com>,

> Ed Morton  <mortons...@gmail.com> wrote:
> ...
>
> >"The order in which elements of the array are accessed by this
> >statement is determined by the internal arrangement of the array
> >elements within awk and cannot be controlled or changed."
>
> If the GAWK manual says this, it is in error.  You and I both know how
> to fix this problem, entirely within the confines of GAWK as publicly
> released.

Actually, I don't know if WHINY_USERS controls the order of access of
the "in" operator or the way in which the elements are arranged
internally so the above statement might be correct.

> >so unless the array gets re-organised (somehow???)between "for ... in"
> >loops, the order will be consistent. I think a lot of awk scripts
> >would break if the order could change.
>
> I'm a little surprised at this comment.  I'm curious which sorts of
> scripts you have in mind here.  I've written lots of AWK scripts and I
> can't think of ever having to iterate over a given array more than once
> (without, of course, reloading the array in between).

One example, printing headers then contents:

BEGIN{
arr["foo"]=3
arr["bar"]=7
for (i in arr) {
printf "%5s",i
}
print ""
for (i in arr) {
printf "%5s",arr[i]
}
print ""
}

Ed.

Kenny McCormack

unread,
Nov 8, 2009, 10:29:15 AM11/8/09
to
In article <a0ad7238-635b-4f73...@g31g2000vbr.googlegroups.com>,

Ed Morton <morto...@gmail.com> wrote:
>On Nov 6, 10:01�am, gaze...@shell.xmission.com (Kenny McCormack)
>wrote:
>> In article <a2d3dfd9-b637-4632-aee8-c5dc1cd90...@j4g2000yqe.googlegroups.com>,
>> Ed Morton �<mortons...@gmail.com> wrote:
>> ...
>>
>> >"The order in which elements of the array are accessed by this
>> >statement is determined by the internal arrangement of the array
>> >elements within awk and cannot be controlled or changed."
>>
>> If the GAWK manual says this, it is in error. �You and I both know how
>> to fix this problem, entirely within the confines of GAWK as publicly
>> released.
>
>Actually, I don't know if WHINY_USERS controls the order of access of
>the "in" operator or the way in which the elements are arranged
>internally so the above statement might be correct.

I think we're parsing words here. I think it is pretty clear that
WHINY_USERS does not affect the internal storage order of GAWK arrays.
It only affects the traversal order of the "for ... in" construct.

I realize (I think. That is, I am assuming that I am parsing what you
wrote as you intended) that you are talking at the "standardsese" level,
not at the "actual reality" level, but even so, some source code
comments are in order:

1) I personally added array sorting to mawk (only for personal use;
did not put it into the "official" sources). It was quite
simple once I figured out where to fix it (and that wasn't very
hard either). Basically added about 5 lines of code, which
consisted of a call to "qsort()" on the indices array (in the C
code). I did this because I needed a version of AWK that did
array sorting on Linux (everywhere else that I use AWK, I can
use TAWK)

2) Sometime later (after doing the mawk fix), I learned about GAWK's
WHINY_USERS feature. Checking the sources showed that they had
done pretty much the same thing I had done (with mawk). If you
are interested, search the GAWK sources for "Shazzam!" (or
however it is spelled).

So, it is pretty clear that that's the way to do it - sort the traversal
array (after having assembled it by iterating through the AWK array).

>> >so unless the array gets re-organised (somehow???)between "for ... in"
>> >loops, the order will be consistent. I think a lot of awk scripts
>> >would break if the order could change.
>>
>> I'm a little surprised at this comment. �I'm curious which sorts of
>> scripts you have in mind here. �I've written lots of AWK scripts and I
>> can't think of ever having to iterate over a given array more than once
>> (without, of course, reloading the array in between).
>
>One example, printing headers then contents:
>
>BEGIN{
> arr["foo"]=3
> arr["bar"]=7
> for (i in arr) {
> printf "%5s",i
> }
> print ""
> for (i in arr) {
> printf "%5s",arr[i]
> }
> print ""
>}

Interesting. FWIW, I don't see the point in printing out data elements
for human consumption in an (apparently) random order. I mean, I do see
your point - that if you print out the headers in an (apparently) random
order, then you need to print out the data in the same (apparently) random
order. It is just that in real life, you will probably want explicit
control over the ordering.

0 new messages