Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

transpose row and column

51 views
Skip to first unread message

T. C.

unread,
Nov 4, 2010, 4:22:41 PM11/4/10
to
Hi,
I have a data file as:

group id price1 price2
-------------------------------------------------------------
a 1 20 10
b 1 21 15
c 1 30 26
d 1 19 12
a 2 0.5 0.6
b 2 0.1 0.9
c 2 0.3 1.1
d 2 0.4 0.8
a 3 : :
b 3 : :
c 3 : :
d 3 : :

I would like to have output as:

id a b c d
1 20 10 21 15 30 26 19 12
2 0.5 0.6 0.1 0.9 0.3 1.1 0.4 0.8
3 : : : : : : : :

How could I use awk to do so?

Thanks!!!!

Janis Papanagnou

unread,
Nov 4, 2010, 4:38:17 PM11/4/10
to

That has been asked and answered before; google this group and you'll find
a couple proposals. I recall that I've provided even a brief solution for
handling sparse matrices.

Janis

Janis Papanagnou

unread,
Nov 4, 2010, 4:47:07 PM11/4/10
to

Sorry, I was wrong assuming and implying it was here; actually it was posted
in comp.unix.shell...


awk ' BEGIN { FS = "\t" }

{ for (col=1; col<=NF; col++) if ($col) matrix[NR,col] = $col }

END { for (col=1; col<=NF; col++)
for (row=1; row<=NR; row++)
printf("%s%s",
((row,col) in matrix) ? matrix[row,col] : 0,
(row<NR) ? FS : RS)
}
'

For your demand you'd have to restrict the collection of data to NR>2 as in
(untested)...

NR>2 { for (col=1; col<=NF; col++) if ($col) matrix[NR,col] = $col }

to skip the headers.

>
> Janis

Janis Papanagnou

unread,
Nov 4, 2010, 4:50:06 PM11/4/10
to
On 04/11/10 21:47, Janis Papanagnou wrote:
> On 04/11/10 21:38, Janis Papanagnou wrote:
>> On 04/11/10 21:22, T. C. wrote:
>>> Hi,
>>> I have a data file as:
>>>
>>> [snip]

Forget what I wrote. I answered your Subject: without looking at your data.
I see that you don't want a transposed matrix, as the subject suggests.

Sorry for the noise.

Loki Harfagr

unread,
Nov 5, 2010, 7:38:25 AM11/5/10
to
Thu, 04 Nov 2010 13:22:41 -0700, T. C. did cat :

as you didn't really specify what to do with namespace or
ordering and you didn't post a sample of what you already tried
maybe this would be either a convenient solution or a starter
or at least an idea about what to do not ?-)
--------
awk '
BEGIN{print "id","a","b","c","d"}
NR<3{next}
{
n=$2
$1=$2=""
w[n]=w[n] OFS $0
}
END{ for(n in w) print n,w[n] }
' yerfile
--------

Ed Morton

unread,
Nov 5, 2010, 8:41:54 AM11/5/10
to

Something like this, untested:

awk '
NR<3 { next }
{ id=$2; grp=$1
ids[id]; grps[grp]
for (p=3; p<=NF; p++) {
prices[p]
price[id,grp,p] = $p
}
}
END {
printf "id"
for (grp in grps) {
printf "\t%s",grp
}
print ""
for (id in ids) {
printf "%s",id
for (grp in grps) {
for (p in prices) {
printf "\t%s",price[id,grp,p]
}
}
print ""
}
}' file

Figuring out the output formatting left as an exercise....

Ed.

Dominic Fandrey

unread,
Nov 9, 2010, 10:03:10 AM11/9/10
to
On 04/11/2010 21:22, T. C. wrote:
> Hi,
> I have a data file as:
>
> group id price1 price2
> -------------------------------------------------------------
> a 1 20 10
> b 1 21 15
> c 1 30 26
> d 1 19 12
> ...

>
> I would like to have output as:
>
> id a b c d
> 1 20 10 21 15 30 26 19 12
> 2 0.5 0.6 0.1 0.9 0.3 1.1 0.4 0.8
> 3 : : : : : : : :

This one produces the following output:
group a b c d a b c d a b c d
id 1 1 1 1 2 2 2 2 3 3 3 3
price1 20 21 30 19 0.5 0.1 0.3 0.4 : : : :
price2 10 15 26 12 0.6 0.9 1.1 0.8 : : : :

Here it is:
NR == 1 {
rows = NF
}
NF == rows {
width = 0
for (i = 1; i <= rows; i++)
width = ((width >= length($i)) * width + (width < length($i)) * length($i));
for (i = 1; i <= rows; i++)
row[i] = sprintf("%s%" ((NR > 1) * 2) "s%" width "s", row[i], "", $i);
}
END {
for (i = 1; i <= rows; i++)
print(row[i]);
}


All lines not having the same format as the first line are ignored.
Have fun deciphering it.

T. C.

unread,
Nov 9, 2010, 9:09:41 PM11/9/10
to
Hi,

Thank you all for the information.
I have try the other file as:

AB01 1 1 2
AB05 1 3 4
AB10 1 5 6
AB15 1 7 8
AB01 2 9 10
AB05 2 11 12
AB10 2 13 14
AB15 2 15 16
AB01 3 17 18
AB05 3 19 20
AB10 3 21 22
AB15 3 23 24

I try to use the following:
NR<1 { next }
{
id=$2; m=$1
$1=$2=""
ms[m]=m
w[id]=w[id] OFS $0
}

END{
printf "id "
for (m in ms) printf "%s ",m
print ""
for(id in w) print id,w[id]
}

The output is like:
id AB10 AB01 AB05 AB15
1 1 2 3 4 5 6 7 8
2 9 10 11 12 13 14 15 16
3 17 18 19 20 21 22 23 24

Somehow the order of m changed. How could I have the order like "AB01
AB05 AB10 AB15"?
Thanks!!!

Hermann Peifer

unread,
Nov 10, 2010, 3:26:08 AM11/10/10
to

You could make use of gawk's WHINY_USERS feature:

$ WHINY_USERS= gawk -f yourscript.awk yourdata
id AB01 AB05 AB10 AB15


1 1 2 3 4 5 6 7 8
2 9 10 11 12 13 14 15 16
3 17 18 19 20 21 22 23 24

$

Hermann

PS
I am not quite sure when you expect this rule to be executed:
NR<1 { next }

T. C.

unread,
Nov 10, 2010, 9:58:01 AM11/10/10
to

Thanks!!!
I remove the "NR<1 { next }" from the script.

One more question. Could I have the output for id and w[id] with space
delimiter?
How could I modify this part "for(id in w) print id,w[id]"?

Thanks!!

Hermann Peifer

unread,
Nov 11, 2010, 2:35:26 AM11/11/10
to

I'm not sure if I understand your question. Can you post expected output?

BTW, this is actually not needed: m=$1; ms[m]=m
You could simplify it to: ms[$1]

Hermann

T. C.

unread,
Nov 11, 2010, 11:50:16 AM11/11/10
to

Thanks!!
I would expect the output as:


id AB01 AB05 AB10 AB15
1 1 2 3 4 5 6 7 8
2 9 10 11 12 13 14 15 16
3 17 18 19 20 21 22 23 24

It is the same as before but with space delimiter.

Hermann Peifer

unread,
Nov 12, 2010, 2:42:58 AM11/12/10
to
On 11/11/2010 17:50, T. C. wrote:
> I would expect the output as:
> id AB01 AB05 AB10 AB15
> 1 1 2 3 4 5 6 7 8
> 2 9 10 11 12 13 14 15 16
> 3 17 18 19 20 21 22 23 24
>

$ cat yourscript.awk
{
ms[$1]
id=$2
w[id]=w[id] sep[id] $3 OFS $4
sep[id]=OFS
}

END{
printf "id"
for (m in ms) printf " %s",m
print ""
for(id in w) print id,w[id]
}

$ WHINY_USERS= awk -f yourscript.awk yourdata


id AB01 AB05 AB10 AB15
1 1 2 3 4 5 6 7 8
2 9 10 11 12 13 14 15 16
3 17 18 19 20 21 22 23 24

Hermann

0 new messages