Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Sorting five columns data file in ascending order at first for column 3 and then column 4

241 views
Skip to first unread message

mohsen...@gmail.com

unread,
Feb 3, 2013, 5:08:47 PM2/3/13
to
Hi all,

I have an array with five columns which has about 304 lines.
Each line gives the vertex coordinates of a polygon.
The first 12 lines as an example are as below:
================================
Nr Line X Y Z
================================
1 1 1.8086 1.2637 0.0000
2 2 2.2161 1.3098 0.0000
3 3 2.2308 1.3023 0.0000
4 4 2.2507 1.1640 0.0000
5 5 2.2507 1.0069 0.0000
6 6 2.2407 0.9966 0.0000
7 7 1.6978 0.9819 0.0000
8 8 1.6492 0.9913 0.0000
9 9 1.4365 0.9858 0.0000
10 10 1.4265 0.9756 0.0000
11 11 1.4265 0.9324 0.0000
12 12 1.4365 0.9225 0.0000
================================
Nr and Line are not always equal.
Somewhen they get different values.
================================

I have to sort these vertices in a ascending order
at first for the X column and then for Y column.

The output should be something like below:
======================
Nr X Y
======================
11 1.4265 0.9324
10 1.4265 0.9756
12 1.4365 0.9225
9 1.4365 0.9858
146 1.4701 1.2364
145 1.4701 1.2603
147 1.4800 1.2275
144 1.4800 1.2716
13 1.6463 0.9248
8 1.6492 0.9913
148 1.6723 1.2487
149 1.6723 1.2487
======================
In the main part of my AWK program I use

X[$1] = $3
Y[$1,$3] = $4

In the END part to get the values for X I have the following lines:

NX = asort(X, X_Data)
for (i = 1; i <= NX; i++) {
printf "%-4s%8.4f\n", i, X_Data[i] > OUTPUT2
}

It works properly and it sorts the values of X.
But how can I sort also with the values for Y (column 4)?
How can I sort the two dimentional array Y[$1,$3]

Best regards
Mohsen

Hermann Peifer

unread,
Feb 4, 2013, 10:31:59 AM2/4/13
to
With Gawk >= 4, you could do something like this:

gawk 'BEGIN {PROCINFO["sorted_in"]="@ind_num_asc"}
{a[$3][$4]=$1 FS $3 FS $4}
END{for (i in a) for (j in a[i]) print a[i][j]}'

However, you would loose records, in case of identical X- and Y-values.
Perhaps a bit OT: Have you considered using the sort utility?

Hermann

mohsen...@gmail.com

unread,
Feb 4, 2013, 5:28:22 PM2/4/13
to
Hello Hermann

Thanks alot for your quick answer.
You have written a two dimenmtional array using double braket pairs "a[i][j]".
I thought that in AWK it is not possible to write this syntax.
Until now I have used only one braket pair with a separation character "a[i,j]".

> With Gawk >= 4, you could do something like this:
>
> gawk 'BEGIN {PROCINFO["sorted_in"]="@ind_num_asc"}
>
> {a[$3][$4]=$1 FS $3 FS $4}
>
> END{for (i in a) for (j in a[i]) print a[i][j]}'

My question is here in the END part:
The first for loop goes through both dimensions and the second one only through the first dimensions?

> However, you would loose records, in case of identical X- and Y-values.

I have noticed that in my case the output file has one line less than the input file because of equality of x and y on two lines.

>
> Perhaps a bit OT: Have you considered using the sort utility?
>

What do you mean? I use the asort utility or
should I write the sort program from the scratch instead of using a buit-in function?

Best regards
Mohsen

Janis Papanagnou

unread,
Feb 4, 2013, 5:54:41 PM2/4/13
to
I suppose herman means that if you're on a Unix system (also available on
WinDOS, e.g. through cygwin) there's the powerful sort command that makes
it possible to sort on columns with preferences defined per column.

With awk; have you considered to not sort on X and Y individually, but
concatenate the values, $2 $3; this is possible in your case because you
seem to have a regular fixed width data in those two numeric columns.

Janis

>
> Best regards
> Mohsen
>

Janis Papanagnou

unread,
Feb 4, 2013, 6:00:44 PM2/4/13
to
On 04.02.2013 23:54, Janis Papanagnou wrote:
>
> I suppose herman means [...]

Ouch. - Sorry Hermann!

Janis

j....@mchsi.com

unread,
Feb 4, 2013, 6:56:54 PM2/4/13
to
gawk 4.x only:

function sort_func(i1, v1, i2, v2)
{
# sort order -- column 3, 4 ascending numeric
if (v1[3] == v2[3])
return v1[4] < v2[4] ? -1 : v1[4] > v2[4]
return v1[3] < v2[3] ? -1 : 1
}

{
for (i = 1; i<=NF; i++)
a[NR][i] = $i
a[NR][3] += 0 # want numeric context
a[NR][4] += 0 # ditto
}

END {
PROCINFO["sorted_in"] = "sort_func"
for (k in a)
print a[k][1], a[k][3], a[k][4]
}


John

Hermann Peifer

unread,
Feb 4, 2013, 8:07:57 PM2/4/13
to
On 2013-02-04 20:28, mohsen...@gmail.com wrote:
>
> You have written a two dimenmtional array using double braket pairs "a[i][j]".
> I thought that in AWK it is not possible to write this syntax.

Obviously, it is possible with Gawk 4.x, i.e. since June 2011. The NEWS
files states: "21. Arrays of arrays added.",
http://git.savannah.gnu.org/cgit/gawk.git/tree/NEWS

>
> I have noticed that in my case the output file has one line less than the input file because of equality of x and y on two lines.
>

You mentioned that your records represent vertices of polygons. In case
of properly closed polygons, the first and last vertex is identical. So
I was expecting that you will loose lines.

A simpler way of getting to the indicated sorted output could be:

sort -n -k3 -n -k4 yourfile | awk '{print $1,$3,$4}'

Hermann

mohsen...@gmail.com

unread,
Feb 5, 2013, 8:47:50 AM2/5/13
to
Hi Hermann
>
> A simpler way of getting to the indicated sorted output could be:
>
> sort -n -k3 -n -k4 yourfile | awk '{print $1,$3,$4}'
>
That's true, this way is much simpler and I don't loose lines.

Mohsen

mohsen...@gmail.com

unread,
Feb 5, 2013, 8:53:15 AM2/5/13
to
> gawk 4.x only:
> function sort_func(i1, v1, i2, v2)
> {
> # sort order -- column 3, 4 ascending numeric
> if (v1[3] == v2[3])
> return v1[4] < v2[4] ? -1 : v1[4] > v2[4]
> return v1[3] < v2[3] ? -1 : 1
> }
> {
> for (i = 1; i<=NF; i++)
> a[NR][i] = $i
> a[NR][3] += 0 # want numeric context
> a[NR][4] += 0 # ditto
> }
> END {
> PROCINFO["sorted_in"] = "sort_func"
> for (k in a)
> print a[k][1], a[k][3], a[k][4]
> }
>
> John

Thanks a lot for your sort function. It is also very interesting:-)

Mohsen

Hermann Peifer

unread,
Feb 5, 2013, 1:15:52 PM2/5/13
to
Just to add another Gawk solution, which keeps all lines:

awk '
{a[$3][$4][NR]=$1 FS $3 FS $4}
END{
PROCINFO["sorted_in"]="@ind_num_asc"
for (i in a)
for (j in a[i])
for (k in a[i][j])
print a[i][j][k]
}'

But the bottom line remains the same: if all you are after is sorting by
columns, you might want to stay with plain sort.

Hermann

mohsen...@gmail.com

unread,
Feb 6, 2013, 5:03:34 AM2/6/13
to
Hi Hermann
>
> But the bottom line remains the same: if all you are after is sorting by
> columns, you might want to stay with plain sort.

That's true. I want only to sort by columns and therfore I will stay with your plain sort suggestion.

Best regards
Mohsen

Kenny McCormack

unread,
Feb 6, 2013, 5:20:58 AM2/6/13
to
In article <84690b0e-b971-4e44...@googlegroups.com>,
Just to be contrarian, I would argue to the contrary - for the following
reasons:

1) Unix sort's syntax is bizarre and weird. I've never been able to
figure it out (without poring over the man page every time I have to
use it).
2) Your project will probably eventually expand to the point where you
need to do the sorting "internall" - i.e., in your main (AWK) script -
anyway, so you might as well do it now.
3) Now that (g)AWK does support sorting internally (without the some
would say ugly "WHINY_USERS" feature), you might as well learn it
and learn to love it.

--
They say compassion is a virtue, but I don't have the time!

- David Byrne -

mohsen...@gmail.com

unread,
Feb 6, 2013, 5:11:49 PM2/6/13
to
> Just to be contrarian, I would argue to the contrary - for the following
> reasons:
> 1) Unix sort's syntax is bizarre and weird. I've never been able to
> figure it out (without poring over the man page every time I have to
> use it).
> 2) Your project will probably eventually expand to the point where you
> need to do the sorting "internall" - i.e., in your main (AWK) script -
> anyway, so you might as well do it now.
> 3) Now that (g)AWK does support sorting internally (without the some
> would say ugly "WHINY_USERS" feature), you might as well learn it
> and learn to love it.
> --
> They say compassion is a virtue, but I don't have the time!
>
> - David Byrne -

Hi David

Perhaps you are right.
If I try to learn it now, it could be that I will love it.

I promise to do my sorting only with (g)AWK as the received suggestions instead of using weird shell command.

Best regards
Mohsen

JD

unread,
May 24, 2013, 7:44:40 AM5/24/13
to
0 new messages