creating an array of n-dimensional array references, and sorting by each of the n column

Anand Rao

unread,

Sep 23, 2014, 6:27:01 AM9/23/14

to unix-and-perl-...@googlegroups.com

Hi all,

I have an array of array references.

@array=qw($a, $b, $c, $d .. $z);

Each of the array references is for an array that is 5 elements each.

@{$a} = qw/ $len, $count, $domlen, $seq, $ID/;

I want to sort @array, based on the array elements of its constituent array, one-by-one

i.e sort @array by $len, then by $count, ... finally by $ID

What is the syntax I will be using here?

I tried something on the lines of the suggestion at : http://www.perlmonks.org/?node_id=674374 by jettero,

but get an error message - I suppose because this is trying to sort by entire arrays within the array (which is not what I want)

my @b = sort {

    $a->[0] <=> $b->[0] || # the result is -1,0,1 ...
    $a->[1] <=> $b->[1]    # so [1] when [0] is same

} @a;

Nikhil Joshi

unread,

Sep 23, 2014, 7:08:44 AM9/23/14

to unix-and-perl-...@googlegroups.com

If I'm understanding your question correctly, I think what you want is something like this:

@sorted_array_by_len = sort {$a->[0] <=> $b->[0]} @array;

to sort by $len, and then just increase the index for the other variables:

@sorted_array_by_count = sort {$a->[1] <=> $b->[1]} @array;

to sort by $count, etc. Of course, you'll have to change the operator if any of those elements are strings:

@sorted_array_by_seq = sort {$a->[4] cmp $b->[4]} @array;

These are just creating an anonymous sort function using the special variables $a and $b.

- Nik.

--
You received this message because you are subscribed to the Google Groups "Unix and Perl for Biologists" group.
To unsubscribe from this group and stop receiving emails from it, send an email to unix-and-perl-for-bi...@googlegroups.com.
To post to this group, send email to unix-and-perl-...@googlegroups.com.
Visit this group at http://groups.google.com/group/unix-and-perl-for-biologists.
For more options, visit https://groups.google.com/d/optout.

--
Nikhil Joshi
Bioinformatics Analyst/Programmer
UC Davis Bioinformatics Core
http://bioinformatics.ucdavis.edu/
najoshi -at- ucdavis -dot- edu
530.752.2698 (w)

Anand K S Rao

unread,

Sep 23, 2014, 7:49:38 AM9/23/14

to unix-and-perl-...@googlegroups.com

Thank you for your reply, Nikhil.

Yes, I do understand how this sorting (numerical or alphabetical) and associated syntax works.

But I did not explain my question well enough 1st time around, so let me try again, this time with some examples (BEFORE sorting).

I have tried to present 4 different cases below, but I will explain only case 1 here in this e-mail.

my @a1 = qw ($ID1, $len1, $dom1, $domlen1, $seq1);

my @a2 = qw ($ID2, $len2, $dom2, $domlen2, $seq2);

my @a3 = qw ($ID3, $len3, $dom3, $domlen3, $seq3);

$line1 = /@a1;

$line2 = /@a2;

$line3 = /@a3;

my @array =qw ($line, $line2, $line3); # an array of array references.

I want to first sort @array by element [1] of @a1, @a2, @a3 i.e. $ID , then by element [2] - $dom, then by element [3] - $domlen, by element [4] - $seq (which would be an alphabetical sort) and finally by element [0] - $ID.

After this sort, I want to extract the "ID" from 1st de-referenced array, in @array

Was that a better explanation?

So for this scenario, I am unable to use the same syntax because my array is itself an array of references.

I know in principle its the same kid of sort, but I cant seem to get the syntax for it right... Could you please help?

Thank you!

Anand

Array name	ID	Full isoform length	# of domains	total length of all FBDs	full sequence

case 1: chosen based on sorted length
@a1	a.1	100	3	48	MAEFR
@a2	a.2	125	3	46	MAEFR
@a3	a.3	100	3	46	MAEFR

case 2: chosen based on sorted number of domains
@b1	b.1	100	3	41	MGAFD
@b2	b.2	100	4	45	MGAFRQ
@b3	b.3	100	3	46	MMGFRPP
@b4	b.4	100	3	42	MGAFD

case 3: chosen based on sorted cumulative length of domain
@c1	c.1	130	2	48	MGAFARAF
@c2	c.2	130	2	80	MGAFARAF

case 4: when all are equal, then choose the element by sorted IDs
@d1	d.1	200	3	46	MALETGD
@d2	d.2	200	3	46	MALETGD
@d3	d.3	200	3	46	MALETGD
@d4	d.4	200	3	46	MALETGD

--
You received this message because you are subscribed to a topic in the Google Groups "Unix and Perl for Biologists" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/unix-and-perl-for-biologists/FAuu1hGjMaY/unsubscribe.
To unsubscribe from this group and all its topics, send an email to unix-and-perl-for-bi...@googlegroups.com.

To post to this group, send email to unix-and-perl-...@googlegroups.com.
Visit this group at http://groups.google.com/group/unix-and-perl-for-biologists.
For more options, visit https://groups.google.com/d/optout.

--

----------------------------------------------------------------------------------------------------------------------------------
“I would rather be ashes than dust! I would rather that my spark should burn out in a brilliant blaze than it should be stifled by dry-rot. I would rather be a superb meteor, every atom of me in magnificent glow, than a sleepy and permanent planet. The function of man is to live, not to exist. I shall not waste my days trying to prolong them. I shall use my time.” - Jack London(1876-1916)

Claudius Kerth

unread,

Sep 25, 2014, 3:44:22 PM9/25/14

to unix-and-perl-...@googlegroups.com

Hi Anand,

the following Perl code should do what you described in your last post. It takes your table in a file as input.

21 use strict;

22 use warnings;

23 use Data::Dumper;

24

25 my @table;

26

27 while(<>){

28 chomp;

29 my @line = split;

30 push @table, \@line; # note, in order for this to work "@line" needs to be lexically scoped inside the loop

31 }

32 #print Dumper(@table); exit;

33

34 # print table

35 foreach my $line (@table){

36 print "@{$line}", "\n";

37 }

38

39 # sort table

40 my @table_sorted = sort { $a->[1] <=> $b->[1]

41 ||

42 $a->[2] <=> $b->[2]

43 ||

44 $a->[3] <=> $b->[3]

45 ||

46 $a->[4] cmp $b->[4]

47 ||

48 $a->[0] <=> $b->[0]

49 } @table;

50

51 print "\n\n";

52

53 # print sorted table

54 foreach my $line (@table_sorted){

55 print "@{$line}", "\n";

56 }

57 # sorting and printing could be combined, so that a temporary @table_sorted

58 # wouldn't need to be created

59

60 print "\n\n";

61

62 # extract ID from first line after sorting

63 print @{$table_sorted[0]}->[0], "\n";

If you don't have it installed yet, get `perldoc`, then:

$ perldoc -f sort

As you can see, this is cumbersome to programme in Perl. However, sorting a table by different columns is a trivial task for the Unix `sort` command:

$ sort -k2,2n -k3,3n -k4,4n -k5,5d -k1,1d <your_table.tsv>

claudius

Claudius Kerth

unread,

Sep 25, 2014, 3:48:57 PM9/25/14

to unix-and-perl-...@googlegroups.com

48 $a->[0] <=> $b->[0]

needs to be replaced with:

48 $a->[0] cmp $b->[0]

sorry!

claudius

On 23 Sep 2014, at 12:49, Anand K S Rao wrote:

Anand K S Rao

unread,

Sep 26, 2014, 12:28:36 AM9/26/14

to unix-and-perl-...@googlegroups.com

Thank you, Claudius.

Since my array is an array of array references, a user at PerlMonks suggested doing

$$a[0]<=> $$b[0], or $$a[4] cmp $$b[4] depending on numerical or alphaberical sort

which works, but I am not sure what the $$ means. And I could not find the meaning of $$ on the internet. Perhaps you could explain?

Thanks again!

Sincerely,

Anand

Claudius Kerth

unread,

Sep 26, 2014, 5:11:50 AM9/26/14

to unix-and-perl-...@googlegroups.com

Hi Anand,

here $$a[0] is a equivalent to ${$a}[0] and is equivalent to $a->[0] . It's confusing to me, too.

See `$ man perlref`, section "Using references".

claudius

Reply all

Reply to author

Forward