creating an array of n-dimensional array references, and sorting by each of the n column

8 views
Skip to first unread message

Anand Rao

unread,
Sep 23, 2014, 6:27:01 AM9/23/14
to unix-and-perl-...@googlegroups.com
Hi all,

I have an array of array references.
@array=qw($a, $b, $c, $d .. $z);

Each of the array references is for an array that is 5 elements each.
@{$a} = qw/ $len, $count, $domlen, $seq, $ID/;

I want to sort @array, based on the array elements of its constituent array, one-by-one
i.e sort @array by $len, then by $count, ... finally by $ID

What is the syntax I will be using here?

I tried something on the lines of the suggestion at : http://www.perlmonks.org/?node_id=674374 by jettero,
but get an error message - I suppose because this is trying to sort by entire arrays within the array (which is not what I want)
    my @b = sort { $a->[0] <=> $b->[0] || # the result is -1,0,1 ... $a->[1] <=> $b->[1] # so [1] when [0] is same } @a;

Nikhil Joshi

unread,
Sep 23, 2014, 7:08:44 AM9/23/14
to unix-and-perl-...@googlegroups.com
If I'm understanding your question correctly, I think what you want is something like this:

@sorted_array_by_len = sort {$a->[0] <=> $b->[0]} @array;

to sort by $len, and then just increase the index for the other variables:

@sorted_array_by_count = sort {$a->[1] <=> $b->[1]} @array;

to sort by $count, etc. Of course, you'll have to change the operator if any of those elements are strings:

@sorted_array_by_seq = sort {$a->[4] cmp $b->[4]} @array;


These are just creating an anonymous sort function using the special variables $a and $b.

- Nik.

--
You received this message because you are subscribed to the Google Groups "Unix and Perl for Biologists" group.
To unsubscribe from this group and stop receiving emails from it, send an email to unix-and-perl-for-bi...@googlegroups.com.
To post to this group, send email to unix-and-perl-...@googlegroups.com.
Visit this group at http://groups.google.com/group/unix-and-perl-for-biologists.
For more options, visit https://groups.google.com/d/optout.



--
Nikhil Joshi
Bioinformatics Analyst/Programmer
UC Davis Bioinformatics Core
http://bioinformatics.ucdavis.edu/
najoshi -at- ucdavis -dot- edu
530.752.2698 (w)

Anand K S Rao

unread,
Sep 23, 2014, 7:49:38 AM9/23/14
to unix-and-perl-...@googlegroups.com
Thank you for your reply, Nikhil. 

Yes, I do understand how this sorting (numerical or alphabetical) and associated syntax works.

But I did not explain my question well enough 1st time around, so let me try again, this time with some examples (BEFORE sorting).

I have tried to present 4 different cases below, but I will explain only case 1 here in this e-mail.

my @a1 = qw ($ID1, $len1, $dom1, $domlen1, $seq1);
my @a2 = qw ($ID2, $len2, $dom2, $domlen2, $seq2);
my @a3 = qw ($ID3, $len3, $dom3, $domlen3, $seq3);
$line1 = /@a1;
$line2 = /@a2;
$line3 = /@a3;
my @array =qw ($line, $line2, $line3); # an array of array references.

I want to first sort @array by element [1] of @a1, @a2, @a3 i.e. $ID , then by element [2] - $dom, then by element [3] - $domlen, by element [4] - $seq (which would be an alphabetical sort) and finally by element [0] - $ID.
After this sort, I want to extract the "ID" from 1st de-referenced array, in @array

Was that a better explanation?

So for this scenario, I am unable to use the same syntax because my array is itself an array of references.
I know in principle its the same kid of sort, but I cant seem to get the syntax for it right... Could you please help?

Thank you!
Anand
Array name ID Full isoform length # of domains total length of all FBDs full sequence
           
case 1: chosen based on sorted length
@a1 a.1 100 3 48 MAEFR
@a2 a.2 125 3 46 MAEFR
@a3 a.3 100 3 46 MAEFR
           
case 2: chosen based on sorted number of domains
@b1 b.1 100 3 41 MGAFD
@b2 b.2 100 4 45 MGAFRQ
@b3 b.3 100 3 46 MMGFRPP
@b4 b.4 100 3 42 MGAFD
           
case 3: chosen based on sorted cumulative length of domain
@c1 c.1 130 2 48 MGAFARAF
@c2 c.2 130 2 80 MGAFARAF
           
case 4: when all are equal, then choose the element by sorted IDs
@d1 d.1 200 3 46 MALETGD
@d2 d.2 200 3 46 MALETGD
@d3 d.3 200 3 46 MALETGD
@d4 d.4 200 3 46 MALETGD

--
You received this message because you are subscribed to a topic in the Google Groups "Unix and Perl for Biologists" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/unix-and-perl-for-biologists/FAuu1hGjMaY/unsubscribe.
To unsubscribe from this group and all its topics, send an email to unix-and-perl-for-bi...@googlegroups.com.

To post to this group, send email to unix-and-perl-...@googlegroups.com.
Visit this group at http://groups.google.com/group/unix-and-perl-for-biologists.
For more options, visit https://groups.google.com/d/optout.



--
----------------------------------------------------------------------------------------------------------------------------------
“I would rather be ashes than dust! I would rather that my spark should burn out in a brilliant blaze than it should be stifled by dry-rot. I would rather be a superb meteor, every atom of me in magnificent glow, than a sleepy and permanent planet. The function of man is to live, not to exist. I shall not waste my days trying to prolong them. I shall use my time.” - Jack London(1876-1916)

Claudius Kerth

unread,
Sep 25, 2014, 3:44:22 PM9/25/14
to unix-and-perl-...@googlegroups.com
Hi Anand,

the following Perl code should do what you described in your last post. It takes your table in a file as input.

 21 use strict;
 22 use warnings;
 23 use Data::Dumper;
 24 
 25 my @table;
 26 
 27 while(<>){
 28     chomp;
 29     my @line = split;
 30     push @table, \@line; # note, in order for this to work "@line" needs to be lexically scoped inside the loop
 31 }
 32 #print Dumper(@table); exit;
 33 
 34 # print table
 35 foreach my $line (@table){
 36     print "@{$line}", "\n";
 37 }
 38 
 39 # sort table
 40 my @table_sorted = sort { $a->[1] <=> $b->[1]
 41                             ||
 42                         $a->[2] <=> $b->[2]
 43                             ||
 44                         $a->[3] <=> $b->[3]
 45                             ||
 46                         $a->[4] cmp $b->[4]
 47                             ||
 48                         $a->[0] <=> $b->[0]
 49                     } @table;
 50 
 51 print "\n\n";
 52 
 53 # print sorted table
 54 foreach my $line (@table_sorted){
 55     print "@{$line}", "\n";
 56 }
 57 # sorting and printing could be combined, so that a temporary @table_sorted
 58 # wouldn't need to be created
 59 
 60 print "\n\n";
 61 
 62 # extract ID from first line after sorting
 63 print @{$table_sorted[0]}->[0], "\n";

If you don't have it installed yet, get `perldoc`, then:

$ perldoc -f sort

As you can see, this is cumbersome to programme in Perl. However, sorting a table by different columns is a trivial task for the Unix `sort` command:

$ sort -k2,2n -k3,3n -k4,4n -k5,5d -k1,1d <your_table.tsv>

claudius

Claudius Kerth

unread,
Sep 25, 2014, 3:48:57 PM9/25/14
to unix-and-perl-...@googlegroups.com
 48                         $a->[0] <=> $b->[0]

needs to be replaced with:

 48                         $a->[0] cmp $b->[0]

sorry!

claudius


On 23 Sep 2014, at 12:49, Anand K S Rao wrote:

Anand K S Rao

unread,
Sep 26, 2014, 12:28:36 AM9/26/14
to unix-and-perl-...@googlegroups.com
Thank you, Claudius.

Since my array is an array of array references, a user at PerlMonks suggested doing

$$a[0]<=> $$b[0], or $$a[4] cmp $$b[4] depending on numerical or alphaberical sort
which works, but I am not sure what the $$ means. And I could not find the meaning of $$ on the internet. Perhaps you could explain?

Thanks again!
Sincerely,
Anand


Claudius Kerth

unread,
Sep 26, 2014, 5:11:50 AM9/26/14
to unix-and-perl-...@googlegroups.com
Hi Anand,

here $$a[0] is a equivalent to ${$a}[0] and is equivalent to $a->[0] . It's confusing to me, too.

See `$ man perlref`, section "Using references".

claudius
Reply all
Reply to author
Forward
0 new messages