running TrinityStats and align_align_and_estimate_abundance.pl on DIAG

1,017 views
Skip to first unread message

Joel Shore

unread,
May 14, 2015, 3:20:02 PM5/14/15
to trinityrn...@googlegroups.com
Dear All,

 Apologies for sending this which is likely something simple to deal
with (I hope) nor was I able to find anyone posting on the matter.

I managed to successfully run Trinity on DIAG.

I next want to  TrinityStats and more importantly
align_align_and_estimate_abundance.pl ultimately heading to differential
expression


I've not been able to run TrinityStats which is in the
util directory of a number of the Trinity versions on  DIAG.

I'm not sure if this is because I somehow haven't included it in my path
explicitly  (although I later did that without success).

Here is the error message I get which seems to suggest it can't find
a Fasta-reader, which I indeed do see in a directory that I've included
in my path.

Can't locate Fasta_reader.pm in @INC (@INC contains:
/opt/sge-root/diag/spool/cloud-200-223/job_scripts/../PerlLib
/usr/local/lib64/perl5 /usr/local/share/perl5 /usr/lib64/perl5/vendor_perl
/usr/share/perl5/vendor_perl /usr/lib64/perl5 /usr/share/perl5 .) at
/opt/sge-root/diag/spool/cloud-200-223/job_scripts/1160567 line 8.
BEGIN failed--compilation aborted at
/opt/sge-root/diag/spool/cloud-200-223/job_scripts/1160567 line 8.

Tiago Hori

unread,
May 14, 2015, 4:02:06 PM5/14/15
to Joel Shore, trinityrn...@googlegroups.com
The problem is not your path, it is the @INC array that the Perl compiler uses and points to the paths for the Perl libraries. When trinity is compiled the libraries should have been put in the correct places. 

Have you tried to compile Trinity locally and run the script?

T.

Sent from my iPhone
--
You received this message because you are subscribed to the Google Groups "trinityrnaseq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to trinityrnaseq-u...@googlegroups.com.
To post to this group, send email to trinityrn...@googlegroups.com.
Visit this group at http://groups.google.com/group/trinityrnaseq-users.
For more options, visit https://groups.google.com/d/optout.

Will Holtz

unread,
May 14, 2015, 4:18:40 PM5/14/15
to Tiago Hori, Joel Shore, trinityrn...@googlegroups.com
Did you try putting the path to the Fasta_reader.pm in the PERL5LIB environmental variable?

-Wil
--
The information contained in this e-mail message or any attachment(s) may be confidential and/or privileged and is intended for use only by the individual(s) to whom this message is addressed.  If you are not the intended recipient, any dissemination, distribution, copying, or use is strictly prohibited.  If you receive this e-mail message in error, please e-mail the sender at who...@lygos.com and destroy this message and remove the transmission from all computer directories (including e-mail servers).

Please consider the environment before printing this email.

Brian Haas

unread,
May 14, 2015, 9:33:30 PM5/14/15
to Will Holtz, Tiago Hori, Joel Shore, trinityrn...@googlegroups.com
The TrinityStats.pl script needs to be executed from where it's located in the Trinity installation directory, so you'd run it as:

  /path/to/TrinityHome/util/TrinityStats.pl   trinity.fasta

If you copy the script outside of that directory and try to run it from some other location, it's not going to find its libraries, which are located in a path relative to where that script exists in the trinity distro.


Joel Shore

unread,
May 15, 2015, 8:46:04 AM5/15/15
to trinityrn...@googlegroups.com
Many Thanks for your responses.

I believe I have correctly followed Brian's suggestion to execute TrinityStats.pl from the util directory.
Here's the command I executed. Note that there are a few versions of Trinity on DIAG. The trinity directory below
is the one from which I successfully ran trinity. The util folder does contain TrinityStats.pl in it.

qsub -P diag -l mem_free=1G -q all.q /diag/software/trinity/util/TrinityStats.pl /diag/home/shore Trinityjoel.fasta


I seem to get the same error message

Can't locate Fasta_reader.pm in @INC (@INC contains: /opt/sge-root/diag/spool/cloud-200-178/job_scripts/../PerlLib /usr/local/lib64/perl5 /usr/local/share/perl5 /usr/lib64/perl5/vendor_perl /usr/share/perl5/vendor_perl /usr/lib64/perl5 /usr/share/perl5 .) at /opt/sge-root/diag/spool/cloud-200-178/job_scripts/1160575 line 8.
BEGIN failed--compilation aborted at /opt/sge-root/diag/spool/cloud-200-178/job_scripts/1160575 line 8.

There is another directory within Trinity directory called PerlLib and another called PerlLibAdapters.
Do I somehow need to include those when I'm executing this job?


In terms of Tiago and Wil's comments, I don't know how to do what is suggested. I can't run this locally as I don't have a machine
with anywhere near the capacity to do the job. (sorry about random font size here). My read files are
30G each.

I'd greatly appreciate any further thoughts/idea.

Brian Haas

unread,
May 15, 2015, 8:57:45 AM5/15/15
to Joel Shore, trinityrn...@googlegroups.com
Hi Joel,

Out of curiosity, if you look at the top of that script, does it include:
#!/usr/bin/env perl
use strict;
use warnings;
use FindBin;
use lib ("$FindBin::Bin/../PerlLib");
use Fasta_reader;

?

If it does, then it should be updating it's own Perl library path so it finds the Fasta_reader.pm module.


If you need to, you can directly set the PERL5LIB environmental variable so it will locate the module in the trinity distro:

export PERL5LIB=/diag/software/trinity/:${PERL5LIB}

but honestly, you shouldn't have to, and in reality if this script is having a hard time finding the PerlLib directory, then really none of the Trinity scripts (including Trinity itself) should be functioning - since they all do this sort of thing.


best,


~brian



--
You received this message because you are subscribed to the Google Groups "trinityrnaseq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to trinityrnaseq-u...@googlegroups.com.
To post to this group, send email to trinityrn...@googlegroups.com.
Visit this group at http://groups.google.com/group/trinityrnaseq-users.
For more options, visit https://groups.google.com/d/optout.



--
--
Brian J. Haas
The Broad Institute
http://broadinstitute.org/~bhaas

 

Tiago Hori

unread,
May 15, 2015, 8:58:20 AM5/15/15
to Joel Shore, trinityrn...@googlegroups.com
Hi Joel,

Trinity stats is not memory or CPU intensive. I suspect that your problem may related to how the qsub protocol works. Because the stats calculations should not burden the system. Try running it on the node instead of with qsub.

T.

Sent from my iPhone
--

Joel Shore

unread,
May 15, 2015, 1:21:44 PM5/15/15
to trinityrn...@googlegroups.com
Many Thanks Again Brian and Tiago.

I'll start by saying Tiago's suggestion seems to be more promising in that TrinityStats seems to run
although I get the error message:

################################
## Counts of transcripts, etc.
################################
Total trinity transcripts:    0
Total trinity components:    0
Illegal division by zero at /diag/software/trinity/util/TrinityStats.pl line 68.

So, I suppose something is odd about my Fasta file. Don't know what that would be since trinity created it.

In terms of Brian's thoughts, here's the TrinityStats.pl script, which seems to mostly follow what Brian had suggested:

#!/usr/bin/env perl

use strict;
use warnings;

use FindBin;
use lib ("$FindBin::Bin/../PerlLib");
use Fasta_reader;
use BHStats;

I also tried Brians export statement, which gave the same error message.

I susptect that perhaps I'm doing something manifestly incompetent?

Thanks again

Joel



On Thursday, May 14, 2015 at 3:20:02 PM UTC-4, Joel Shore wrote:

Tiago Hori

unread,
May 15, 2015, 1:23:54 PM5/15/15
to Joel Shore, trinityrn...@googlegroups.com
Can you do a head Trintiy.fasta and send the output?

T.

Sent from my iPhone
--

Joel Shore

unread,
May 15, 2015, 1:32:00 PM5/15/15
to trinityrn...@googlegroups.com
Tiago,

 here's the fasta head

>comp0_c0_seq1 len=311 path=[3:0-310]
TGAAATTCCATCTGAATTGGTGGCCTTGTCCAGGACTTCTTTTCCGTCATGGTAGAAATC
AATTCAAATTTCTGCACTCATGGATGGTTCAGTTTGCCCATGAAATTTTCTGATCCTTAA
CATTCACTTCCCTACGCGTTCGGCCAAGTTCCTTGATTTGTTGGCAGTACCCGAAAATGG
TGGAGTAACCCCTTCGGTGATACGATACCTAAGGGAGTATCAGTTGCCCGGCAATTATAA
ACAATTAACAGAGAATACCAACGATGGAATGAAGGGAAAACTTATCCCAGAACTCGGATA
ATTAACAAAAG
>comp0_c1_seq1 len=317 path=[5756:0-316]
ATAATGTAGACCACATCAAAATCAAATTGATAGACAAGCATCATCAAAAGGATTATCAAT
ATTACATGCCAAATTTCTAGAGTCCGATTCGCGTATATGCTAATGGCATCAAAGAAGCAG



On Thursday, May 14, 2015 at 3:20:02 PM UTC-4, Joel Shore wrote:

Tiago Hori

unread,
May 15, 2015, 1:39:18 PM5/15/15
to Joel Shore, trinityrn...@googlegroups.com
Which version of Trinity did you use?

Make sure you are using the Trinity Stats from the same version you did your assembly. The name conventions changes and the stats script uses those to group the transcripts into Trinity genes. I actually don't even recognize that naming convention. Is neither the version 2 nor the July 2014 release.


T.

Sent from my iPhone
--

Joel Shore

unread,
May 15, 2015, 1:47:31 PM5/15/15
to trinityrn...@googlegroups.com
OK it ran!
I think without using the qsub shtick, and I made corrected my specification of directories.
I think I used a 2013 version. I perhaps could do a new version at risk of being strangled by the DIAG
folks, for consuming computing time and resources.

Anyway, here's the assembly stats.

Note, that what I really want to do is map the reads, and the align reads thing is in a util folder.
I was in part, using TrinityStats to see if I could run something from the util folder.

Many thanks for your advice Tiago and Brian.

Cheers
Joel


################################
## Counts of transcripts, etc.
################################
Total trinity transcripts:    226631
Total trinity components:    103380
Percent GC: 41.47

########################################
Stats based on ALL transcript contigs:
########################################

    Contig N10: 4759
    Contig N20: 3827
    Contig N30: 3206
    Contig N40: 2718
    Contig N50: 2315

    Median contig length: 793
    Average contig: 1309.44
    Total assembled bases: 296758903


#####################################################
## Stats based on ONLY LONGEST ISOFORM per COMPONENT:
#####################################################

    Contig N10: 4293
    Contig N20: 3195
    Contig N30: 2478
    Contig N40: 1909
    Contig N50: 1390

    Median contig length: 387
    Average contig: 758.83
    Total assembled bases: 78447826



On Thursday, May 14, 2015 at 3:20:02 PM UTC-4, Joel Shore wrote:
Reply all
Reply to author
Forward
0 new messages