align_and_estimate_abundance.pl

449 views
Skip to first unread message

Mark Chapman

unread,
Mar 25, 2015, 7:28:38 PM3/25/15
to trinityrn...@googlegroups.com
Hello all,

I've come across something unusual with the align_and_estimate_abundance.pl script in Trinity (I'm using 2.0.6). I've run this many times and it's given a sensible output, but this time my .genes.results and .isoforms.results outputs both say that all my reads are mapping to only one of the transcripts. I.e. if I run /count_features_given_MIN_FPKM_threshold.pl I get:

neg_min_fpkm    num_features
-4048583        1
0       59034

I've looked in the STDERR and STDOUT files and they don't have anything obvious to say its failed. They tell me that ca. 85% of reads have mapped back to the reference, so this doesn't seem to be the problem, eg:

# reads processed: 17622001 # this is correct (and the F and R files have the same number of reads)
# reads with at least one reported alignment: 15298195 (86.81%) # this is encouraging
# reads that failed to align: 2323806 (13.19%)

My command is:
/local/software/trinity/trinityrnaseq-2.0.6/util/align_and_estimate_abundance.pl \
--transcripts /scratch/mc1c12/EPseq_Sept14/EP_maptorefMarch15/TrinityX3_17.fasta --seqType fq \
--left /scratch/mc1c12/EP_Wildreadsmerged/UND_MM0710_left.fastq \
--right /scratch/mc1c12/EP_Wildreadsmerged/UND_MM0710_right.fastq \
--debug --trinity_mode --est_method RSEM --aln_method bowtie --SS_lib_type RF --prep_reference --output_prefix UND_MM0710 --output_dir /scratch/mc1c12/EPseq_Sept14/EP_maptorefMarch15/UND_MM0710

Can anyone see anything I've missed?

Many thanks, Mark

--
Dr. Mark A. Chapman
+44 (0)2380 594396
------------------------------------
Centre for Biological Sciences
University of Southampton
Life Sciences Building 85
Highfield Campus
Southampton
SO17 1BJ

Brian Haas

unread,
Mar 25, 2015, 8:10:52 PM3/25/15
to Mark Chapman, trinityrn...@googlegroups.com
Hi Mark,

That certainly sounds pretty weird.  If you run

  cat RSEM.isoforms.results| perl -lane 'if ($F[4] > 0) { print;}'

are you only seeing one entry, or do you see lots of them?

I'm wondering where the point of failure is here... starting by looking at the RSEM files themselves.

~b

--
You received this message because you are subscribed to the Google Groups "trinityrnaseq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to trinityrnaseq-u...@googlegroups.com.
To post to this group, send email to trinityrn...@googlegroups.com.
Visit this group at http://groups.google.com/group/trinityrnaseq-users.
For more options, visit https://groups.google.com/d/optout.



--
--
Brian J. Haas
The Broad Institute
http://broadinstitute.org/~bhaas

 

Mark Chapman

unread,
Mar 25, 2015, 8:28:36 PM3/25/15
to Brian Haas, trinityrn...@googlegroups.com
Hi Brian,
Just one:
[mc1c12@cyan01 C150]$ cat C150.isoforms.results| perl -lane 'if ($F[4] > 0) { print;}'
TR17383|c0_g3_i1        TR17383|c0_g3   254     247.00  2.00    1000000.00      4048583.00      100.00

(note that if i look at the isoforms.results outputs from different runs of 'align and estimate' I always have the reads mapping to one, but always that same, transcript)

Not something I have ever seen before or can rationalise how it would happen!

Brian Haas

unread,
Mar 25, 2015, 8:30:10 PM3/25/15
to Mark Chapman, trinityrn...@googlegroups.com
Ah..   just a second... I think I might know what's going on.

~b

Mark Chapman

unread,
Mar 25, 2015, 8:31:43 PM3/25/15
to Brian Haas, trinityrn...@googlegroups.com
OK, sounds promising!

FYI samtools flagstat gives:
[mc1c12@cyan01 UND_MM0710]$ samtools flagstat UND_MM0710.bowtie.bam
61867224 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 duplicates
61867224 + 0 mapped (100.00%:-nan%)
61867224 + 0 paired in sequencing
30933612 + 0 read1
30933612 + 0 read2
61867224 + 0 properly paired (100.00%:-nan%)
61867224 + 0 with itself and mate mapped
0 + 0 singletons (0.00%:-nan%)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)

Brian Haas

unread,
Mar 25, 2015, 8:37:33 PM3/25/15
to Mark Chapman, trinityrn...@googlegroups.com
Try running RSEM again using that bam file. Note, it's got to be a name-sorted file to work with RSEM (not coordinate sorted), but the wrapper should be doing the right thing here.... unless you give the bam file directly to the script rather than having it run the aligner.

best,

~b

Mark Chapman

unread,
Mar 26, 2015, 2:54:41 AM3/26/15
to Brian Haas, trinityrn...@googlegroups.com

Thanks Brian, Do you mean to run RSEM alone or run the align and estimate script feeding it with the bam instead of the reads? Sorry, bit confused :/
Thanks!

Brian Haas

unread,
Mar 26, 2015, 6:27:49 AM3/26/15
to Mark Chapman, trinityrn...@googlegroups.com
The simplest thing to do is to just rerun the align-estimate script in a new directory, to be sure that no earlier intermediates are regenerated.  Let's try that first.

There is a way to feed the align-estimate script a bam file (what I was thinking to do earlier), but let's look into that later.

best,

~b

Mark Chapman

unread,
Mar 26, 2015, 10:56:47 AM3/26/15
to Brian Haas, trinityrn...@googlegroups.com
Hi Brian,

I ran the same script in a new directory. It ran through all the steps (it seems) and gave the usual outputs, but again when I try /count_features_given_MIN_FPKM_threshold.pl
I get:
neg_min_fpkm    num_features
-3937008        1
0       92709

​and only get one locus in the list when i try cat RSEM.isoforms.results| perl -lane 'if ($F[4] > 0) { print;}'.

Any ideas?

Thanks, Mark

Brian Haas

unread,
Mar 26, 2015, 11:05:38 AM3/26/15
to Mark Chapman, trinityrn...@googlegroups.com
Let's do a couple of things.

First, try running the sample data set:

    cd trinityrnaseq/sample_data/test_Trinity_Assembly

     ./runMe.sh 1

The trailing 1 will force it to run the RSEM pipeline after doing the assembly.  Let's see if the RSEM output files look as expected. 

If the RSEM results look funky, then try rebuilding the RSEM codebase in the trinity-plugins area.

If they look ok, then we'll go back to looking at your data set.

best,

~b

Ken Field

unread,
Mar 26, 2015, 4:05:33 PM3/26/15
to Brian Haas, Mark Chapman, trinityrn...@googlegroups.com
Mark-
Have you taken a close look at your fastq headers? This might be a crazy idea, but I recall that RSEM is more strict about the format of the header lines than the rest of the Trinity pipeline. 

Ken


Mark Chapman

unread,
Mar 26, 2015, 4:16:30 PM3/26/15
to Ken Field, Brian Haas, trinityrn...@googlegroups.com
Hi Ken,

I'm using samples which have been run through older versions of Trinity (and the same [1.2.9] ver of RSEM), so that's the main reason I'm so perplexed. The fastqs look 'typical', eg:

@HISEQ2500-01:151:H7PUNADXX:1:2106:5332:24110 1:N:0:CAGGTTCT
GGAGGAGAGGGGATGATAGGAGGAATAATGGAAGGTGGAGGTAACAATGGGTTTGGAGGAAGGATTGAAGGTGGCGGAAGAAGTGGATTCGGAGGCAAGG
+
BC@FFDFDHHHHHJJJJJJJHHJGIIIJJJJHIJJDGIEHIBFIIJJJJJJFHHIJEHHFFFFEEEEEEED?CDDDDDDDBDDACDCDDDDDDDDDDDDD

There's been much talk of 'spaces in headers causing problems'and there IS a space here... but as they've run before I'm guessing this space isn't a problem.

Does anything look fishy?

Thanks!

Ken Field

unread,
Mar 26, 2015, 4:29:52 PM3/26/15
to Mark Chapman, Brian Haas, trinityrn...@googlegroups.com
I do like to change all the spaces to dashes and also place the -1 or -2 to indicate whether it is left or right. However, I'm not sure why that would cause trouble now if it was okay before.
Ken

Mark Chapman

unread,
Mar 27, 2015, 3:56:38 AM3/27/15
to Ken Field, Brian Haas, trinityrn...@googlegroups.com
Brian, I am having trouble with the sample data - you said to# out the java version check in the Trinity script, so I #'d out lines 815-835 - is this enough? Because when i run runMe.sh now I get the error:

CMD: /scratch/mc1c12/trinityrnaseq-2.0.6/util/support_scripts/get_Trinity_gene_to_trans_map.pl trinity_out_dir/Trinity.fasta > trinity_out_dir/Trinity.fasta.gene_trans_map
Can't open trinity_out_dir/Trinity.fasta: No such file or directory

So it seems Trinity isnt making the typical Trinity.fasta file. But it has started doing the first few steps (its made 'trinity_out_dir' and there are a few files in there (including a suspiciously empty 'chrysalis' folder).

I've already fixed several errors about path settings and 'permission denied's but this error isnt specific enough for me to pinpoint which bit is failing so I am at a loss as to what can be changed.

Any help greatly appreciated as always!

[Thanks for the idea Ken, Its something I will change in the future as matter of course]

--Mark

Brian Haas

unread,
Mar 27, 2015, 7:33:27 AM3/27/15
to Mark Chapman, Ken Field, trinityrn...@googlegroups.com
It is very suspicious that the chrysalis folder is empty.

Try starting over by first running the cleanMe.pl script there, followed by the 
   runMe.sh 1

and let's see how that goes.

I just noticed an option I had added to Trinity to bypass the java versioning check:

  --bypass_java_version_check

You could just use that flag, but it sounds like you commented out the relevant section anyway.  I don't expect that to be related to this latest weird problem.  Cleaning / restarting should hopefully reset everything and we can take it from there.

best,

~brian


Mark Chapman

unread,
Mar 27, 2015, 8:05:54 AM3/27/15
to Brian Haas, Ken Field, trinityrn...@googlegroups.com
Thanks Brian,

But now I get a thread termination part way through 'Phase 1':

Converting input files. (in parallel)Friday, March 27, 2015: 11:50:15   CMD: gunzip -c /scratch/mc1c12/trinityrnaseq-2.0.6/sample_data/test_Trinity_Assembly/reads.left.fq.gz > /scratch/mc1c12/trinityrnaseq-2.0.6/sample_data/test_Trinity_Assembly/reads.left.fq
Friday, March 27, 2015: 11:50:15        CMD: gunzip -c /scratch/mc1c12/trinityrnaseq-2.0.6/sample_data/test_Trinity_Assembly/reads.right.fq.gz > /scratch/mc1c12/trinityrnaseq-2.0.6/sample_data/test_Trinity_Assembly/reads.right.fq
Friday, March 27, 2015: 11:50:15        CMD: gunzip -c /scratch/mc1c12/trinityrnaseq-2.0.6/sample_data/test_Trinity_Assembly/reads2.left.fq.gz > /scratch/mc1c12/trinityrnaseq-2.0.6/sample_data/test_Trinity_Assembly/reads2.left.fq
Friday, March 27, 2015: 11:50:15        CMD: gunzip -c /scratch/mc1c12/trinityrnaseq-2.0.6/sample_data/test_Trinity_Assembly/reads2.right.fq.gz > /scratch/mc1c12/trinityrnaseq-2.0.6/sample_data/test_Trinity_Assembly/reads2.right.fq
Friday, March 27, 2015: 11:50:15        CMD: /scratch/mc1c12/trinityrnaseq-2.0.6/trinity-plugins/fastool/fastool --rev  --illumina-trinity --to-fasta /scratch/mc1c12/trinityrnaseq-2.0.6/sample_data/test_Trinity_Assembly/reads.left.fq >> left.fa 2> /scratch/mc1c12/trinityrnaseq-2.0.6/sample_data/test_Trinity_Assembly/reads.left.fq.readcount 
Thread 1 terminated abnormally: Error, cmd: /scratch/mc1c12/trinityrnaseq-2.0.6/trinity-plugins/fastool/fastool --rev  --illumina-trinity --to-fasta /scratch/mc1c12/trinityrnaseq-2.0.6/sample_data/test_Trinity_Assembly/reads.left.fq >> left.fa 2> /scratch/mc1c12/trinityrnaseq-2.0.6/sample_data/test_Trinity_Assembly/reads.left.fq.readcount  died with ret 32512 at ../../Trinity line 2116.
Use of uninitialized value in array dereference at ../../Trinity line 1211.
Friday, March 27, 2015: 11:50:15        CMD: /scratch/mc1c12/trinityrnaseq-2.0.6/trinity-plugins/fastool/fastool --illumina-trinity --to-fasta /scratch/mc1c12/trinityrnaseq-2.0.6/sample_data/test_Trinity_Assembly/reads.right.fq >> right.fa 2> /scratch/mc1c12/trinityrnaseq-2.0.6/sample_data/test_Trinity_Assembly/reads.right.fq.readcount 
etc...

Any ideas why it might have died here?

Thanks, Mark

Brian Haas

unread,
Mar 27, 2015, 8:49:42 AM3/27/15
to Mark Chapman, Ken Field, trinityrn...@googlegroups.com
Did all the software build properly under

   make
And
  make plugins 

in the base installation directory?

Also, add --verbose to the trinity command in the runMe.sh file and try again
Thx,

-Brian
(by iPhone)

Mark Chapman

unread,
Mar 27, 2015, 10:39:52 AM3/27/15
to Brian Haas, Ken Field, trinityrn...@googlegroups.com
Hi Brian,

Good point, and I don't recall, so to make sure, I removed trinity and reinstalled. 'make' completed fine and 'make plugins' had the following error:

cd collectl && ./build_collectl.sh
/bin/sh: ./build_collectl.sh: Permission denied
make[1]: *** [collectl_target] Error 126
make[1]: Leaving directory `/scratch/mc1c12/trinityrnaseq-2.0.6/trinity-plugins'
make: *** [plugins] Error 2

I thought I'd try the sample data anyway, and after a lot of messing with PATHs and modules I got it to run, and no errors. BUT I still have the reads all mapping to just one transcript:

[mc1c12@cyan01 trinityrnaseq-2.0.6]$ /scratch/mc1c12/trinityrnaseq-2.0.6/util/misc/count_features_given_MIN_FPKM_threshold.pl RSEM.genes.results > cumul_counts.txt
[mc1c12@cyan01 test_Trinity_Assembly]$ head cumul_counts.txt 
neg_min_fpkm    num_features
-141383 1
0       68

(Are there only 68 genes in the sample data?)

I can also confirm I have java version 1.6.0_24 (but I #'d out the java check). If this is likely the problem then I will have to talk to my server people, BUT this did create an assembly previously.

Thanks again (again!)
--Mark

Brian Haas

unread,
Mar 27, 2015, 10:55:42 AM3/27/15
to Mark Chapman, Ken Field, trinityrn...@googlegroups.com
It's important to make sure the RSEM in the trinity-plugins builds ok.

I'm not sure what the deal is with collectl giving you the permission-denied...

Try this:

  cd trinity-plugins

  make rsem

and see if that builds it.

Once you're able to run the 'runMe.sh 1' in the sample data area, you should get an RSEM isoforms.results file like attached.

If it looks considerably different (still only getting result for a single transcript), then something is still very wrong, and I'm at a loss to explain it.  The next step would be to pull the latest RSEM code down from the RSEM software site, install it, and run it separately. If it continues to misbehave, the RSEM group provides excellent support for it.

I wish I had better insights here...

best,

~brian

RSEM.isoforms.results

Brian Haas

unread,
Mar 27, 2015, 11:52:50 AM3/27/15
to Mark Chapman, trinityrn...@googlegroups.com
Phew!  :)

Great to see things coming along now. :)

~brian


On Fri, Mar 27, 2015 at 11:44 AM, Mark Chapman <markcha...@gmail.com> wrote:
OK, looking better. That was clearly an issue (although I was directing my PATH to a previously installed (and previously functional) version of RSEM). Now I have:

[mc1c12@cyan01 test_Trinity_Assembly]$ head RSEM.isoforms.results
transcript_id   gene_id length  effective_length        expected_count  TPM     FPKM    IsoPct
TR10|c0_g1_i1   TR10|c0_g1      253     13.18   0.00    0.00    0.00    0.00
TR10|c0_g1_i2   TR10|c0_g1      3745    3448.45 717.00  18208.76        7810.35 100.00
TR11|c1_g1_i1   TR11|c1_g1      245     10.29   1.00    8507.46 3649.14 100.00
TR11|c2_g1_i1   TR11|c2_g1      239     8.34    0.00    0.00    0.00    0.00
TR12|c0_g1_i1   TR12|c0_g1      304     38.40   1.00    2280.90 978.35  100.00
TR13|c0_g1_i1   TR13|c0_g1      229     5.54    1.00    15809.27        6781.13 100.00
TR13|c1_g1_i1   TR13|c1_g1      1413    1116.45 573.00  44946.90        19279.24        100.00
TR13|c2_g1_i1   TR13|c2_g1      1089    792.45  374.00  41331.74        17728.57        100.00
TR14|c0_g1_i1   TR14|c0_g1      307     40.21   3.00    6534.01 2802.66 100.00

So it seems that my initial problem was my local RSEM after all (although again it used to work which makes this perplexing). I will have a go directing my earlier scripts to the new RSEM and see if this works and report back later...

Thanks Brian!
--Mark

Reply all
Reply to author
Forward
0 new messages