quality of Trinity assembly

616 views
Skip to first unread message

hsiu-ching Ma

unread,
Mar 23, 2015, 6:44:37 PM3/23/15
to Brian Haas, Tiago Hori, trinityrn...@googlegroups.com
Hi, 

I am trying to assess the quality of my assembly before I proceed further and I am wondering if there are any suggestions as to what can be explored?

So far, i am using:

1) transcript fullness assessment as suggested by Trinity using both transcripts and protein databases

2) run bowtie_PE_separate_then_join.pl


My Q:

Do you recommend mapping Trinity transcripts to a reference genome sequence and do reciprocal blastn searches?

Also, is it worth running Detonate REF-EVAL? 

I am aware that I have a large amount of reads mapped to rRNA (~20%) in my output, would I better off filtering out these reads before running Trinity in silico normalisation and assembly? Will this improve the quality of the assembly?

Any suggestions will be greatly appreciated.

Best Regards
Karen

Tiago Hori

unread,
Mar 23, 2015, 6:53:02 PM3/23/15
to hsiu-ching Ma, Brian Haas, trinityrn...@googlegroups.com
Mapping against the genome could be trick unless you have a intron exon boundary library. 

DETONATE will give a more complete picture than separate then join method.

I would also use the script that extracts all putative full length sequences and compare that to your estimated number of transcripts.

It also depends on what you are looking for. For differential analysis for  example what really matters is unique mapping and fragmentation may not necessarily be a huge issue.

I also find that it is hard to determine the absolute quality of an assembly without a reference transcriptome. What I will often do is compare different assemblers and a merged assembly.

The rRNA should have been taken care at library prep, but that does not always work. The normalization should reduce the presence of those. 

T.

Sent from my iPhone
--
You received this message because you are subscribed to the Google Groups "trinityrnaseq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to trinityrnaseq-u...@googlegroups.com.
To post to this group, send email to trinityrn...@googlegroups.com.
Visit this group at http://groups.google.com/group/trinityrnaseq-users.
For more options, visit https://groups.google.com/d/optout.

hsiu-ching Ma

unread,
Mar 24, 2015, 7:16:44 PM3/24/15
to Tiago Hori, Brian Haas, trinityrn...@googlegroups.com
Hi Tiago,

Thanks for the reply. My library is a normal non-stranded library, so I suppose mapping Trinity transcripts back to the genome will not tell me anything about the quality of the Trinity transcriptome. I came across a paper in which the authors does this to access their transcriptome assembly and I was just not sure if this is something i should try.

Detonate: both RSEM-EVAL and REF-EVAL should be tested?

I have run the script that extract full length sequences , i.e. analyze_blastPlus_topHit_coverage.pl. I ran this script with the CDS sequences and only ~10% of transcripts mapped at least 80% to the CDS. I suppose this % is not good enough?

my Q:

1. I am actually trying to do 2 things with my Trinity assembly. First is to generate a good transcriptome assembly and to do annotation and second to compare gene expression between different samples. At this stage, I thought I should make sure the Trinity assembly I have is good before I proceed further.

Any suggestions as to  how to proceed with these 2 types of analysis?

2. If there is a okish transcriptome available for my species, anything i can do to test the quality of my Trinity assembly?

Thanks.

Best Regards
Karen


Tiago Hori

unread,
Mar 25, 2015, 11:53:35 AM3/25/15
to hsiu-ching Ma, Brian Haas, trinityrn...@googlegroups.com
You would have to use RSEM eval, as REF-eval requires a reference.

You can map your reads to you transcriptome and the Trinity generated one and if the percent properly mapped read are much higher in the old transcriptome, you could try to use CAP3 or CD_HIT to merge them. CD-HIT is actually better.

Can you send the results of bowtie_PE_separate_then_join.pl?

T.
Ultimate stresses sportsmanship and fair play. Competitive play is encouraged, but NEVER AT THE EXPENSE OF RESPECT BETWEEN PLAYERS, adherence to the rules and the BASIC JOY OF PLAY.

On Mar 24, 2015, at 08:46 PM, hsiu-ching Ma <hck...@gmail.com> wrote:

Hi Tiago,

Thanks for the reply. My library is a normal non-stranded library, so I suppose mapping Trinity transcripts back to the genome will not tell me anything about the quality of the Trinity transcriptome. I came across a paper in which the authors does this to access their transcriptome assembly and I was just not sure if this is something i should try.

Detonate: both RSEM-EVAL and REF-EVAL should be tested?
 
bowtie_PE_separate_then_join.pl

hsiu-ching Ma

unread,
Mar 25, 2015, 2:14:35 PM3/25/15
to Tiago Hori, Brian Haas, trinityrn...@googlegroups.com
Hi Tiago,

Thanks for your kind offer to look at the result for me, I really appreciate it. The hpc is down here and will be back tomorrow or Friday, I will send it as soon as the hpc is back and running, if this is ok. 

Apart from merging with the old transcriptome, any suggestion if i want to use only the de novo assembly?

Thanks.

Best Regards
Karen



Mark Chapman

unread,
Mar 25, 2015, 6:21:42 PM3/25/15
to hsiu-ching Ma, Tiago Hori, Brian Haas, trinityrn...@googlegroups.com
Hello, As Tiago says, DETONATE is a pretty good way of assessing how 'good' a transcriptome is. Do you by any chance have access to the raw reads from the previous assembly? Were they from the same individual? Or completely different? You could always use CD-HIT to assess what proportion of your assembly is covered in the initial assembly, as well as vice versa to see if its worth combining them. Just some ideas :) Thanks, Mark
Dr. Mark A. Chapman
+44 (0)2380 594396
------------------------------------
Centre for Biological Sciences
University of Southampton
Life Sciences Building 85
Highfield Campus
Southampton
SO17 1BJ

Brian Haas

unread,
Mar 25, 2015, 6:34:10 PM3/25/15
to Mark Chapman, hsiu-ching Ma, Tiago Hori, trinityrn...@googlegroups.com
We definitely need to modernize the section on our web documentation on how to assess the quality of an assembly.

~b
--
--
Brian J. Haas
The Broad Institute
http://broadinstitute.org/~bhaas

 

hsiu-ching Ma

unread,
Mar 26, 2015, 7:52:27 PM3/26/15
to Mark Chapman, Tiago Hori, Brian Haas, trinityrn...@googlegroups.com
Hi Mark,

Thanks for the email. The only reference I have for the species of interest is in flybase and no, unfortunately I don't have the raw reads and I don't think the transcriptome was generated based on these second generation sequencing data. They are the same species, but not sure if they are the exact same strain.

Thanks.

Best Regards
Karen


hsiu-ching Ma

unread,
Mar 30, 2015, 7:31:36 PM3/30/15
to Tiago Hori, Brian Haas, trinityrn...@googlegroups.com
Hi Tiago,

I was able to get the read alignment statistics from the '.o' file after running 
bowtie_PE_separate_then_join.pl

However, this time when I run the analysis, 
This is the kind of output I am getting:

$ perl /data/apps/trinity/r2015-2.0.3/util/SAM_nameSorted_to_uniq_count_stats.pl bowtie.nameSorted.bam
Æ20,000,000Å  lines read 


#read_type count pct
single 10781121 100.00

Total aligned reads: 10781121



I do not seems to be able to obtain the alignment stats as stated here:



Any suggestions?

Thanks.

Best Regards
Karen

   

On Wed, Mar 25, 2015 at 8:53 AM, Tiago Hori <tiag...@me.com> wrote:

Mark Chapman

unread,
Mar 31, 2015, 2:28:47 AM3/31/15
to hsiu-ching Ma, Tiago Hori, Brian Haas, trinityrn...@googlegroups.com
​Hi Karen,
Can you provide your full 'bowtie_PE_separate_then_join.pl' command as it seems you are only able to detect reads that map singly and not as pairs. ​Do you have 10781121 reads? Or hasnt it used the F and R? (I presume you have PE data?).
Thanks, Mark

hsiu-ching Ma

unread,
Mar 31, 2015, 2:43:17 PM3/31/15
to Mark Chapman, Tiago Hori, Brian Haas, trinityrn...@googlegroups.com
Hi Mark,

Thanks for the email. Yes, I have PE data, non-stranded and here is the command i used

perl /data/apps/trinity/r2015-2.0.3/util/bowtie_PE_separate_then_join.pl --seqType fq --left ../../../R207-P1-READ1-trimmed-paired-new.txt --right ../../../R207-P1-READ2-trimmed-paired-new.txt --target Trinity.fasta --aligner bowtie -- -p 4 --all --best --strata -m 300



Thanks.

Best Regards
Karen

Mark Chapman

unread,
Mar 31, 2015, 4:33:55 PM3/31/15
to hsiu-ching Ma, Brian Haas, Tiago Hori, trinityrn...@googlegroups.com

Hi Karen,
The first thing I see is that the filenames are .txt but you say they are .fq. Could this be causing the issue? Also you can add --SS_lib_type parameter to make it clear its strand-specific. Maybe this would fix it?
Thanks, Mark

Tiago Hori

unread,
Mar 31, 2015, 4:35:32 PM3/31/15
to Mark Chapman, hsiu-ching Ma, Brian Haas, trinityrn...@googlegroups.com
Also, Brian has mentioned in the past that it is a bad idea to have your data and Trinity installation in the same file tree.

T.

Sent from my iPhone

hsiu-ching Ma

unread,
Mar 31, 2015, 5:18:33 PM3/31/15
to Mark Chapman, Brian Haas, Tiago Hori, trinityrn...@googlegroups.com
Hi Mark,

My library and not stranded one and also, I have ran with the same file name, i.e.R207-P1-READ1-trimmed-paired-new.txt in the past and it didn;t give me any problems. Indeed, the content is in fastq format.

Thanks.

Karen


hsiu-ching Ma

unread,
Mar 31, 2015, 5:20:40 PM3/31/15
to Mark Chapman, Brian Haas, Tiago Hori, trinityrn...@googlegroups.com
Hi Mark,

This is the message I got:

Error, cmd: /data/apps/trinity/r2015-2.0.3/util/../util/support_scripts/merge_left_right_nameSorted_SAMs.pl --left_sam left/left.nameSorted.bam --right_sam right/right.nameSorted.bam -D 2000 | samtools view -bt target.fa.fai -S - > combined.nameSorted.pre.bam died with ret 256 at /data/apps/trinity/r2015-2.0.3/util/bowtie_PE_separate_then_join.pl line 630


Thanks.

Karen

Mark Chapman

unread,
Mar 31, 2015, 5:33:57 PM3/31/15
to hsiu-ching Ma, Tiago Hori, Brian Haas, trinityrn...@googlegroups.com

Hi Karen,
Sorry I misread about stranded/unstranded. Maybe try renaming the files to .fq to see is anything changes. Is samtools in your PATH? Maybe Brian would suggest upgrading to ver. 2.0.6 too? These are just some ideas :)
Thanks, Mark

Brian Haas

unread,
Mar 31, 2015, 6:00:16 PM3/31/15
to Mark Chapman, hsiu-ching Ma, Tiago Hori, trinityrn...@googlegroups.com
Upgrading to v2.0.6 won't hurt, but I'm not sure it's going to fix this specific problem.

Try running that failed command directly and let's see if there's a useful error message.

best,

~brian

Mark Chapman

unread,
Mar 31, 2015, 6:15:59 PM3/31/15
to Brian Haas, Tiago Hori, hsiu-ching Ma, trinityrn...@googlegroups.com

Hi Karen,
Were your sequence files .txt files when you received them? If not then is it possible they have been saved as .txt files more recently and hence the line breaks could have been changed? (Just another idea!)
--Mark

hsiu-ching Ma

unread,
Apr 3, 2015, 3:13:07 PM4/3/15
to Tiago Hori, Brian Haas, trinityrn...@googlegroups.com
Hi Tiago,

Finally, I have the result of bowtie_PE_separate_then_join.pl. Can you please take a look for me?



Fri Apr 03 12:10:55 hcma@compute-1-13:/bio/hcma/new_trinity_analysis/merged_maleANDf_trinity_r2015-2.0.3_output/output_with/trinity_garr/bowtie_out
1052 $ cat SAM_nameSorted_merged_with.o3264330 

#read_type count pct
proper_pairs 224421146 80.03
improper_pairs 45037334 16.06
right_only 5685546 2.03
left_only 5282764 1.88

Total aligned reads: 280426790


Thanks for your time.

Best Regards
Karen





On Wed, Mar 25, 2015 at 8:53 AM, Tiago Hori <tiag...@me.com> wrote:

Tiago Hori

unread,
Apr 3, 2015, 3:23:47 PM4/3/15
to hsiu-ching Ma, Brian Haas, trinityrn...@googlegroups.com
Hi Karen,

That looks great. 80% proper pairs is what you want to see for a good assembly. Getting above that takes a lot depth and/or multi-assemblies.

T.

Sent from my iPhone

Ken Field

unread,
Apr 4, 2015, 11:11:56 AM4/4/15
to Tiago Hori, hsiu-ching Ma, Brian Haas, trinityrn...@googlegroups.com
I would add that if your read lengths overlap with your fragment lengths, most of the 'improper pairs' may in fact just be overlapping reads. In my case, I had 200 basepair fragments and 100 basepair reads and I found that almost all of the improper pairs were actually just overlaps.

It would be great if bowtie_PE_separate_then_join.pl would have a separate category of overlapping reads versus truly improper pairs (i.e. pairs that map to the same contig but in the wrong orientation relative to each other).

Ken
Ken Field, Ph.D.
Associate Professor of Biology
Program in Cell Biology/Biochemistry
Bucknell University
Room 203A Biology Building

Tiago Hori

unread,
Apr 4, 2015, 1:05:26 PM4/4/15
to Ken Field, hsiu-ching Ma, Brian Haas, trinityrn...@googlegroups.com
Ken,

That is a great point! I should start looking at that!

T.

Sent from my iPhone

Ken Field

unread,
Apr 5, 2015, 11:26:34 AM4/5/15
to Tiago Hori, hsiu-ching Ma, Brian Haas, trinityrn...@googlegroups.com
This thread reminds me of a question that I have had about overlapping reads and RSEM and eXpress.

When using the align_and_estimate_abundance.pl script, do overlapping reads lead to false increases in the apparent expression of the overlapped region for either RSEM or eXpress estimates of read counts? 

I think that this is important to know because longer reads are becoming the norm and overlaps will be very common, even after quality trimming, when we have 125 bp paired reads on 200 bp fragments. 

I know that the program flash can join these overlaps, but I ran into all kinds of problems in the downstream analysis when I tried to use flash to preprocess my reads for Trinity.

Ken

hsiu-ching Ma

unread,
Nov 5, 2015, 8:22:29 PM11/5/15
to Ken Field, Tiago Hori, Brian Haas, trinityrn...@googlegroups.com
Hi, 

I ran the following script:


perl /data/apps/trinity/r2015-2.1.1/util/bowtie_PE_separate_then_join.pl --seqType fq --left ../../../R207-P2-READ1-trimmed-paired-new.txt --right ../../../R207-P2-READ2-trimmed-paired-new.txt --target ../Trinity.fasta --SS_lib_type RF --aligner bowtie -- -p $CORES --all --best --strata -m 300

Then

perl /data/apps/trinity/r2015-2.1.1/util/SAM_nameSorted_to_uniq_count_stats.pl bowtie_out.nameSorted.bam



#read_type count pct
single 1 100.00

Total aligned reads: 1

But i am getting a wired output.

Thanks.

Best Regards
Karen



Brian Haas

unread,
Nov 5, 2015, 8:28:57 PM11/5/15
to hsiu-ching Ma, Ken Field, Tiago Hori, trinityrn...@googlegroups.com
Hi Karen,

If you run 
    samtools flagstat  bowtie_out.nameSorted.bam

does it report useful stats?

If you want to share the bam file with me, I can give it a look.

best,

~b

hsiu-ching Ma

unread,
Nov 5, 2015, 8:34:44 PM11/5/15
to Brian Haas, Ken Field, Tiago Hori, trinityrn...@googlegroups.com
Hi Brian,

Thanks for the email. I did what you say and below is what I got:

Of course, see attached my bam file.

$ samtools flagstat bowtie_out.nameSorted.bam
0 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
0 + 0 supplimentary
0 + 0 duplicates
0 + 0 mapped (-nan%:-nan%)
0 + 0 paired in sequencing
0 + 0 read1
0 + 0 read2
0 + 0 properly paired (-nan%:-nan%)
0 + 0 with itself and mate mapped
0 + 0 singletons (-nan%:-nan%)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)

Thanks.

Best Regards
Karen





bowtie_out.nameSorted.bam

Brian Haas

unread,
Nov 5, 2015, 9:05:59 PM11/5/15
to hsiu-ching Ma, Ken Field, Tiago Hori, trinityrn...@googlegroups.com
My best guess is that there's something about your input files that it doesn't like.

Is your R207-P2-READ2-trimmed-paired-new.txt  in fastq file format?

To test whether or not the bowtie-PE process can operate given your installation, and assuming that you have the very latest Trinity version installed, you can 

   cd trinityrnaseq/sample_data/test_Trinity_Assembly/

then
  
    make test_trinity

to build a small trinity assembly, and then

   make test_bowtie_PE_read_estimates

to test the bowtie-PE process.

This should tell us whether or not the system is functioning as intended, and then we can focus on what's different about your data.

best,

~b

hsiu-ching Ma

unread,
Nov 5, 2015, 9:11:11 PM11/5/15
to Brian Haas, Ken Field, Tiago Hori, trinityrn...@googlegroups.com
It is so strange because i ran the same script using 2.0,3 a few months ago and everything worked last time. 

The R207-P2-READ2-trimmed-paired-new.txt  is indeed in fastq format, but then again, i used the same format last time as well. 

is Trinity 2.1.1 compatible with bowtie/0.12.8 and samtools 1,1?

Thanks.

Best Regards
Karen



Brian Haas

unread,
Nov 5, 2015, 9:16:15 PM11/5/15
to hsiu-ching Ma, Ken Field, Tiago Hori, trinityrn...@googlegroups.com
It should be.  Let's see how the tests go.

~b

hsiu-ching Ma

unread,
Nov 5, 2015, 9:21:43 PM11/5/15
to Brian Haas, Ken Field, Tiago Hori, trinityrn...@googlegroups.com
Has it got anything to do with the new Trinity naming stystem?


Output of running Trinity/2.0.3:

transcript_id gene_id length effective_length expected_count TPM FPKM IsoPct
c10000_g1_i1 c10000_g1 406 207.11 5.00 0.24 0.18 100.00
c10001_g1_i1 c10001_g1 383 184.82 8.00 0.42 0.33 100.00




Output of running Trinity/2.1.1:

transcript_id gene_id length effective_length expected_count TPM FPKM IsoPct
TRINITY_DN10000_c0_g1_i1 TRINITY_DN10000_c0_g1 378 180.67 20.91 0.58 0.81 100.00
TRINITY_DN10000_c0_g2_i1 TRINITY_DN10000_c0_g2 840 639.92 73.09 0.57 0.80 100.00

Thanks.

Best Regards
Karen

hsiu-ching Ma

unread,
Nov 5, 2015, 9:24:50 PM11/5/15
to Brian Haas, Ken Field, Tiago Hori, trinityrn...@googlegroups.com
Thu Nov 05 18:24:20 hcma@compute-1-13:/data/apps/trinity/r2015-2.1.1
1020 $ ls
Analysis       hpc_conf     Makefile   README         Trinity             util
Butterfly      Inchworm     ,make.log  README.md      Trinity.original
Chrysalis      LICENSE      notes      Release.Notes  trinity-plugins
galaxy-plugin  LICENSE.txt  PerlLib    sample_data    trinityrnaseq.wiki


This is all i see, how can i 

cd trinityrnaseq/sample_data/test_Trinity_Assembly/

Thanks

Brian Haas

unread,
Nov 5, 2015, 9:25:28 PM11/5/15
to hsiu-ching Ma, Ken Field, Tiago Hori, trinityrn...@googlegroups.com
It shouldn't.   Again, let's see how the test goes given the sample data provided. 

hsiu-ching Ma

unread,
Nov 5, 2015, 9:33:11 PM11/5/15
to Brian Haas, Ken Field, Tiago Hori, trinityrn...@googlegroups.com
Sorry, i found 'sample_data', but then

Thu Nov 05 18:31:36 hcma@compute-1-13:/data/apps/trinity/r2015-2.1.1/sample_data/test_Trinity_Assembly
1028 $ ls
cleanme.pl                 Makefile        reads2.left.fq.gz   reads.right.fq.gz
__indiv_ex_sample_derived  misc_run_tests  reads2.right.fq.gz  runMe.sh
longReads.fa               README          reads.left.fq.gz    test_FL.sh

Thu Nov 05 18:31:37 hcma@compute-1-13:/data/apps/trinity/r2015-2.1.1/sample_data/test_Trinity_Assembly
1029 $ make test_trinity
./runMe.sh
module () {  eval `/data/apps/modules/Modules/$MODULE_VERSION/bin/modulecmd bash $*`
}
#!/bin/bash -ve


if [ -e reads.right.fq.gz ] && [ ! -e reads.right.fq ]; then
    gunzip -c reads.right.fq.gz > reads.right.fq
fi
./runMe.sh: line 5: reads.right.fq: Permission denied
make: *** [test_trinity] Error 1


Thanks

Brian Haas

unread,
Nov 5, 2015, 9:40:56 PM11/5/15
to hsiu-ching Ma, Ken Field, Tiago Hori, trinityrn...@googlegroups.com
I see....  Yes, it's going to be difficult to do this given a central installation.

In any case, you'll find sample data there that you can explore and attempt to execute the various steps with.




hsiu-ching Ma

unread,
Nov 6, 2015, 1:28:08 PM11/6/15
to Brian Haas, Ken Field, Tiago Hori, trinityrn...@googlegroups.com
I have asked the people who installed it to test it for me as suggested. I will let you know what they says.

Thanks.

Best Regards
Karen

Brian Haas

unread,
Nov 6, 2015, 1:45:32 PM11/6/15
to hsiu-ching Ma, Ken Field, Tiago Hori, trinityrn...@googlegroups.com
sounds good.  best of luck!

~b

hsiu-ching Ma

unread,
Nov 6, 2015, 2:50:56 PM11/6/15
to Brian Haas, Ken Field, Tiago Hori, trinityrn...@googlegroups.com
Hi Brian,

The people from hpc installed an app to test gam files and these are the 2 results i got using a ban output that has not worked with the 'SAM_nameSorted_to_uniq_count_stats.pl'

-------------------

output using files that has NOT worked with SAM_nameSorted_to_uniq_count_stats.pl:

Number of records read = 391736570
Number of valid records = 391736570

TotalReads(e6) 391.74
MappedReads(e6) 391.74
PairedReads(e6) 391.74
ProperPair(e6) 391.74
DuplicateReads(e6) 0.00
QCFailureReads(e6) 0.00

MappingRate(%) 100.00
PairedReads(%) 100.00
ProperPair(%) 100.00
DupRate(%) 0.00
QCFailRate(%) 0.00

TotalBases(e6) 38234.95
BasesInMappedReads(e6) 38234.95
Returning: 0 (SUCCESS)

[1]+  Done                    bam validate --in bowtie.bam


--------------------------

output using files that has worked with SAM_nameSorted_to_uniq_count_stats.pl:

Number of records read = 568404620
Number of valid records = 568404620

TotalReads(e6) 568.40
MappedReads(e6) 568.40
PairedReads(e6) 568.40
ProperPair(e6) 568.40
DuplicateReads(e6) 0.00
QCFailureReads(e6) 0.00

MappingRate(%) 100.00
PairedReads(%) 100.00
ProperPair(%) 100.00
DupRate(%) 0.00
QCFailRate(%) 0.00

TotalBases(e6) 55461.44
BasesInMappedReads(e6) 55461.44
Returning: 0 (SUCCESS)
jobs
[1]+  Done                    bam validate --in bowtie.bam


-----------------------

They seems pretty similar, so does this means the problem is with the 'SAM_nameSorted_to_uniq_count_stats.pl'?

Thanks.

Best Regards
Karen

Brian Haas

unread,
Nov 6, 2015, 2:54:34 PM11/6/15
to hsiu-ching Ma, Ken Field, Tiago Hori, trinityrn...@googlegroups.com
Hi Karen,

What happens when you run the SAM_nameSorted_to_uniq_count_stats.pl script on one of those bam files?

~b

hsiu-ching Ma

unread,
Nov 6, 2015, 2:59:09 PM11/6/15
to Brian Haas, Ken Field, Tiago Hori, trinityrn...@googlegroups.com
Output of running  SAM_nameSorted_to_uniq_count_stats.pl script:

1) didn;t work with 2.1.1:

perl /data/apps/trinity/r2015-2.1.1/util/SAM_nameSorted_to_uniq_count_stats.pl bowtie_out.nameSorted.bam



#read_type count pct
single 1 100.00

Total aligned reads: 1

But i am getting a wired output.


2) worked with 2.0.3 in the past:

$ cat SAM_nameSorted_merged_with_new.o3279429 

#read_type count pct
proper_pairs 224421146 80.03
improper_pairs 45037334 16.06
right_only 5685546 2.03
left_only 5282764 1.88

Total aligned reads: 280426790

Thanks.

Karen

hsiu-ching Ma

unread,
Nov 6, 2015, 3:01:48 PM11/6/15
to Brian Haas, Ken Field, Tiago Hori, trinityrn...@googlegroups.com
Hi Brian,

The hpc people have copied the files to my dir, so i can test it.

I did 'make test_trinity' and this is what i got below: 


$ make test_trinity
./runMe.sh
module () {  eval `/data/apps/modules/Modules/$MODULE_VERSION/bin/modulecmd bash $*`
}
#!/bin/bash -ve


if [ -e reads.right.fq.gz ] && [ ! -e reads.right.fq ]; then
    gunzip -c reads.right.fq.gz > reads.right.fq
fi

if [ -e reads.left.fq.gz ] && [ ! -e reads.left.fq ]; then
    gunzip -c reads.left.fq.gz > reads.left.fq
fi

if [ -e reads2.right.fq.gz ] && [ ! -e reads2.right.fq ]; then
    gunzip -c reads2.right.fq.gz > reads2.right.fq
fi

if [ -e reads2.left.fq.gz ] && [ ! -e reads2.left.fq ]; then
    gunzip -c reads2.left.fq.gz > reads2.left.fq
fi



#######################################################
##  Run Trinity to Generate Transcriptome Assemblies ##
#######################################################

../../Trinity --seqType fq --max_memory 2G --left reads.left.fq.gz,reads2.left.fq.gz --right reads.right.fq.gz,reads2.right.fq.gz --SS_lib_type RF --CPU 4 --no_cleanup
./runMe.sh: line 26: ../../Trinity: No such file or directory
make: *** [test_trinity] Error 1


Thanks a lot for your time.
Karen

Brian Haas

unread,
Nov 6, 2015, 3:14:10 PM11/6/15
to hsiu-ching Ma, Ken Field, Tiago Hori, trinityrn...@googlegroups.com

If you make your bam file available to me, I'll give it a look.  It isn't obvious to me why it's causing trouble.


hsiu-ching Ma

unread,
Nov 6, 2015, 3:18:15 PM11/6/15
to Brian Haas, Ken Field, Tiago Hori, trinityrn...@googlegroups.com
Hi Brian,

Please see attached, this is the bam file (using r2015-2.1.1) that has not worked with the script 'SAM_nameSorted_to_uniq_count_stats.pl

Thanks.
Karen
bowtie_out.nameSorted.bam

hsiu-ching Ma

unread,
Nov 6, 2015, 3:18:54 PM11/6/15
to Brian Haas, Ken Field, Tiago Hori, trinityrn...@googlegroups.com
Hi Brian, did you see this?

Brian Haas

unread,
Nov 6, 2015, 3:24:03 PM11/6/15
to hsiu-ching Ma, Ken Field, Tiago Hori, trinityrn...@googlegroups.com
That bam file is empty - no reads.  I'll look into why the script is reporting 1 read, as that's incorrect.

If the bam file has no reads, then it sounds like a problem earlier on, and it's a different problem than the counting script not working on your other large bam files correctly.

Let's focus for a moment on the large bam files for which you have statistics indicating that there are plenty of reads.  What happens when you run the counting script on one of them?

As for running the 'make' tests in Trinity, it's another can of worms, and I honestly just don't have resources to begin exploring it, unfortunately, unless it's a problem that many are encountering.


hsiu-ching Ma

unread,
Nov 6, 2015, 6:20:03 PM11/6/15
to Brian Haas, Ken Field, Tiago Hori, trinityrn...@googlegroups.com
Hi Brian,

Sorry for the delay in replying.

Using the 'SAM_nameSorted_to_uniq_count_stats.pl' from r-2015.2.0.3

#read_type count pct
proper_pairs 224421146 80.03
improper_pairs 45037334 16.06
right_only 5685546 2.03
left_only 5282764 1.88

Total aligned reads: 280426790


However, i ran the script from trinity/r-2015.2.1.1 using the same dataset, this is what i got:

perl /data/apps/trinity/r2015-2.1.1/util/SAM_nameSorted_to_uniq_count_stats.pl bowtie_out.nameSorted.bam 
Can't exec "samtools": No such file or directory at /data/apps/trinity/r2015-2.1.1/util/../PerlLib/SAM_reader.pm line 35.
Error, cannot open file bowtie_out.nameSorted.bam at /data/apps/trinity/r2015-2.1.1/util/../PerlLib/SAM_reader.pm line 34.
SAM_reader::_init('SAM_reader=HASH(0x1585768)') called at /data/apps/trinity/r2015-2.1.1/util/../PerlLib/SAM_reader.pm line 24
SAM_reader::new('SAM_reader', 'bowtie_out.nameSorted.bam') called at /data/apps/trinity/r2015-2.1.1/util/SAM_nameSorted_to_uniq_count_stats.pl line 58



Does this mean anything?

Thanks

Best Regards
Karen


hsiu-ching Ma

unread,
Nov 6, 2015, 6:24:03 PM11/6/15
to Brian Haas, Ken Field, Tiago Hori, trinityrn...@googlegroups.com
As for running the 'make' tests in Trinity, it's another can of worms, and I honestly just don't have resources to begin exploring it, unfortunately, unless it's a problem that many are encountering............i see, so does this means that there is problem with my actual Trinity.fasta file, or if i got an '' at the end of running it, then things are fine?

Thanks.

Karen

Brian Haas

unread,
Nov 6, 2015, 9:26:51 PM11/6/15
to hsiu-ching Ma, Ken Field, Tiago Hori, trinityrn...@googlegroups.com
For some reason, the version of the script that's breaking is not finding the 'samtools' software in your PATH, but apparently the first one is.   That script hasn't changed between the releases, so it sounds like it's some issue with the computing environment.   If you have a version that's working, then use that one - for lack of better advice.
Reply all
Reply to author
Forward
0 new messages