Merging multiple bam files to perform genome-guided assembly

989 views
Skip to first unread message

Lucila Traverso

unread,
Mar 27, 2018, 10:36:42 PM3/27/18
to trinityrnaseq-users
Hi all,

I am new to Trinity and trying to perform a genome-guided assembly. The Trinity version I am using is v2.1.1. I have 8 PE samples. As I first aligned my reads with STAR, I have one bam file for each sample, so 8 in total. I ran samtools merge to obtain a single bam file. My question is: Do I have to run samtools sort after this, or the resulting file of the merge step is enough to use in --genome_guided_bam?

Once I have this done, I am planning to run Trinity with all the 16 fq files and the merged bam to obtain a single assembly. Is my approach correct?

Thanks a lot for your help and advices.

Best,
Lucila.

Brian Haas

unread,
Mar 28, 2018, 8:25:48 AM3/28/18
to Lucila Traverso, trinityrnaseq-users
Hi Lucila,

After you merge the bam files, you'd need to coordinate-sort them.  (samtools sort)

Before going through all this, just be sure to read through:

You might want to also compare running Stringtie and doing full genome-free denovo assembly to see how they all fare.

best,

~brian


--
You received this message because you are subscribed to the Google Groups "trinityrnaseq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to trinityrnaseq-users+unsub...@googlegroups.com.
To post to this group, send email to trinityrnaseq-users@googlegroups.com.
Visit this group at https://groups.google.com/group/trinityrnaseq-users.
For more options, visit https://groups.google.com/d/optout.



--
--
Brian J. Haas
The Broad Institute
http://broadinstitute.org/~bhaas

 

Lucila Traverso

unread,
Apr 6, 2018, 2:50:22 PM4/6/18
to trinityrnaseq-users
Thank you so much, Brian!
I have another question: I am using paired end reads for the assembly, but also some single end reads (that survived the Trimmomatic step as unpaired). I joined them all in a single file (with cat) and added them as --left. But I haven't aligned them with STAR, so the bam file only contains the coordinates for the paired end-reads. Do you think that it can be a problem? Would it be better to also align them and add that information to the bam file?

Thank you again, this group is very helpful for me!

Lucila.
 

Brian Haas

unread,
Apr 6, 2018, 3:34:12 PM4/6/18
to Lucila Traverso, trinityrnaseq-users
Hi Lucila,

I haven't tried mixing paired and single-end data w/ the genome guided pipeline.   I'm not anticipating any trouble, but wouldn't be surprised if it broke.  You can give it a try.   You'd need to align the PE and SE reads separately, then merge and coord-sort before using it as input.

If it breaks, I could probably make it work w/ a little effort.

best,

~b

--
You received this message because you are subscribed to the Google Groups "trinityrnaseq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to trinityrnaseq-users+unsub...@googlegroups.com.
To post to this group, send email to trinityrnaseq-users@googlegroups.com.
Visit this group at https://groups.google.com/group/trinityrnaseq-users.
For more options, visit https://groups.google.com/d/optout.

Lucila Traverso

unread,
Apr 6, 2018, 3:50:21 PM4/6/18
to trinityrnaseq-users
I have run it as I told you, with paired and single end reads, but only with the paired reads bam file. The phase 1 was completed succesfully, but the phase 2 gave me an error that appears a lot of times (42MB log file) and starts with:

** The inchworm process failed.sh: /usr/local/bioinformatic/trinityrnaseq-2.1.1/Inchworm/bin//inchworm: No such file or directory

then it finishes saying


We are sorry, commands in file: [FailedCommands] failed.  :-(

Error, cmd: /usr/local/bioinformatic/trinityrnaseq-2.1.1/trinity-plugins/parafly/bin/ParaFly -c trinity_GG.cmds -CPU 10 -v  died with ret 256 at /usr/local/bioinformatic/trinityrnaseq-2.1.1/Trinity line 2183.

I think that this is an installation problem so I am trying to solve it. Then I will try with your advice.

Thank you so much,
Lucila.



Brian Haas

unread,
Apr 6, 2018, 3:58:43 PM4/6/18
to Lucila Traverso, trinityrnaseq-users
yeah, looks like an installation issue.

Running 'make' 
or 'make clean && make'
in the base installation directory should solve this.

best,

~b

--
You received this message because you are subscribed to the Google Groups "trinityrnaseq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to trinityrnaseq-users+unsub...@googlegroups.com.
To post to this group, send email to trinityrnaseq-users@googlegroups.com.
Visit this group at https://groups.google.com/group/trinityrnaseq-users.
For more options, visit https://groups.google.com/d/optout.

Lucila Traverso

unread,
Apr 13, 2018, 1:16:19 PM4/13/18
to trinityrnaseq-users
Thanks Brian, it worked! Apparently with the mix of paired- and single-end input files.
But I have a doubt: after Phase 2, the program stopped and the log file ends with:

All commands completed successfully. :-)

Friday, April 13, 2018: 11:59:30    CMD: find Dir_*  -name '*inity.fasta'  | /usr/local/bioinformatic/trinityrnaseq-2.1.1/util/support_scripts/GG_partitioned_trinity_aggregator.pl TRINITY_GG > Trinity-GG.fasta.tmp

Finished. See Trinity-GG.fasta for reconstructed transcripts



I was expecting Phase 3 to start. Is something wrong with my files? Is something missing in my assembly results?

Thank you so much.
Lucila.


Brian Haas

unread,
Apr 14, 2018, 7:56:35 AM4/14/18
to Lucila Traverso, trinityrnaseq-users
Hi Lucila,

It's all good.  There's just 2 phases in Trinity v2:   (1) partitioning reads into clusters, and (2) assembling the reads.   But there are 3 primary programs involved, which is where Trinity gets its name from.

best,

~b


--
You received this message because you are subscribed to the Google Groups "trinityrnaseq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to trinityrnaseq-users+unsub...@googlegroups.com.
To post to this group, send email to trinityrnaseq-users@googlegroups.com.
Visit this group at https://groups.google.com/group/trinityrnaseq-users.
For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages