De novo, GG and merging assemblies with CAP3

346 views
Skip to first unread message

jeremy le luyer

unread,
May 21, 2015, 2:04:26 PM5/21/15
to trinityrn...@googlegroups.com
Hi Trinity users,

I completed the assemblies of 6 samples (141M reads 100pb PE) using Trinity in de novo and genome-guided for a polyploid fish species for which a good quality genome was made recently available. I also did the assembly of the exact same reads with Trans-Abyss.

According to this paper: http://www.biomedcentral.com/content/pdf/s12864-014-1192-7.pdf , merging assemblies from different tools and reassemble with CAP3 seems to work pretty well. 

Would it be a good idea to try reassemble my 3 assemblies ? Would I loose some important features from my GG assembly (I mean information for paralogs segregation for instance) ?

Also, I noted that significantly less bases (226836315 vs 318550269) were assembled with Trinity in de novo vs. GG. Here are some of the results:


Trinity_denovo.fasta

Total trinity 'genes': 174317

Total trinity transcripts: 305694

Percent GC: 47.47

Contig N50: 2140

Median contig length: 466.5

Average contig: 1042.06

Total assembled bases: 318550269

min length: 200bp



Trinity-GG.fasta

Total trinity 'genes': 258910

Total trinity transcripts: 287074

Percent GC: 46.20

Contig N50: 1361

Median contig length: 408

Average contig: 790.17

Total assembled bases: 226836315

min length: 224bp


Regards,

Jérémy

Tiago Hori

unread,
May 21, 2015, 2:52:24 PM5/21/15
to jeremy le luyer, trinityrn...@googlegroups.com
Re-assembling is usually not a good idea because assemblers such as Trinity and Trans-abyss are not meant not deal with long sequences. 

I prefer CD-hit, but CAP3 should do the trick. When you have multiple assemblies, my opinion is that they should be treated as ESTs. 

If you multiple k-met assemblies in Transabyss, I would merger those within Transabyss. The same goes for Velvet/Oasis.

T.

Sent from my iPhone
--
You received this message because you are subscribed to the Google Groups "trinityrnaseq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to trinityrnaseq-u...@googlegroups.com.
To post to this group, send email to trinityrn...@googlegroups.com.
Visit this group at http://groups.google.com/group/trinityrnaseq-users.
For more options, visit https://groups.google.com/d/optout.

jeremy le luyer

unread,
May 21, 2015, 3:46:08 PM5/21/15
to trinityrn...@googlegroups.com

Hi,

Yes, it was the point of using Trans-abyss to try different k-mer lengths. I actually already merged different k-mer assemblies within Trans-abyss to finally obtain a merged Trans-Abyss assembly.

According to He et al. (2015), 'best' results are apparently obtained when merging different assemblies from different assemblers (Soaptrans, Oases and Edena) and re-assemble using CAP3.

When using, cd-hit (I guess cd-hit-est), do you usually keep default % identity (0.9) ?

Thank you !

Jérémy

Brian Haas

unread,
May 21, 2015, 3:51:27 PM5/21/15
to jeremy le luyer, trinityrn...@googlegroups.com
Different groups have used different metrics for defining 'best'. I'm very curious to know what 'best' is under the latest/greatest methods:


The cap3 method could certainly help for some subset of transcripts, but I do wonder if it'll do more harm than good in the entire scheme of things (merging transcripts that shouldn't be merged).  You'll want to be stringent in your parameter settings.

~b

--
You received this message because you are subscribed to the Google Groups "trinityrnaseq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to trinityrnaseq-u...@googlegroups.com.
To post to this group, send email to trinityrn...@googlegroups.com.
Visit this group at http://groups.google.com/group/trinityrnaseq-users.
For more options, visit https://groups.google.com/d/optout.



--
--
Brian J. Haas
The Broad Institute
http://broadinstitute.org/~bhaas

 

Mark Chapman

unread,
May 21, 2015, 4:32:33 PM5/21/15
to Brian Haas, jeremy le luyer, trinityrn...@googlegroups.com
Hi all,
I wonder if the multiple isoforms per 'gene from trinity would all get assembled together even in the absence of the transabyss data. This might be a good place to start to see if CAP3 (or CD-HIT-EST) is over-zealous in its collapsing of contigs and could be used to see the effect of changing the parameters.
Thanks, Mark
Dr. Mark A. Chapman
+44 (0)2380 594396
------------------------------------
Centre for Biological Sciences
University of Southampton
Life Sciences Building 85
Highfield Campus
Southampton
SO17 1BJ

Tiago Hori

unread,
May 21, 2015, 5:00:51 PM5/21/15
to Brian Haas, jeremy le luyer, trinityrn...@googlegroups.com
Yep. Pretty much what Brian said. I found that by detonate metrics the merged assembly is not always the best, for the reasons he pointed out. 

I am going to go on a lim here and assume you work with salmonids, cause nobody but myself is crazy enough to work with sturgeon. The average similarly between recent paralogs in salmonids is around 90%, I would go at least 95%

T.

Sent from my iPhone

Tiago Hori

unread,
May 21, 2015, 5:03:47 PM5/21/15
to Brian Haas, jeremy le luyer, trinityrn...@googlegroups.com
Also to Brian's point I worry that they did not measure re-mapping as a quality metric. Proper mapping is to me the best measure of weather paralogs have been properly separated.

And because re-mapping is the base of DE, I find it an essential quality metric.

T.

Sent from my iPhone

On May 21, 2015, at 4:51 PM, Brian Haas <bh...@broadinstitute.org> wrote:

Farbod Emami

unread,
Dec 13, 2015, 2:58:26 PM12/13/15
to trinityrnaseq-users, bh...@broadinstitute.org, jeremy....@gmail.com
Dear Tiago, Hi
I am working on sturgeon fish and I have many isoforms per gene and transcripts and i do not know to how I could minimize this huge un-necessary numbers ? it is really painful in DEG analysis as I have a transcripts with 12 isoforms that one of them is highly differentially expressed and other 11 are co-expressed!
To unsubscribe from this group and stop receiving emails from it, send an email to trinityrnaseq-users+unsub...@googlegroups.com.

To post to this group, send email to trinityrn...@googlegroups.com.
Visit this group at http://groups.google.com/group/trinityrnaseq-users.
For more options, visit https://groups.google.com/d/optout.



--
--
Brian J. Haas
The Broad Institute
http://broadinstitute.org/~bhaas

 

--
You received this message because you are subscribed to the Google Groups "trinityrnaseq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to trinityrnaseq-users+unsub...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages