When "Quality trimming using Trimmomatic" is really needed?

739 views
Skip to first unread message

Farbod Emami

unread,
Oct 1, 2015, 3:45:07 AM10/1/15
to trinityrnaseq-users
Dear Brian, Hi.
First of all let me thank you for this new version of Trinity , it seems it is more easy to install and the documentation archiving seems better than the old one!

I am using the Trinity package for de novo assembly of the transcriptome of a non-model fish. So as I gave many reads in my illumina paired-end fastq, I usually use the "--normalize_reads " parameter.
Before running Trinity, I have checked the quality of my fastq files with fastQC program and there were no adapter or low quality situations.
Do I need to run "--quality_trimming_params " ? 
Does it has any effect of improving the assembly procedure (e.g creating  robust contigs or better isoforms or . . . ) or not ?
Thank you again

Mark Chapman

unread,
Oct 1, 2015, 4:07:07 AM10/1/15
to Farbod Emami, trinityrnaseq-users
Hi Farbod,

My experience is that quality trimming can't hurt. You're not going to lose anything, and might gain a few more robust contigs at the expense of split or poorly supported ones. My advice would be to try a parallel assembly with trimming and see if the results look particularly different. If you're starting from scratch Id say definitely do it but if you're wondering about an older assembly then maybe see if the results are particularly different before thinking you need to redo every analysis.

BW, Mark

--
You received this message because you are subscribed to the Google Groups "trinityrnaseq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to trinityrnaseq-u...@googlegroups.com.
To post to this group, send email to trinityrn...@googlegroups.com.
Visit this group at http://groups.google.com/group/trinityrnaseq-users.
For more options, visit https://groups.google.com/d/optout.



--
Dr. Mark A. Chapman
+44 (0)2380 594396
------------------------------------
Centre for Biological Sciences
University of Southampton
Life Sciences Building 85
Highfield Campus
Southampton
SO17 1BJ

Farbod Emami

unread,
Oct 1, 2015, 4:14:01 AM10/1/15
to trinityrnaseq-users, farbo...@gmail.com
Dear Dr. Mark
Hi .
Yes, I am trying the new version of Trinity to assembly again from beginning, :)

So, 
1- what parameter for "--trimmomatic" do you suggest?

2- for normalization I usually use just "--normalize_reads" , is it enough or I must add any extera parameter wit it?

Thank you and best wishes
Farbod

On Thursday, October 1, 2015 at 11:37:07 AM UTC+3:30, Mark Chapman wrote:
Hi Farbod,

My experience is that quality trimming can't hurt. You're not going to lose anything, and might gain a few more robust contigs at the expense of split or poorly supported ones. My advice would be to try a parallel assembly with trimming and see if the results look particularly different. If you're starting from scratch Id say definitely do it but if you're wondering about an older assembly then maybe see if the results are particularly different before thinking you need to redo every analysis.

BW, Mark
On 1 October 2015 at 08:45, Farbod Emami <farbo...@gmail.com> wrote:
Dear Brian, Hi.
First of all let me thank you for this new version of Trinity , it seems it is more easy to install and the documentation archiving seems better than the old one!

I am using the Trinity package for de novo assembly of the transcriptome of a non-model fish. So as I gave many reads in my illumina paired-end fastq, I usually use the "--normalize_reads " parameter.
Before running Trinity, I have checked the quality of my fastq files with fastQC program and there were no adapter or low quality situations.
Do I need to run "--quality_trimming_params " ? 
Does it has any effect of improving the assembly procedure (e.g creating  robust contigs or better isoforms or . . . ) or not ?
Thank you again

--
You received this message because you are subscribed to the Google Groups "trinityrnaseq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to trinityrnaseq-users+unsub...@googlegroups.com.

To post to this group, send email to trinityrn...@googlegroups.com.
Visit this group at http://groups.google.com/group/trinityrnaseq-users.
For more options, visit https://groups.google.com/d/optout.

Matthew MacManes

unread,
Oct 1, 2015, 6:07:03 AM10/1/15
to Farbod Emami, trinityrnaseq-users
Mark and others, 

You’ve just touched upon my pet-peeve issue.. trimming.. Quality trimming CAN and DOES (often) make assemblies worse. See http://journal.frontiersin.org/article/10.3389/fgene.2014.00013/abstract for a description of the effects. 

Trinity includes the trimming recommendations from the above paper. It would be surprising to me that raw Illumina data would NOT contain any adapters, even if FastQC says otherwise. I’d go ahead and trim using the Trinity default using the --trimmomatic` command without specifying anything else, which will take care of bases with a Phred score <5, and adapters. You can combine this with --normalize_reads also passed to Trinity without other modification. 

Matt
______________________________________________
Matthew MacManes, Ph.D. 
University of New Hampshire  I  Assistant Professor of Genome Enabled Biology
Department of Molecular, Cellular, & Biomedical Sciences
Durham, NH  03824
Phone: 603-862-4052  I  Twitter: @macmanes | Web: genomebio.org
Office: 189 Rudman Hall | Laboratory: 145 Rudman Hall

To unsubscribe from this group and stop receiving emails from it, send an email to trinityrnaseq-u...@googlegroups.com.

Tiago Hori

unread,
Oct 1, 2015, 6:29:36 AM10/1/15
to Matthew MacManes, Farbod Emami, trinityrnaseq-users
I'd say it also depends on how complex your transcriptome is. Our experience with some tetraploid fish genomes is that if your error rate increases, you get more mis-assembled chimeric contigs. We did trim more aggressively in that instance and it did help.

One thing to remember is that Trinity does not greedily extend the reads but rather the identified kmers and in the de-brujin graph the kmer is really the assembly unit not the read and that greatly reduces the impact of lower quality bases in the assembly.

T.

"Profanity the is the only language all programmers understand" 
Sent from my iPhone, the universal excuse for my poor spelling.

Farbod Emami

unread,
Oct 1, 2015, 6:34:58 AM10/1/15
to trinityrnaseq-users, macm...@gmail.com, farbo...@gmail.com
Thank you all,
So it seems that it is a "YES" answer to perform trimming.
and the preferred way is to just use "--trimmomatic" without any extra parameters in the script, right?

Farbod Emami

unread,
Oct 2, 2015, 3:11:23 AM10/2/15
to trinityrnaseq-users, macm...@gmail.com, farbo...@gmail.com
Dear Mark and other friends, Hi
I have finished a de novo assembly with --normalize and without trimmomatic first, and now I am performing another assembly with BOTH --normalize and --trimmomatic on the same data (it will be finish tomorrow).
So, what parameters (e.g in  TrinityStat output)  I must compare to see which assembly is much better ? N50 ? number of contigs ? number of Longest Isoform Per Gene ? 
which parameter?
Thank you in advance
Farbod

Mark Chapman

unread,
Oct 2, 2015, 5:01:51 AM10/2/15
to Farbod Emami, trinityrnaseq-users, macm...@gmail.com
Hi Farbod,
As discussed frequently on here there's no single statistic from Trinity that says assembly A is better or worse than assembly B. You can use detonate to compare your assemblies if you want to do this comprehensively. Some information from the trinity output might be useful but doesn't prove anything - eg if the N50 of one assembly was 500 and the N50 of another based on the same data was 20 then i would presume there's a problem with the latter, but slight differences are hard to interpret. If your N50 goes up a bit does this mean that you're assembling contigs better and longer? or does it mean you have some erroneous contigs where paralogous loci have been joined?
So I'd say interpret your Trinity output with a pinch of salt...
BW, Mark

Farbod
Matt
To unsubscribe from this group and stop receiving emails from it, send an email to trinityrnaseq-u...@googlegroups.com.

To post to this group, send email to trinityrn...@googlegroups.com.
Visit this group at http://groups.google.com/group/trinityrnaseq-users.
For more options, visit https://groups.google.com/d/optout.



--
Dr. Mark A. Chapman
------------------------------------
Centre for Biological Sciences
University of Southampton
Life Sciences Building 85
Highfield Campus
Southampton
SO17 1BJ

--
You received this message because you are subscribed to the Google Groups "trinityrnaseq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to trinityrnaseq-u...@googlegroups.com.

To post to this group, send email to trinityrn...@googlegroups.com.
Visit this group at http://groups.google.com/group/trinityrnaseq-users.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "trinityrnaseq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to trinityrnaseq-u...@googlegroups.com.

To post to this group, send email to trinityrn...@googlegroups.com.
Visit this group at http://groups.google.com/group/trinityrnaseq-users.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "trinityrnaseq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to trinityrnaseq-u...@googlegroups.com.

To post to this group, send email to trinityrn...@googlegroups.com.
Visit this group at http://groups.google.com/group/trinityrnaseq-users.
For more options, visit https://groups.google.com/d/optout.

Tiago Hori

unread,
Oct 2, 2015, 5:58:11 AM10/2/15
to Mark Chapman, Farbod Emami, trinityrnaseq-users, macm...@gmail.com
The new Trinity wiki has a real nice section in assembly quality assessment, I suggest reading it.

T.

"Profanity the is the only language all programmers understand" 
Sent from my iPhone, the universal excuse for my poor spelling.

Ken Field

unread,
Oct 2, 2015, 6:12:19 AM10/2/15
to Tiago Hori, Mark Chapman, Farbod Emami, trinityrnaseq-users, Matthew MacManes
I would also add BUSCO to that list. It is a great way to compare the representation of certain highly conserved genes in the two assemblies.

Ken
Ken Field, Ph.D.
Associate Professor of Biology
Program in Cell Biology/Biochemistry
Bucknell University
Room 203A Biology Building

Tiago Hori

unread,
Oct 2, 2015, 7:05:18 AM10/2/15
to Ken Field, Mark Chapman, Farbod Emami, trinityrnaseq-users, Matthew MacManes
Ken,

I was taking a quick peek at BUSCO and I could not get a clear picture of wether you do or do not need a reference transcriptome to use it, can give us a brief blurb of how it works?

T.

"Profanity the is the only language all programmers understand" 
Sent from my iPhone, the universal excuse for my poor spelling.
Reply all
Reply to author
Forward
0 new messages