Trimmomatic - adapter contamination

276 views
Skip to first unread message

krabb...@gmail.com

unread,
Jul 5, 2017, 5:33:57 PM7/5/17
to trinityrnaseq-users
Trinity Users, 

I am trying to deposit Trinity assemblies in NCBI's TSA database, but there are significant hits against the UniVec database and the assemblies are not accepted by NCBI.  I ran Trimmomatic as part of the Trinity pipeline, but apparently some Illumina vectors made it through the trimming step.  NCBI's comment is to remove the adapters from raw reads and start over with the assembly.  I am wondering if this is a common issue for Trinity pipeline users or perhaps I have missed an important step in the pipeline?  Minimally, hopefully this message saves someone else from similar problems. 

Thank you, 
Trevor


Brian Haas

unread,
Jul 5, 2017, 8:35:55 PM7/5/17
to krabb...@gmail.com, trinityrnaseq-users
Hi Tevor,

If your experience is going to be anything like mine was, it's a long painful back-and-forth process with NCBI to get a transcriptome assembly submitted.     If NCBI would make their full pipeline available to users to run independently and resolve issues directly before submitting, then we could automate the clean-up process, but that wasn't the case for when I did this months ago.

My guess is that if you redo the assembly after more aggressive trimming, it'll still find some adaptors or sequences that it thinks are adaptors and reject.

I'd suggest asking for the full report of required corrections, including regions matching adaptors and obvious contaminants, and then remove those sequences directly (trim ends of offending contigs, mask internal regions with N characters if they'll allow that, or remove offending sequences altogether), then resubmit.  If they come back with a bunch of new offending sequences and regions that weren't in the original report, you'll have grounds to complain.

I'm still suffering PTSD from my experience here.

best wishes,

~b

--
You received this message because you are subscribed to the Google Groups "trinityrnaseq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to trinityrnaseq-users+unsub...@googlegroups.com.
To post to this group, send email to trinityrnaseq-users@googlegroups.com.
Visit this group at https://groups.google.com/group/trinityrnaseq-users.
For more options, visit https://groups.google.com/d/optout.



--
--
Brian J. Haas
The Broad Institute
http://broadinstitute.org/~bhaas

 

Pavel-EGT

unread,
Jul 14, 2017, 6:48:02 PM7/14/17
to trinityrnaseq-users
Hi,

I recently had the same issue, and I had to remove the contigs, the problem was that I already had done all the DGE analysis and more, so I had to re do almost everything.

I agree with Dr. Hass, it would be useful integrate all into trinotate.

regards
P~ 

Pavel-EGT

unread,
Jul 14, 2017, 6:51:41 PM7/14/17
to trinityrnaseq-users
Sorry I meant in trimmomatic trinity pipeline

El miércoles, 5 de julio de 2017, 15:33:57 (UTC-6), krabb...@gmail.com escribió:

krabb...@gmail.com

unread,
Aug 2, 2017, 11:38:47 AM8/2/17
to trinityrnaseq-users, krabb...@gmail.com
Hi Brian, 

Thank you for your reply.  To follow up, per NCBI's suggestion, I have run the following on raw reads to identify sequences that match the UniVec database:

blastn -task blastn -reward 1 -penalty -3 -evalue 700 -searchsp 1750000000000 -dust yes -gapopen 3 -gapextend 3 -query myreads -db UniVec  

Indeed, many raw reads (and contigs) have adapter contamination that was missed by Trimmomatic.  I have run CutAdapt/TrimGalore (with default parameter settings) and it seems to resolve this problem.  I do not know if adapter contamination is a ubiquitous problem, but Trinity users might consider trying TrimGalore instead of Trimmomatic.  The next step will be to see how adapter trimming affects assembly results.  

Hopefully someone else finds this useful. 

Best, 
Trevor


On Wednesday, July 5, 2017 at 8:35:55 PM UTC-4, Brian Haas wrote:
Hi Tevor,

If your experience is going to be anything like mine was, it's a long painful back-and-forth process with NCBI to get a transcriptome assembly submitted.     If NCBI would make their full pipeline available to users to run independently and resolve issues directly before submitting, then we could automate the clean-up process, but that wasn't the case for when I did this months ago.

My guess is that if you redo the assembly after more aggressive trimming, it'll still find some adaptors or sequences that it thinks are adaptors and reject.

I'd suggest asking for the full report of required corrections, including regions matching adaptors and obvious contaminants, and then remove those sequences directly (trim ends of offending contigs, mask internal regions with N characters if they'll allow that, or remove offending sequences altogether), then resubmit.  If they come back with a bunch of new offending sequences and regions that weren't in the original report, you'll have grounds to complain.

I'm still suffering PTSD from my experience here.

best wishes,

~b
On Wed, Jul 5, 2017 at 5:33 PM, <krabb...@gmail.com> wrote:
Trinity Users, 

I am trying to deposit Trinity assemblies in NCBI's TSA database, but there are significant hits against the UniVec database and the assemblies are not accepted by NCBI.  I ran Trimmomatic as part of the Trinity pipeline, but apparently some Illumina vectors made it through the trimming step.  NCBI's comment is to remove the adapters from raw reads and start over with the assembly.  I am wondering if this is a common issue for Trinity pipeline users or perhaps I have missed an important step in the pipeline?  Minimally, hopefully this message saves someone else from similar problems. 

Thank you, 
Trevor


--
You received this message because you are subscribed to the Google Groups "trinityrnaseq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to trinityrnaseq-users+unsub...@googlegroups.com.
To post to this group, send email to trinityrn...@googlegroups.com.

Brian Haas

unread,
Aug 2, 2017, 11:51:37 AM8/2/17
to krabb...@gmail.com, trinityrnaseq-users
Thanks for the info!

We should take this up with the Trimmomatic folks....  I wonder if there's some parameter adjustments that might be needed.

Would you be able to send me some example reads that were missed by trimmomatic but captured by TrimGalore?

We could swap out Trimmomatic for TrimGalore, but it would be easier to just reparameterize it or drop in a new version.

best,
~b

On Wed, Aug 2, 2017 at 11:38 AM, <krabb...@gmail.com> wrote:
Hi Brian, 

Thank you for your reply.  To follow up, per NCBI's suggestion, I have run the following on raw reads to identify sequences that match the UniVec database:

blastn -task blastn -reward 1 -penalty -3 -evalue 700 -searchsp 1750000000000 -dust yes -gapopen 3 -gapextend 3 -query myreads -db UniVec  

Indeed, many raw reads (and contigs) have adapter contamination that was missed by Trimmomatic.  I have run CutAdapt/TrimGalore (with default parameter settings) and it seems to resolve this problem.  I do not know if adapter contamination is a ubiquitous problem, but Trinity users might consider trying TrimGalore instead of Trimmomatic.  The next step will be to see how adapter trimming affects assembly results.  

Hopefully someone else finds this useful. 

Best, 
Trevor


On Wednesday, July 5, 2017 at 8:35:55 PM UTC-4, Brian Haas wrote:
Hi Tevor,

If your experience is going to be anything like mine was, it's a long painful back-and-forth process with NCBI to get a transcriptome assembly submitted.     If NCBI would make their full pipeline available to users to run independently and resolve issues directly before submitting, then we could automate the clean-up process, but that wasn't the case for when I did this months ago.

My guess is that if you redo the assembly after more aggressive trimming, it'll still find some adaptors or sequences that it thinks are adaptors and reject.

I'd suggest asking for the full report of required corrections, including regions matching adaptors and obvious contaminants, and then remove those sequences directly (trim ends of offending contigs, mask internal regions with N characters if they'll allow that, or remove offending sequences altogether), then resubmit.  If they come back with a bunch of new offending sequences and regions that weren't in the original report, you'll have grounds to complain.

I'm still suffering PTSD from my experience here.

best wishes,

~b
On Wed, Jul 5, 2017 at 5:33 PM, <krabb...@gmail.com> wrote:
Trinity Users, 

I am trying to deposit Trinity assemblies in NCBI's TSA database, but there are significant hits against the UniVec database and the assemblies are not accepted by NCBI.  I ran Trimmomatic as part of the Trinity pipeline, but apparently some Illumina vectors made it through the trimming step.  NCBI's comment is to remove the adapters from raw reads and start over with the assembly.  I am wondering if this is a common issue for Trinity pipeline users or perhaps I have missed an important step in the pipeline?  Minimally, hopefully this message saves someone else from similar problems. 

Thank you, 
Trevor


--
You received this message because you are subscribed to the Google Groups "trinityrnaseq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to trinityrnaseq-users+unsubscribe...@googlegroups.com.
To post to this group, send email to trinityrn...@googlegroups.com.
Visit this group at https://groups.google.com/group/trinityrnaseq-users.
For more options, visit https://groups.google.com/d/optout.



--
--
Brian J. Haas
The Broad Institute
http://broadinstitute.org/~bhaas

 

--
You received this message because you are subscribed to the Google Groups "trinityrnaseq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to trinityrnaseq-users+unsub...@googlegroups.com.
To post to this group, send email to trinityrnaseq-users@googlegroups.com.

Mark Chapman

unread,
Aug 2, 2017, 12:13:59 PM8/2/17
to Brian Haas, krabb...@gmail.com, trinityrnaseq-users
This may have nothing to do with it, but I see that trimmomatic wrapped in trinity pulls the "TruSeq3-PE.fa" file of adapters, but of course there are other files for other Hiseq/Truseq data. Could it just be that the wrong adapters were searched for?
Cheers, Mark

To post to this group, send email to trinityrnaseq-users@googlegroups.com.
Visit this group at https://groups.google.com/group/trinityrnaseq-users.
For more options, visit https://groups.google.com/d/optout.



--
--
Brian J. Haas
The Broad Institute
http://broadinstitute.org/~bhaas

 

--
You received this message because you are subscribed to the Google Groups "trinityrnaseq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to trinityrnaseq-users+unsub...@googlegroups.com.
To post to this group, send email to trinityrnaseq-users@googlegroups.com.
Visit this group at https://groups.google.com/group/trinityrnaseq-users.
For more options, visit https://groups.google.com/d/optout.



--
Dr. Mark A. Chapman
+44 (0)2380 594396
------------------------------------
Centre for Biological Sciences
University of Southampton
Life Sciences Building 85
Highfield Campus
Southampton
SO17 1BJ
Message has been deleted

Bob Zimmermann

unread,
Jan 30, 2018, 2:49:12 PM1/30/18
to trinityrnaseq-users
Thanks for the thread! I just got in touch with NCBI about this and first they described how the pipeline works (below) and also clarified that they are working on a way to make it public. Until then, seems like the easiest is to submit the transcriptome early and remove transcripts that are otherwise unsubmittable.

See below for the pipeline description. Best, Bob

Our suite of foreign contamination screens uses BLAST to screen the submitted sequences against:

1. a common contaminants database that contains vector sequences, bacterial insertion sequences, E. coli and phage genomes

2. a database of adaptors linkers and primers

3. a database of mitochondrial genomes

4. the chromosomes of unrelated organisms

5. a database of ribosomal RNA genes



Suspect spans are re-BLASTed against:

1. the chromosomes of unrelated organisms

2. the chromosomes of related organisms

3. the NCBI nt BLAST database of nucleotide sequence from all traditional divisions of GenBank, EMBL, and DDBJ

4. the NCBI htgs BLAST database of sequences from the HTG division of GenBank, EMBL, and DDBJ 

Tristan Lefebure

unread,
Jan 30, 2018, 5:16:09 PM1/30/18
to Mark Chapman, Brian Haas, krabb...@gmail.com, trinityrnaseq-users
I agree with Mark: trimmomatic, as run by Trinity, searches for TrueSeq adapters. Depending on your library preparation this might not be enough.
Best
--
Tristan

--
Dr. Mark A. Chapman
------------------------------------
Centre for Biological Sciences
University of Southampton
Life Sciences Building 85
Highfield Campus
Southampton
SO17 1BJ

--

Brian Haas

unread,
Jan 30, 2018, 7:59:14 PM1/30/18
to trinityrnaseq-users
NCBI's public release of their process is way overdue.  We all eagerly await it. :-)

Björn Usadel

unread,
Feb 1, 2018, 6:34:47 AM2/1/18
to trinityrnaseq-users
Did you figure out if it was the adapter definition? If not please let us know so we can adapt trimmomatic.
(TrueSeq3 works in most cases but not all so it is a good default)

Thijmen18

unread,
Jan 10, 2019, 2:49:29 PM1/10/19
to trinityrnaseq-users
Hi all,

I posted a question about a similar issue here: https://groups.google.com/forum/#!topic/trinityrnaseq-users/zPWYwb8SGQI
Any thoughts about how to deal with this issue are most welcome..

Op woensdag 5 juli 2017 23:33:57 UTC+2 schreef krabb...@gmail.com:

David Mathog

unread,
Jan 14, 2019, 12:30:10 AM1/14/19
to Thijmen18, trinityrnaseq-users
https://github.com/trinityrnaseq/trinity_community_codebase/wiki/Removing-adapters-from-Trinity-Transcriptome-assemblies
https://github.com/trinityrnaseq/trinity_community_codebase/blob/master/trim_adapters.pl

Regards,

David Mathog
> --
> You received this message because you are subscribed to the Google Groups "trinityrnaseq-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to trinityrnaseq-u...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages