Barcode trimming

564 views
Skip to first unread message

Kruttika Phalnikar

unread,
Mar 2, 2017, 10:07:57 AM3/2/17
to Qiime 1 Forum
Dear qiime community,

Thank you for reading this post

I am analysing 16s MiSeq data. I have received demultiplexed data files, forward and reverse reads for each sample. I was trying to remove barcodes /adaptors/primers from my sequences. However I think my sequences are adaptor/barcode free because:

I have tried to search for the barcode / primer /adaptor in few of my fastq files by simply opening the file in txt format and doing a ctrl+F search. I have used indices in the header line to search for these sequences in the reads.Except header line, I don't see these sequences anywhere in the main reads. Also I have tried to nBLAST single reads from the fastq files and I get the 100% query coverage for many of such reads. I assume if my fastq file had barcodes or adaptors, my entire read will not get a 100% coverage. In addition, if barcodes are present, then in a given sample the barcode sequence should be seen at the beginning/end of each read in a given fastq file? I don't really see this. This atleast indicates that there are no barcodes/adaptors in the sequencing reads. Is this logic correct?

I have barcode in my header though-

example:@MG00HS19:748:HH3V7BCXX:2:1101:2956:2184 1:N:0:CGGAGCCTAAGGCTAT
CCTACGGGGGGCTGCAGTCGAGAATTTTGGGCAATGGGGGCAACCCTGACCCAGCAATGCCGCGTGCGGGATGAAGGTCTTCGGATTGTAAACCGCTGTCAAGAGGGACGAATACAATTGACGGTACCTCTGGAGGAAGTCACGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGTGACAAGCGTTGTTCGGATTCATTGGGCGTAAAGGGTCCGCAGGTGGTTTGATAAGTCTGACGTGAAATAC
+
DDDDDIIIIIIIIIIIIIIIIHHIHIIIIIIHIIIIIIIIIHIIIIIIIIIIIIIIIIGIIIIHIHHIIGHHIIIIIIIGIIIIIIIHIIIIIIHIIIIIIEHHHIIIIIHIIIHIIIIHIIIIIIIIIIIIIHIIIGIIIIIIIIIHIIIHIHHHIIIIIHHIIHIIIHIHGHIH@HHIIHHIIIIHIHHIIIGHHHGHHIIHHHFEHCCHHHGHIHIHDCHCHIH9EGHF@@GHHFF?6GCEEHHHIHE

Multiple join paired ends /multiple split libraries fastq and rest of the commands work just fine. So the QIIME workflow is not giving any error

Do you think it should be fine to go ahead without any trimming?

On a slightly different note,
Does having barcodes / adaptors affect the results?
According to my knowledge, picking OTUs/assigning taxonomy is only dependent on similarities with V3-V4 region of 16s (I have sequenced V3-V4, that's why). So should these adaptor/barcode sequences interfere with the otu picking/taxonomy assignments?


Thank you so much
Kruttika

--

Kelsey Jesser

unread,
Mar 2, 2017, 11:29:11 AM3/2/17
to Qiime 1 Forum
Hi Kruttika,

That's how I receive my amplicon data as well. I'm new to bioinformatics and am not an expert by any means, but I like to run the FastQC check on my sequencing files as a first step. Among other things, this program will quickly check your sequences for common barcode/adapter sequences. You can download it here: http://www.bioinformatics.babraham.ac.uk/projects/fastqc/ and it can be accessed from both command line and GUI interfaces.

In terms of trimming/filtering I like bbmap's bbduk trimming/filtering program, which I run on my paired files before moving into Qiime. Here's a link: http://jgi.doe.gov/data-and-tools/bbtools/bb-tools-user-guide/bbduk-guide/. When my adapter content is very low I lose only a fraction of a percent of my total reads and when it is high it cleans them right up. 

Also, my understanding is that simply joining paired ends in Qiime is an excellent quality filtering step.

Hope that's helpful!

Kelsey

jonsan

unread,
Mar 2, 2017, 1:12:56 PM3/2/17
to Qiime 1 Forum
Hi Kruttika, 

This is correct! In this case, the barcodes are read in a separate read on the Illumina machine, and are only in the header and not the sequence read itself. You should be good to go!

-jon
Reply all
Reply to author
Forward
0 new messages