Hi!
I'm analyzing different datasets from 16S projects from different articles. The authors of a research realized in Kenia amplified the v4 hypervariable region of 16s to obtain their taxonomic profile (
https://bmcmicrobiol.biomedcentral.com/articles/10.1186/s12866-016-0748-x). I downloaded de .fastq that they published (its one fastq for all the samples), and I hace some troubles to analyze the data.
On its supplementary material they provide barcodes and primers used in the study. With these information I firstly convert the .fastq to a .fasta + .qual archives since I was not able to obtain a .fastq for the barcodes. Then, I realize the demultiplexing usin split_libraries.py in the next way:
split_libraries.py -f fasta_qual/Kenia.fna -m Kenia_map.txt -q fasta_qual/Kenia.qual -l 200 -b 8 -o demultiplexed
For check the demultiplexing i perform the validate_demultiplexed_fasta.py, and the results shows that there still exists barcodes (percentage of sequences: 0.191) and linker primer (percentage of sequences: 0.002). Afterthat I realize the
pick_open_refences.otus.py and all the taxonomy results in unassigned. The same results I obtained when I used pick_de_novo_otus.py
I have some questions about my steps:
The authors used MiSeq 2x300 bp technology for sequencing, The amplicos after demultiplexing have around 500bp of length. This is weird for me since MiSeq technology should result in sequences of around 200 bp approximately. what does 2x300bp refer? It could be the reason for the failure in the otu picking steps?
The step of demultiplexing is failing in completely remove primers and barcodes. This sequences are interrupting the otu picking a taxonomy assignment? Any suggestion for completely remove them?
Additionally, I have another set where the authors amplified two hypervariable regions (v5 - v8). For picking otus step in this case I should use pick_close_reference_otus? Am I right?