pick closed reference otus - extremely high number of failures

16 views
Skip to first unread message

Angela Zou

unread,
Aug 14, 2017, 12:11:57 PM8/14/17
to Qiime 1 Forum
Hello,

I ran pick closed reference otus with enable_rev_strand_matching set as true using default uclust parameters against the newest Silva database. Afterwards, the log file said I had 564836 failures, even though I only had 565513 sequences in my seqs.fna to begin with. I am considering shortening decreasing the word length parameter and increasing the max rejects parameter, is there anything else I can do? I also checked the sequences for primers that were listed in the study that the reads came from and found none, I should mention that most of the sequences are relatively short (50 - 258) and downloaded from the MGRAST server so I am not sure how they were processed beforehand. Here is a sample of the reads:

>mgp101Cjejuni_0 EDADH4P01EL7SF
TAATATAAAATTTCATCTTCCGCTTCATTTACTGTATCAAGAAATACAATTTCTAACTAACAGCCTTCCCTGTCGTTTCCCTCTTTTTCTTGCCGCTTTATTGCACCTTACGCTCAGG
>mgp101Cjejuni_1 EDADH4P01AMK1E
TATCCGACTTATGCGCTTTTCCGTCTGCGCCCATATACTGACCATCCACATCCGTG
>mgp101Cjejuni_2 EDADH4P01EHEF0
CAGGACAGCATGCCGCAGGGCCGGATACTGCGCGCCTTCAGCACGGGAAGGAGGGTGTACCGCACCGCATCTTCCGTTCCGCTT
>mgp101Cjejuni_3 EDADH4P01ESJFN
CTGTAGTGCCGAACGTGGTCTGTTTGATCACCCCAGTCAGTAAGTATATCCCAAA
>mgp101Cjejuni_4 EDADH4P01EFTTC
TCCCAAAACCACTACCTTATTGATCAACGCATCGCGATACCCCTGTACAATGTAAAACATGGGAT
>mgp101Cjejuni_5 EDADH4P01AMW4O
GTGTTCAACCAGATTTTCCTGATGATATATCTGTCCGTGTGGTTACGCCGTTCGACGTATTGACGGGAGGCCGAACT
>mgp101Cjejuni_6 EDADH4P01APLQV
ATCGCCTTTATCTCCTCAGCGCTTTTCCCCTGTTTGATGGCTTTCCGTACGTAGTCCGTCCCCACCAGCAA
>mgp101Cjejuni_7 EDADH4P01C6LGI
TAACCAAAATAAAAACTTTACAAGAAATTTAGTCAGGTAAATAATAAATTAGATCGATCTTACTACGGAAAAGTAAGAAAGAAATAGTACATCAAATTTACTAG
>mgp101Cjejuni_8 EDADH4P01COLFV
GGGGTTGCGGACTGCGGAAGGGACTTAAGGAGATAGCGGAACGGTTTTGGGAAAGCCGGCCAGAGAGGGTGAAAGCCCCGTAGGTGAAATCTCCAAGAGCCTGGCAGGATCCAGAGTACCACGGG
>mgp101Cjejuni_9 EDADH4P01DXJMV
GACGGAACCTGTTCGGTAGCTGGGCTTTTTGGGCGGTAGTGCAAATATCGTCTAAATTTGTCGGTGGACAAGTTCTTTCATAGCGGCT
>mgp101Cjejuni_10 EDADH4P01ARZFX
TCATAAGATGTAAATTCTTTCCCGTATGAGTAGAGGTAAGATAGCCTAAAAGTGCAATAATGCAATCGGAACAACCACCTGTNTAAAAAATAAAATGAAAAAATAAACCCGGCCCGT
>mgp101Cjejuni_11 EDADH4P01E1VYJ
GGCTGCGTTTAACCAGATATTAGAGTCAAGCCTTTACAGTTCCTTGCTGATCGGGCGA
>mgp101Cjejuni_12 EDADH4P01AP4Z0
GGAGCTGTTGACAAAGCCGCCCGCCGGCCGCATAATAAAGCCATCGATATACAAAATCTGGATGCCCCGCATCCACCCGAGAGGAAGTATTCATGGATACCA
>mgp101Cjejuni_13 EDADH4P01D4YBN
AAGGCCGGCACGCTCTACGTGGTCGGCATTGGGCCGGGAAGGCCGGATCAGATGACGGCCCAGGCCCGTCAGGCCTTGGAGCGGAGCCGGGTGATTGCCGGATATCCGG
>mgp101Cjejuni_14 EDADH4P01EXBQI
TATCATCGTCCAGGATGTGGTCGTTCACCCTGATTTCATTCATTATATAAGTGGTCATCTGCAAACAGTCTTTCAACTCCCGGAACGACGGGCGAGGAAG


Any idea what I can do? Thank you! 

Jose Antonio Navas Molina

unread,
Aug 14, 2017, 1:10:00 PM8/14/17
to Qiime 1 Forum
Hi Angela,

The Silva database contains reference sequences for small (16S/18S, SSU) and large (23S/28S, LSU) subunit ribosomal RNA. Tony pointed me out that your sequences looks like metagenomics data, so there is a high probability that your reads don't contain those small and large subunits and hence fail to cluster.

You can either check if the dataset contains a specific set of files with the subunits of interest or you will need to use metagenomic tools to analyze that data.

Hope this helps!
Reply all
Reply to author
Forward
0 new messages