Abnormally high percentage of chimeric sequences

432 views
Skip to first unread message

Julien Chamberland

unread,
Dec 8, 2016, 10:33:30 AM12/8/16
to Qiime 1 Forum
Hello,

From the same MiSeq run, I have compared the sequencing results with primers targeting the v3v4 region or the v6v8 region of the 16S rRNA gene.

For the v6v8 region, everything worked perfectly until the end.

For the v3v4, I have have problems from the identify_chimeric_seqs.py step where I get an abnormally high percentage of chimeric sequences (over 40 to 50%). As the v6v8 region, I used the Greengenes rep_set database and usearch61.

identify_chimeric_seqs.py -m usearch61 -i ~/$Projet/seqs.fna -r $Database/gg_13_8_otus/rep_set/97_otus.fasta -o ~/$Projet/chimeras

Does somebody had this issue before ?

P.S. My reads where trimmed with Trimmomatic without any issue, while the reads where assembled with Pandaseq :

pandaseq -F -f R1_pe.fasta.gz -r R2_pe.fastq.gz -w output.fasta -l 420 -L 490 -N -o 10


Thanks,
Julien



Antonio González Peña

unread,
Dec 12, 2016, 9:06:45 AM12/12/16
to Qiime 1 Forum
Interesting. 

Have you tried to skip denovo chimera checking (--suppress_usearch61_denovo) and compared the results? Other option to assign taxonomy or BLAST against NR those "chimeric" sequences to see if they are real. 

Hope this helps.

Julien Chamberland

unread,
Dec 17, 2016, 10:36:36 AM12/17/16
to Qiime 1 Forum
Hi Antonio,

Thank you for your answer. I tried to suppress the "de novo" step, but results are very similar between de novo and reference based chimera detection steps. Your idea of blasting was good, I found that many sequences assigned as chimeric sequences were not chimera.

I've worked on trimming parameters and assembly parameters, I always have the same results with the identification of chimeric sequences.

Does anyone know if the command pick_open_reference_otus do filter chimeric sequences ?

Thanks,

Julien

Antonio González Peña

unread,
Dec 20, 2016, 7:56:52 AM12/20/16
to Qiime 1 Forum
By default any of the QIIME workflows and pipelines do chimera checking. 

Note that I have never used them so I can't give you first hand suggestions and they are not wrapped in QIIME 1.

Just as a reminder, other options for "high" quality sequence processing you can use DADA2/QIIME2 or deblur/Qiita. 

Hagen

unread,
Jan 5, 2017, 10:09:49 AM1/5/17
to Qiime 1 Forum
Hi, 

I have the same issue: a high portion (40-50%) of my sequences are identified as chimeric sequences using Usearch61 (denovo only). From BLASTing random "chimera sequences" it looks like a lot of this is false negative; most of the checked sequences annotated as chimeras does not look like chimeric sequences (i.e. the whole sequence show high similarity to one bacterium). I will try the new tools suggested above, but I am also interested in knowing what the reason behind this could be? The quality of the sequencing run was overall quite good, the same primers have given low levels of chimera before on same kind of samples (samples with high diversity), and the other quality filtering steps in my pipeline retained most of the sequences. Is there anything else I should keep in mind/consider? 

Thanks,
Live H.

TonyWalters

unread,
Jan 5, 2017, 10:23:17 AM1/5/17
to Qiime 1 Forum
Hello Hagen,

Chimera checking is still a very imperfect science, so I probably won't have an exact answer for you.

Do the sequences also get flagged with reference chimera checking on (i.e. they are flagged as chimeric for both reference and de novo)?

To split up the data on a per-sample basis for de novo checking, you can use the --split_by_sampleid option with identify_chimeric_seqs.py -m usearch61; this will prevent cross-sample abundance-based chimera hits (chimeras should be within a particular PCR reaction, rather than between, after all). I would try that option first-there are a number of other parameters that are implemented for usearch61 and identify_chimeric_seqs.py that could alter the specificity/sensitivity (http://qiime.org/scripts/identify_chimeric_seqs.html) but I would try the splitting option first.

-Tony

Julien Chamberland

unread,
Jan 7, 2017, 3:35:50 PM1/7/17
to Qiime 1 Forum
Hi Tony,
Thank you for your answer. I've tried the --split_by_sampleid option, but it did not change the number of chimeric sequences in my samples.

When the --split_by_sampleid is set to false :
ref_non_chimeras        13149
ref_chimeras    7637
denovo_chimeras 7870
denovo_non_chimeras     12916

When the --split_by_sampleid is set to true :
ref_non_chimeras        13149
ref_chimeras    7637
denovo_chimeras 7870
denovo_non_chimeras     12916

I've also tried many of the specific parameters (affecting the specificity or the sensitivity) implemented for usearch61 such as the minh, dn, xn, but they only had very limited effects on the results.

I also worked a lot on the assembly with Pandaseq or Mothur... without any improvement.

Thanks,
Julien

TonyWalters

unread,
Jan 7, 2017, 3:46:28 PM1/7/17
to Qiime 1 Forum
Hello Julien,

It might be worth eliminating the stitching process as a source for this-can you run the chimera checking on just the R1 reads, and see if you get similar numbers?

-Tony

Julien Chamberland

unread,
Jan 8, 2017, 3:09:19 PM1/8/17
to Qiime 1 Forum
Good idea !

If I only check the R1 file, I have around 30% of chimeric sequences...

ref_non_chimeras        28420
ref_chimeras    11061
denovo_chimeras 10809
denovo_non_chimeras     28672

Is it a sequencing issue?

Thanks.
Julien

TonyWalters

unread,
Jan 8, 2017, 3:27:03 PM1/8/17
to Qiime 1 Forum
Hello Julien,

That's still quite a few chimeras. I'm afraid I don't have a solid answer on this one-it could be that there are low quality regions with incorrect base calls mucking up the chimera checker, but I've not see any prior chimera studies that suggest that to be a major factor in chimera checking, but conceivably it could cause mix-ups in matching parts of the reads to putative parent/children sequences.

If they are false positives, it might be worth getting some of the reads that appear to have good matches to the reference database (i.e., shouldn't be flagged as chimeras during the reference step), and send them to the author of the chimera checking code to see if they have any ideas. Perhaps the reference database should be checked too-can you try the SILVA 123 database (https://www.arb-silva.de/no_cache/download/archive/qiime/) and see if that gives you similar numbers to the Greengenes database?

Julien Chamberland

unread,
Jan 8, 2017, 6:18:10 PM1/8/17
to Qiime 1 Forum
Hello Tony,

Thank you for your good ideas and your time !

Unfortunately, the Silva database gives very similar results... As as with the Greengenes database, usearch identify a lot of false-negative. For example, this sequence :
GGAGGCAGCAGTGGGGAATCTTCCGCAATGGACGAAAGTCTGACGGAGCAACGCCGCGTGGGGGATGAAGGCCTTCGGGTTGTAAACTCCTTTC
AACCATGACGAAGCATTATGTGACGGTAGTGGTAGAAGAAGCACCGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGGTGCGAGCGTTGT
CCGGAATTACTGGGCGTAAAGAGCTCGTAGGTGGTTTGTCGCGTCGTCTGTGAAATTCCGGGGCTTAACTCCGGGCGTGCAGGCGATACGGGCATA
ACTTGAGTGCTGTAGGGGAGACTGGAATTCCTGGTGTAGCGGTGAAATGCGCAGATATCAGGAGGAACACCGATGGCGAAGGCAGGTCTCTGGGC
AGTAACTGACGCTGAGGAGCGAAAGCATGGGTAGCGAACAGGATTAGATACCCCGGTAG

was flagged as a chimera, while it has a 98% identity score with nBlast.

There is definitely something with usearch... I will report this issue to the authors.
Reply all
Reply to author
Forward
0 new messages