Chimera checking

1,349 views
Skip to first unread message

SAMIK

unread,
Jul 9, 2012, 8:25:40 PM7/9/12
to qiime...@googlegroups.com
Hi,
Samik again,

I was trying to apply chimera checking after  sequences alignment with PyNAST using Chimera checking sequences with QIIME

identify_chimeric_seqs.py -m ChimeraSlayer -i rep_set_aligned.fasta -a reference_set_aligned.fasta -o chimeric_seqs.txt resulted into error since it couldn't find file or directory: 'reference_set_aligned.fasta'.

i have aligned sequence by step (not using pick_otus_through_otu_table.py workflow) and couldn't trace the reference_set_aligned.fasta sequence.

please find the error

dentify_chimeric_seqs.py -m ChimeraSlayer -i rep_set_aligned.fasta -a reference_set_aligned.fasta -o chimeric_seqs.txt
Usage: identify_chimeric_seqs.py [options] {-i/--input_fasta_fp INPUT_FASTA_FP}

[] indicates optional input (order unimportant)
{} indicates required input (order unimportant)

A FASTA file of sequences, can be screened to remove chimeras (sequences generated due to the PCR amplification of multiple templates or parent sequences). QIIME currently includes a taxonomy-assignment-based approach, blast_fragments, for identifying sequences as chimeric and the ChimeraSlayer algorithm.

1. Blast_fragments approach:

The reference sequences (-r) and id-to-taxonomy map (-t) provided are the same format as those provided to assign_taxonomy.py. The reference sequences are in fasta format, and the id-to-taxonomy map contains tab-separated lines where the first field is a sequence identifier, and the second field is the taxonomy separated by semi-colons (e.g., Archaea;Euryarchaeota;Methanobacteriales;Methanobacterium). The reference collection should be derived from a chimera-checked database (such as the full greengenes database), and filtered to contain only sequences at, for example, a maximum of 97% sequence identity.

2. ChimeraSlayer:

ChimeraSlayer uses BLAST to identify potential chimera parents and computes the optimal branching alignment of the query against two parents.
We suggest to use the pynast aligned representative sequences as input.


Example usage:
Print help message and exit
 identify_chimeric_seqs.py -h

blast_fragments example: For each sequence provided as input, the blast_fragments method splits the input sequence into n roughly-equal-sized, non-overlapping fragments, and assigns taxonomy to each fragment against a reference database. The BlastTaxonAssigner (implemented in assign_taxonomy.py) is used for this. The taxonomies of the fragments are compared with one another (at a default depth of 4), and if contradictory assignments are returned the sequence is identified as chimeric. For example, if an input sequence was split into 3 fragments, and the following taxon assignments were returned:

==========  ==========================================================
fragment1:  Archaea;Euryarchaeota;Methanobacteriales;Methanobacterium
fragment2:  Archaea;Euryarchaeota;Halobacteriales;uncultured
fragment3:  Archaea;Euryarchaeota;Methanobacteriales;Methanobacterium
==========  ==========================================================

The sequence would be considered chimeric at a depth of 3 (Methanobacteriales vs. Halobacteriales), but non-chimeric at a depth of 2 (all Euryarchaeota).

blast_fragments begins with the assumption that a sequence is non-chimeric, and looks for evidence to the contrary. This is important when, for example, no taxonomy assignment can be made because no blast result is returned. If a sequence is split into three fragments, and only one returns a blast hit, that sequence would be considered non-chimeric. This is because there is no evidence (i.e., contradictory blast assignments) for the sequence being chimeric. This script can be run by the following command, where the resulting data is written to the directory "identify_chimeras/" and using default parameters (e.g. chimera detection method ("-m blast_fragments"), number of fragments ("-n 3"), taxonomy depth ("-d 4") and maximum E-value ("-e 1e-30"))
 identify_chimeric_seqs.py -i repr_set_seqs.fasta -t taxonomy_assignment.txt -r ref_seq_set.fna -o chimeric_seqs.txt

ChimeraSlayer Example: Identify chimeric sequences using the ChimeraSlayer algorithm against a user provided reference data base. The input sequences need to be provided in aligned (Py)Nast format. The reference data base needs to be provided as aligned FASTA (-a). Note that the reference database needs to be the same that was used to build the alignment of the input sequences!
 identify_chimeric_seqs.py -m ChimeraSlayer -i repr_set_seqs_aligned.fasta -a ref_seq_set_aligned.fasta -o chimeric_seqs.txt

identify_chimeric_seqs.py: error: option -i: file does not exist: 'rep_set_aligned.fasta'
MacQIIME kw-12650:srt $ identify_chimeric_seqs.py -m ChimeraSlayer -i alignment/rep_set_aligned.fasta -a reference_set_aligned.fasta -o chimeric_seqs.txt
Traceback (most recent call last):
  File "/macqiime/QIIME/bin/identify_chimeric_seqs.py", line 172, in <module>
    main()
  File "/macqiime/QIIME/bin/identify_chimeric_seqs.py", line 169, in main
    keep_intermediates=keep_intermediates)
  File "/macqiime/lib/python2.7/site-packages/qiime/identify_chimeric_seqs.py", line 148, in chimeraSlayer_identify_chimeras
    keep_intermediates=keep_intermediates):
  File "/macqiime/lib/python2.7/site-packages/qiime/identify_chimeric_seqs.py", line 133, in __call__
    keep_intermediates=keep_intermediates)
  File "/macqiime/lib/python2.7/site-packages/qiime/identify_chimeric_seqs.py", line 582, in get_chimeras_from_Nast_aligned
    open(ref_db_aligned_fp)))
IOError: [Errno 2] No such file or directory: 'reference_set_aligned.fasta'


Thank you in advance
Samik

Tony Walters

unread,
Jul 9, 2012, 10:13:30 PM7/9/12
to qiime...@googlegroups.com
Samik,

The -a parameter is a reference alignment, which QIIME doesn't generate during the process of analyzing a dataset, you have to supply it.  You can download the Greengenes core aligned data here: http://greengenes.lbl.gov/Download/Sequence_Data/Fasta_data_files/core_set_aligned.fasta.imputed
to specify with your -a parameter.

-Tony

SAMIK BAGCHI

unread,
Jul 10, 2012, 3:38:46 AM7/10/12
to qiime...@googlegroups.com
Hi Tony,
Thank you. I have downloaded the .fasta file and saved into my current dir, it was also default located in macQIIME.
following the run with
identify_chimeric_seqs.py -m ChimeraSlayer -i alignment/rep_set_aligned.fasta -a core_set_aligned.fasta.imputed -o chimeric_seqs.txt, it still showing the following error


Traceback (most recent call last):
  File "/macqiime/QIIME/bin/identify_chimeric_seqs.py", line 172, in <module>
    main()
  File "/macqiime/QIIME/bin/identify_chimeric_seqs.py", line 169, in main
    keep_intermediates=keep_intermediates)
  File "/macqiime/lib/python2.7/site-packages/qiime/identify_chimeric_seqs.py", line 148, in chimeraSlayer_identify_chimeras
    keep_intermediates=keep_intermediates):
  File "/macqiime/lib/python2.7/site-packages/qiime/identify_chimeric_seqs.py", line 133, in __call__
    keep_intermediates=keep_intermediates)
  File "/macqiime/lib/python2.7/site-packages/qiime/identify_chimeric_seqs.py", line 592, in get_chimeras_from_Nast_aligned
    app_results = app()
  File "/macqiime/lib/python2.7/site-packages/cogent/app/util.py", line 269, in __call__
    result_paths=self._get_result_paths(data))
  File "/macqiime/lib/python2.7/site-packages/qiime/identify_chimeric_seqs.py", line 405, in _get_result_paths
    raise ApplicationError,"Calling ChimeraSlayer failed."
cogent.app.util.ApplicationError: Calling ChimeraSlayer failed.

regards
samik
--

Kind Regards

 
Samik Bagchi
Post-Doctoral Fellow
Water Desalination and Reuse Center
King Abdullah University of Science and Technology (KAUST)
Thuwal. 23955-6900 Kingdom of Saudi Arabia

Tony Walters

unread,
Jul 10, 2012, 12:50:17 PM7/10/12
to qiime...@googlegroups.com
Hello Samik,

There are two possibilities to fix this, one is to make sure BLAST is working correctly (see  http://www.wernerlab.org/software/macqiime/macqiime-installation/installing-blast-in-os-x for install notes), and additionally make sure to use the full absolute filepaths for this function.

-Tony
Reply all
Reply to author
Forward
0 new messages