chimera check

701 views
Skip to first unread message

覃千山

unread,
Jun 7, 2014, 5:04:47 AM6/7/14
to qiime...@googlegroups.com
Dear profession:
I use the command to check  chimera,
/sam/qiime/Uparse/usearch -uchime_ref 1.fna -db /sam/qiime/Uparse/gold.fasta -uchimeout results1.uchime -uchimealns alnfile1 -chimeras ch1.fasta -nonchimeras good1.fasta -strand plus

I have 2 quesetions:
1, I have several samples and sequencing seperately. I put all the reads from different sequencing into one txt called  1.fna .I wonder I could use the command to check  all samples' chimera together?
2,If could't analysis together,could I analysis seperately and put the output good1.fasta together to go on analysis?

Best wishes!

Adam robbins-pianka

unread,
Jun 10, 2014, 10:09:36 AM6/10/14
to qiime...@googlegroups.com
Hello,

Just wanted to let you know that I forwarded your question to someone who might be better able to answer it than myself.

Best,
Adam


--

---
You received this message because you are subscribed to the Google Groups "Qiime Forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to qiime-forum...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Tony Walters

unread,
Jun 10, 2014, 12:13:01 PM6/10/14
to qiime...@googlegroups.com
Hello,

If you want to use usearch outside of QIIME, you will want to follow the complete pipeline that's outlined on the usearch website, so that the data can be integrated into QIIME.

On to the specific questions:
1. For reference based chimera detection, it doesn't matter if the samples are separate or merged. For de novo chimera detection, the chimeras should only be formed *within* a particular PCR reaction, so it makes sense to run de novo chimera checking on a per-sample basis and merge the separate samples later.
2. Yes.

I would also point out that some of this software is already implemented in QIIME, so you have QIIME 1.7.0 or later, for instance, you can use usearch61 with the identify_chimeric_seqs.py script (http://qiime.org/scripts/identify_chimeric_seqs.html). You would concatenate all of your sequences, and the script could optionally split them on a per-sample basis and will merge the final results.


On Sat, Jun 7, 2014 at 3:04 AM, 覃千山 <qinqi...@gmail.com> wrote:

--

覃千山

unread,
Jun 11, 2014, 10:38:49 PM6/11/14
to qiime...@googlegroups.com
Dear profession:
     Thanks for your advice! I try to use the usearch61 command like this:
identify_chimeric_seqs.py -m usearch61 -i lib1/seqs.fna -r /sam/qiime/Uparse/gold.fasta -o usearch61_chimera_checking/

it comes the error:

StdOut:
USEARCH 6 is not freely redistributable and is thus not included in the
default QIIME package.
You may obtain a personal copy of the 32-bit program at no charge.
To use this feature, please go to:
Download USEARCH v6.1, then:
    sudo mv usearch* /usr/local/bin/usearch61
    sudo chmod a+x /usr/local/bin/usearch61
You probably also want to install USEARCH 5 as /usr/local/bin/usearch

I follow the advice ,download usearch6.1.544_i86linux32 and 
sudo mv usearch* /usr/local/bin/usearch61
    sudo chmod a+x /usr/local/bin/usearch61

But when 
sam@sam-Precision-WorkStation-T7500[mtt2] print_qiime_config.py -t    [10:29AM]

USEARCH is not freely redistributable and is thus not included in the
default QIIME package.
You may obtain a personal copy of the 32-bit program at no charge.

To use this feature, please go to:

Download USEARCH v5.2.236, then:
    sudo mv usearch* /usr/local/bin/usearch
    sudo chmod a+x /usr/local/bin/usearch

You probably also want to install USEARCH 6.1 as /usr/local/bin/usearch61


1,So the usearch61 does't find in the command?  How to solve the problem?

2,identify_chimeric_seqs.py -m usearch61 -i lib1/seqs.fna -r /sam/qiime/Uparse/gold.fasta -o usearch61_chimera_checking/
   the seqs.fna  is total of several different  samples from different sequencing ,is  the command fit for the total sequences from many samples?






StdErr:

在 2014年6月11日星期三UTC+8上午12时13分01秒,TonyWalters写道:

Tony Walters

unread,
Jun 11, 2014, 10:49:23 PM6/11/14
to qiime...@googlegroups.com
What result do you get if you open a new terminal and type:
which usearch61

and

usearch61 --version

?

覃千山

unread,
Jun 12, 2014, 3:47:45 AM6/12/14
to qiime...@googlegroups.com

Dear TonyWalters 
    Thanks very much! 

sam@sam-Precision-WorkStation-T7500[mtt2] which usearch61             [10:32AM]

/usr/lib/qiime/bin/usearch61

sam@sam-Precision-WorkStation-T7500[mtt2] usearch61 --version         [ 2:45PM]

USEARCH 6 is not freely redistributable and is thus not included in the

default QIIME package.

You may obtain a personal copy of the 32-bit program at no charge.


So I remove usearch61 in the  /usr/lib/qiime/,then run the script it worked! 

sam@sam-Precision-WorkStation-T7500[mtt2] sudo rm /usr/lib/qiime/bin/usearch61 

sam@sam-Precision-WorkStation-T7500[mtt2] export PATH=$PATH:/usr/local/bin

sam@sam-Precision-WorkStation-T7500[mtt2] which usearch61             [ 2:53PM]

/usr/local/bin/usearch61

 

identify_chimeric_seqs.py -m usearch61 -i lib1/seqs.fna -r /sam/qiime/Uparse/gold.fasta -o usearch61_chimera_checking/ --split_by_sampleid

I wonder whether the command above can analysis the sequences from different sequencing together? Do I need to add some parameters?

 

To use this feature, please go to:

    http://www.drive5.com/usearch/download.html



在 2014年6月12日星期四UTC+8上午10时49分23秒,TonyWalters写道:

Tony Walters

unread,
Jun 12, 2014, 10:19:58 AM6/12/14
to qiime...@googlegroups.com
Hello,

That command should be able to handle the combined sequences. The potential issue I can see at this point is the labels for the input fasta sequence (lib1/seqs.fna) being QIIME compatible. You probably want to run a check on them to make sure with the validate_demultiplexed_fasta.py script (http://qiime.org/scripts/validate_demultiplexed_fasta.html).

-Tony

覃千山

unread,
Jun 12, 2014, 9:45:52 PM6/12/14
to qiime...@googlegroups.com
Dear TonyWalters :
    Thank you very much!
    I run the script and it shows OK. I think if I use  split_libraries.py with mapping.txt before the identify_chimeric_seqs.py, it will be  no problem!
    Thanks  very much!

在 2014年6月12日星期四UTC+8下午10时19分58秒,TonyWalters写道:

覃千山

unread,
Jun 13, 2014, 5:41:48 AM6/13/14
to qiime...@googlegroups.com
Dear TonyWalters:
     I find a interesting thing, using the command below:
/sam/qiime/Uparse/usearch -uchime_ref lib1/seqs.fna -db /sam/qiime/Uparse/gold.fasta -uchimeout results1.uchime -uchimealns alnfile1 -chimeras ch1.fasta -nonchimeras good1.fasta -strand plus
    seqs.fna has 386523 sequences,and I get good1.fasta without chimeras has 378125 sequences.

    I then use the command below:

identify_chimeric_seqs.py -m usearch61 -i lib1/seqs.fna -r /sam/qiime/Uparse/gold.fasta -o usearch61_chimera_checking/
I get several files 

The file chimera.txt only has 1 sequence. 
And non_chimeras.txt has  386522 sequences.

identify_chimeric_seqs.log :
input_seqs_fp /MTT/mtt2/lib1/seqs.fna
output_dir /MTT/mtt2/usearch61_chimera_checking1
reference_seqs_fp /sam/qiime/Uparse/gold.fasta
suppress_usearch61_intermediates False
suppress_usearch61_ref False
suppress_usearch61_denovo False
split_by_sampleid False
non_chimeras_retention union
usearch61_minh 0.28
usearch61_xn 8.0
usearch61_dn 1.4
usearch61_mindiffs 3
usearch61_mindiv 0.8
usearch61_abundance_skew 2.0
percent_id_usearch61 0.97
minlen 64
word_length 8
max_accepts 1
max_rejects 8
HALT_EXEC False

ref_non_chimeras 383766
ref_chimeras 2757
denovo_chimeras 28
denovo_non_chimeras 386495

I have two questions: 
1,why  non_chimeras.txt  does't contain the output of ref-based and de novo?
2,why  is the result of identify_chimeric_seqs.py not better than  uchime_ref?

在 2014年6月12日星期四UTC+8下午10时19分58秒,TonyWalters写道:

Tony Walters

unread,
Jun 13, 2014, 7:16:05 AM6/13/14
to qiime...@googlegroups.com
I think for 1, the difference between the script and calling usearch directly is the union/intersection. The non_chimeras_retention union in the log shows the default setting for the script, but you would want to set intersection to make it run like the default approach from Robert Edgar (if something is labeled as a chimera in either de novo or reference based detection). This might answer 2 as well, if by better you mean detecting more chimeras.

覃千山

unread,
Jun 14, 2014, 10:19:35 PM6/14/14
to qiime...@googlegroups.com
 Dear TonyWalters:
   The problem is still have .
/sam/qiime/Uparse/usearch -uchime_ref lib1/seqs.fna -db /sam/qiime/Uparse/gold.fasta -uchimeout results1.uchime -uchimealns alnfile1 -chimeras ch1.fasta -nonchimeras good1.fasta -strand plus
    seqs.fna has 386523 sequences,and I get good1.fasta without chimeras has 378125 sequences.

sam@sam-Precision-WorkStation-T7500[mtt2] identify_chimeric_seqs.py -m usearch61 -i lib1/seqs.fna -r /sam/qiime/Uparse/gold.fasta -o usearch61_chimera_checking3/ --non_chimeras_retention intersection
    seqs.fna has 386523 sequences,and I get good1.fasta without chimeras has 383739 sequences.

why  is the result of identify_chimeric_seqs.py still not better than  uchime_ref?

在 2014年6月13日星期五UTC+8下午7时16分05秒,TonyWalters写道:

Tony Walters

unread,
Jun 14, 2014, 10:25:07 PM6/14/14
to qiime...@googlegroups.com
Can you define "better", is it just more sequences being flagged as chimeras and removed? The different methods for chimera detection do not all agree (as you can see from running usearch, your top command (you can verify the version by typing usearch --version), and usearch 6.1, your bottom command), and there isn't a gold standard about what the false positives and false negatives are.

You can dig through the .uc files that are created to see the direct commands to usearch that are being called as well if you wanted to examine those.

覃千山

unread,
Jun 15, 2014, 8:54:38 PM6/15/14
to qiime...@googlegroups.com
Dear TonyWalters:
     Thank you very much!
      You are right! The version of userch in the two different mehtods are different . When I change to the newerst version 7.0,the results of ref_based  are very close.
      Thank you give me so much help!

在 2014年6月15日星期日UTC+8上午10时25分07秒,TonyWalters写道:
Reply all
Reply to author
Forward
0 new messages