Hello Qiime developers,
I am trying to analyse some Illumina reads I got for my thesis project. So far I’ve been working with Qiime with no problems but I would also like to try using UPARSE for the OTU picking. At this point I have an OTU table created with UPARSE pipeline but it has no taxonomy assignment yet. My question is: I know that you are planning to integrate UPARSE into QIIME but I will not happen anytime soon so, is there any way I can manually “import” this UPARSE OTU table into QIIME for the taxonomy assignment and the alpha diversity analysis? Thank you very much for your help.
Best regards,
Carlos Ruiz
--
---
You received this message because you are subscribed to the Google Groups "Qiime Forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to qiime-forum...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
--
--
---
You received this message because you are subscribed to a topic in the Google Groups "Qiime Forum" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/qiime-forum/zqmvpnZe26g/unsubscribe.
To unsubscribe from this group and all its topics, send an email to qiime-forum...@googlegroups.com.
###**ERROR** barcodelabel= not found in read label '38B4_2'
does not happen. But how do I do that -- modify the command that is? At this point in my career I can only write commands, not modify them!
Any help would be greatly appreciated. I am actually merging four 454 Junior runs together and using QIIME to demultiplex at first is very helpful.
Thanks!
Carly
Quality filtering (removing reads with low average quality) and consenus making from R1 and R2 done by a perl script.Now I have all candidate V3 sequences in fasta in hand for the downstream analysis
Then I follo: the steps,1.usearch7 -derep_fulllength sample.fa -output sample_derep.fa -sizeout2. usearch -sortbysize sample_derep.fa -minsize 2 -output sample_NoSingleton.fa3. usearch -uchime_denovo sample_NoSingleton.fa --chimeras chimera.fa --nonchimeras nonchimera.fa
My metagenomic set conians multiple samples. Till chimera detection I run individually. Then I combined all (non-chimeric sequences) to one file AllSamples.fa
4.Run picked OTU, pick_rep_set, asiign taxonomy from qiime\
5. usearch -usearch_global InitialAll.fa -db AllSamples.rep_set.fa -strand plus -id 0.97 -uc AllSamples_otu.map.uc -e Allchimerea.fa
(here InitialAll.fa is my combined sequences even before dereplication.)6. python uc2otutab_mod.py AllSamples_otu.map.uc >AllSamples_otu.map.txt7.sed 's/Consensus Lineage/ConsensusLineage/' < AllSamples_otu.map.txt | sed 's/ConsensusLineage/taxonomy/' > AllSamples_otu.map.taxonomy.txt
8. biom convert -i AllSamples_otu.map.taxonomy.txt -o AllSamples_otu.map.taxonomy.biom --table-type="otu table" --process-obs-metadata taxonomyThis biom file was used for Alpha diveristy and beta diversity8. alpha_rarefaction.py and jackniffed_beta_diversity.pyWhen I made the biom summary table, only half of the sequences from the initial file are mapped to each sample??
I will be really glad if you go through these steps and let me know any flaws in the pipeline?Note: I am using usearch 7 and qiime 1.5.0
Hi,I´m using Mike´s approach to implement uparse into qiime, it´s brilliant.However, when I try to use the uc2utotab.py script this is what happens:$ python ~/qiime_tutorial/uc2otutab.py otu.map.uc > seqs.filtered.derep.mc2.repset.nochimeras.OTU-table.txt--->**ERROR** barcodelabel= not found in read label 'ST.1_3 M01337:22:000000000-A5PCV:1:1101:24481:4785 1:N:0:1 orig_bc=AAAAAAAAAAAA new_bc=AAAAAAAAAAAA bc_diffs=0'My barcodes were removed by default by the MiSeq and most likely the barcode labels too.How can I work around this?ThanksIda
--
---
You received this message because you are subscribed to the Google Groups "Qiime Forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to qiime-forum...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Yes, but what file is the rep_set file?Thanks
--
--
--
--
--
--
--
--
--
--
--
I´m not sure how to choose "2 cores".Ida
--
Strange error with the "uc2otutab_mod.py" script as downloaded. Thoughts?
import.im6: unable to open X server `' @ error/import.c/ImportImageCommand/368.
import.im6: unable to open X server `' @ error/import.c/ImportImageCommand/368.
import.im6: unable to open X server `' @ error/import.c/ImportImageCommand/368.
import.im6: unable to open X server `' @ error/import.c/ImportImageCommand/368.
/home/lab/qiime_software/qiime-1.8.0-release/bin/uc2otutab_mod.py: line 6: FileName: command not found
/home/lab/qiime_software/qiime-1.8.0-release/bin/uc2otutab_mod.py: line 17: syntax error near unexpected token `('
--
Hi,
H 161 250 100.0 + 0 0 250M ex91;barcodelabel=B7; OTU_162
H 9087 250 100.0 + 0 0 250M ex40;barcodelabel=B18; OTU_9088
H 799 250 100.0 + 0 0 250M ex106;barcodelabel=B19; OTU_800
H 7 250 100.0 + 0 0 250M ex79;barcodelabel=B4; OTU_8
H 31 250 100.0 + 0 0 250M ex104;barcodelabel=B17; OTU_32
H 3023 250 100.0 + 0 0 250M ex153;barcodelabel=B3; OTU_3024
H 7596 250 97.2 + 0 0 250M ex138;barcodelabel=B22; OTU_7597
H 0 250 99.6 + 0 0 250M ex139;barcodelabel=B23; OTU_1
H 2742 250 97.6 + 0 0 250M ex130;barcodelabel=B7; OTU_2743
H 0 250 100.0 + 0 0 250M ex142;barcodelabel=B23; OTU_1
There are obviously different ways to get the data into QIIME. I opted for the procedure below, mainly because several of the UPARSE scripts did not work on my data. Below is a set of commands I have used to process my data via UPARSE and get the data into QIIME. I generally followed the UPARSE pipeline (http://drive5.com/usearch/manual/uparse_cmds.html) with modification: it is a combination of QIIME and UPARSE scripts used to analyze paired-end data:# join paired endsusearch7 -fastq_mergepairs R1.fastq -reverse R2.fastq -fastq_truncqual 3 -fastqout merged.fastq -fastaout merged.fasta# remove unused barcodes, here is a link to the script I posted a while back:remove_unused_barcodes.py barcodes.fastq merged.fastq merged.barcodes.fastq# Use QIIME to demultiplex the data, with -q 0. Store output as fastq format (we will quality filter with usearch7)split_libraries_fastq.py -v -q 0 --store_demultiplexed_fastq -m miseq2_mapping.txt --barcode_type golay_12 -b merged.barcodes.fastq --rev_comp_mapping_barcodes -i merged.fastq -o sl_out# get quality statsusearch7 -fastq_stats seqs.fastq -log seqs.stats.log# remove low quality reads (trimming not required for paired-end data)usearch7 -fastq_filter seqs.fastq -fastaout seqs.filtered.fasta -fastq_maxee 0.5 -threads 24# dereplicate seqsusearch7 -derep_fulllength seqs.filtered.fasta -output seqs.filtered.derep.fasta -sizeout -threads 24# filter singletonsusearch7 -sortbysize seqs.filtered.derep.fasta -minsize 2 -output seqs.filtered.derep.mc2.fasta# cluster OTUs (de novo chimera checking can not be disabled in usearch7)usearch7 -cluster_otus seqs.filtered.derep.mc2.fasta -otus seqs.filtered.derep.mc2.repset.fasta# reference chimera checkusearch7 -uchime_ref seqs.filtered.derep.mc2.repset.fasta -db gold.fa -strand plus -nonchimeras seqs.filtered.derep.mc2.repset.nochimeras.fasta -threads 24# label OTUs using UPARSE python scriptpython fasta_number.py seqs.filtered.derep.mc2.repset.nochimeras.fasta OTU_ > seqs.filtered.derep.mc2.repset.nochimeras.OTUs.fasta# map the _original_ quality filtered reads back to OTUsusearch7 -usearch_global seqs.filtered.fasta -db seqs.filtered.derep.mc2.repset.nochimeras.OTUs.fasta -strand plus -id 0.97 -uc otu.map.uc -threads 24# make OTU table. I modified the function 'GetSampleID' in the script 'uc2otutab.py' and renamed the script 'uc2otutab_mod.py':# The modified function is: function is:# def GetSampleId(Label):# SampleID = Label.split()[0].split('_')[0]# return SampleID# I did this because my demultiplexed headers in the otu_map.uc looked like this:
ENDO.O.2.KLNG.20.1_19 MISEQ03:119:000000000-A3N4Y:1:2101:28299:16762 1:N:0:GGTATGACTCA orig_bc=GGTATGACTCA new_bc=GGTATGACTCA bc_diffs=0
# and all I need is the SampleID: "ENDO.O.2.KLNG.20.1", so I split on '_'python uc2otutab_mod.py otu.map.uc > seqs.filtered.derep.mc2.repset.nochimeras.OTU-table.txt# convert to biombiom convert --table-type="otu table" -i seqs.filtered.derep.mc2.repset.nochimeras.OTU-table.txt -o seqs.filtered.derep.mc2.repset.nochimeras.OTU-table.biom# assign taxonomyparallel_assign_taxonomy_rdp.py -v --rdp_max_memory 4000 -O 24 -t /gg_13_5/gg_13_8_otus/taxonomy/97_otu_taxonomy.txt -r /gg_13_5/gg_13_8_otus/rep_set/97_otus.fasta -i sl_out_miseq_run_02/seqs.filtered.derep.mc2.repset.nochimeras.OTUs.fasta -o sl_out_miseq_run_02/assigned_taxonomy# add taxonomy to BIOM tablebiom add-metadata --sc-separated taxonomy --observation-header OTUID,taxonomy --observation-metadata-fp assigned_taxonomy/seqs.filtered.derep.mc2.repset.nochimeras.OTUs_tax_assignments.txt -i seqs.filtered.derep.mc2.repset.nochimeras.OTU-table.biom -o seqs.filtered.derep.mc2.repset.nochimeras.tax.OTU-table.biom# Then off to QIIME-ing. :-)I hope the above helps some of you get started. I'd like to see how others have integrated their UPARSE pipeline into QIIME-Cheers! :-)-Mike
On Monday, November 25, 2013 9:35:20 PM UTC-5, Blair wrote:Hi Adam,Yes, I did use the uc2otutab.py script. Basically I followed along with the UPARSE command line examples using my own data and ended up with an OTU table (in txt format). This looks very much like one of the old QIIME tables, without the # QIIME OTU table line at the top, without the #OTU ID in the second line (it's OTUId instead) and without a Consensus Lineage column.I've been trying out a range of OTU picking strategies on an old data set and really wanted to find out how many OTU's I'd get using the UPARSE pipeline, compared with the QIIME 1.3 denovo picking and the QIIME 1.7.0 open and closed reference methods. The short answer to that question was 7500 otus using QIIME 1.3 denovo, around 4000 otus using 1.7.0 open reference, 900 otus using 1.7.0 closed reference and 150 otus using UPARSE. I wanted the taxonomy so that I could check the UPARSE OTUs against the QIIME versions (with taxonomy included). Thus I'd gotten around to attaching a QIIME generated taxonomy to the UPARSE OTU table but had not then attempted to convert this table to biom format. So I've not tested out downstream QIIME scripts on the UPARSE OTU table.I've just now modified the UPARSE OTU table to 'look' like a QIIME txt table (including the # QIIME OTU table top line, changing 'OTUId' to #OTU ID and pasting in the taxonomy information. This does convert to biom format (convert_biom.py -i otu_table_qiime.txt -o otu_table_qiime.biom --biom_table_type="otu table"). Also using 'print_biom_table_summary.py' gives an output......but I've not tried using this table for anything 'downstream'. Hopefully it'll work fine, but I just don't know yet.Cheers,Blair
On Tuesday, November 26, 2013 2:58:55 PM UTC+13, Adamrp wrote:Blair, I'm curious about whether or not you used the uc2otutab.py script that's on the page I linked. I haven't tried to use it, but I was just curious to hear about your experience with it.Or are you saying that even after using that script, the OTU table that is output cannot be converted to biom format?AdamOn Mon, Nov 25, 2013 at 6:50 PM, Blair <blair....@otago.ac.nz> wrote:Hello,I'm not part of the QIIME team, and am certainly not a computer scientist! However,I've also been playing around with UPARSE, but using 454 data, and came across the same problem as Carlos. I've come up with a very rudimentary work-around that I'm sure could be made far more elegant. Indeed Carlos may already have done just what I've done.....so sorry if I'm teaching my grandmother to suck eggs.I followed the UPARSE command line examples (using my data) and got to the point of generating the OTU table, without taxonomy assignment. I then found the otu.fa file (generated in the third to last step of the UPARSE command lines example...'Label OTU sequences'). This should be a fasta file with rep sequences from the UPARSE pipeline. I then went back to QIIME and ran this 'rep set' through the 'align_seqs.py', 'assign_taxonomy.py', 'filter_alignment.py' and 'make_phylogeny.py' scripts. The taxa_assignment.txt file (generated through the 'assign_taxonomy.py' script) can be opened in excel and sorted according to OTU numbers. This can then be aligned with the UPARSE OTU table and the taxonomy appended to the table. It needs a bit of adjustment to convert this new table to a format that can be converted to biom and used in QIIME downstream analyses (alpha, beta diversity etc.). But it's possible.If someone has a better way to do this it'd be great to hear it (for example, I couldn't work out how to use the UPARSE otu mapping data to generate the OTU table directly in QIIME).Cheers,Blair
On Sunday, November 24, 2013 6:14:22 PM UTC+13, Carlos Ruiz wrote:Hello Qiime developers,
I am trying to analyse some Illumina reads I got for my thesis project. So far I’ve been working with Qiime with no problems but I would also like to try using UPARSE for the OTU picking. At this point I have an OTU table created with UPARSE pipeline but it has no taxonomy assignment yet. My question is: I know that you are planning to integrate UPARSE into QIIME but I will not happen anytime soon so, is there any way I can manually “import” this UPARSE OTU table into QIIME for the taxonomy assignment and the alpha diversity analysis? Thank you very much for your help.
Best regards,
Carlos Ruiz
--
---
You received this message because you are subscribed to the Google Groups "Qiime Forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to qiime-forum...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
Hi Colleen,The -v option is just to set the verbose option for the script. Just a habit of mine, especially since some of the scripts do not print anything anyway. :-)Since, I used joined paired-ends the quality scores are quite good throughout the read. So, I just left the --max_bad_run_length and --min_per_read_length_fraction as is. Though if we wanted to be strict I'd suspect we'd just set the following:--max_bad_run_length : 250 (for miseq, alternatively set to the average size of your fragments, or close to it, just make it large)--min_per_read_length_fraction : set very low (0.01) or to 0.0On another note, be _very_ wary of using the gold.fa file for your reference database when using UPARSE with using default settings. I've recently had ran into a case where the uchime_ref command removed some of the most abundant reads of a sample. That is, it was a read we expected to be there and was supposed to be the majority of the sample. uchime_ref removed this OTU (and other OTUs that we know should have been in the sample) and skewed our results heavily. This issue went away either, when I opted to use the the 13_8 greengenes database or if I used the gold.fa database with the -minh flag of chime_ref to 1.5-2.5 (default is 0.28). In a nutshell the gold.fa database is not as expansive as greengenes and may discard many valid reads. However, grenegenes may have chimeras. Either way, I'd suggest printing out the chimeras to file when using uchime_ref and playing around with the -minh setting. I also recommend BLAST checking the chimeras no matter which reference database you use, as all these methods have a small false-positive rate of detecting chimeras.UPARSE is good, but more attention needs to be paid to the user options. I also wish there was on option to disable the de novo chimera checking. I am still validating UPARSE for my analysis and comparing it to the QIIME pipeline. So, if others have experience on this please let us know. :-)-Mike
On Tuesday, December 10, 2013 5:12:05 PM UTC-5, Colleen wrote:One additional question - for Mike, or any one really. If we are trying to just get the sequence headers or labels in the correct format to use UPARSE for OTU picking with the split_libraries_fastq.py step (so no quality filtering in QIIME), is it also necessary to adjust the -r (--max_bad_run_length) and -p (--min_per_read_length_fraction) parameters to a minimum so that no quality filtering is done in QIIME. Or, since we are using a q of 0, are the QIIME defaults for the -r and -p parameters irrelevant?
Thanks!
colleen
On Tuesday, December 10, 2013 11:04:59 AM UTC-8, Colleen wrote:Hi Mike,
I am also trying to use UPARSE OTU picking and then use that data for downstream analyses in QIIME. I was struggling with getting my MiSeq data in the correct format to use in USEARCH, so plan to use the split_libraries_fastq.py script in QIIME, like you apparently did. I have a quick question about about the command you show below. What is the -v flag for in your split_libraries_fastq.py command line below? I don't see that as one of the options to pass with split_libraries_fastq.py...
Thanks!
colleen
For more options, visit https://groups.google.com/d/optout.
Hi QIIMErs,
I am using both UPARSE and QIIME for my data analyses. I need to remove sequences have failed alignment (as I have blasted them and realized that they are only human sequences). I feel like UPARSE does not have that option (of excluding use-specified sequences). So after creating an OUT text table by UPARSE (uc2otutab.py), I used QIIME’s “make_otu_table.py” to make an OUT biom table that had sequences that failed alignment removed. Here I turned on the “-e” (--exclude_otus_fp) option. I got a biom file that I compared it with what was generated when UPARSE’s “biom convert”, assign_taxonomy.py” and “biom add-metadata” are run. QIIME’s biom file looks pretty ok, but has additional information in it that appears to affect the “biom summarize-table” command. The output (of biom summarize-table) appears wrong, thus the sub-sampling depth cannot be defined from it.
Kindly assist me on how to exclude samples failing alignment (so that the biom table is “clean”) for downstream processes. My current input is out.table.txt from UPARSE, but it appears slight incompatible with QIIME’s “make_otu_table.py”.
Thanks,
Harris.
I mentioned the details of the `uc2otutab.py` script changes along with the original pipeline post earlier in this thread, here. Just open `uc2otutab.py` in any raw text editor and replacethese lines:def GetSampleId(Label):Fields = Label.split(";")for Field in Fields:if Field.startswith("barcodelabel="):return Field[13:]die.Die("barcodelabel= not found in read label '%s'" % Label)with these:def GetSampleId(Label):
SampleID = Label.split()[0].split('_')[0]
return SampleIDI've attached the `uc2otutab_mod.py` file to this post.-MikeOn Monday, March 17, 2014 10:39:22 PM UTC-4, Carly wrote:Hi Mike,Thank you for the pipeline. This has been very helpful. I am at the 'uc2otutab.py' command, and have run into an issue. It is that I do not know how to modify a command. You state you renamed the script 'uc2otutab_mod.py' so that the issue of### uc2otutab.py otu.map.uc###**ERROR** barcodelabel= not found in read label '38B4_2'
does not happen. But how do I do that -- modify the command that is? At this point in my career I can only write commands, not modify them!
Any help would be greatly appreciated. I am actually merging four 454 Junior runs together and using QIIME to demultiplex at first is very helpful.
Thanks!
Carly
On Monday, February 24, 2014 9:10:34 AM UTC-5, Mike R wrote:Hi Serena,Not a problem. The split_libraries_fastq.py script can work on either single or paired-end reads. Just skip the "usearch7 -fastq_mergepairs" / "join_paired_ends.py" step and start directly with split_libraries_fastq.py. Also, the "barcodelabel=" is added to your data by the uparse python scripts "fastq_strip_barcode_relabel.py" and/or "fastq_strip_barcode_relabel2.py". So, simply do not run these scripts on your raw data. Just take the raw data that you received from the sequencing facility and send it directly to split_libraries_fastq.py, then through the rest of the pipeline.-Mike
On Sunday, February 23, 2014 10:00:25 AM UTC-5, Serena Thomson wrote:Hi Mike,Really appreciate you taking the time to respond to me personally, thank you.I can't follow your pipeline exactly, because I don't have paired end data. I am analysing single reads. Therefore I can't run this: split_libraries_fastq.py but I will try with the standard split_libraries.py command and see if I get this error.Ideally I just need to modify the uc2otutab script so that it doesn't look for this barcodelabel=Thanks again
SerenaOn Thu, Feb 20, 2014 at 7:36 PM, Mike R <soilbd...@gmail.com> wrote:
Serena,I forgot to mention, you can also simplify things by making use of join_paired_ends.py (with the -b flag set), instead of using the `usearch -mergepairs` and the `remove_unused_barcodes.py ` steps (the first two steps) I initially posted. So, just do `join_paired_ends.py` followed by `split_libraries_fastq.py`, then on to the rest of the pipeline.-Mike
.
On Thursday, February 20, 2014 12:28:57 PM UTC-7, Mike R wrote:Hi Serena,Did you follow my pipeline from the very beginning and only use the commands I posted? If so, then the "barcodelabel= not found" should not arise using the pipeline exactly as posted. In fact, the commands I posted precisely circumvent this issue, as I am not using the `fastq_strip_barcode_relabel2.py` (my data were not in a compatible format for that script). Thus, I circumvent the use of `fastq_strip_barcode_relabel2.py` by using `split_libraries_fastq.py` to demultiplex my data instead. This is why I modified `uc2otutab.py` the way I did. That modification will only work if you demultiplex via `split_libraries_fastq.py`, hence the reason for my posting an example of the OTU header I had to parse. If you notice, the OTU header is in the format you'd generally expect as output from `split_libraries_fastq.py` (e.g. "SampleID_SeqCount<whitespace>OtherInfo".So, unless there are some peculiarities with your format, or I am missing something, the pipeline I posted should work. My post was just an initial "quick-and-dirty" stab in which I was able to get my data from UPARSE to QIIME with minimal effort. I would love to hear how others coerced there data from UPARSE into QIIME. Especially considering the myriad formats of our respective data. :-)Does this help?-Mike
On Thursday, February 20, 2014 7:36:19 AM UTC-7, Serena Thomson wrote:Hi MikeI am failing to see how your modification to the Sample_ID part of the uc2otutab.py gets round the issue of the error it spits out when the barcodelabel= not found. I noticed you don't use the fastq_strip_barcode_relabel2.py script either, so I'm not sure how you managed this.Serena
--
---
You received this message because you are subscribed to a topic in the Google Groups "Qiime Forum" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/qiime-forum/zqmvpnZe26g/unsubscribe.
To unsubscribe from this group and all its topics, send an email to qiime-forum...@googlegroups.com.