rarefaction curves

1,091 views
Skip to first unread message

Danilo Ercolini

unread,
Jun 18, 2012, 10:24:40 AM6/18/12
to Qiime Forum
Hello Qiime forum!
I run the make_rarefaction_plot.py script, but I had some rarefaction
curves where the maximum value on the x axis is about 8000 (sequences
per sample), even if I have an higher number of sequences in some
samples. Why? Is there any parameter to set to have an higher value on
the x axis?
Thanks.

Tony Walters

unread,
Jun 18, 2012, 11:14:10 AM6/18/12
to qiime...@googlegroups.com
Hello Danilo,

You would need to run alpha_rarefaction.py ( http://qiime.org/scripts/alpha_rarefaction.html?highlight=alpha_rarefaction ) and set the --max_rare_depth to a higher value.  The default max is the median sequences/sample.

-Tony

Danilo Ercolini

unread,
Jun 19, 2012, 6:49:43 AM6/19/12
to qiime...@googlegroups.com
Hi Tony!
Many thanks for your suggestion! I have another problem now: I'd like to have an OTU heatmap showing only those OTUs whith a precence higher than 1%. How can I do? Moreover, I'd like to have an heatmap where are displayed the percentages of every OTU for each sample. Is it possible??

Tony Walters

unread,
Jun 19, 2012, 11:46:15 AM6/19/12
to qiime...@googlegroups.com
Hello Danilo,

I don't think we have a direct way to do that, but you could approach the problem this way:
1.  Run per_library_stats.py (http://qiime.org/scripts/per_library_stats.html) on your OTU table to get the total sequence count.  Multiply by 0.01 to get a threshold (you'll have to choose to round up or down to the closest whole number).
2.  Run filter_otus_from_otu_table.py ( http://qiime.org/scripts/filter_otus_from_otu_table.html?highlight=filter_otus_from_otu_table ) on the OTU table, and use the -n parameter and the value calculated in step 1 to filter out those OTUs that have less than 1% abundance.
3.  Make a heatmap using this filtered OTU table.

Hope this helps,
Tony Walters

Newbie

unread,
Jun 28, 2012, 3:37:09 PM6/28/12
to Qiime Forum
Hi everyone,

I have a quick question about the rarefaction plots. In my current
dataset, I have 30 samples with the minimum number of sequences in a
sample of 833, and the maximum number in a single sample is 4189. When
I run alpha_rarefaction.py and set my maximum rarefaction depth to 833
to standardize by my sample with the fewest sequences, I have several
samples that only plot with ~500 sequences, although those samples
have many more sequences than that, definitely more than 833. What is
it about how rarefaction works that would cause those samples to show
only so few sequences?

Thank you,
Molli

Jose Carlos Clemente

unread,
Jun 28, 2012, 3:38:46 PM6/28/12
to qiime...@googlegroups.com
Hi Molli,

can you please send the result of per_library_stats.py on your OTU
table, and the exact command you used for alpha_rarefaction.py?

Thanks,
Jose

Newbie

unread,
Jun 28, 2012, 3:59:18 PM6/28/12
to Qiime Forum
Hi Jose,

I am not sure that I replied correctly this last time.

The results of per_library_stats.py are as follows:
Num samples: 30

Seqs/sample summary:
Min: 286
Max: 1496
Median: 728.5
Mean: 738.566666667
Std. dev.: 306.653407322
Median Absolute Deviation: 249.0
Default even sampling depth in
core_qiime_analyses.py (just a suggestion): 286

Seqs/sample detail:
DNA.2: 286
DNA.11: 306
DNA.6: 315
DNA.1: 334
DNA.5: 399
DNA.24: 416
DNA.9: 500
DNA.14: 511
DNA.25: 513
DNA.28: 538
DNA.23: 550
DNA.18: 593
DNA.7: 607
DNA.29: 689
DNA.26: 724
DNA.3: 733
DNA.17: 737
DNA.20: 749
DNA.16: 777
DNA.4: 805
DNA.13: 903
DNA.12: 998
DNA.27: 1018
DNA.22: 1040
DNA.19: 1071
DNA.21: 1124
DNA.8: 1131
DNA.10: 1132
DNA.30: 1162
DNA.15: 1496

Total observations (sequences): 22157

These numbers match up with what I am seeing in the rarefaction plots.
But, these are the numbers I get after split_libraries_fastq.py:

Quality filter results
Total number of input sequences: 667319
Barcode not in mapping file: 29967
Read too short after quality truncation: 445745
Count of N characters exceeds limit: 607
Illumina quality digit = 0: 0
Barcode errors exceed max: 126028

Result summary (after quality filtering)
Median sequence length: 151.00
DNA.15 4189
DNA.10 3370
DNA.27 3311
DNA.30 3295
DNA.8 3255
DNA.21 3114
DNA.19 3025
DNA.22 2985
DNA.12 2859
DNA.13 2642
DNA.16 2422
DNA.17 2390
DNA.4 2354
DNA.3 2143
DNA.20 2140
DNA.26 2118
DNA.29 2009
DNA.18 1760
DNA.7 1692
DNA.23 1654
DNA.25 1630
DNA.28 1594
DNA.9 1570
DNA.14 1423
DNA.24 1269
DNA.5 1157
DNA.1 970
DNA.6 900
DNA.11 899
DNA.2 833
---

This is what I am using for alpha_rarefaction.py:
alpha_rarefaction.py -i '/home/qiime/Desktop/Shared_Folder/
06272012_MiSeq_Analysis/ucrC/uclust_ref_picked_otus/otu_table.txt' -m
'/home/qiime/Desktop/Shared_Folder/06272012_MiSeq_Analysis/
mapping_output/Rhizobox_16S_mapping_corrected.txt' -o wf_arare/ -p '/
home/qiime/Desktop/Shared_Folder/06272012_MiSeq_Analysis/
alpha_params.txt' -t '/software/gg_otus-4feb2011-release/trees/
gg_97_otus_4feb2011.tre' -e 833

What is it about the processing that leads to this decrease in
sequence number per sample? I have been following the flow of the
tutorial script as well as the Illumina processing workflow.

Thanks for the help!
Molli

On Jun 28, 2:38 pm, Jose Carlos Clemente <jose.cleme...@gmail.com>
wrote:
> Hi Molli,
>
> can you please send the result of per_library_stats.py on your OTU
> table, and the exact command you used foralpha_rarefaction.py?
>
> Thanks,
> Jose
>
>
>
>
>
>
>
> On Thu, Jun 28, 2012 at 1:37 PM, Newbie <rams...@auburn.edu> wrote:
> > Hi everyone,
>
> > I have a quick question about the rarefaction plots. In my current
> > dataset, I have 30 samples with the minimum number of sequences in a
> > sample of 833, and the maximum number in a single sample is 4189. When
> > I runalpha_rarefaction.pyand set my maximum rarefaction depth to 833

Jose Carlos Clemente

unread,
Jun 28, 2012, 4:48:27 PM6/28/12
to qiime...@googlegroups.com
Hi Molli,

what I think could be happening here is that when you pick OTUs some
of the reads that passed de-multiplexing (split_libraries_fastq) fail
to hit your reference DB. That would explain the lower number of
sequences you get. You should of course set your rarefaction levels
based on what you have in the OTU table rather than in what you get
after demux.

By the way, if this is Illumina it looks like you have very low number
of reads per sample. Are you using MiSeq?

Jose

Newbie

unread,
Jun 28, 2012, 4:56:37 PM6/28/12
to Qiime Forum
Hi Jose,

I see. It must be samples that did not match the Greengenes database.
I will adjust my curves based on the per_library_stats.py output
instead of the split library log.

Yes, it is Illumina data. It was a very low cluster density run. This
was my first run on our new MiSeq, and I only used 8pM of sample with
40% of 8pM PhiX control and 0.2N NaOH. I am planning to up my pM of
sample load on the next run and reduce the PhiX to 30% to try to
increase my cluster density. I thought this data would be a good test
set for learning to use QIIME on Illumina data. Do you have any advice
for ways to improve cluster density and increase sample yield for
MiSeq runs? I am definitely open to suggestions. :)

Thanks so much for the help here!
Molli

On Jun 28, 3:48 pm, Jose Carlos Clemente <jose.cleme...@gmail.com>
wrote:
> Hi Molli,
>
> >alpha_rarefaction.py-i '/home/qiime/Desktop/Shared_Folder/

Jose Carlos Clemente

unread,
Jun 28, 2012, 5:19:07 PM6/28/12
to qiime...@googlegroups.com
Molli,

I don't have much experience with MiSeq, maybe others in the forum
have some suggestions about how to increase yield and density. Let us
know if you have other problems with the analysis.

Cheers,
Jose

Newbie

unread,
Jun 29, 2012, 3:25:56 PM6/29/12
to qiime...@googlegroups.com
Hi Jose,

I was thinking about this reduction in sequences per sample that I am getting because of a large portion of my reads not matching the Greengenes database. What do people typically do if a high number of their sequences does not match the database and causes a drastic reduction in sequence numbers per sample? From my analysis, a large number of the sequences I have matching the database are Acidobacteria, and I am just wondering if I am losing a lot of sequences solely because they do not match the database. Should I just try the analysis using a variety of databases? Or, is there a way to still include these in the downstream workflow but denote them as sequences that did not hit to the database (or various databases if needed)?

Thank you again for all of your help with the rarefaction question!
Molli

Jose Carlos Clemente

unread,
Jun 29, 2012, 4:04:51 PM6/29/12
to qiime...@googlegroups.com
Hi Molli,

there are several things you could consider:

- run OTU picking at a lower identity threshold (e.g. 90%) to make
sure your sequences are in fact 16S and not junk.
- if this works and most of the sequences now hit reference, I would
to OTU picking at 97% with uclust_ref but allowing for new clusters.
By doing this, you will capture the sequences that are failing to hit
greengenes into new clusters
- alternatively try using BLAST as your OTU picking method
- what type of environment do your samples come from? If it's human
gut and you are getting less than 90% of your reads hitting ref,
something must be wrong with either your sequences or the
pre-processing. If it's soil or some other not well-characterized
environment, then you cannot do much and probably a high percentage of
your reads won't hit ref. The solution in this case is, as mentioned
above, uclust_ref with new clusters.

Jose

Newbie

unread,
Jun 30, 2012, 12:22:37 PM6/30/12
to Qiime Forum
Hi Jose,

Yes, these sequences are from a soil environment, in particular
agricultural soil. I will try reducing the identity threshold for OTU
picking, and if that increases what I am getting back, I will switch
to uclust_ref. If I do end up doing this, would I still use
pick_reference_otus_through_table.py or would I use pick_otus.py? I
see where I can specify uclust_ref wiht pick_otus.py or specify a
different reference database path in
pick_reference_otus_through_table.py, but I only see an option to
specify allow for new clusters with pick_otus.py.

Thank you again,
Molli

On Jun 29, 3:04 pm, Jose Carlos Clemente <jose.cleme...@gmail.com>
wrote:
> Hi Molli,
>

Jose Carlos Clemente

unread,
Jul 1, 2012, 7:43:09 PM7/1/12
to qiime...@googlegroups.com
Hi Molli,

if you want to add new clusters, you need to use pick_otus.py, as
pick_reference_otus... would only get you the reference OTUs. The way
I'd do it is first try reducing the identity threshold, and if that
results in higher % of reads hitting reference, then go for ref-based
+ new clusters OTU picking.

Good luck!
Jose

Newbie

unread,
Jul 2, 2012, 6:06:57 PM7/2/12
to Qiime Forum
Hi Jose,

When lowering the identity threshold, do I change the percent identity
in the parameters file (e.g. pick_otus:similarity 0.90) and also
supply a different .fasta file for the rep set? Initially, I thought I
only needed to reduce the similarity threshold in the parameters file,
but when I looked back at the gg_otus-4feb2011-release files, I see
there are other files that seem to be separated by lower threshold
levels.

Thank you,
Molli

On Jun 29, 3:04 pm, Jose Carlos Clemente <jose.cleme...@gmail.com>
wrote:
> Hi Molli,
>

Jose Carlos Clemente

unread,
Jul 2, 2012, 11:15:09 PM7/2/12
to qiime...@googlegroups.com
Hi Molli,

exactly, you need to change the percent identity and also the rep set,
so you want to use matching the identity.

Jose

Newbie

unread,
Jul 3, 2012, 12:26:13 PM7/3/12
to Qiime Forum
Hi Jose,

I re-ran the pick_reference_otus_through_otu_table.py with the lower
threshold (91%). And, below is the output.

qiime@qiime-VirtualBox:~/Desktop$ rm -rf ucrC ;
pick_reference_otus_through_otu_table.py -i '/home/qiime/Desktop/
Shared_Folder/06272012_MiSeq_Analysis/sl_out/seqs.fna' -o ucrC/ -r '/
home/qiime/qiime_software/gg_otus-4feb2011-release/rep_set/
gg_91_otus_4feb2011.fasta' -t '/home/qiime/qiime_software/
gg_otus-4feb2011-release/taxonomies/greengenes_tax.txt' -p '/home/
qiime/Desktop/Shared_Folder/06272012_MiSeq_Analysis/
07032012_91percent_parameters.txt'

qiime@qiime-VirtualBox:~/Desktop$ per_library_stats.py -i '/home/qiime/
Desktop/ucrC/uclust_ref_picked_otus/otu_table.biom' Num samples: 30
Num otus: 989
Num observations (sequences): 35780.0

Seqs/sample summary:
Min: 484.0
Max: 2331.0
Median: 1185.5
Mean: 1192.66666667
Std. dev.: 482.015721274
Median Absolute Deviation: 392.5
Default even sampling depth in
core_qiime_analyses.py (just a suggestion): 484.0

Seqs/sample detail:
DNA.2: 484.0
DNA.11: 503.0
DNA.1: 515.0
DNA.6: 518.0
DNA.5: 656.0
DNA.24: 663.0
DNA.14: 785.0
DNA.9: 840.0
DNA.25: 869.0
DNA.28: 899.0
DNA.23: 901.0
DNA.7: 950.0
DNA.18: 959.0
DNA.29: 1140.0
DNA.20: 1167.0
DNA.26: 1204.0
DNA.3: 1206.0
DNA.17: 1245.0
DNA.16: 1288.0
DNA.4: 1319.0
DNA.13: 1445.0
DNA.12: 1570.0
DNA.22: 1660.0
DNA.19: 1687.0
DNA.27: 1701.0
DNA.21: 1748.0
DNA.8: 1837.0
DNA.30: 1840.0
DNA.10: 1850.0
DNA.15: 2331.0

This was an increase is sequence number per sample over what I
previously posted. So, I ran pick_otus.py using uclust_ref and the 97%
gg_otus .fasta file without suppressing new clusters. I ran the output
of this through assign_taxonomy.py and make_otu_table.py. Then, I put
the .biom file through per_library_stats.py and got the following
output:
qiime@qiime-VirtualBox:~/Desktop$ per_library_stats.py -i '/home/qiime/
Desktop/otu_table.biom'
Num samples: 30
Num otus: 28468
Num observations (sequences): 64972.0

Seqs/sample summary:
Min: 833.0
Max: 4189.0
Median: 2129.0
Mean: 2165.73333333
Std. dev.: 874.043398363
Median Absolute Deviation: 718.0
Default even sampling depth in
core_qiime_analyses.py (just a suggestion): 833.0

Seqs/sample detail:
DNA.2: 833.0
DNA.11: 899.0
DNA.6: 900.0
DNA.1: 970.0
DNA.5: 1157.0
DNA.24: 1269.0
DNA.14: 1423.0
DNA.9: 1570.0
DNA.28: 1594.0
DNA.25: 1630.0
DNA.23: 1654.0
DNA.7: 1692.0
DNA.18: 1760.0
DNA.29: 2009.0
DNA.26: 2118.0
DNA.20: 2140.0
DNA.3: 2143.0
DNA.4: 2354.0
DNA.17: 2390.0
DNA.16: 2422.0
DNA.13: 2642.0
DNA.12: 2859.0
DNA.22: 2985.0
DNA.19: 3025.0
DNA.21: 3114.0
DNA.8: 3255.0
DNA.30: 3295.0
DNA.27: 3311.0
DNA.10: 3370.0
DNA.15: 4189.0

It seems like this will solve the problem of losing sequence data when
generating rarefaction plots. Just to make sure I understand, Is it
that my sequences that did not match the reference dataset but still
passed the quality filtering steps were still 16S (not junk) sequences
since they were picked up when I dropped the identity threshold? They
just didn't happen to match anything in the reference dataset I was
initially using to generate clusters and my rep_set?

Also, I took the output taxonomy assignments generated in
assign_taxonomy.py and used them in make_otu_table.py, along with the
output from pick_otus.py, to generate an OTU table and generate a
heatmap. But, I must be doing something wrong because when I look at
the heatmap, the taxonomy is "None" which is coming from the
pick_otus.py output file (seqs_otus.txt). I thought those would be
replaced with taxonomic assignments provided in -t of
make_otu_table.py, but they are not. I checked, and the taxonomic
assignments file does have taxonomic assignments listed in it. Is
there something I am not specifying correctly? Below is the code I am
using:

pick_otus.py -i '/home/qiime/Desktop/Shared_Folder/
06272012_MiSeq_Analysis/sl_out/seqs.fna' -m uclust_ref -r '/home/qiime/
qiime_software/gg_otus-4feb2011-release/rep_set/
gg_97_otus_4feb2011.fasta'

assign_taxonomy.py -i '/home/qiime/Desktop/Shared_Folder/
06272012_MiSeq_Analysis/sl_out/seqs.fna'

make_otu_table.py -i '/home/qiime/Desktop/
uclust_ref_picked_otus_newclusters/seqs_otus.txt' -o otu_table.biom -t
'/home/qiime/Desktop/rdp22_assigned_taxonomy/
seqs_tax_assignments.txt'

make_otu_heatmap_html.py -i '/home/qiime/Desktop/otu_table.biom' -o
OTU_Heatmap

Thank you again,
Molli


On Jul 2, 10:15 pm, Jose Carlos Clemente <jose.cleme...@gmail.com>
wrote:
> Hi Molli,
>

Newbie

unread,
Jul 5, 2012, 7:22:44 AM7/5/12
to Qiime Forum
Hi everyone!

I was wondering if anyone could help me with my questions in the above
previous post re: taxonomy assignments not showing up.

Thank you,
Molli

On Jul 3, 11:26 am, Newbie <rams...@auburn.edu> wrote:
> Hi Jose,
>
> ...
>
> read more »

Tony Walters

unread,
Jul 5, 2012, 10:53:24 AM7/5/12
to qiime...@googlegroups.com
Hello Molli,

I think you might be pointing at the wrong directory for the taxonomic assignments to use with your OTU mapping file.

Is there an rdp_assigned_taxonomy folder in this directory: /home/qiime/Desktop/Shared_Folder/
06272012_MiSeq_Analysis/sl_out/ ?

If so, I would rerun this step with the taxonomic assignments from the folder above specified with the -t parameter:
make_otu_table.py -i '/home/qiime/Desktop/
uclust_ref_picked_otus_newclusters/seqs_otus.txt' -o otu_table.biom -t
'/home/qiime/Desktop/Shared_Folder/
06272012_MiSeq_Analysis/sl_out/XXXXX'

-Tony

Jose Carlos Clemente

unread,
Jul 5, 2012, 11:01:17 AM7/5/12
to qiime...@googlegroups.com
Hi Molli,

yes, the sequences that hit your reference DB at lower threshold are
not junk, but true 16S. So it just might be that they belong to taxa
that have not been well characterized yet and thus are missing in GG.

Jose

Newbie

unread,
Jul 5, 2012, 1:42:00 PM7/5/12
to Qiime Forum
Hi Tony and Jose,

Thank you for your replies. Tony, there was no rdp_assigned_taxonomy
folder in the sl_out folder. I only had one taxonomic assignment file.
Jose, I thought that too, but I know that before I was getting some
sequences that matched the GG database so I thought I would at least
see some of those showing up. I am re-running it from start to see if
maybe I just specified something wrong since I was trying to do a lot
of different things on Tuesday when I tried it.

Molli

On Jul 5, 10:01 am, Jose Carlos Clemente <jose.cleme...@gmail.com>
wrote:
> Hi Molli,
>
> ...
>
> read more »

Newbie

unread,
Jul 5, 2012, 2:52:53 PM7/5/12
to Qiime Forum
Quick question. After I run:
qiime@qiime-VirtualBox:~/Desktop$ pick_otus.py -i '/home/qiime/Desktop/
Shared_Folder/06272012_MiSeq_Analysis/sl_out/seqs.fna' -m uclust_ref -
r '/home/qiime/qiime_software/gg_otus-4feb2011-release/rep_set/
gg_97_otus_4feb2011.fasta' -o uclust_ref_picked_otus/

I get 4 files: seqs_clusters.uc, seqs_failures.txt, seqs_otus.log, and
seqs_otus.txt. Is it correct that I then need to run
assign_taxonomy.py to obtain taxonomic IDs for these clusters, and if
so, what would the input file for this be? I first used my
original .fna file, but I am wondering if this should just be a
smaller representative set of my sequences, but do not know where that
would be coming from. And, are there any of the options flags for
assign_taxonomy.py that I would need to use given I am using
uclust_ref without suppressing new clusters?

Thank you again,
Molli
> ...
>
> read more »

Tony Walters

unread,
Jul 5, 2012, 3:02:46 PM7/5/12
to qiime...@googlegroups.com
Hello Molli,

You want to first run pick_rep_set.py (specifying the seqs_otus.txt file as the OTU mapping file, and the seqs.fna file as the source fasta file).  Then you run assign_taxonomy.py on the rep_set.fna file that is created (and you probably want to specify an output directory during this step, just so you have control the location of the assignments.txt file used to create the OTU table).

-Tony

Newbie

unread,
Jul 5, 2012, 3:05:00 PM7/5/12
to Qiime Forum
Hi Tony,

Thank you so much! I knew there was some fundamental portion to this
that I was not doing. :)

Thank you for clearing that up! I am sure that must be where the mix
up was.

I will follow up again once I get it to work.

Thank you!
Molli
> ...
>
> read more »

peziza

unread,
Jul 8, 2012, 12:44:14 AM7/8/12
to qiime...@googlegroups.com
Hi everyone,
I would like the smooth out the lines in my rarefaction curves. From the output file chao1.txt (from the alpha_div_collated folder output), it appears there are 5 iterations and it starts with 10 sequences per sample, but then jumps to 104, 198, 292, etc... I would like to have the number of sequences per sample be more even, perhaps steps of 20 (or 50) sequences per sample. I have tried using -n 20 at the end of alpha_rarefaction.py, but that did not help. Thank you for your help! I appreciate any help with this! 

Luke Ursell

unread,
Jul 8, 2012, 12:52:10 AM7/8/12
to qiime...@googlegroups.com
Hi there,

The first thing to do is to do single_rarefaction.py on your OTU table at some given maximum number of sequences per sample that you would like to use (you can do per_library_stats.py to figure out what a good number might be so that you balance out dropping low sequence samples with getting a maximum even sampling). Then using the alpha_rarefaction.py script, set the max_rare_depth to whatever you just previously rarefied by, set the min_rare_depth so something you are happy with, and then you can adjust the -n steps so that its even, i.e. if your min seqs are 100, and your max seqs are 1000, then maybe you'd like to do 20 steps, so that you bump up 50 seqs/sample each iteration.

Hope this helps,
Luke

Greg Caporaso

unread,
Jul 8, 2012, 11:22:58 AM7/8/12
to qiime...@googlegroups.com
One modification to Luke's response. You don't want to call the
single_rarefaction.py script first: instead just set the max
rarefaction depth with -e to alpha_rarefaction.py, and follow the
additional steps that Luke recommends for this. The reason for this is
that if you call single_rarefaction.py first, all rarified OTU tables
will only draw from that input rarefied table, where you really want
them to each draw from the full OTU table.

Luke Ursell

unread,
Jul 8, 2012, 9:00:04 PM7/8/12
to qiime...@googlegroups.com, qiime...@googlegroups.com
Yep that sounds a bit better.

Sent from my iPhone

peziza

unread,
Jul 8, 2012, 10:10:19 PM7/8/12
to qiime...@googlegroups.com
Dear Luke and Greg,
Thank you both very much! I was working in an older version of MacQiime which did not support the -e option. I updated it and had to rerun some of the analysis. 

Below is what I entered for the alpha_rarefaction.py command, however, the outputs from the alpha_div_collated (such as chao1.txt) still do not have steps of 20 sequences per samples. What is the argument for the min_rare_depth? It is not clear what it is supposed to be: http://qiime.org/scripts/alpha_rarefaction.html. Is it not working because a min depth was not provided? Any other suggestions?

MacQIIME macbook:Q $ alpha_rarefaction.py -i otu_table.txt -m Mapfile_corrected.txt -o rare950 -p custom_parameters.txt -e 950  -n 20

Thank you!!! 


On Sunday, July 8, 2012 6:00:04 PM UTC-7, Luke Ursell wrote:
Yep that sounds a bit better.

Sent from my iPhone

On Jul 8, 2012, at 9:22 AM, Greg Caporaso 

> One modification to Luke's response. You don't want to call the
> single_rarefaction.py script first: instead just set the max
> rarefaction depth with -e to alpha_rarefaction.py, and follow the
> additional steps that Luke recommends for this. The reason for this is
> that if you call single_rarefaction.py first, all rarified OTU tables
> will only draw from that input rarefied table, where you really want
> them to each draw from the full OTU table.
>
> On Sat, Jul 7, 2012 at 9:52 PM, Luke Ursell 

Luke Ursell

unread,
Jul 8, 2012, 11:03:50 PM7/8/12
to qiime...@googlegroups.com
Hey there,

I think the problem is that you need to set your steps so that when you subtract your max seqs/sample minus your min seqs/sample ( from the min_rare_depth), you wind up with steps of 20 sequences. For example, running:


alpha_rarefaction.py -i otu_table.txt -m MapFile_corrected.txt -o rare950 -p custom_paramaters.txt -e 900 -n 45 --min_rare_depth 20

will start at 20 seqs/sample, then do 45 steps to get to 900 (which means jumping up by 20 seqs/sample every iteration).

Hope this helps,
Luke

peziza

unread,
Jul 9, 2012, 12:48:40 AM7/9/12
to qiime...@googlegroups.com
Hi Luke,
That worked! Thanks so much! 

Have you dealt with modifying OTU tables to run rarefaction curves in EstimateS? 

Thanks again!

Luke Ursell

unread,
Jul 9, 2012, 12:56:46 AM7/9/12
to qiime...@googlegroups.com
Hi,

If you convert your otu_table.biom to otu_table.txt with the convert_biom.py script, you can open the otu_table.txt file in Excel. This should allow you to add in the second row information about number of samples and number of species (OTUs) that you need for the EstimateS formatting.

Best,
Luke

peziza

unread,
Jul 9, 2012, 6:50:02 PM7/9/12
to qiime...@googlegroups.com
Thank you, Luke! EstimateS continues to crash every time I try and use it. Thank you for all your help! 

peziza

unread,
Jul 10, 2012, 1:40:48 PM7/10/12
to qiime...@googlegroups.com
I would like to run rarefaction curves for each of my treatments (by individuals instead of by sites) in EstimateS. The resulting graph would be (number of OTUs (y axis) vs. number of sequence reads (x axis)). In order to do this, the OTU table (parsed by treatment) has to be modified to accommodate EstimateS's formats. If I just input an OTU table of a particular treatment (that was separated out by metadata), the resulting output from EstimateS is rarified by the number of sites (or replicates) instead of by individuals. Any suggestions?   

Luke Ursell

unread,
Jul 10, 2012, 1:51:44 PM7/10/12
to qiime...@googlegroups.com
Hi,

It might be easier to ask the EstimateS people what format you need to get your data in in order to visualize the output you are looking for. Once you know that, then it will be easier for us to figure out how QIIME can produce that input you require.

Best,
Luke

Lâm Nguyễn Văn

unread,
Nov 7, 2014, 11:21:21 PM11/7/14
to qiime...@googlegroups.com
Hi everyone.
I have a same problem with Danilo Ercolini, but on Qiime-1.8.0. I have 3 sample and I set --max-rare-depth with the max read of sample. but Qiime not calculate rarefaction of last read each sample.

How can I calculate rarefaction on last read in each sample.
Example
A: 1037 reads
B: 2194 reads
C: 3756 reads
I set --max-rare-depth: 3756 (alpha_rarefaction.py). but it not calculate last reads in sample A (1037) and B (2194).
What should I do?
On Monday, June 18, 2012 10:14:10 PM UTC+7, TonyWalters wrote:
Hello Danilo,

You would need to run alpha_rarefaction.py ( http://qiime.org/scripts/alpha_rarefaction.html?highlight=alpha_rarefaction ) and set the --max_rare_depth to a higher value.  The default max is the median sequences/sample.

-Tony

On Mon, Jun 18, 2012 at 8:24 AM, Danilo Ercolini <danilo....@gmail.com> wrote:
Hello Qiime forum!
I run the make_rarefaction_plot.py script, but I had some rarefaction
curves where the maximum value on the x axis is about 8000 (sequences
per sample), even if I have an higher number of sequences in some
samples. Why? Is there any parameter to set to have an higher value on
the x axis?
Thanks.

Barvaz Sini

unread,
Nov 8, 2014, 12:29:19 PM11/8/14
to qiime...@googlegroups.com
Hi,
Could you clarify a bit what is exactly the problem? What do you mean "but it not calculate last reads in sample A (1037) and B (2194)"?
What is the output you wanted, and what is the output you are getting instead?

Also, could you post the exact command you are using.

Cheers,
Amnon

--

---
You received this message because you are subscribed to the Google Groups "Qiime Forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to qiime-forum...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

shashank gupta

unread,
Jan 8, 2015, 5:30:04 AM1/8/15
to qiime...@googlegroups.com
Hi Jose,
i am having the same problem.

So how can i interpret the data with this. Should i write on the report that when you pick OTUs some of the reads that passed de-multiplexing (split_libraries_fastq) fail to hit your reference DB ?

Thanx
Shashank

On Friday, June 29, 2012 2:18:27 AM UTC+5:30, Jose wrote:
Hi Molli,

what I think could be happening here is that when you pick OTUs some
of the reads that passed de-multiplexing (split_libraries_fastq) fail
to hit your reference DB. That would explain the lower number of
sequences you get. You should of course set your rarefaction levels
based on what you have in the OTU table rather than in what you get
after demux.

By the way, if this is Illumina it looks like you have very low number
of reads per sample. Are you using MiSeq?

Jose

On Thu, Jun 28, 2012 at 1:59 PM, Newbie <ram...@auburn.edu> wrote:
> Hi Jose,
>
> I am not sure that I replied correctly this last time.
>
> The results of per_library_stats.py are as follows:
> Num samples: 30
>
> Seqs/sample summary:
>  Min: 286
>  Max: 1496
>  Median: 728.5
>  Mean: 738.566666667
>  Std. dev.: 306.653407322
>  Median Absolute Deviation: 249.0
>  Default even sampling depth in
>  core_qiime_analyses.py (just a suggestion): 286
>
> Seqs/sample detail:
>  DNA.2: 286
>  DNA.11: 306
>  DNA.6: 315
>  DNA.1: 334
>  DNA.5: 399
>  DNA.24: 416
>  DNA.9: 500
>  DNA.14: 511
>  DNA.25: 513
>  DNA.28: 538
>  DNA.23: 550
>  DNA.18: 593
>  DNA.7: 607
>  DNA.29: 689
>  DNA.26: 724
>  DNA.3: 733
>  DNA.17: 737
>  DNA.20: 749
>  DNA.16: 777
>  DNA.4: 805
> On Jun 28, 2:38 pm, Jose Carlos Clemente <jose.cleme...@gmail.com>
> wrote:
>> Hi Molli,
>>
>> can you please send the result of per_library_stats.py on your OTU
>> table, and the exact command you used foralpha_rarefaction.py?
>>
>> Thanks,
>> Jose
>>
>>
>>
>>
>>
>>
>>

Jose Carlos Clemente

unread,
Jan 9, 2015, 3:33:09 PM1/9/15
to qiime...@googlegroups.com
Hi,

if you are doing closed reference OTU picking and your coverage per sample is low after picking OTUs but not before, then yes, your reads are failing to hit the reference DB.

Cheers,
Jose

--

---
You received this message because you are subscribed to the Google Groups "Qiime Forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to qiime-forum+unsubscribe@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages