Compare samples from different biom files

150 views
Skip to first unread message

p.se...@wisplinghoff.de

unread,
May 10, 2017, 9:50:13 AM5/10/17
to Qiime 1 Forum
Hello everybody,

I am really wondering about the results I get at the end of my analysis. So I hope to get some feedback from the community :)
I followed the tutorial located at http://qiime.org/tutorials/tutorial.html to create a plot with stacked bar charts for gut microbiome analysis. This way I compare the contained 6 barcoded samples in my fastq file with each other. The commands I used were:

#step 1
validate_mapping_file
.py -m map1.txt -o validate_map
#step 2
convert_fastaqual_fastq
.py -c fastq_to_fastaqual -f file.fastq -o fastaqual
#step 3
split_libraries
.py -m map1.txt -f fastaqual/file.fna -q fastaqual/file.qual -o split_library_out/ -b 13 -l 140 -z truncate_only
#step 4
pick_de_novo_otus
.py -i split_library_out/seqs.fna -o otus/
#step 5
assign_taxonomy
.py -i otus/rep_set/seqs_rep_set.fasta -m rdp -o rdp_assigned_taxonomy
#step 6
make_otu_table
.py -i otus/uclust_picked_otus/seqs_otus_txt -t rdp_assigned_taxonomy/seqs_rep_set_tax_assignments.txt -o L7_otu_table.biom
#step 7
summarize_taxa
.py -i L7_otu_table.biom -o L7_taxonomy_summary/ -L 7
#step 8
plot_taxa_summary
.py -i L7_taxonomy_summary/L7_otu_table_L7.txt -o L7_taxonomy_plot/

I used the same procedure to process a second fastq file with 6 other samples. So far, the pipeline works well.
 
My problem is now that I would like to compare 2 samples from different biom files in a stacked bar chart with each other. I googled around a bit and these commands looked good to me:

# Split biom file belonging to file.fastq (created at step 6 above) by SampleID
split_otu_table
.py -i L7_otu_table.biom -m map1.txt -f SampleID -o split_by_sample
# The same for file2.fastq
split_otu_table
.py -i L7_otu_table.biom -m map2.txt -f SampleID -o split_by_sample
# Merge 2 biom files containing the samples I want to compare
merge_otu_tables
.py -i otu_table.file.fastq.biom,otu_table.file2.fastq.biom -o merged_otu_table.biom
# Now I follow the above procedure starting at step 7 to create a bar chart with new biom file
summarize_taxa
.py -i merged_otu_table.biom -o new_taxonomy_summary/ -L 7
plot_taxa_summary
.py -i new_taxonomy_summary/merged_otu_table_L7.txt -o new_taxonomy_plot/

I end up with a new plot, containing 2 bar charts; but I am wondering that the percentages in one of the two charts are completely different compared to its initial version. To my mind I should get a plot with 2 bar charts, containing identical percentages like their initial version.

Am I doing something wrong or do I miss a step here? I really can't imagine why values change in my new plot. 
Any advise is really appreciated :)

Patrick

Colin Brislawn

unread,
May 10, 2017, 1:05:59 PM5/10/17
to Qiime 1 Forum
Hello Patrick,

Comparing different .biom files could be easy or could be nearly impossible, depending how you made OTUs within those files.

If you used closed-ref OTU picking, then all your OTU ids will match and you can safely combine OTU tables using the merge_otu_tables.py command. If you used open-ref or de novo OTU picking methods, then different OTUs will have the exact same IDs, and your merged table will be meaningless. 

If you want to compare samples with any open-ref or de novo methods, the only easy way to do this is to combine your seqs.fna files, then process them together in OTU picking. 

Does that help answer your question?
Colin

Greg Caporaso

unread,
May 10, 2017, 3:56:42 PM5/10/17
to Qiime 1 Forum
Hi Patrick,
Colin is correct - you need to first combine your sequences files if you're going to run pick_de_novo_otus.py. You can't merge tables resulting from two different runs of pick_de_novo_otus.py as the OTU identifiers are not consistent across the two runs. I think what's happening is when you're merging your OTU tables, the taxonomy that is associated with each OTU is getting mixed up because of this. merge_otu_table.py has a warning about this in its help text:

$ merge_otu_tables.py --help
...

Requirements: It is also very important that your OTUs are consistent across the different OTU tables. For example, you cannot safely merge OTU tables from two independent de novo OTU picking runs. Finally, either all or none of the OTU tables can contain taxonomic information: you can't merge some OTU tables with taxonomic data and some without taxonomic data.

Once you combine your sequence files and re-run pick_de_novo_otus.py, your workflow will be a lot easier since all of your samples will already be in one BIOM file. If you want to split the data into two BIOM files for separate analyses, you can run split_otu_tables.py. If you want to generate a taxonomy barplot for just a couple of samples, you could use filter_samples_from_otu_table.py to create the table that has the two samples you want, and then pass that to summarize_taxa_through_plots.py.

Hope this helps! 
Greg

Patrick

unread,
May 11, 2017, 4:46:58 AM5/11/17
to Qiime 1 Forum
Hi Colin and Greg,

thank you very much for your help! Now I get the expected results.

Actually I already read merge_otu_tables.py's help text, but I missed this important paragraph...

Have a nice day!

Patrick

Greg Caporaso

unread,
May 11, 2017, 1:11:44 PM5/11/17
to Qiime 1 Forum
Great, glad it's working for you. Thanks for following up! 

Best,
Greg
Reply all
Reply to author
Forward
0 new messages