Problems Generating BIOM Table for Downstream Analyses

157 views
Skip to first unread message

zabe...@ucdavis.edu

unread,
Aug 17, 2017, 3:13:45 PM8/17/17
to Qiime 1 Forum
Hello,

I generated a .txt version of my OTU table so I could employ a batch correction procedure, and I now want to convert this table back to .BIOM for analysis in QIIME.  However. I cannot get the biom convert script to work the way I expect it to (code below).  Am I running the script correctly?  Also, the batch correction procedure converts the OTU counts to non-integer values (i.e. it adds decimals).  Would this prevent the biom convert script from reading them correctly?  Any help would be greatly appreciated.  I've also attached my OTU table to this thread. 

#this produces an OTU table without error
biom convert
-i ./LSU_otu_table_sorted_filtered_0.00005_rare12290_batch_corrected.txt -o ./LSU_otu_table_sorted_filtered_0.00005_rare12290_batch_corrected_hdf5.biom --table-type="OTU table" --process-obs-metadata "taxonomy" --to-hdf5

#this also runs without error, but each taxonomic level in the output is empty and the summary files all say "nan"

summarize_taxa_through_plots
.py -i ./LSU_otu_table_sorted_filtered_0.00005_rare12290_batch_corrected_hdf5.biom -o ./taxa_summary/batch_corrected/ -f
LSU_otu_table_sorted_filtered_0.00005_rare12290_batch_corrected.txt

Jose Antonio Navas Molina

unread,
Aug 18, 2017, 11:37:39 AM8/18/17
to Qiime 1 Forum
Hello,

I've been trying to run your biom command in the given table and the command is failing with the following error: "ValueError: column index exceeds matrix dimensions"

Can you double check that the given table is the same than the one given to the command? I've opened the table and the last column is missing the "taxonomy" header, which that may be the reason why you are not getting taxonomies in your HDF5 biom table.

After running the biom convert command in the table, run "biom summarize-table" in the resulting table, and it should tell you if taxonomy has been added or not.

Hope this helps!

zabe...@ucdavis.edu

unread,
Aug 18, 2017, 3:07:19 PM8/18/17
to Qiime 1 Forum
Hi Jose,

Thanks for pointing that out.  I've attached the same file with the taxonomy header.  I just confirmed that this file produces the issues I described above, so my problem was unrelated to that.  My issue is that the taxonomies present in the .txt file appear in the .BIOM file, but all of the sequence counts are NaN.

I noticed that, following batch correction, the counts for one of my OTUs (#1106614) was replaced with 'NaN'.  I'm wondering if this is causing the issue because the biom convert script expects numerical values.   

Running biom summarize-table on the BIOM table gives me this error:
>biom summarize-table -i ./LSU_otu_table_sorted_filtered_0.00005_rare12290_batch_corrected.biom -o ./SUMMARY_batch_corrected_otu.txt

Traceback (most recent call last):
 
File "/usr/local/bin/pyqi", line 184, in <module>
    optparse_main
(cmd_obj, argv[1:])
 
File "/usr/local/lib/python2.7/dist-packages/pyqi/core/interfaces/optparse/__init__.py", line 275, in optparse_main
    result
= optparse_cmd(local_argv[1:])
 
File "/usr/local/lib/python2.7/dist-packages/pyqi/core/interface.py", line 39, in __call__
    cmd_result
= self.CmdInstance(**cmd_input)
 
File "/usr/local/lib/python2.7/dist-packages/pyqi/core/command.py", line 137, in __call__
    result
= self.run(**kwargs)
 
File "/usr/local/lib/python2.7/dist-packages/biom/commands/table_summarizer.py", line 119, in run
    lines
.append('Total count: %d' % total_count)
TypeError: %d format: a number is required, not float

zabe...@ucdavis.edu

unread,
Aug 18, 2017, 3:16:59 PM8/18/17
to Qiime 1 Forum
I went ahead and converted the NaN values to 0 and re-ran the scripts, and everything worked as expected.  So word of warning to everyone using ComBat for batch correction, check those output files for missing values!

Colin Brislawn

unread,
Aug 18, 2017, 3:22:01 PM8/18/17
to Qiime 1 Forum
I'm glad this is working for you!

Thanks for sharing what you found. Your post will really help other qiime users solve this problem in the future.

Also... does the .biom format support a NA value? In many omics types, na != 0, so perhaps supporting an 'na' value would be helpful. Just a thought!

Colin

Jose Antonio Navas Molina

unread,
Aug 18, 2017, 4:06:31 PM8/18/17
to Qiime 1 Forum
Glad you found the issue! I would check in the ComBat documentation or with their developers to find out which is exactly the meaning of NaN. As Colin points out, typically NaN != 0.

Colin, if you're interested in that feature feel free to open an issue in biom's format issue tracker to get the discussion rolling!

zabe...@ucdavis.edu

unread,
Aug 18, 2017, 4:17:05 PM8/18/17
to Qiime 1 Forum
I went back to OTU 1106614 that gave the NaN values after batch correction.  In my rarefied OTU table, 1106614 only has counts in one batch of samples, but has 0 counts in the other batch.  My assumption is that ComBat didn't know how to handle this situation which is why it gave missing values.  Since this OTU is batch-specific and couldn't be corrected for in ComBat, I decided to just remove it for downstream analyses. 
Reply all
Reply to author
Forward
0 new messages