txt - to .biom conversion from SourceTracker output

169 views
Skip to first unread message

Rebekah Henry

unread,
Jul 13, 2016, 9:50:35 PM7/13/16
to Qiime 1 Forum
Hi.

I have the full_results output from SourceTracker and would like to convert these to a biom so I can pass this through summarize_taxa and have a look at the OTUs which were applied for this analysis. (The biom used for analysis was converted using biom convert -b)

I have tried using the following:

biom convert -i /data/rebekah_data/model_ST_Paper/SourceTracker_Dusan/rerun_kew/run1/full_results/sink_predictions_WGB_contributions.txt -o /data/rebekah_data/model_ST_Paper/SourceTracker_Dusan/rerun_kew/run1/full_results/sink_predictions_WGB_contributions.biom --table-type="OTU table" --process-obs-metadata taxonomy

biom add-metadata -i /data/rebekah_data/model_ST_Paper/SourceTracker_Dusan/rerun_kew/run1/full_results/sink_predictions_WGB_contributions.biom -o /data/rebekah_data/model_ST_Paper/SourceTracker_Dusan/rerun_kew/run1/full_results/sink_predictions_WGB_contributions_meta.biom -m /data/rebekah_data/model_ST_Paper/SourceTracker_Dusan/rerun_kew/run1/map.txt

summarize_taxa.py -i /data/rebekah_data/model_ST_Paper/SourceTracker_Dusan/rerun_kew/run1/full_results/sink_predictions_WGB_contributions_sc.biom -o /data/rebekah_data/model_ST_Paper/SourceTracker_Dusan/rerun_kew/run1/full_results/taxa_summaries/WGB/ -L 5,6

However, I keep getting the following error:

  raise ValueError, ("BIOM table does not contain any "
ValueError: BIOM table does not contain any observation metadata (e.g., taxonomy). You can add metadata to it using the 'biom add-metadata' command.

Seeing as I am adding the meta-data  - I don't understand whats going on?

Has anyone has success turning these files into figures????

Cheers

Justine Debelius

unread,
Jul 13, 2016, 10:37:13 PM7/13/16
to Qiime 1 Forum
Hi Rebekah,

You to summarize taxonomy, you need to pass in taxonomic metadata. Currently, you're adding sample metadata (-m), and you need to add observation metadata (--observation-metadata-fp). You'll need a file that maps taxonomy to OTU ID, which may be different locations depending on your OTU picking method.

Thanks,
Justine

Rebekah Henry

unread,
Sep 13, 2016, 7:16:12 PM9/13/16
to Qiime 1 Forum
Hi,

Have given it another try... this time have applied....

As my biom format according to qiime is 1.2.0 I first passed:

sed 's/Consensus Lineage/ConsensusLineage/' < /data/rebekah_data/Tuflow_ST_Paper/SourceTracker_Dusan/rerun_kew/run1/full_results/sink_predictions_WGB_contributions.txt | sed 's/ConsensusLineage/taxonomy/' > /data/rebekah_data/Tuflow_ST_Paper/SourceTracker_Dusan/rerun_kew/run1/full_results/sink_predictions_WGB_contributions_taxonomy.txt

biom convert -i /data/rebekah_data/Tuflow_ST_Paper/SourceTracker_Dusan/rerun_kew/run1/full_results/sink_predictions_WGB_contributions_taxonomy.txt -o /data/rebekah_data/Tuflow_ST_Paper/SourceTracker_Dusan/rerun_kew/run1/full_results/sink_predictions_WGB_contributions_taxonomy.biom --table-type="OTU table" --process-obs-metadata taxonomy

 biom add-metadata -i /data/rebekah_data/Tuflow_ST_Paper/SourceTracker_Dusan/rerun_kew/run1/full_results/sink_predictions_WGB_contributions_taxonomy.biom -o /data/rebekah_data/Tuflow_ST_Paper/SourceTracker_Dusan/rerun_kew/run1/full_results/sink_predictions_WGB_contributions_taxonomy_omd.biom --observation-metadata-fp /data/rebekah_data/bioms/run1-10/meta2-10_merged_otus.txt --sample-metadata-fp /data/rebekah_data/Tuflow_ST_Paper/SourceTracker_Dusan/all_runs_mapping_dusan_revision2.txt

And this time received this error...

Traceback (most recent call last):
  File "/usr/bin/pyqi", line 174, in <module>
    optparse_main(cmd_obj, argv[1:])
  File "/usr/lib/python2.7/dist-packages/pyqi/core/interfaces/optparse/__init__.py", line 284, in optparse_main
    result = optparse_cmd(local_argv[1:])
  File "/usr/lib/python2.7/dist-packages/pyqi/core/interface.py", line 47, in __call__
    return self._output_handler(self.CmdInstance(**cmd_input))
  File "/usr/lib/python2.7/dist-packages/pyqi/core/command.py", line 131, in __call__
    raise e
biom.exception.BiomParseException: No header line was found in mapping file.

So back to confused again....

Cheers

Embriette

unread,
Sep 13, 2016, 7:39:37 PM9/13/16
to Qiime 1 Forum
Hi Rebekah,

Can you send your mapping file?

Thanks!

Embriette

Rebekah Henry

unread,
Sep 13, 2016, 9:31:33 PM9/13/16
to Qiime 1 Forum
Hi Embriette,

Yep; please find attached.

I've applied this mapping file to other processes without any hitches thus far....

Cheers
all_runs_mapping_dusan_revision2.txt

Rebekah Henry

unread,
Sep 13, 2016, 9:37:16 PM9/13/16
to Qiime 1 Forum
I should also mention, that I have tried running without passing the sample metadata in the biom add-metadata and have got the same response....

Will Van Treuren

unread,
Sep 13, 2016, 10:37:47 PM9/13/16
to Qiime 1 Forum
Hi Rebekah, 

I think I'll be able to help better with access to all the files. Can you send me:

/data/rebekah_data/Tuflow_ST_Paper/SourceTracker_Dusan/rerun_kew/run1/full_results/sink_predictions_WGB_contributions_taxonomy.txt
/data/rebekah_data/bioms/run1-10/meta2-10_merged_otus.txt 
/data/rebekah_data/Tuflow_ST_Paper/SourceTracker_Dusan/rerun_kew/run1/full_results/sink_predictions_WGB_contributions.txt

These files will allow me to see why the biom addition isn't working. 

As a side note, I've rewritten SourceTracker to use python behind the scenes. It's much easier to work with, and we'll be able to  help you more easily, as we have a lot more experience with python than R (the current SourceTracker backend). You can find the new SourceTracker (as well as install instructions) here.

Best,
Will 

Rebekah Henry

unread,
Sep 14, 2016, 12:35:26 AM9/14/16
to Qiime 1 Forum
Hi Will,

With a bit more investigation, I think the issue may be the observation file I am passing is the wrong file (I usually don't play with this data format).

Consequently, I was wondering if you could confirm from where the observation metadata information is generated. I can see what it should look like (http://biom-format.org/documentation/adding_metadata.html), but have no idea where that information comes from/...

I am really looking forward to giving the python version a go, the fact that it runs parallel certainly made me very excited!

Cheers

Bek

Will Van Treuren

unread,
Sep 14, 2016, 12:52:43 AM9/14/16
to Qiime 1 Forum
Hi Rebekah, 

The observation metadata is traditionally taxonomy information. It will be generated during OTU picking, likely by the script 'assign_taxonomy.py'. You might not have called this script explicitly; the workflow scripts 'pick_open_reference_otus.py' and 'pick_otus.py' will call 'assign_taxonomy.py' after the OTU table has been built. 

The original biom file you fed to SourceTracker may contain the taxonomy information. If it does not, then the biom file went through some conversion before being fed to SourceTracker (or never had taxonomy assigned in the first place). It's going to be very hard to diagnose if you have/where the taxonomy information is located, without knowing the full set of steps you used to get the table. 

Let me know if this explanation helps, or if I can be more specific. 
Best,
Will 

Rebekah Henry

unread,
Sep 14, 2016, 1:34:38 AM9/14/16
to Qiime 1 Forum
Hi Will,

So we have been applying closed OTU data so I used the greengenes taxonomy file and set it up as an observation metadata file (except it does not have a confidence column).

It looks like this:

#OTUID taxonomy
367523 k__Bacteria; p__Bacteroidetes; c__Flavobacteriia; o__Flavobacteriales; f__Flavobacteriaceae; g__Flavobacterium; s__
187144 k__Bacteria; p__Firmicutes; c__Clostridia; o__Clostridiales; f__; g__; s__
836974 k__Bacteria; p__Cyanobacteria; c__Chloroplast; o__Cercozoa; f__; g__; s__
310669 k__Bacteria; p__Firmicutes; c__Clostridia; o__Clostridiales; f__; g__; s__
823916 k__Bacteria; p__Proteobacteria; c__Gammaproteobacteria; o__Pseudomonadales; f__Moraxellaceae; g__Enhydrobacter; s__

I am running add-metadata now and not getting the error - suggesting the metadata has attached. 

However when I come to run summarize taxa, it is giving me this error...

Traceback (most recent call last):
  File "/usr/lib/qiime/bin//summarize_taxa.py", line 204, in <module>
    main()
  File "/usr/lib/qiime/bin//summarize_taxa.py", line 191, in main
    md_identifier)
  File "/usr/lib/python2.7/dist-packages/qiime/summarize_taxa.py", line 42, in make_summary
    md_identifier)
  File "/usr/lib/python2.7/dist-packages/qiime/summarize_taxa.py", line 72, in sum_counts_by_consensus
    raise ValueError, ("BIOM table does not contain any "
ValueError: BIOM table does not contain any observation metadata (e.g., taxonomy). You can add metadata to it using the 'biom add-metadata' command.

Suggesting nothing has been attached...

So sorry to continue to bother.

Bek

Embriette

unread,
Sep 14, 2016, 11:51:32 AM9/14/16
to Qiime 1 Forum
Hi Rebekah,

Can you please send all of the commands that you used to create to your initial biom table up to now? As Will said, normally taxonomy is added to your biom table during biom creation, but if you used individual commands and skipped assign_taxonomy instead of using the workflow command, that could indicate why you don't have taxonomy in the file. It's hard to know for sure what happened without knowing exactly what you did so having a list of the commands that you ran to create your OTU table (as well as any downstream adjustments made to your OTU table) will help.

Thanks!

Embriette

Rebekah Henry

unread,
Sep 14, 2016, 7:25:32 PM9/14/16
to Qiime 1 Forum
Hi Embriette,

I will have to chase those down (sorry its been a while; this is for a paper revision). 

However, I am pretty certain assign taxonomy was conducted as prior to running SourceTracker I could happily do summarize_taxa.py and plenty of scripts that required that meta data on the biom. 

The problem seems to be that I have converted the biom to a txt file for sourcetracker; sourcetracker has outputted the full results for each source in a .txt version and now for some reason I can't re-attach the metadata. Should I have run something like....

biom convert -i table.biom -o table.from_biom_w_consensuslineage.txt --to-tsv --header-key taxonomy --output-metadata-id "ConsensusLineage"

when converting the table, prior to running sourcetracker?

I know when running biom convert on the .txt file I don't specify --hdf5 or --json types, as this brings up an error; I'm assuming its a version thing, that shouldn't have a impact on attaching the metadata? 

Cheers

Will Van Treuren

unread,
Sep 14, 2016, 7:39:35 PM9/14/16
to Qiime 1 Forum
Hi Rebekah,

Without the actual files Embriette and I are not going to be able to diagnose what is going on. There are too many errors that creep in at various steps. 

If you provide the files we can help. If you are uncomfortable providing the full files, you can provide the first couple of lines, and we may be able to diagnose the error from there. 

Best,
Will 

Rebekah Henry

unread,
Sep 14, 2016, 7:43:28 PM9/14/16
to Qiime 1 Forum
Hi Will,

No issue. I do have to track down the scripts as we do this in collaboration so that part is not conducted by myself. 

Also the files are very large and won't attach - so I will attach what I can and provide the heads for the others.

Cheers

Rebekah Henry

unread,
Sep 15, 2016, 1:52:56 AM9/15/16
to Qiime 1 Forum
Okey doke so here is what is happening.....

So the pipeline script that is being run is:

pick_closed_reference_otus.py -i meta2-10_merged.fna -o META2-10_MERGED_CLOSED -r 97_otus.fasta -t 97_otu_taxonomy.txt -a -O 3

Which has been generating a biom file (head file attached). The head of the -t input file (97_otu_taxonomy.txt) is also attached.

The biom has then been converted...

biom convert -i /data/rebekah_data/bioms/run1-10/otu_table_1_10.biom -o /data/rebekah_data/Tuflow_ST_Paper/biom_conversions/otu_table_1_10_.txt -b

Then this applied to sourcetracker....

nohup R --slave --vanilla --args -i /data/rebekah_data/Tuflow_ST_Paper/biom_conversions/run_1_to_10.txt -o /data/rebekah_data/Tuflow_ST_Paper/biom_conversions/ST_rerun/con1/ -m /data/rebekah_data/Tuflow_ST_Paper/SourceTracker_Dusan/all_runs_mapping_dusan1.txt < $SOURCETRACKER_PATH/sourcetracker_for_qiime.r


The full result files are then being converted to bioms....(these are scripts I have used today)


biom convert -i /data/rebekah_data/Tuflow_ST_Paper/biom_conversions/ST_rerun/con1/full_results/sink_predictions_Gardiners_contributions.txt -o /data/rebekah_data/Tuflow_ST_Paper/biom_conversions/ST_rerun/con1/full_results/sink_predictions_Gardiners_contributions.biom --table-type="OTU table" --process-obs-metadata taxonomy


biom add-metadata -i /data/rebekah_data/Tuflow_ST_Paper/biom_conversions/ST_rerun/con1/full_results/sink_predictions_Gardiners_contributions.biom -o /data/rebekah_data/Tuflow_ST_Paper/biom_conversions/ST_rerun/con1/full_results/sink_predictions_Gardiners_contributions_wmd.biom --observation-metadata-fp /data/rebekah_data/Tuflow_ST_Paper/OTU_data/observation_metadata.txt


(I have attached the head of the observation metadata file and the full result file from sourcetracker).


When I then run summarize taxa I get the error that is outlined in the above messages - stating that there is no metadata.


From looking at the biom head data - it does indicate that the metadata is present (from what I can tell) - but getting it to do the reverse after use with sourcetracker is not happening....


Any hints??










files.zip

Embriette

unread,
Sep 15, 2016, 12:18:12 PM9/15/16
to Qiime 1 Forum
Hi Rebekah,

Thanks for including your commands and the attachments. There are a few things going on here.

First, there is a difference in names between the otu table resulting from biom convert (otu_table_1_10_.txt), the one input into SourceTracker (run_1_to_10.txt) and the file you attached the head for (otu_1_10.txt) so I'm still not sure what the input files are and if we're looking at the right things.

Nevertheless, the major issue is that the file you are trying to convert into .biom format is not an OTU table.  It appears to be the sink predictions output file resulting from SourceTracker. What this file tells you is what proportion of each of your sink samples comes from each of your source samples. It looks like in your file you have each individual sample in your dataset as a sink, and OTUs as sources (you didn't attach the mapping file you input to SourceTracker that indicates sinks and sources so I'm not 100% sure). I've never seen SourceTracker used this way before (usually some samples are labelled as sinks and the rest are labelled as sources) and I'm not sure how the OTU IDs were added as sources here (without the mapping file input to SourceTracker, it's difficult to track). Perhaps Will has some more insight.

What MIGHT work, although I'm not sure, is if you change the format of your sink predictions file to so that it is in OTU table format. You'd need to transpose the file so that the OTU ids are in rows, not columns, and sample IDs are in columns, not rows. Then, you might be able to add the metadata. If that doesn't work, you'll likely have to do some custom scripting. I did note that the proportions in the file don't add up to 1, so either some samples are missing or downstream changes were made to the file, so be sure to double check that.

Best,

Embriette

Rebekah Henry

unread,
Sep 15, 2016, 8:27:35 PM9/15/16
to Qiime 1 Forum
Thanks Embriette,

No nothing done to the file, this is just the way that it is outputted from Sourcetracker...The mapping file (attached) is just a normal mapping file...

Maybe Will can shed some light on how these can be applied to generate a usable figure....

Cheers
all_runs_mapping_dusan_revision2.txt
Reply all
Reply to author
Forward
0 new messages