Sourcetracker - convert between output abundance and count number

202 views
Skip to first unread message

Drea

unread,
Aug 25, 2016, 11:23:43 AM8/25/16
to Qiime 1 Forum
Greetings Qiimers,

A question I just want to check my logic on:

Our group is attempting to use sourcetracker as a means of detecting the impact of contamination on samples, and removing reads that have a source designation as either reagents or lab environment from the data set. As such the 'unknown' group is the one that we've been typically using as a starting point for correcting our raw data. I'm wondering whether it's possible to convert the relative abundance output from sourcetracker, back into read counts, based on the raw reads detected in our original sample. As an example, lets say we have a sample in which 80% of the reads come back as unknown. Would one multiply the original number of sequences (let's say 10000) in the sample by 0.8, then use the output (8000 seqs) as the total number of sequences in the sample, which each OTU relative abundance can be multiplied against to get an estimate of the 'corrected number of sequences' per OTU? This makes sense to me logically, but I'm not sure that it works with regards to how the calculations are performed in SourceTracker etc. Thoughts?

Thanks!

Jose Antonio Navas Molina

unread,
Aug 25, 2016, 8:49:27 PM8/25/16
to Qiime 1 Forum
Hi Drea,

I've contacted the main developer of SourceTracker, he will back to you ASAP.

Thanks,

Will Van Treuren

unread,
Aug 26, 2016, 2:21:53 PM8/26/16
to Qiime 1 Forum
Hi Drea,

Rather than convert the relative abundances, you can actually get SourceTracker to give you the 'full output' for a sample. The full output is a contingency table (sources X OTUs) which shows the source origin for each feature in each sink. For instance, if you had 2 sources and 5 OTUs, the full output might look something like this (for a given sink).

otu1 otu2 otu3 otu4 otu5
source1 10 1 0 0 4
source2 24 100 100 2 0
unknown 7 2 2 1 1

This means that source1 contributed 10 counts of OTU1 to the sink, 1 count of 2, etc. For your specific use, you could take the unknown environment and perhaps remove OTUs which were predominantly coming from the unknown but not the other sources.

I think this is analogous to what you are asking for, and the latest version of the R source code produces these.

As a recommended alternative, I've rewritten SourceTracker to use Python, be faster and parallel and more intuitive (hopefully). In SourceTracker2 (download and install instructions here) you'd use the --per_sink_feature_assignments flag to get these full output files (on a per sink basis). It'll be easier to help with questions if you use SourceTracker2.

Hope this helps,
Will 




I've rewritten SourceTracker in python to make it faster and easier to use. 

Drea

unread,
Aug 30, 2016, 3:22:45 PM8/30/16
to Qiime 1 Forum
Thanks Will, I'll give the new version a shot. 

Drea

unread,
Aug 30, 2016, 4:16:53 PM8/30/16
to Qiime 1 Forum
Ok one more question re: installation of the recommended sourcetracker2: As per the directions at the above link, I installed anaconda (v 4.1.11) and used the specified conda script to try and install. Below is the script and output. I tried running anaconda search for the biom-format and scikit-bio. There are several biom-format options, but scikit-bio doesn't have a 0.4.3 in the list that comes up (0.4.2 and 0.5.0 are the ones listed). I'm at a bit of a loss as to how to proceed.

conda create -n st2 python=3.5 numpy scipy scikit-bio=0.4.3 biom-format h5py hdf5.

Fetching package metadata .......
Solving package specifications: .
Error: Packages missing in current linux-64 channels: 
  - scikit-bio 0.4.3*
  - biom-format

Close matches found; did you mean one of these?

    biom-format: nbformat

You can search for packages on anaconda.org with

    anaconda search -t conda scikit-bio

(and similarly for the other packages)

Will Van Treuren

unread,
Aug 30, 2016, 6:13:59 PM8/30/16
to Qiime 1 Forum
Hi Drea,

Sorry for that error - it looks like the default conda channels are more different than I realized. It should be an easy fix. 

Try the following:

conda create -n st2 python=3.5 numpy scipy h5py hdf5
source activate st2
conda install scikit-bio=0.4.3 biom-format -c biocore

Let me know if that works.

Best,
Will 

Drea

unread,
Aug 31, 2016, 11:18:24 AM8/31/16
to Qiime 1 Forum
Heya Will,

First step is working, I can activate st2, but it's still not letting me install the scikit-bio 0.4.3 or biom-format

(st2) user:~/Desktop$ conda install scikit-bio=0.4.3 biom-format -c biocore
Using Anaconda Cloud api site https://api.anaconda.org
Fetching package metadata .........
Solving package specifications: .
Error: Packages missing in current linux-64 channels: 
  - scikit-bio 0.4.3*
  - biom-format

Close matches found; did you mean one of these?

    biom-format: nbformat

You can search for packages on anaconda.org with

    anaconda search -t conda scikit-bio

(and similarly for the other packages)

for the search function, biom-format, I get options from HCC, bioconda, jorge, qiime2, travis and yoshiki
for the search function, scikit-bio, I again only get different versions 0.4.2 and 0.5.0

Thanks, 
Andr

Will Van Treuren

unread,
Aug 31, 2016, 12:08:59 PM8/31/16
to Qiime 1 Forum
Hi Andr, 

Sorry about this - this is unexpected. I am attempting to figure out whats going on and I will get back to you ASAP. 

Best,
Will 

Will Van Treuren

unread,
Aug 31, 2016, 1:42:39 PM8/31/16
to Qiime 1 Forum
Hi Andr, 

Can you try the following command and then tell us how it works:

source activate st2
conda install scikit-bio=0.4.3 -c bioconda
conda install biom-format -c biocore

Best,
Will 

Drea

unread,
Aug 31, 2016, 2:20:26 PM8/31/16
to Qiime 1 Forum
no dice. Still says package missing... just in two separate steps,.

Colin Brislawn

unread,
Aug 31, 2016, 2:49:06 PM8/31/16
to Qiime 1 Forum
Hello Drea,

This is a strange bug! Maybe adding the bioconda channel first will help. 
conda config --add channels r
conda config --add channels bioconda

Then try the install again, without the -c flag. 
conda install scikit-bio=0.4.3 biom-format

I hope this helps! Honestly, I'm not sure what's going wrong, but hope adding bioconda as a permanent channel will help. 
Colin 

Drea

unread,
Aug 31, 2016, 3:30:46 PM8/31/16
to Qiime 1 Forum
So I managed to the biom format using 
conda install -c bioconda biom-format
Not sure why it would make a difference having the -c bioconda in front of biom-format, but it seemed to make it work.

As for the scikit-bio, I really can't seem to find the 0.4.3 version (assuming that's what the 0.4.3 refers to). Would it be possible to use either the 0.4.2 or 0.5.0 or will those be incompatible with the rest of the software?
Reply all
Reply to author
Forward
0 new messages