Hi,
I'm trying to use Source tracker for assess the contamination or not of my sample (16S rRNA pyrosequencing, 454 FLX titanium).
I retrieved from the Qiime database (map.file and otu_table.biom) :
Body site samples (Study ID=449, Costello) and soil samples (ID=103, Lauber) that is going to be the sources.
So first I merge the Otu_table from this 2 sets and also the map file.
merge_mapping_files.py -m study_103_mapping_file.txt,study_449_mapping_file.txt -o merged_mapping.txt -n 'Data not collected'
merge_otu_tables.py -i study_103_closed_reference_otu_table.biom,study_449_closed_reference_otu_table.biom -o merged_otu_table.biom
I convert the biom format into txt running:
convert_biom.py -i
merged_otu_table.biom -o merged_otu_table.txt -b
To make a first positve test , in the field Env of the merge_map file I turn one soil sample into sink and all the remaining samples are defined as source
I run the command line
R --slave --vanilla --args -i merged_otu_table.txt -m merged_mapping.txt -o sourcetracker_out1 < $SOURCETRACKER_PATH/sourcetracker_for_qiime.r
So that is OK, source tracker found that is "soil" at 82% with a little part of unknown.
THE PROBLEM:
Now I would try it to my own data (16S pyroseq) so I used qiime:
so I used the closed reference protocole after quality trimming check by :
split_libraries.py -m
mapping_output/map_corrected.txt -f 16s.fna -q
16s.qual -o split_library_output -b 0 -M 1 -H 8 -a 0 -l 70
-s 30
pick_closed_reference_otus.py -i seqs.fna -r $HOME/qiime_software/gg_otus-12_10-release/rep_set/97_otus.fasta -t $HOME/qiime_software/gg_otus-12_10-release/taxonomy/97_otu_taxonomy.txt -o OTUREF/
1400 OTU and only 5.000 seq failures
So after this step I get a otu_table.biom & I merge this table to the source samples (body sites & soil)
I add a line in the previuos merge map file with my sampleID and add sink in the Env field.
Then I run Source tracker as prevuiously :
no errors appears...but at the end the result is 100% unknown...
I try the same protocol on an another data set that I known is feces...however...the result is 100% unknown with source tracker.
Any idea on what is wrong??? a difference of the reference database used between the qiime samples loaded from the database and my own samples??
very difficult to find a complete procedure for using source tracker that compare personnal data with others dataset??
Hope that you can help me.
Best
Fabrice