http://sourceforge.net/p/krona/wiki/Importing%20text%20and%20XML%20data/
Perhaps a simple text export that conforms to use of 'ktImportText' ?
Simple example:
http://krona.sourceforge.net/examples/text.txt
Thanks for reply;
So here's a sample output from metaphlan:
==> Sample_SGI-01.bt2out.txt <==
GWZHISEQ01:97:C0LR6ACXX:6:1201:3639:1943 100146888
GWZHISEQ01:97:C0LR6ACXX:6:1201:6666:1990 100167158
GWZHISEQ01:97:C0LR6ACXX:6:1201:14657:1973 100039919
GWZHISEQ01:97:C0LR6ACXX:6:1201:20597:1960 100146591
GWZHISEQ01:97:C0LR6ACXX:6:1201:1706:2116 100146741
GWZHISEQ01:97:C0LR6ACXX:6:1201:2426:2049 100146876
GWZHISEQ01:97:C0LR6ACXX:6:1201:2354:2072 100170358
GWZHISEQ01:97:C0LR6ACXX:6:1201:2327:2104 100167880
GWZHISEQ01:97:C0LR6ACXX:6:1201:4470:2158 100140746
GWZHISEQ01:97:C0LR6ACXX:6:1201:5043:2188 100167880
==> Sample_SGI-01.profiling_output.txt <==
k__Bacteria 99.97846
k__Archaea 0.02154
k__Bacteria|p__Proteobacteria 90.73406
k__Bacteria|p__Bacteroidetes 7.88813
k__Bacteria|p__Actinobacteria 0.73572
k__Bacteria|p__Chloroflexi 0.30594
k__Bacteria|p__Acidobacteria 0.27093
k__Bacteria|p__Thermi 0.03437
k__Archaea|p__Euryarchaeota 0.02154
k__Bacteria|p__Firmicutes 0.00599
Krona can be given a simple text input, such as following:
(http://krona.sourceforge.net/examples/text.txt)
2 Fats Saturated fat
3 Fats Unsaturated fat Monounsaturated fat
3 Fats Unsaturated fat Polyunsaturated fat
13 Carbohydrates Sugars
4 Carbohydrates Dietary fiber
21 Carbohydrates
5 Protein
4
Here is the problem is given this format:"If a hierarchy has more than one listing, the quantities will be added."
Since the 2nd metaphlan output is very thorough and gives relative abundance at every taxa, i can't traverse it and simply add the counts (based on converting from % to count).
It should be possible to do if we had a mapping to the taxonomic levels (which is what I suspect each number in column 2 of example 1 is).
Any thoughts?
I am now producing Krona visualizations from your result files; the code for each script is below;
But there are two shortcomings to this method of creating the krona output:
- we do not obtain the list of read ID's associated with each taxa (I imagine that's what the number in second column of *.bt2out.txt represents)
- there is no confidence value associated with the assignments
Here's how I generate my plots:
for a in `ls *.bt2out.txt|sort`;do
B=`basename ${a} .bt2out.txt`
metaphlan2krona.py \
-m ${a} -p ${B}.profiling_output.txt -k ${B}
/bioinformatics/asm/bio_bin/KronaTools-2.2/bin/ktImportText -o ${B}.metaphlan.krona.html ${B}.tmp
done
And here is the code for metaphlan2krona.py:
#!/usr/bin/env python2.7
import sys
import optparse
import re
import subprocess
import tempfile
def file_len(fname):
p = subprocess.Popen(['wc', '-l', fname], stdout=subprocess.PIPE,
stderr=subprocess.PIPE)
result, err = p.communicate()
if p.returncode != 0:
raise IOError(err)
return int(result.strip().split()[0])
def runCommandTMPLost(myCommand):
try:
tmp_stdout = open( tempfile.NamedTemporaryFile().name, 'w' )
tmp_stderr = open( tempfile.NamedTemporaryFile().name, 'w' )
proc = subprocess.Popen(args=myCommand, shell=True, stdout=tmp_stdout, stderr=tmp_stderr, bufsize=4096)
tmp_stdout.close()
tmp_stderr.close()
returncode = proc.wait()
except Exception, e:
sys.stderr.write( "runCommand error: %s\n"%e )
def main():
#Parse Command Line
parser = optparse.OptionParser()
parser.add_option( '-p', '--profile', dest='profile', default='', action='store', help='the metaPhLan .profile taxonomic output file' )
parser.add_option( '-m', '--map', dest='map', default='', action='store', help='the metaPhLan read to taxa mapping file' )
parser.add_option( '-k', '--krona', dest='krona', default='krona.out', action='store', help='the Kron output file name' )
( options, spillover ) = parser.parse_args()
TotalReads = file_len(options.map)
re_candidates = re.compile(r"s__|unclassified\t")
re_replace = re.compile(r"\w__")
re_bar = re.compile(r"\|")
metaPhLan = list()
with open(options.profile,'r') as f:
metaPhLan = f.readlines()
f.close()
krona_tmp = options.krona + '.tmp'
metaPhLan_FH = open(krona_tmp, 'w')
for aline in (metaPhLan):
if(re.search(re_candidates, aline)):
x=re.sub(re_replace, '\t', aline)
x=re.sub(re_bar, '', x)
x_cells = x.split('\t')
lineage = '\t'.join(x_cells[0:(len(x_cells) -1)])
abundance = float(x_cells[-1].rstrip('\n')) * (float(TotalReads) / float(100))
metaPhLan_FH.write('%s\n'%(str(int(abundance)) + '\t' + lineage))
metaPhLan_FH.close()
if __name__ == '__main__':
main()
I hope this helps some people out.
Ciao
Your changes make sense.
The only inconvenience I can see is that you dont get a count of the number of classified reads from your sample.
We still need to so a 'wc -l sample.mpa.txt'; Perhaps a log with various metrics would be beneficial? somewhat similar to the bowtie2 mapping metrics?
Anyways, thanks for helping me get the visualization I wanted for MetaPhlan.