Hello all,
It is probably not the best place to ask my question, but I think someone in this group may have similar questions. By the way, the PanPhlan google group seems to be closed, so I came here.
I am trying the PanPhlan with our own data, and got to the final step using panphlan_profile.py.
The colnames of panphlan_profile.py output was like g00001... , how to get the genes names as K05711.. by using KEGG ?
I referred to the latest published paper using PanPhlan,
Metagenomic Sequencing with Strain-Level Resolution Implicates Uropathogenic E. coli in Necrotizing Enterocolitis and Mortality in Preterm Infants
it wrote as follows:
Metagenomic MLST Analysis
We developed a metagenomic approach to exploit the MLST strategy commonly used in cultivation-based typing assays (Maiden et al., 1998). Reads were mapped with Bowtie2 against a database of the known E. coli MLST sequences corresponding to distinct alleles of seven genes: adk, fumC, gyrB, icd, mdh, purA, and recA (parameters -D 20 -R 3 -N 0 -L 20 -i S,1,0.50). A consensus sequence for each loci was constructed considering the nucleotide with the highest frequency in each position. All samples where all loci obtained a minimum breath of coverage of at least 90% were confidently mapped. For the small fraction of loci with low or non-complete coverage (2.11% of the loci in the positive samples), the best-matching reference allele from the MLST database was used to fill the uncovered positions. Reconstructed consensus alleles were used to determine the most abundant MLST (ST) profile in a sample based on known E. coli ST profiles—3,895 known profiles from the University of Warwick Medical School MLST database,
http://mlst.warwick.ac.uk/mlst/ (Wirth et al., 2006).
I have tried our raw data (fastq) with the bowtie2 as the same parameters -D 20 -R 3 -N 0 -L 20 -i S,1,0.50, but the output file was SAM format. I am really confused about the results.
To make my question simple, how can I use the Panphlan output to get the strains name of E.coli, similarly as shown in the abstract of the above paper:
...Metagenomic multilocus sequence typing analysis further defined NEC-associated strains as sequence types often associated with urinary tract infections, including ST69, ST73, ST95, ST127, ST131, and ST144. ...
Our preliminary analysis of our own data also discovered E.coli, our next analysis relied heavily on PanPhlan to dive into the strain level.
Any comments or advice are welcome. I really appreciate your help. Thanks
Ming