I've been trying to integrate two bacterial genomes to make them part of the humann2 pipeline.
https://groups.google.com/forum/#!topic/humann-users/Lj71_nno3Oo
However I would like to run metaphlan2 with my genomes included. The tutorial on the metaphlan Wiki site gives a tutorial on this. But I'm not sure how to prepare my data to add to the markers.fasta file. Should I pull out the same marker genes from my genomes that are in the markers.fasta file? Would my markers have the same score as their homologs? I've tried to modify the metaphlan2.py script as follows;
####modified###JD
import cPickle as pickle
import bz2
with open(args['mpa_pkl'], 'rb') as ifile:
db = pickle.loads(bz2.decompress(ifile.read()))
# Add the taxonomy of the new genomes
db['k__Bacteria']['g__Gibbsiella.s__quercinecans'] = 5548506
db['k__Bacteria']['g__Brenneria.s__goodwinii'] = 5395301
# Add the information of the new marker as the other markers
db['mpa_v21_m200'][new_marker_name] = {
'bacteria': the clade that the marker belongs to,
'ext': {the name of the first external genome where the marker appears,
the name of the second external genome where the marker appears,
},
'len': length of the marker,
'score': score of the marker,
'taxon': the taxon of the marker}
# To see an example, try to print the first marker information:
# print db['markers'].items()[0]
# Save the new mpa_pkl file
ofile = bz2.BZ2File('metaphlan2/db_v21/mpa_v21_m200.pkl', 'w')
pickle.dump(db, ofile, pickle.HIGHEST_PROTOCOL)
ofile.close()
I'm not sure about this, should I add new information for each marker from my new genomes to the new marker section of the script?
Thanks,
James
db['markers'][new_marker_name] = {...}
bowtie2-inspect metaphlan2/db_v20/mpa_v20_m200 > metaphlan2/markers.fasta
cat new_marker.fasta >> metaphlan2/markers.fasta
mkdir metaphlan2/db_v21/mpa_v21_m200 bowtie2-build metaphlan2/markers.fasta metaphlan2/db_v21/mpa_v21_m200
--
You received this message because you are subscribed to the Google Groups "MetaPhlAn-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to metaphlan-use...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
for the virus this is the format i have (basically the headers):
gnl|BL_ORD_ID|0 gb|AB261990|Strain|M2|Description|Lymphocytic choriomeningitis virus genomic RNA, segment S, nearly complete sequence including cds, strain: M2.|Country||Gi|108743534|vipr-id|284567
I was going to use the Gi Id's for the core markers or should i use the Id given by the curated group (vipr-id), and how should i format it into the format you mentioned ( k__ ...| p__...| ... |s__...|t...)?
for the bacteria though the header look like this : (prn pertactin precursor), the first part makes up unique markers (prn) which contain the information to an outward csv file containing the genus, species & strain, is that marker unique enough to get the information, the genus, species & strain don't have their own uqniue markers though?
Thanks in advance.
import cPickle as pickle
import bz2
import scipy.stats as st
1.0-st.binom.sf(3,3,0.95)
with open(args['mpa_pkl'], 'rb') as ifile:
db = pickle.loads(bz2.decompress(ifile.read()))
# Add the taxonomy of the new genomes
db['taxonomy']['k__Bacteria|p__Proteobacteria|c_Gammaproteobacteria|o__Enterobacteriales|f__Enterobacteriaceae|g__Gibbsiella|s__quercinecans|t__genome_id'] = 5548506
db['taxonomy']['k__Bacteria|p__Proteobacteria|c_Gammaproteobacteria|o__Enterobacteriales|f__Enterobacteriaceae|g__Brenneria.s__goodwinii|t__genome_id'] = 5395301
# Add the information of the new marker as the other markers
db['markers'][new_marker_name] = {
'clade':
'ext'
'len'
'score'
'taxon'
}
ofile = bz2.BZ2File('metaphlan2/db_v21/mpa_v21_m200.pkl', 'w')
pickle.dump(db, ofile, pickle.HIGHEST_PROTOCOL)
ofile.close()
current error;
with open(args['mpa_pkl'], 'rb') as ifile:
^
SyntaxError: invalid syntax
Thanks,
James
Thanks for that, I have a few marker genes but it will likely take time to verify a large number, I understand the requirement for clade specific makers, the difficulty I have is amending the python code as I have no experience with this.
Thanks,
James