how to prepare marker sequence from a new genome

267 views
Skip to first unread message

Capricy Gao

unread,
Oct 24, 2016, 5:30:14 PM10/24/16
to MetaPhlAn-users
Hello, there,

I have a new genome. I would like to create a marker file of this genome and include it in the metaphlan analysis. How may I prepare the marker file for this genome?

Thanks.

C.

Duy Tin Truong

unread,
Oct 31, 2016, 3:44:44 AM10/31/16
to Capricy Gao, MetaPhlAn-users
Hi Capricy,

To add the new markers, you have to map them again all reference genomes that we used in the database and make sure that they are unique and core genes for their clade. For more information, you can see at this:

I am also working on the pipeline that can automatically generate the markers for the database given a set of input reference genomes but I need few more months to release it.

Thanks,
Tin


--
You received this message because you are subscribed to the Google Groups "MetaPhlAn-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to metaphlan-use...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Xin Gao

unread,
Oct 31, 2016, 9:03:24 AM10/31/16
to Duy Tin Truong, MetaPhlAn-users
Hi, Tin,

Thank you very much for the reply and glad to know that new pipeline is being developed.

So for now, where can I download this all-reference-genome database? Or could you point out some equivalent database to use? what is your cutoff fore a gene to be unique if I am running usearch? What is the cutoff for a gene to be core if I am running uclust?

The link you provided is the later step after I have the marker sequences, I think.

Thanks.

C.

To unsubscribe from this group and stop receiving emails from it, send an email to metaphlan-users+unsubscribe@googlegroups.com.

yanan su

unread,
Jan 18, 2017, 12:15:19 AM1/18/17
to MetaPhlAn-users
Hi,Gao,

Have you solved your problem?
I have the same question about the database preparation of new virus genome 's markers, could you give me some solutions on this if you have solved problems?

thanks.
Su

Dave Armitage

unread,
Jan 18, 2017, 10:32:39 AM1/18/17
to MetaPhlAn-users
I too am interested in any developments on this front. I have about 30 genomes I'd like to add to the marker database

Duy Tin Truong

unread,
Jan 19, 2017, 3:01:59 AM1/19/17
to Xin Gao, MetaPhlAn-users
Hi Xin Gao,

I am sorry for the late reply on this.
The reference genome database can be downloaded directly from ncbi https://ftp.ncbi.nlm.nih.gov/genomes/ or by this tool:
https://bitbucket.org/nsegata/repophlan/wiki/Home

The cutoff should be 80% of identity for the genes to be clustered together. Then, you need to map the core genes against all genomes with blast or similar tools to check for the uniqueness.

Thanks,
Tin


To unsubscribe from this group and stop receiving emails from it, send an email to metaphlan-use...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "MetaPhlAn-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to metaphlan-use...@googlegroups.com.

Dave Armitage

unread,
Jan 19, 2017, 12:23:50 PM1/19/17
to MetaPhlAn-users
Thanks so much Tin.

I tried running repophlan to download the NCBI microbial genomes and ran into this error, which may be tied to the nature of the assembly summary file. Any idea how to fix it? Thanks!

$ ./repophlan_get_microbes.py --taxonomy taxonomy_reduced.txt --out_dir microbes --nproc 4 --out_summary repophlan_microbes.txt
2017-01-19 12:16:25,329 ./repophlan_get_microbes.py INFO Reading the taxonomy from taxonomy_reduced.txt...
2017-01-19 12:16:29,411 ./repophlan_get_microbes.py INFO Done.
Traceback (most recent call last):
File "./repophlan_get_microbes.py", line 291, in <module>
refseq_assemblies = get_assemblies( add_protocol(NCBI_ftp + NCBI_ASREFSEQ_file), par['out_dir'] )
File "./repophlan_get_microbes.py", line 250, in get_assemblies
if line_d['version_status'] != 'latest': continue
KeyError: 'version_status'

Dave Armitage

unread,
Jan 19, 2017, 2:17:06 PM1/19/17
to MetaPhlAn-users
Nevermind, replaced 0->1 and 1->2 in line 246 and it seems to be working!

Duy Tin Truong

unread,
Jan 19, 2017, 5:58:04 PM1/19/17
to Dave Armitage, MetaPhlAn-users
Hi Dave,

Yes, NCBI can change the format by time.

Thanks,
Tin

Capricy Gao

unread,
Feb 13, 2017, 4:25:11 PM2/13/17
to MetaPhlAn-users, dave.a...@gmail.com
Hi, Tin,

I am trying to download Eukaryotes genomes with the corresponding taxonomy, but your package does not include :

./repophlan_get_euks.py

Could you please upload it to bitbucket?

Thanks.

C.

Duy Tin Truong

unread,
Feb 16, 2017, 4:38:07 AM2/16/17
to Capricy Gao, MetaPhlAn-users, dave.a...@gmail.com
Hi Capricy,

It was included in repophlan_get_microbes.py. In other words, when you run repophlan_get_microbes.py, you will get both.
You should use "run.sh" to execute the scripts.

Cheers,
Tin

Capricy Gao

unread,
Feb 18, 2017, 9:59:05 PM2/18/17
to Duy Tin Truong, MetaPhlAn-users, dave.a...@gmail.com
Hi, Tin,

Thank you very much for clarifying this. According to your source comments, only single-celled eukaryotes are downloaded using this scripts. So my next questions is:

when you make metaphlan database, are the "all reference genomes" referring to only the virus, bacterial, and single-celled eukayrotes? The unique and core genes are unique among the virus, bacterial, and single-celled eukaryotes?

Thanks.

C.

To unsubscribe from this group and stop receiving emails from it, send an email to metaphlan-users+unsubscribe@googlegroups.com.

Duy Tin Truong

unread,
Feb 20, 2017, 5:26:03 AM2/20/17
to Capricy Gao, MetaPhlAn-users, dave.a...@gmail.com
Hi Capricy,

Yes, the reference genomes also include Archaea genomes.

Cheers,
Tin

Capricy Gao

unread,
Feb 20, 2017, 10:21:02 PM2/20/17
to MetaPhlAn-users, capricyg...@gmail.com, dave.a...@gmail.com
Thank you very much, Tin!

Also wonder how is your pipeline that is being developed for automatic marker generation??

Thanks.

C.

Duy Tin Truong

unread,
Feb 21, 2017, 3:39:52 AM2/21/17
to Capricy Gao, MetaPhlAn-users, dave.a...@gmail.com
Hi Capricy,

The implementation is almost done. I am testing the pipeline.

Cheers,
Tin

Flo

unread,
Feb 21, 2017, 3:45:25 AM2/21/17
to MetaPhlAn-users, capricyg...@gmail.com, dave.a...@gmail.com
Hi Tin, Hi Capricy,

Great news !

Thanks a lot for your efforts!

Flo

Capricy Gao

unread,
Feb 22, 2017, 9:27:53 AM2/22/17
to MetaPhlAn-users, capricyg...@gmail.com, dave.a...@gmail.com
Tin, thank you for your efforts!

We have data really waiting for this pipeline!

Yanmei Ju

unread,
Jul 26, 2017, 5:40:15 AM7/26/17
to MetaPhlAn-users
Hi,Tin
How's it going about the new pipeline to pick markers from new genome?


Thanks!
Mei

Reply all
Reply to author
Forward
0 new messages