Looking for an option to extract more than one sequence per species & gene

24 views
Skip to first unread message

Stefan Pinkert

unread,
Nov 8, 2014, 1:35:13 PM11/8/14
to phylogener...@googlegroups.com
Moin Will,

maybe I just overlooked it, but is there a way to extract the Genebank number in addition to species name and to integrate more than one sequence per species and gene (maybe differentiating by GI:#)? The latter option would be also extremely useful to browse genebank with only the genus' name (differentiating at the species-level).

Thanks a lot.
Cheers, Stefan


Will

unread,
Nov 10, 2014, 11:37:25 AM11/10/14
to Stefan Pinkert, phylogener...@googlegroups.com
Hello,

Thanks for emailing. If I understand you correctly, you're asking two questions: (1) how can you get the GenBank accession numbers out of pG, and (2) how can you get multiple sequences for each species.

For (1), just type 'output' at the DNA download prompt. In addition, at the end of your run the file "stem.name_sequence_info.txt" has all your sequence information. Just keep hitting enter and you'll run a single RAxML search, which shouldn't take too long.

For (2), there are a few ways to do this, but I admit I probably haven't been very clear! The easiest way would be to go to 'reload' mode at the DNA checking prompt, make sure you're on seqChoice 'random', then reload 'EVERYTHING'. You can 'output' a few times, and that way you'll get everything you need. Running pG several times would probably have the same effect, as would simply giving it a species file with the same species name lots of times (off the top of my head). The nerdiest way (and which, I sense, may be what you want) is to run pG automatedly/as a Python library (http://willpearse.github.io/phyloGenerator/scripting.html). Just pass the 'noSeqs' argument the number of sequences you want to the 'sequenceDownload' function.

Please also note that you can already use pG with just a genus name; you don't have to be using the 'replace' method or anything fancy like that to use a genus name, you can just give genera as input to the program.

I hope that helps, let me know if you're not clear on anything. Sorry for being a bit slow on the reply; it's snowing quite a lot here in Minnesota right now! :D

Cheers,

Will
Reply all
Reply to author
Forward
0 new messages