finding sequences problem

88 views
Skip to first unread message

Losia Nakagawa-Lagisz

unread,
May 21, 2013, 8:48:34 PM5/21/13
to phylogener...@googlegroups.com
hi,
The pG looks great and should be very useful to lots of people.
I tried to run the program for 65 species (mostly vertebrates), with a search for vertebrate genes (or just COI) for the alignment.
Unfortunately, the program was able to find just a handful of sequences for my species list which is surprising,
because I know most of these species have at least one COI sequence in the NCBI nucleotide database.
I would appreciate any suggestions on why this search did not work.
Kind regards,
Losia

Will Pearse

unread,
Jul 3, 2013, 4:46:14 AM7/3/13
to phylogener...@googlegroups.com
Hello,

Apologies everyone; my reply to this email seems to have gone directly to the user, and not to the list. I've copy-pasted my question below.

Thanks,

Will

******************************************************************
           Hello,

Thanks for your email. My suggestion would be that you try using the 'sequence alias' feature of pG - have a look at the website ( http://willpearse.github.io/phyloGenerator/guide.html#startUp) - it's the second paragraph of the 'gene name(s)' section. Something like '-gene COI-cox1-cytochorome_oxidase_one' should do the job

If that doesn't turn something up, please do send me another email (preferably with your species list) and I'll see what's going on.

Cheers,

Will

John Burley

unread,
Jul 28, 2013, 9:39:06 AM7/28/13
to phylogener...@googlegroups.com
Hi Will,

I've had a similar problem to Losia in trying to import the sequence data into phyloGenerator. I am currently working with a list of 12 of ~50 amphibian species I am interested in from eastern Australia. Very few of these species have information from the COI gene, and scanning the literature and NCBI shows me that there is a range of genes that people have used. When I type 'vertebrate' into this part of the script, hoping that it will show all genes that have been sequenced for each species on the list, it does not find anything. Does this short cut normally work, or is it best to list each of the genes separately?

It looks like a great resource.

John

wdpe...@umn.edu

unread,
Jul 28, 2013, 4:06:06 PM7/28/13
to phylogener...@googlegroups.com
Hello,

Thanks for getting in touch.

I'm afraid the 'vertebrate' options just searches for COI information at the moment. If you have a list of candidate genes, you can specify them directly (and aliases for them, e.g., 'COI' and 'cytochrome oxidase one' can be searched for at the same time). There are instructions on how to do that here (http://willpearse.github.io/phyloGenerator/guide.html#startUp), and you can always choose to remove a particular gene from your search later (see here http://willpearse.github.io/phyloGenerator/guide.html#dnaChecking) if you wish.

I hope that helps; let me know if you want to know more.

Cheers,

Will

sande...@atree.org

unread,
Aug 10, 2016, 7:03:01 AM8/10/16
to phyloGenerator Users
Hi Will,

I recently started using PG!i found this is a wonderful software to work with. I work mostly on plants in the Indian subcontinent, I wanted to download sequences for ITS form NCBI and the ITS gene is usually coded as "internal transcribed spacer 1, partial sequence; 5.8S ribosomal RNA gene, complete sequence; and internal transcribed spacer 2, partial sequence" for some species and for the other few it is like "18S ribosomal RNA gene, partial sequence; internal transcribed spacer 1, 5.8S ribosomal RNA gene, and internal transcribed spacer 2, complete sequence; and 28S ribosomal RNA gene, partial sequence".

My question is how do i tell PG that one of my gene of interest is ITS(which include ITS1,5.8s and ITS2), i also tried coding this by removing spaces with "_".

please let me know your suggestions

Sandeep

Will

unread,
Aug 10, 2016, 8:08:34 AM8/10/16
to sande...@atree.org, phyloGenerator Users
Hello Sandeep,

Thanks for getting in touch! I'm not at my laptop now, but if I've understood you correctly you should use whatever name for that locus (ITS) is given in the sequences' entries on GenBank. So, for ITS, often ITS1 and ITS2 will work (from memory). Use hyphens to give gene aliases - so something like ITS1-ITS2 will work.

You don't need to use the full name, which it sounds like you're doing. Check the 'gene', or maybe 'locus', entry for a sequence to see what I mean ( there are no centralised lists of names that I know of).

If you're still having trouble, send me a species list and I'll take a look tomorrow.

Thanks again,

Will

Sandeep Sen

unread,
Aug 11, 2016, 1:11:25 AM8/11/16
to Will, phyloGenerator Users
Hi,

Many thanks for the reply Will ! Unfortunately the problem still persists. I am attaching a  list of few species which I already know the sequences are available in NCBI for ITS. 

Please have a look into this

Warmly
Sandeep


shortlist.txt

Will

unread,
Aug 11, 2016, 7:08:47 AM8/11/16
to Sandeep Sen, phyloGenerator Users
Hello Sandeep,

Thanks for this. I was able to get sequences for all those species when I specified "internal_transcribed_spacer_1"; I found this by going to the nucleotide entry for a Piper species (http://www.ncbi.nlm.nih.gov/nuccore/EF056293.1) and reading off the name in the "features" section of the GenBank entry (if you're interested in how I did it). Does that answer your question?

In passing, I noticed that you've got a few species in your list twice. That could cause trouble for you further down the line; when building phylogenies of species, most software assumes each species is only in there once. I'd get rid of all duplicates; at best they will just make things a bit slower (you'll have to download sequences twice, etc.) and at worst they could crash something further down the line.

Thanks for getting in touch and, as I say, let me know if this answers your question.

Cheers,

Will


---
Need a phylogeny? Try phyloGenerator: original or new version
Measuring phylogenetic structure? Try install.packages('pez')

Will Pearse
Post-doc, ecology / evolutionary biology
Davies and Peres-Neto labs
Skype: will.pearse
Cell: (+1) 514-973-1987

Sandeep Sen

unread,
Aug 11, 2016, 1:34:32 PM8/11/16
to Will, phyloGenerator Users
Thanks Will! it is working perfectly! 

 I will exclude those duplicate species names for my further analysis.

Thank you

Sandeep

Will

unread,
Aug 12, 2016, 4:58:19 AM8/12/16
to Sandeep Sen, phyloGenerator Users
No worries! Glad to hear it.

Thanks,

Will
Reply all
Reply to author
Forward
0 new messages