Missing sequences

38 views
Skip to first unread message

gaiar...@gmail.com

unread,
Aug 25, 2015, 12:19:22 AM8/25/15
to phyloGenerator Users
Hello Will,
I am using phyloGenerator1 for the first time and I am trying to built a phylogeny for 30 plant species. My problem is when the program checks for DNA and gives the sequence summary. I am using the presets for plants, and in the ouput there are a few species with '0'. However, if I put these species on geneBank there are sequences available ("rosmarinus_officinalis" for example).
I read your manuscript and the tutorials, but as a non-phylogeneticist I couldn't figure out the difference between this species and the others for which phyloGenerator gives the sequnces.

Thanks in advance and congrats on the program,
Marilia

Will

unread,
Aug 25, 2015, 9:46:44 AM8/25/15
to gaiar...@gmail.com, phyloGenerator Users
Hello,

Thanks for getting in touch and using the program.

The most likely reason is that pG is not downloading these sequences because it's checking them and finding some problem with them (e.g., it wouldn't align well with your data). Two potential solutions: try 'reload'-ing the species you're missing stuff for, and if that doesn't work try 'replace' and then 'THOROUGH'. That last one should replace species where it can with close relatives.

Two final things: I would be grateful if you could send me what operating system you're on (Windows 10, etc.). I've had a few reports like this of late, and I've got a sneaking suspicion about some newer versions of Windows... Secondly, take a look at my R package pez; for a comparatively small plant phylogeny, you might find the 'congeneric.merge' function useful when used with the Zanne et al. tree (http://datadryad.org/resource/doi:10.5061/dryad.63q27). At the very least, it would be a good way to get a constraint tree for your analysis.

Thanks again,

Will

---
Need a phylogeny? Try phyloGenerator: original or new version
Measuring phylogenetic structure? Try install.packages('pez')
Want something to read? Try PEGE journal club or willeerd

Will Pearse
Post-doc, ecology / evolutionary biology
Davies and Peres-Neto labs
Skype: will.pearse
Cell: (+1) 514-973-1987

Marilia Gaiarsa

unread,
Aug 25, 2015, 5:38:28 PM8/25/15
to Will, phyloGenerator Users
Hey Will, thanks for the fast reply.

I am using a Mac OS X (10.10.5).

Yesterday I ran the "speciesShorter.txt" of the British Birds demo because I though there could be something wrong with my files and it returned all zeros! Any tips on that?
I'll try the 'replace' and then 'THOROUGH today on my data and get back to you.

Thanks again,

Mariia
--
Marilia P. Gaiarsa
PhD Candidate, Guimarães Lab
Universidade de São Paulo, São Paulo, Brazil



Will

unread,
Aug 25, 2015, 9:02:42 PM8/25/15
to Marilia Gaiarsa, phyloGenerator Users
Hello,

Oh dear, that sounds annoying, sorry! Is there a chance you could send me a copy-paste of what you've typed into pG and what it's spat out at you? That way I'll have a better idea of what's going on. It sounds suspiciously to me like a file encoding problem...

Thanks!

Will


---
Need a phylogeny? Try phyloGenerator: original or new version
Measuring phylogenetic structure? Try install.packages('pez')
Want something to read? Try PEGE journal club or willeerd

Will Pearse
Post-doc, ecology / evolutionary biology
Davies and Peres-Neto labs
Skype: will.pearse
Cell: (+1) 514-973-1987

Marilia Gaiarsa

unread,
Aug 26, 2015, 12:05:24 AM8/26/15
to Will, phyloGenerator Users
Hey Will, 
I restarted my computer (!) and did the 'replace' and then 'THOROUGH' and this time it worked.
Thanks so much!! I'll keep "playing" and hopefully don't have to bother you again.

Cheers,
Marilia 

Will

unread,
Aug 26, 2015, 12:23:40 PM8/26/15
to Marilia Gaiarsa, phyloGenerator Users
Hello,

Just to be sure; when you say it worked, do you mean that THOROUGH worked, or that it downloaded sequences now? Again, I would really be grateful of a copy-paste of your output if it's not downloading anything from the demo files, because they really should work!

At any rate, I'm glad it's working out! No worries about emailing; I'm grateful for the feedback.

Thanks,

Will


---
Need a phylogeny? Try phyloGenerator: original or new version
Measuring phylogenetic structure? Try install.packages('pez')
Want something to read? Try PEGE journal club or willeerd

Will Pearse
Post-doc, ecology / evolutionary biology
Davies and Peres-Neto labs
Skype: will.pearse
Cell: (+1) 514-973-1987

Marilia Gaiarsa

unread,
Aug 26, 2015, 8:50:08 PM8/26/15
to Will, phyloGenerator Users
Sorry, I meant that the THOROUGH worked - but that was yesterday. I just ran everything again (to send you the print screen) and this time it only found one species (before it didn't find seven)... I used the same file from yesterday and ran the THOROUGH again, but with no success this time. 
Any tips?

Thanks and sorry for all the trouble.
Marilia

Screen Shot 2015-08-27 at 12.48.13 PM.png
Screen Shot 2015-08-27 at 12.42.50 PM.png
Screen Shot 2015-08-27 at 12.42.45 PM.png

Will

unread,
Aug 27, 2015, 8:56:12 AM8/27/15
to Marilia Gaiarsa, phyloGenerator Users
Hello,

Please don't apologise - you're not causing any trouble! It looks, from this, like my instructions on the website might not have been clear enough, and so I'm sorry for that. When you first start pG, specify "-genes rbcL,matK". You're searching for a gene called 'plants', instead of using the genes that are the default for plants.

If that doesn't do the trick (...but I'm almost certain it will!) then please send me a copy-paste of everything you've typed into the program - print-screens of the end of the run aren't always everything I need. It would probably help to have the species list that you're giving pG too - some of these species (e.g., "linus_sp_m_pl_158..." or something like that) aren't really genus_species names, and I would be surprised if pG can find anything for them.

Thanks again,

Will


---
Need a phylogeny? Try phyloGenerator: original or new version
Measuring phylogenetic structure? Try install.packages('pez')
Want something to read? Try PEGE journal club or willeerd

Will Pearse
Post-doc, ecology / evolutionary biology
Davies and Peres-Neto labs
Skype: will.pearse
Cell: (+1) 514-973-1987

Marilia Gaiarsa

unread,
Aug 29, 2015, 7:52:23 PM8/29/15
to Will, phyloGenerator Users
Oh my that's embarrassing! I was in a rush to send you the print screens and didn't pay attention. I did it right this time and it worked - thanks so much. 

One last question for the specilist - should I keep trimming species 7 and 23 until the "^^^^" disappears? 

And finally, I am aware that the "species"  "linus_sp_m_pl_158" won't work, but I was just checking how pG would deal with those =)

Thanks and congrats again
M.
pG.pdf

Will

unread,
Aug 29, 2015, 8:15:03 PM8/29/15
to Marilia Gaiarsa, phyloGenerator Users
Hello,

No worries, happens to everyone!

You should definitely trim them, because it looks like you've downloaded an entire chloroplast. Something will always be the longest/shortest sequence in a run, so trimming until the warnings go away might not always be the best bet, but in cases like this yes you should definitely trim.

Glad to see you're checking up on its performance :D Let me know if you find something isn't working!...

Will


---
Need a phylogeny? Try phyloGenerator: original or new version
Measuring phylogenetic structure? Try install.packages('pez')
Want something to read? Try PEGE journal club or willeerd

Will Pearse
Post-doc, ecology / evolutionary biology
Davies and Peres-Neto labs
Skype: will.pearse
Cell: (+1) 514-973-1987

estibal...@gmail.com

unread,
Feb 18, 2016, 2:16:09 AM2/18/16
to phyloGenerator Users
Hi Will,

congrats for the program - seems really handy!

I am going through a similar problem as Marilia. The program runs smoothly, but doesn't seem to download any sequence from GenBank. So far I've tried different gene specifications (only rbcL, rbcL and matK, 'plant'), but none of them has worked.

Would you be able to have a look to my files and let me know if something seems to be wrong? Thanks!

Cheers,
Esti

Will

unread,
Feb 18, 2016, 10:56:47 AM2/18/16
to estibal...@gmail.com, phyloGenerator Users
Hello,

Thanks for getting in touch. It's difficult for me to say what's going on without any more information, but if you send me (1) the species file (with all the names), (2) a copy-paste of what you've typed into the program, and (3) what operating system you're on (Mac, Windows, etc.) I'll take a look for you.

Cheers,

Will

PS - I'm travelling Friday-Sunday, but if you get this to me now I might be able to figure it out before I leave :D


---
Need a phylogeny? Try phyloGenerator: original or new version
Measuring phylogenetic structure? Try install.packages('pez')
Want something to read? Try PEGE journal club or willeerd

Will Pearse
Post-doc, ecology / evolutionary biology
Davies and Peres-Neto labs
Skype: will.pearse
Cell: (+1) 514-973-1987

Estíbaliz Palma

unread,
Feb 18, 2016, 6:11:35 PM2/18/16
to Will, phyloGenerator Users
Hi Will, 

thanks for the quick reply. I couldn't send my files before (sorry - contacting from Australia), but not a problem if you can't have a look until next week :)

I've attached the species files and a the copy/paste from my phyloGenerator session. Re the species list, I am working with the codes from GenBank directly, because some of my species lacked info in GenBank and I replaced them with the most appropriate alternative before running the program (instead of letting the program look for available alternatives). I hope this is not a problem. I've attached an excel file in case you need to have a look at how the codes match up with species names.

Thanks again for your help!

I hope you have a very nice weekend. Cheers,

Esti
 
species.txt
GenBank codes.xlsx
FirstAttempt_FW_CODE.pdf

Will

unread,
Feb 18, 2016, 8:24:33 PM2/18/16
to Estíbaliz Palma, phyloGenerator Users
Hello,

Thanks for this. If you're going to give phyloGenerator TaxonIDs from GenBank and not species names, you need to tell it. You can do this by running it with the "-taxonIDs" argument. From your file encodings, I think you're on Windows (right?), so that means that mean you'll have to run phyloGenerator by opening command prompt, navigating to wherever pG is saved (e.g., "cd Desktop\phyloGenerator") and running it by typing "phyloGenerator.exe -species C:\location\of\your\file\species.txt -taxonIDs". 

It sounds like you might have specific DNA sequences you want to work with. If you do, remember you can just download those straight from GenBank and give them to pG as DNA.

In case anyone else is reading this, remember that you can find all pG's options if you run it with the "--help" option. I find it much easier to run pG by specifying all the options ahead of time, since it means I can figure everything out in one go, and I can just copy-paste my choices for when I'm doing another analysis.

Let me know how this works out for you. Please do tell me if you have any more problems!

Cheers,

Will

---
Need a phylogeny? Try phyloGenerator: original or new version
Measuring phylogenetic structure? Try install.packages('pez')
Want something to read? Try PEGE journal club or willeerd

Will Pearse
Post-doc, ecology / evolutionary biology
Davies and Peres-Neto labs
Skype: will.pearse
Cell: (+1) 514-973-1987

Estíbaliz Palma

unread,
Feb 19, 2016, 1:58:10 AM2/19/16
to Will, phyloGenerator Users
Awesome! It worked :)

Thanks Will, that was really helpful. For my previous attempt, I opened the program directly from the .exe file. This time, I opened the command prompt, changed my working directory, and typed the line you provided "phyloGenerator.exe -species C:\location\of\your\file\species.txt -taxonIDs". Another difference is that this time it didn't ask if I wanted to use my own gene sequences.

If you download the sequences directly from GenBank, how do you exactly type them into phyloGenerator? Especially if you have more than one gene per species. Also, and this may be a little bit of topic, but do you know of any look up table from GenBank where genes names/codes are listed? I still need to figure out which genes to use to build the phylogeny...

Oh, and yes, I work with Windows. I forgot to mention before. 

Enjoy your weekend! And thanks again!

Cheers,
Esti

Estíbaliz Palma

unread,
Feb 19, 2016, 1:59:59 AM2/19/16
to Will, phyloGenerator Users
I thought I should attach the code for my second attempt, in case someone wants to have a look :)
SecondAttempt_FW_CODE.pdf

Will

unread,
Feb 22, 2016, 11:00:11 AM2/22/16
to Estíbaliz Palma, phyloGenerator Users
Hello,

Thanks for this, and thanks also for sending round what you typed in. I think it's really useful for people to know what did and didn't work, so thanks for sharing! Since you've raised a few points, I've replied in a semi-structured way below...

Not asking for DNA sequences when providing species/taxonIDs
pG tries to be intelligent, so if you give it taxon IDs to download it doesn't ask for DNA/species because it's already got those to be getting on with. The same goes for all command-line arguments: if you tell pG to align with MAFFT, it won't ask about alignment, it'll just do it. This is another advantage to using command line arguments: your analysis goes a lot quicker.

Giving pG existing DNA data
You can give pG sequences from multiple genes using the -dna argument; there's an example of this in the 'Sillwood Plants' demo folder. For example, "-dna C:\Documents\plants\rbcL_raw.fasta,C:\Documents\plants\matK_raw.fasta" would load two datasets (notice there are two absolute file paths, delimited by commas). pG uses the FASTA sequence format for raw sequences and alignments - look in the demos folder for examples.

A related question I sometimes get is how you can take DNA data from two runs and use them to build a single phylogeny. Say, for example, you were building a phylogeny of 100 mammals and then you realised you forgot ten. Just run those ten new species and then copy-paste the extra sequences into a new file containing both the sequences from the first run and then second. FASTA is a very simple format - you literally just open up the sequences from the first run in Notepad (Textwrangler, emacs, whatever), copy-paste the extra sequences from the second run, then save as a new file.

Gene name lookup table for GenBank
I don't know of such a thing, but I wish there were one! I think the problem of labelling regions of sequences is a field in of itself, but this (http://www.ncbi.nlm.nih.gov/gene) site is not a bad place to start looking.

I hope that helps. Please do let me know how you get on!

Cheers,

Will


---
Need a phylogeny? Try phyloGenerator: original or new version
Measuring phylogenetic structure? Try install.packages('pez')
Want something to read? Try PEGE journal club or willeerd

Will Pearse
Post-doc, ecology / evolutionary biology
Davies and Peres-Neto labs
Skype: will.pearse
Cell: (+1) 514-973-1987
Reply all
Reply to author
Forward
0 new messages