Hello,
Good news! I've fixed the species file (and emailed the corrected version to you), and I've also made the phylogeny/constraint tree for you using my new R package pez, which contains some pG functions in it. I describe both below in case it's of use to other people.
The weird output is because your file was saved in a file encoding I've not seen before. phyloGenerator (or, more precisely, Python) was getting confused by the different way your file told the computer what characters were in your file. This has come up for others before: I just saved it as a UTF-8 file and it was all fine. If you use the emacs text editor, like me, type 'C-x C-m f' otherwise I'm sure there's an option in 'save as' to alter your encoding.
(You're trying to build a very large phylogeny, and so) I would recommend you use some kind of constraint tree if you can. Here is a way to build one very easily using pez in R:
#Load pez package
require(pez)
#Load phylogeny and species
tree <- read.tree("Vascular_Plants_rooted.dated.tre")
species <- read.delim("all.species.names.largedb.txt",
as.is=TRUE)[,1]
#Make tree and drop unwanted species
tree <- congeneric.merge(tree, species)
tree <- drop.tip(tree, setdiff(tree$tip.label, species))
...for this I assume you have the phylogeny from
Zanne et al. (2014) on your hard-drive somewhere. Note that, in this case, you've got ~550 species (of ~560) in your phylogeny now. So you can either use this as a constraint, or... just go ahead and use it as a tree for whatever you're doing.
pez is simply running the
merge/
replace option in pG on the Zanne tree.
Let me know how you get on,
Will