FAQ: geogfn (geography file) format question
A question on geography file formats (I have put in fake
data instead of real data):
> I'm trying to use your BioGeoBEARS package and I have a
> problem with an input file format...
>
> I'm sorry to bother you for a simple question but I try to
> create the geogfn file with geographic range data , which is
> according to your tutorial a PHYLIP style file, and I cannot
> find how to create this kind of file.
>
> Thank you very much,
>
> Have a nice day,
Some answers for general use:
1. COMMON PROBLEMS IN GEOGRAPHY FILES
Watch out for tabs vs spaces, and tabs hidden at the end of
the lines. This user's problem was that their file looked
like this:
===================
5 3 (A B C)
sp1 011
sp2 010
sp3 111
sp4 100
sp5 001
===================
This has 2 problems: (a) there should be tabs instead of
spaces between the taxon names and characters, and (b) at
the end of the line staring "sp4", there is a hidden tab.
Once these were fixed, the file loaded fine with these commands:
================================
library(BioGeoBEARS)
# wd = working directory
# Here, wd is set to your default user directory,
# you should change as needed
wd = "~"
setwd(wd)
# Set the geogfn (geogfn = geography filename)
geogfn = "test_geog.data"
# Load the input geography file into the tipranges object
tipranges = getranges_from_LagrangePHYLIP(lgdata_fn=geogfn)
tipranges
================================
2. BIOGEOBEARS GEOGRAPHY DATA INPUT FORMAT
The geography file format is the same as that for C++
LAGRANGE, which is basically a PHYLIP data format.
It looks like this:
===================
5 3 (A B C)
sp1 011
sp2 010
sp3 111
sp4 100
sp5 001
===================
"5" means there are 5 species, "3" means there are 3 areas.
"A B C" are the names of your areas (you could use other
abbreviations/names, although long names don't look very
good in most plots).
...however, I've now realized that tabs etc. don't always
translate to the web/email. Here's the same file, with tab
indicated with "[TAB]":
===================
5[TAB]3[TAB](A B C)
sp1[TAB]011
sp2[TAB]010
sp3[TAB]111
sp4[TAB]100
sp5[TAB]001
===================
...the other spaces, between the "A B C" can be just spaces
(I can't remember if I wrote it to accept tabs there also).
I made [TAB] the delimiter to copy the C++ LAGRANGE input
file, and because people might have spaces in their species
names.
However, it is a good rule in general in phylogenetics
programs, TO NEVER PUT SPACES IN FILENAMES OR TAXON NAMES.
USE "_" INSTEAD OF " " AND YOUR LIFE WILL BE SIMPLER.
3. VIEWING THE EXAMPLE FILES.
The example files for BioGeoBEARS are included within the R
package when you install BioGeoBEARS. They live in the
"extdata" (extdata = extension data) directory of the
installed package.
However, the R package can be installed in many different
places, depending on your operating system and R setup.
To find out where your R package is installed, the example
script uses the following commands:
=======================================
# You can find the input files at:
extdata_dir = np(system.file("extdata", package="BioGeoBEARS"))
extdata_dir
list.files(extdata_dir)
=======================================
- "system.file" finds the path of the BioGeoBEARS installation
- "np" is just a shortcut for the function "normalizePath"
- "normalizePath" converts paths into the appropriate format
for Windows (e.g., \\example_directory\\subdirectory\\) or
Mac/Linux (e.g., /example_directory/subdirectory/)
- "extdata_dir" stores the location of the extdata directory
("extdata_dir" = "extension data directory"
- "list.files" lists the files in the input directory. It is
similar to the "ls" command in the Terminal, or the "dir"
command in DOS/Windows Command Line
To get your working directory (getwd), type:
getwd()
To set your working directory, use setwd, e.g.:
setwd(extdata_dir)
After you have set your working directory, you don't have to
keep specifying the full path when referring to files in
that directory.
E.g., you can just type
list.files()
...to view the files in the current working directory.
Now that you know where the example files live, you can
navigate to that directory using Finder (Macs), Windows
Explorer (Windows), or whatever program you like.
The files can be opened in a text editor. I recommend using
a "plain text" editor (e.g., TextWrangler/BBedit on Macs,
Notepad or Notetab on Windows), rather than e.g. Word,
WordPad, etc. The more complex programs can insert all
kinds of weird invisible characters in your files and screw
them up.
(As you can tell, I'm slowly writing a FAQ, as inspired by
questions, so keep them coming.)
--
====================================================
Nicholas J. Matzke, Ph.D.
NIMBioS Postdoctoral Fellow in Mathematical Biology
National Institute for Mathematical and Biological Synthesis
(NIMBioS,
www.nimbios.org)
Cell:
510-301-0179
Email:
mat...@nimbios.org
Links to CV, R packages, etc.:
http://phylo.wikidot.com/nicholas-j-matzke
Also: Brian O'Meara Lab
Postdoc office: 425a Hesler
Department of Ecology and Evolutionary Biology
University of Tennessee, Knoxville
http://www.brianomeara.info/
NIMBioS Office:
Claxton Bldg. #110B
Office phone:
865-974-4873
NIMBioS:
1122 Volunteer Blvd., Suite 106
University of Tennessee
Knoxville, TN 37996-3410
Phone:
(865) 974-9334
Fax:
(865) 974-9300
-----------------------------------------------------
"[W]hen people thought the earth was flat, they were wrong.
When people thought the earth was spherical, they were
wrong. But if you think that thinking the earth is spherical
is just as wrong as thinking the earth is flat, then your
view is wronger than both of them put together."
Isaac Asimov (1989). "The Relativity of Wrong." The
Skeptical Inquirer, 14(1), 35-44. Fall 1989.
http://chem.tufts.edu/AnswersInScience/RelativityofWrong.htm
====================================================