Re: Beauti: SNPs data into nexus format

3,595 views
Skip to first unread message

Remco Bouckaert

unread,
Jun 12, 2012, 4:06:29 PM6/12/12
to beast...@googlegroups.com
Hi Jostein,

At the moment, SNAPP only accepts nexus files with binary sequences, one
sequence for every individual. Also, names are expected to have a
separator, like underscore, in them in the hope of being able to group
them together. Something like the example below should work.

BTW, you might have some trouble running SNAPP with 190 individuals due
to the amount of computation involved. Make sure you run with as many
threads as you can. Alternatively, remove some individuals to start with
and add more when you think it runs fast enough not to outrun your
patience.

Remco


#nexus

BEGIN Taxa;
DIMENSIONS ntax=2;
TAXLABELS
[1] 'taxon_1'
[2] 'taxon_2'
;
END; [Taxa]

BEGIN Characters;
DIMENSIONS nchar=50;
FORMAT
datatype=STANDARD
missing=?
gap=-
symbols="01"
labels=left
transpose=no
interleave=no
;
MATRIX
'taxon_1' 10001010000000100010000001000001010100000010001000
'taxon_2' 01010001000000000000100000000010000100000000001000
;
End;



On Tue, 2012-06-12 at 04:58 -0700, Jostein Gohli wrote:
> Hi all,
>
>
>
> I've got a huge load of SNPs (~28K, 190 individuals) from a RAD tag
> run that I want to analyse in SNAPP (http://snapp.otago.ac.nz/).
>
>
> I want to do model selection and conversion to XML format in Beauti.
>
>
> Currently, my data is in the following format:
>
>
> SNP 1 2 3 4 5 6
> ind001 4/0 4/4 3/3 0/0 0/0 4/0
> ind002 0/0 4/4 3/0 0/0 0/0 0/0
> ind004 4/4 4/4 3/3 3/0 2/0 4/4
> ind005 0/0 4/4 1/3 3/0 2/0 4/0
> ind006 4/4 4/4 3/3 0/0 0/0 4/0
> ind007 0/0 4/4 3/3 0/0 0/0 4/4
> ind008 4/4 4/4 3/0 0/0 0/0 4/4
> ind009 4/0 4/4 3/3 0/0 0/0 0/0
>
>
>
> I need to get this data into a nexus file in order to get it into
> Beauti.
>
>
> I really hope someone can help me out with this. Thanks a lot!
>
>
> Jostein
> --
> You received this message because you are subscribed to the Google
> Groups "beast-users" group.
> To view this discussion on the web visit
> https://groups.google.com/d/msg/beast-users/-/2XC9oW9ZrpsJ.
> To post to this group, send email to beast...@googlegroups.com.
> To unsubscribe from this group, send email to beast-users
> +unsub...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/beast-users?hl=en.


Jostein Gohli

unread,
Jun 14, 2012, 5:15:24 AM6/14/12
to beast...@googlegroups.com
Hi remco,

Thanks a lot for your prompt answer to my query.

Could I bother you with explaining exactly how my data

e.g. 4/4 0/4 1/1

translates into binary sequences.

Thanks again :) 

Remco Bouckaert

unread,
Jun 14, 2012, 4:28:08 PM6/14/12
to beast...@googlegroups.com
On Thu, 2012-06-14 at 02:15 -0700, Jostein Gohli wrote:
> Could I bother you with explaining exactly how my data
>
>
> e.g. 4/4 0/4 1/1
>
>
> translates into binary sequences.
>

Can you tell a bit more about the meaning of this data format? Is it
diploid with every column containing only two possible values? If so,
choose a dominant value for each column and replace with 1 of the value
matches and 0 if the value does not match.

For you data

> SNP 1 2 3 4 5 6
> ind001 4/0 4/4 3/3 0/0 0/0 4/0
> ind002 0/0 4/4 3/0 0/0 0/0 0/0
> ind004 4/4 4/4 3/3 3/0 2/0 4/4
> ind005 0/0 4/4 1/3 3/0 2/0 4/0
> ind006 4/4 4/4 3/3 0/0 0/0 4/0
> ind007 0/0 4/4 3/3 0/0 0/0 4/4
> ind008 4/4 4/4 3/0 0/0 0/0 4/4
> ind009 4/0 4/4 3/3 0/0 0/0 0/0


Choose dominant value 4/4 4/4 3/3 0/0 0/0 4/0
then this translates in

> ind001 1/0 1/1 1/1 0/0 0/0 1/0
> ind002 0/0 1/1 1/0 0/0 0/0 0/0
etc.

Hope this helps,

Remco


Jostein Gohli

unread,
Jun 18, 2012, 4:56:22 AM6/18/12
to beast-users, re...@cs.auckland.ac.nz
Hi Remco

Sorry for the late reply, I thought I had this thread on e-mail alert,
but apparently not.

The data is as follows.

For one individual and one SNP position, the two numbers separated by
slash translates into the diplodid SNP.

There are 5 different states and all states can occur together: 0, 1,
2, 3, 4 -> N, G, A, T, C, where N is just missing data or
"hemizygote" (when we don't have the coverage to conclude that we have
a homozygote).

So, 0/4 is the hemizygote N/C. In this case we are sure that C occurs
at this particular SNP, but we have not observed proportionaly enough
C's to conclude that the individual is homozygote in this SNP
position,

3/3 is homozygote T/T

1/2 is heterozygote G/A.

NB! All data is not in phase.

Thanks again Remco!

Jostein Gohli

unread,
Jun 18, 2012, 6:51:58 AM6/18/12
to beast-users, re...@cs.auckland.ac.nz
Sorry, I misunderstood you. I had triallelic data but removed it, so
yes I can use the method of coding you outlined here.

There are however a couple of problems. How do I code missing data?
Often there will be heterozygotes and also individuals with missing
data or insufficient coverage for calling a homozygote. So there will
be three states (e.g. N, A and C or 0, 2, 4). How can I include this
missing data? Can I just replace the N's og 0's with "?"?

Also, is it crucial that I call the dominant value (most numerous
value) as 1 and the other value as 0 througout. What I'm asking is,
does SNAPP treat 1's and 0's differently?

Lastly, the coding you outlined:

> ind001 1/0 1/1 1/1 0/0 0/0 1/0
> ind002 0/0 1/1 1/0 0/0 0/0 0/0

I can't get this into a nexus file in a way that Beauti accepts.

Thanks again :)

On Jun 14, 10:28 pm, Remco Bouckaert <re...@cs.auckland.ac.nz> wrote:

Remco Bouckaert

unread,
Jun 18, 2012, 6:45:44 PM6/18/12
to Jostein Gohli, beast-users
On Mon, 2012-06-18 at 03:51 -0700, Jostein Gohli wrote:
> There are however a couple of problems. How do I code missing data?
> Often there will be heterozygotes and also individuals with missing
> data or insufficient coverage for calling a homozygote. So there will
> be three states (e.g. N, A and C or 0, 2, 4). How can I include this
> missing data? Can I just replace the N's og 0's with "?"?

There is not support for missing data at the moment, but it is at the
top of the todo list. If there are not too many sites with missing data,
you could just remove these sites from the dataset.

> Also, is it crucial that I call the dominant value (most numerous
> value) as 1 and the other value as 0 througout. What I'm asking is,
> does SNAPP treat 1's and 0's differently?

It only treats them differently in that the mutation rate from 0 to 1
differs from that of the rate from 1 to 0. Otherwise the choice of 1 is
arbitrary.

> Lastly, the coding you outlined:
>
> > ind001 1/0 1/1 1/1 0/0 0/0 1/0
> > ind002 0/0 1/1 1/0 0/0 0/0 0/0
>
> I can't get this into a nexus file in a way that Beauti accepts.

You have to put it in nexus format, like so

#nexus

BEGIN Taxa;
DIMENSIONS ntax=2;
TAXLABELS
[1] 'ind_001'
[2] 'ind_002'
;
END; [Taxa]

BEGIN Characters;
DIMENSIONS nchar=50;
FORMAT
datatype=STANDARD
missing=?
gap=-
symbols="01"
labels=left
transpose=no
interleave=no
;
MATRIX
'ind_001' 10001010000000100010000001000001010100000010001000
'ind_002' 01010001000000000000100000000010000100000000001000
;
End;

Remco


Jostein Gohli

unread,
Jun 19, 2012, 5:41:18 AM6/19/12
to beast-users, re...@cs.auckland.ac.nz
Hi Remco,

Thanks for your help!

When you say that there is no support for missing data, do you mean in
Beauti or SNAPP or both? You say that it's on the top of the todo
list... Would you risk giving an ETA, as I assume you are involved in
development?

Sadly, there is no complete coverage for any of the SNPs in my data
set. There always some individuals that didn't have the necessary
coverage to call the SNP as homozygote for that individuals (when a
certain proportion of the reads are N's, we call it as hemizygote
(e.g. T/N)). I'm sure I'll find a way of treating the missing data,
which will include removing a lot of data and making certain bold
assumptions and alterations to the data. I would of course prefer it
to be treated by the software.

Thanks again Remco, I'll be sure to remember your name come
acknowledgment-time.

Jostein

Derek

unread,
Jun 19, 2012, 12:56:15 PM6/19/12
to beast...@googlegroups.com
Perl and Python are well suited for 'data massaging', i.e. for writing programs which would, in your case, convert your data into NEXUS format.


On Tuesday, June 12, 2012 7:58:03 AM UTC-4, Jostein Gohli wrote:
Hi all,

I've got a huge load of SNPs (~28K, 190 individuals) from a RAD tag run that I want to analyse in SNAPP (http://snapp.otago.ac.nz/).

I want to do model selection and conversion to XML format in Beauti.

Currently, my data is in the following format:

SNP       1    2    3    4    5   6
ind001   4/0 4/4 3/3 0/0 0/0 4/0
ind002   0/0 4/4 3/0 0/0 0/0 0/0
ind004   4/4 4/4 3/3 3/0 2/0 4/4
ind005   0/0 4/4 1/3 3/0 2/0 4/0
ind006   4/4 4/4 3/3 0/0 0/0 4/0
ind007   0/0 4/4 3/3 0/0 0/0 4/4
ind008   4/4 4/4 3/0 0/0 0/0 4/4
ind009   4/0 4/4 3/3 0/0 0/0 0/0

Remco Bouckaert

unread,
Jun 19, 2012, 6:05:51 PM6/19/12
to Jostein Gohli, beast-users
Hi Jostein,

On Tue, 2012-06-19 at 02:41 -0700, Jostein Gohli wrote:
> When you say that there is no support for missing data, do you mean in
> Beauti or SNAPP or both? You say that it's on the top of the todo
> list... Would you risk giving an ETA, as I assume you are involved in
> development?

There is no support in SNAPP, hence Beauti is configured not to load
such data. The code for missing data support is actually already
written, but it needs testing. You can the jar from
http://code.google.com/p/snap-mcmc/downloads/detail?name=snap.jar&can=2&q=
which should replace the jar in the release, but keep in mind that it is
not tested, hence rather experimental, so it might not work as expected.


> Sadly, there is no complete coverage for any of the SNPs in my data
> set. There always some individuals that didn't have the necessary
> coverage to call the SNP as homozygote for that individuals (when a
> certain proportion of the reads are N's, we call it as hemizygote
> (e.g. T/N)). I'm sure I'll find a way of treating the missing data,
> which will include removing a lot of data and making certain bold
> assumptions and alterations to the data. I would of course prefer it
> to be treated by the software.

If the code happens to work, you can still use all data. An alternative
approach is to fill in the blanks by randomly assigning values. It would
probably be best to sample from the distribution of the remaining values
at the same site to minimise distortions.


> Thanks again Remco, I'll be sure to remember your name come
> acknowledgment-time.

Thanks, that would be nice.

Remco

Jostein Gohli

unread,
Jun 20, 2012, 6:04:58 AM6/20/12
to beast-users
Yeah, I was planning on using Python scripts for this stuff, but
thanks for your input :)

On Jun 19, 6:56 pm, Derek <derek.br...@gallaudet.edu> wrote:
> Perl and Python are well suited for 'data massaging', i.e. for writing
> programs which would, in your case, convert your data into NEXUS format.
>
>
>
>
>
>
>
> On Tuesday, June 12, 2012 7:58:03 AM UTC-4, Jostein Gohli wrote:
>
> > Hi all,
>
> > I've got a huge load of SNPs (~28K, 190 individuals) from a RAD tag run
> > that I want to analyse in SNAPP (http://snapp.otago.ac.nz/).
>
> > I want to do model selection and conversion to XML format in Beauti.
>
> > Currently, my data is in the following format:
>
> > SNP       1    2    3    4    5   6
> > ind001   4/0 4/4 3/3 0/0 0/0 4/0
> > ind002   0/0 4/4 3/0 0/0 0/0 0/0
> > ind004   4/4 4/4 3/3 3/0 2/0 4/4
> > ind005   0/0 4/4 1/3 3/0 2/0 4/0
> > ind006   4/4 4/4 3/3 0/0 0/0 4/0
> > ind007   0/0 4/4 3/3 0/0 0/0 4/4
> > ind008   4/4 4/4 3/0 0/0 0/0 4/4
> > ind009   4/0 4/4 3/3 0/0 0/0 0/0
>
> > *I need to get this data into a nexus file in order to get it into Beauti.
> > *

Jostein Gohli

unread,
Jun 20, 2012, 6:10:04 AM6/20/12
to beast-users, re...@cs.auckland.ac.nz
> There is no support in SNAPP, hence Beauti is configured not to load
> such data. The code for missing data support is actually already
> written, but it needs testing. You can the jar fromhttp://code.google.com/p/snap-mcmc/downloads/detail?name=snap.jar&can...
> which should replace the jar in the release, but keep in mind that it is
> not tested, hence rather experimental, so it might not work as expected.

That's great! I'll give it a try once my damn mac is back from service
-_-

> If the code happens to work, you can still use all data. An alternative
> approach is to fill in the blanks by randomly assigning values. It would
> probably be best to sample from the distribution of the remaining values
> at the same site to minimise distortions.

That sounds like a clever approach. I'll keep it in mind.

Thanks for all your help :)


Remco Bouckaert

unread,
Sep 30, 2012, 3:23:35 PM9/30/12
to Emiliano, beast...@googlegroups.com
Hi Emiliano,

On Sat, 2012-09-29 at 14:33 -0700, Emiliano wrote:
> I tried to use the new jar file that takes into account missing data
> but without success: the run just crashes immediately writing only a
> single line both in the log and in the tree file. What is your
> experience? Did you succeed in running your analysis of with missing
> data?

I have not seen before that the chain just stops after writing out a
single line. However, it is possible that the program still runs, but
due to the number of sequences in the analysis it takes a lot of time
for SNAPP to report the next line in the log file. How many sequences do
you have in the analysis? Are there any error messages on screen?

Kind regards,

Remco

Jostein Gohli

unread,
Oct 2, 2012, 4:42:13 AM10/2/12
to beast...@googlegroups.com, Emiliano
Hi guys,

I'm actually starting up with these analyses again this week, and I have a similar problem to you Emiliano. 

As I'm struggling, I've tried to get SNAPP to work with the example data first. I've used the aflp_25.nex data file supplied through the SNAPP package. First I tried to get it to run without alterations, which it did. Then I introduced some missing data into the data file and constructed a new XML file through Beauti. While running this file I got similar errors to Emiliano, i.e. spamming of java.lang.ArrayIndexOutOfBoundsException (I changed nothing in Beauti except for 'Chain length' and the 'Dominant' box was left unchecked. The files I used are attached.

PS. The file testMissing.xml runs fine in SNAPP.

PPS. Emiliano, I think you should restrict your symbols to only two states (binary). Also, you should check your process manager to see if is using resources (javaw.exe on windows). I had a similar experience, so it might just need some time before it starts writing to your log file.

Best regards,

Jostein


kl. 11:06:29 UTC+2 mandag 1. oktober 2012 skrev Emiliano følgende:
Hi Remco,

here is the header of my NEXUS file
-------
#NEXUS

Begin data;
Dimensions ntax=29 nchar=26521;
Format datatype=STANDARD symbols="012" missing=- gap=?;
Matrix
-------------
I had a long list of "ArrayIndexOutOfBoundsException" as showed in another post but I didn't flagged the "Dominant" box.
I also checked the xml file  (attached) but I couldn't find this parameter. This was the first exploratory run so I left everything as the default setting.

Thanks a lot for the help
Emiliano
aflp_25.nex
aflp_25_missing.xml
run.txt

Jostein Gohli

unread,
Oct 2, 2012, 4:45:09 AM10/2/12
to beast...@googlegroups.com, Emiliano
Also Emiliano, I believe it should be as follows: "gaps=- missing=?", not the other way around. And mabe use a subset of your data, not all of you 13 thousand SNPs when playing around with SNAPP :)

Jostein

kl. 11:06:29 UTC+2 mandag 1. oktober 2012 skrev Emiliano følgende:
Hi Remco,

here is the header of my NEXUS file
-------
#NEXUS

Begin data;
Dimensions ntax=29 nchar=26521;
Format datatype=STANDARD symbols="012" missing=- gap=?;
Matrix
-------------
I had a long list of "ArrayIndexOutOfBoundsException" as showed in another post but I didn't flagged the "Dominant" box.
I also checked the xml file  (attached) but I couldn't find this parameter. This was the first exploratory run so I left everything as the default setting.

Thanks a lot for the help
Emiliano


On Sunday, September 30, 2012 9:22:15 PM UTC+2, remco wrote:

Emiliano

unread,
Oct 2, 2012, 5:47:58 AM10/2/12
to beast...@googlegroups.com, Emiliano
Hi Jostein,

you're right, it's better not to mess with the full dataset! Concerning the encoding of the sequences (gaps=- missing=?) , I thought it doesn't matter which character you use as long as you coded it but I'll try the way you suggested. However, coding the states as binary only (01), I suppose it is not really possible since I have three states (AA=0, Aa=1, aa=2) and the transition probability between 0 and 1 or 2 and 1 on one side, and between 0 and 2 on the other side should be different. 
I will check if the java is running and simply taking a long time to write results.

Thanks a lot for the interest. It is nice to share with others the frustration of using snps data in a phylogeographic (software) world focused on sequence!
Let's keep in touch

Emiliano

Jostein Gohli

unread,
Oct 2, 2012, 7:40:05 AM10/2/12
to beast...@googlegroups.com, Emiliano
Hi again,

Regarding the coding of missing data, I thought like you (that it was arbitrary), but Remco seemed so adamant about coding missing data as 'missing=?', so I assumed it wasn't an option like it usually is.

Oh, I see (I whink). I have unknown phase on my SNPs. So if I have a "AT" SNP for instance, it can take the following forms: "AA,TT,AT,AN,TN", but never "TA". Hence, I can use a recode the data to only two states. If I'm not mistaken SNAPP can only handle biallelic data, so maybe you need to recode your data somehow?

Regards

Jostein 

Remco Bouckaert

unread,
Oct 2, 2012, 5:28:41 PM10/2/12
to beast...@googlegroups.com, Emiliano
Hi Emiliano,

It turns out there is a (non trivial) bug in the code that is dealing
with missing data. This same bug affects the example that Jostein
posted.

I will work on this and let you know when it is ready.

BTW encoding gaps as questions-marks does not make any difference (any
more).

Regards,

Remco


On Mon, 2012-10-01 at 02:06 -0700, Emiliano wrote:
> Hi Remco,
>
>
> here is the header of my NEXUS file
> -------
> #NEXUS
>
> Begin data;
> Dimensions ntax=29 nchar=26521;
> Format datatype=STANDARD symbols="012" missing=- gap=?;
> Matrix
> -------------
> I had a long list of "ArrayIndexOutOfBoundsException" as showed in
> another post but I didn't flagged the "Dominant" box.
> I also checked the xml file (attached) but I couldn't find this
> parameter. This was the first exploratory run so I left everything as
> the default setting.
>
>
> Thanks a lot for the help
> Emiliano
>
>
>
> On Sunday, September 30, 2012 9:22:15 PM UTC+2, remco wrote:
> --
> You received this message because you are subscribed to the Google
> Groups "beast-users" group.
> To view this discussion on the web visit
> https://groups.google.com/d/msg/beast-users/-/M48WidcxZqMJ.

Jostein Gohli

unread,
Oct 3, 2012, 6:02:54 AM10/3/12
to beast...@googlegroups.com, Emiliano
Thank you very much Remco!

Jostein

Emiliano

unread,
Oct 3, 2012, 12:32:38 PM10/3/12
to beast...@googlegroups.com, Emiliano
Yes, thanks a lot Remco.
Emiliano

Remco Bouckaert

unread,
Oct 3, 2012, 3:28:34 PM10/3/12
to beast...@googlegroups.com, Emiliano, David Bryant
Hi Everyone,

I updated the BEAST 2 addon for SNAPP that fixes the missing data issue
(but not the SNAPP stand-alone version, since there is quite a bit more
work involved in changing those, and I want to wait for the dominant
marker bug to be fixed before getting into that).

It now properly runs the example of Jostein and also the one of
Emiliano. The main issue is with missing data for sites that have no
data at all for one or more of the species. What this fix does is remove
these sites from the data, which may or may not be appropriate. When it
does this, it prints out a message on screen, for example for Emiliano's
XML file, it removes 1137 of the 26521 sites and prints:

26521 sites
25986 patterns
WARNING: removed 1137 sites becaues they have one or more branches
without data.


To use the SNAPP add-on, install BEAST 2.0 if you have not already done
so. The, install the SNAPP addon. The easiest way to do thi is start
BEAUti, click menu File/Manage Add-ons, select SNAPP from the list and
click the install button. You may need to restart BEAUti.

If you already had the SNAPP add-on installed, uninstall it first: start
BEAUti, click menu File/Manage Add-ons, select SNAPP from the list and
click the un/install button. Then re-install by clicking the un/install
button once more.

You might notice that the dominant flag is removed from BEAUti. This is
because (apart from the bug that prevents use of it right now) we want
to discourage selecting it since it does not seem to change the outcome
from an analysis significantly but it does add a lot of computational
time to an already computational intensive algorithm.

Let me know if you run into any more problems,

Remco





Emiliano

unread,
Oct 4, 2012, 4:56:48 AM10/4/12
to beast...@googlegroups.com, Emiliano, David Bryant
Hi Remco,

I agree with the decision of removing those sites that do not match one of the species samples set.
Actually, those sites should have been previously removed from my data and this is a problem related with the criteria I used for selection.

I'll try this new version asap and let you know. Thank you so much for the really fast job.
Best,
Emiliano 

Emiliano

unread,
Oct 8, 2012, 6:48:26 AM10/8/12
to beast...@googlegroups.com, Emiliano, David Bryant
Hi Remco,

I tried to run this implemented version of the add-on but BEAUTY (or me?) seems to have a problem in managing the add-on.
First: the two version of beauty in the beast2 package and in the snapp package are different, are they?
If run the first, you can read BEAUty2:standard while in the second it is BEAUty2:SNAPP. Panels are obviously different.

If I run the BEAUty2:standard with the SNAPP add-on already installed and I upload the snp alignement I got a message (actually many times) that it couldn't entry a substitution model.
If I save and run this xml with BEAST2, it stops with the following error:

Error 110 parsing the xml input file

validate and intialize error: Input 'substModel' must be specified.

Error detected about here:
<beast>
<run id='mcmc' spec='MCMC'>
<distribution id='posterior' spec='util.CompoundDistribution'>
<distribution id='likelihood' spec='util.CompoundDistribution'>
<distribution id='treeLikelihood.hys_red' spec='TreeLikelihood'>
<siteModel id='SiteModel.s:hys_red' spec='SiteModel'>

If I run the BEAUty2:SNAPP version and I run the xml with BEAST2 I got this other error:


Error 122 parsing the xml input file
Cannot create class: snap.MCMC. Class could not be found. Did you mean beast.core.MCMC?
Error detected about here:
<beast>
<run id='mcmc' spec='snap.MCMC'>
java.lang.Exception: Class could not be found. Did you mean beast.core.MCMC.

Maybe, I am doing something wrong in the add-on implementation. I just installed BEAST2, opened BEAUty and uninstalled and re-installed the SNAPP add-on and imported the nexus with the snps.
I am using the windows version. If you need other details just let me know.

Thanks a lot for the help
Emiliano

On Wednesday, October 3, 2012 9:26:53 PM UTC+2, remco wrote:

Lorna-Masters student

unread,
Mar 13, 2014, 5:17:38 AM3/13/14
to beast...@googlegroups.com, jostei...@gmail.com

Hi All,
I am a new user of BEAST and SNAPP altogether,So my question is how do i save my binary sequences in nexus format for  Beauti to conver them to XML file to be used in SNAPP??
Kindly help

Regards,
Lorna

On Tuesday, June 12, 2012 2:58:03 PM UTC+3, Jostein Gohli wrote:
Hi all,

I've got a huge load of SNPs (~28K, 190 individuals) from a RAD tag run that I want to analyse in SNAPP (http://snapp.otago.ac.nz/).

I want to do model selection and conversion to XML format in Beauti.

Currently, my data is in the following format:

SNP       1    2    3    4    5   6
ind001   4/0 4/4 3/3 0/0 0/0 4/0
ind002   0/0 4/4 3/0 0/0 0/0 0/0
ind004   4/4 4/4 3/3 3/0 2/0 4/4
ind005   0/0 4/4 1/3 3/0 2/0 4/0
ind006   4/4 4/4 3/3 0/0 0/0 4/0
ind007   0/0 4/4 3/3 0/0 0/0 4/4
ind008   4/4 4/4 3/0 0/0 0/0 4/4
ind009   4/0 4/4 3/3 0/0 0/0 0/0

I need to get this data into a nexus file in order to get it into Beauti.

Lorna Jemosop

unread,
Mar 13, 2014, 5:34:19 AM3/13/14
to Jostein Gohli, beast...@googlegroups.com
Thank you for your apt reply but where is do you input all these commands for the conversion to take place?At the moment my binary sequences are in a spreadsheet.Am trying to look for a demo but not succesful so am banking on this useful discussion


Regards,
Lorna


On Thu, Mar 13, 2014 at 12:20 PM, Jostein Gohli <jostei...@gmail.com> wrote:
The numbers 0-2 are the occurrence of the alternate base at a diploid allele: 

#NEXUS

Begin data;
 Dimensions ntax=21 nchar=13;
Format datatype=standard symbols="012" gap=- missing=-;
Matrix

C019_Mo 0 - 0 2 2 2 2 1 2 2 0 2 1
C022_Mo 0 2 1 2 2 2 2 2 1 2 0 2 1
C024_Mo 0 2 0 2 2 2 - 2 1 2 1 2 1
C064_EH 0 0 2 2 2 2 0 0 0 - - 2 0
C075_EH 0 0 2 2 2 2 - 0 0 0 - 2 0
C077_EH 0 0 2 2 2 2 0 0 0 0 2 2 0
C090_FL 0 2 1 2 2 2 2 2 - 2 0 2 0
C094_FL 1 2 - 2 - - - - 1 2 0 - 0
C096_FL 2 2 0 2 2 2 2 2 1 2 0 2 0
C117_GC 0 2 2 0 0 - 0 1 0 1 - - 0
C120_GC 0 2 - 0 0 0 0 0 0 1 0 2 0
C123_GC 0 2 2 0 0 0 0 0 0 1 - 2 0
C137_LG 0 0 - 2 - 2 - 1 - - - - 0
C139_LG 0 1 - 2 1 2 0 2 - - 0 - 0
C140_LG 0 0 - 2 - - - - - - - - 1
C160_LP 0 2 0 2 2 - - 2 2 2 - - 2
C164_LP 0 2 - 2 2 2 2 2 2 2 2 0 2
C165_LP 0 2 - 2 2 2 2 2 - 2 - - 2
C182_Te 0 - 2 2 2 1 0 0 0 0 0 2 2
C188_Te 0 2 2 2 2 1 0 1 1 0 0 2 2
C190_Te 0 2 - 1 2 1 - - 2 1 0 2 2
END;
Reply all
Reply to author
Forward
0 new messages