Infinite Alleles Model for Microsatellites Data / Software Installation

Álvar Veiga Schmitz

unread,

Mar 29, 2019, 9:04:22 AM3/29/19

to migrate-support

Hello! I am a potential new user of Migrate. I'm working with microsatellites, but my data doesn't fit strictly the stepwise mutation model. I'm reading the manual (version 4.0) but I can't figure out if I can use my data with an Infinite Alleles Model. My guess is that I can't, since the input uses repeat numbers instead allele length (and my data contain several alleles lengths that are not multiples of the repeat number). However, in the section Data Type of the manual (page 28), one of the data type options reads: "Allele: innite allele model, suitable for electrophoretic markers, perhaps the "best" guess for codominant markers of which we do not know the mutation model". So i am a bit confused, can I use or not my data with an Infinite Alleles Model? If so, how should be the input, as STRs data input use repeat numbers, that I can't obtain with my data?

On the other hand, I can't find the 'installation' section in the manual, neither in the program folder (I download version 4.4.0), neither googling, so I have no idea how to install it.

Many thanks in advance, any help will be appreciated!

Álvar.

Peter Beerli

unread,

Mar 29, 2019, 10:07:05 AM3/29/19

to migrate...@googlegroups.com

Dear Alvar,

concerning installation: there is a README file in the distribution, currently there is now windows binary, but I am working on that.

Do you think any mutation model is accurate? Do you believe that your msat allele with 100 repeats will jump as likely to 101,90, 1, 2000? If you answer yes to the last question then use the infinite allele model otherwise use the Brownian motion model for msats. [if you use msats you must also change the priors, the default priors are useful for DNA data]

If you have fragment length data and know the repeat unit then specify as the second line in your data file

#@M l1 l2 ….

if you have 2 loci that are dinucleotides and 2 that are tetranucleotides then you use

#@M 2 2 4 4

The infinite allele model can be used with the same input format as the fragment length data, so for example:

2 3 / example data

2 Tallahassee

i1 124/122 10/11 33/30

i2 122/122 11/11 29/30

2 Quincy

i3 122/122 11/11 30/30

i4 122/122 11/11 29/30

with the fragment lengths for brownian motion

2 3 / example data

#@M 2 2 3

2 Tallahassee

i1 124/122 9/11 33/30

i2 122/122 11/11 29/30

2 Quincy

i3 122/122 11/11 30/30

i4 122/122 11/11 27/30

Peter

--
You received this message because you are subscribed to the Google Groups "migrate-support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to migrate-suppo...@googlegroups.com.
To post to this group, send email to migrate...@googlegroups.com.
Visit this group at https://groups.google.com/group/migrate-support.
For more options, visit https://groups.google.com/d/optout.

Álvar Veiga Schmitz

unread,

Mar 29, 2019, 11:18:46 AM3/29/19

to migrate-support

Hello Peter, thank you for your fast response. I missed the README file, I will check it!

Regarding the fragment length and specifying the repeat unit in the input, wouldn't that make the program crashes if the fragment length isn't a multiplier of the repeat unit? As I understand, that is why my data wouldn't fit a strict stepwise mutation model (as specified, for example, in the IMa2 software documentation, which also uses repeat numbers in its input for microsatellite data). I´m giving you a small example of some individuals of my data, for 4 loci, all of them dinucleotides, highlighting the cases that concerns me:

i1       352/363   190/241   147/147   125/125
i2       355/374   190/226   151/161   125/133
i3       344/355   190/202   147/147   125/133
i4 352/360   190/210   151/161   120/133
i5 353/353   190/216   151/157   116/125

Being all the highlighted lengths uneven and all 4 loci dinucleotide, the repeat number it is not an integer number, is that ok with your software?

Also, I have data for 2 more loci which have compound repeats (published sequences are (CA)2GA(CA)15 and (TAGG)7TATG(TAGA)13). Does this loci fit the software assumptions? Can I use it without specifying the repeat unit?

Thank you very much for your help!

Peter

To unsubscribe from this group and stop receiving emails from it, send an email to migrate...@googlegroups.com.

Peter Beerli

unread,

Mar 29, 2019, 12:05:32 PM3/29/19

to migrate...@googlegroups.com

Dear Alvar,

If you use migrate’s fragment length —> repeat number facility then it will assume that deviations of the most common regular patterns will be scoring issues and currently it will fix this probabilistically a dinuc that is mostly 100, 102, 104 but has also 103 will resolve this to something like 10,11, 12 repeats and the 103 will randomly assigned to either 11 or 12. with a tetranuc this becomes slightly more involved, for example 100 104 108 103 96 would resolve to 10 11 12 11* 9, wherer the 103 in different runs may become 11 with p=0.75, and 10 with p=0.25, if you do not like that ‘rounding’ scheme then either translate using your best guess or use PGDspyder which will translate using their scheme.

For compound msats I simply would also use the brownian model, I would expect that you see fewer mutations with that which will lead to lower population sizes, I then would use that as a caveat in discussing the results.

Peter

To unsubscribe from this group and stop receiving emails from it, send an email to migrate-suppo...@googlegroups.com.

Álvar Veiga Schmitz

unread,

Mar 29, 2019, 12:37:10 PM3/29/19

to migrate-support

Thank you very much for your help Peter, you have been very aclaratory! I think I will let the program decide the roundings for me jajaj. I will come back if I find any problem. Thanks again!

Alvar.

Reply all

Reply to author

Forward