RAD and GBS as input for migrate

robertkraus

unread,

Mar 20, 2015, 4:57:31 AM3/20/15

to migrate...@googlegroups.com

Hi all!

I might get into a project with RAD and GBS that shall include migrate-n analysis. I have chedk this group but found no discussions about GBS. Using RAD as a search term I found this post from about two years ago https://groups.google.com/forum/#!searchin/migrate-support/rad/migrate-support/q4kTBQFRg1k/O9PCAcmbBDYJ . There, it seems that the SNPs themselves were used. Are there examples when the whole sequence from a RAD or GBS experiment was used with a sequence evolution model? I also searched all citation to Beerli & Felsenstein 1999, 2001 - the two papers I consider are usually cited when using migrate-n. Almost nothing on first glance... Does any of you have examples where RAD or GBS was used in migrate-n? I'd prefer examples where the sequence was used instead of the SNP.

I can share an interesting paper that cropped up in my search: Hird SM (2012) lociNGS: A Lightweight Alternative for Assessing Suitability of Next-Generation Loci for Evolutionary Analysis. PLoS ONE 7(10): e46847. doi:10.1371/journal.pone.0046847

Has anyone tried this?

Cheers,
robert

Peter Beerli

unread,

Mar 20, 2015, 6:43:37 AM3/20/15

to migrate...@googlegroups.com

Robert,

you will see papers with migrate-n and RAD sequences, eventually. I had several inquiries about this and usually get asked about snps first, but I told them and tell all of you “snps are evil” for these type of analyses because you will need to figure out how the variable-only dataset relates to the whole genome (e.g. population size estimates need ascertainment corrections) whereas short reads will not need such correction because the ratio of variable site to ‘invariant’ sites is natural for the species or population.

I did not know about GBS, but after reading http://cbsu.tc.cornell.edu/lab/doc/GBS_Method_Overview1.pdf I believe that this is a great method. I had no inquiry yet of someone using GBS as far as I remember.

I have not read yet about lociNGS, thanks for the link.

Peter

PS. I suggest to cite Beerli 2006 instead of the Beerli and Felsenstein 1999, 2001 if you use migrate and Bayesian inference, of course you can cite them all :-)

--
You received this message because you are subscribed to the Google Groups "migrate-support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to migrate-suppo...@googlegroups.com.
To post to this group, send email to migrate...@googlegroups.com.
Visit this group at http://groups.google.com/group/migrate-support.
For more options, visit https://groups.google.com/d/optout.

Robert Kraus

unread,

Mar 20, 2015, 8:14:07 AM3/20/15

to migrate...@googlegroups.com

Thanks, Peter, for the quick response! And yes, I dit cite all three papers that you suggest, normally ;-).

Maybe someone on this list has examples for me anyway? Either RAD or GBS. For GBS I post the "officila" reference in addition to Peter's link:

Elshire RJ, Glaubitz JC, Sun Q, Poland JA, Kawamoto K, Buckler ES, Mitchell SE (2011) A robust, simple genotyping-by-sequencing (GBS) approach for high diversity species. PLoS ONE 6:e19379 doi:10.1371/journal.pone.0019379

More examples??

Cheers,

robert

Gesendet: Freitag, 20. März 2015 um 11:43 Uhr
Von: "Peter Beerli" <beerli...@gmail.com>
An: migrate...@googlegroups.com
Betreff: Re: [migrate-support] RAD and GBS as input for migrate

lpl...@umces.edu

unread,

Mar 31, 2015, 11:43:52 AM3/31/15

to migrate...@googlegroups.com

Hi Robert and Peter,

I am currently trying to use a GBS SNP dataset in Migrate and/or IMa2. Perhaps I should have disclosed this with my other posts! Anyway, I have had a tough time just getting my SNP data into migrate, though that appears to be solved now for the most part (issues with the interpretation of SNP data in migrate still exist of course as Peter highlighted above).

You can also output allelic data for each individual, from each RADtag, using the STACKS pipeline and PyRAD too I think ( though I have little experience with PyRAD). There are a couple of papers that I can think of that have used the short sequence data from RADseq for coalescent analysis in Ima2 (Loren Reisberg's group has a paper in Molecular ecology a year or two back...i can track it down if youd like). Haven't seen any papers with RAD data that use migrate, but I haven't really been looking.

Anyway, it is definitely possible to get the allelic sequence data output from Stacks (my pipeline of choice at the moment) into a FASTA format but it is not straightforward to convert this to migrate or IMa2 format. Of course it can be done by hand, but for 100 or more loci, thats is not an option. I have been working on some bash to get it there...

So far, i have yet to run any Migrate analysis with sequences from my GBS data set, but I am working on that now.

LP

To unsubscribe from this group and stop receiving emails from it, send an email to migrate-support+unsubscribe@googlegroups.com.
To post to this group, send email to migrate-support@googlegroups.com.

Visit this group at http://groups.google.com/group/migrate-support.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "migrate-support" group.

To unsubscribe from this group and stop receiving emails from it, send an email to migrate-support+unsubscribe@googlegroups.com.
To post to this group, send email to migrate-support@googlegroups.com.

Peter Beerli

unread,

Mar 31, 2015, 11:55:43 AM3/31/15

to migrate...@googlegroups.com

Dear LP,

in fact there was a user who had a STACKS pipeline to generate migrate input (but I cannot find his email address anymore),

it definitely would be better to have the short reads instead of the snps. I have a FASTA converter to Migrate for a specific project that may be adaptable to generate migrate-input, I guess if you have a fasta file for each locus and and a population file that maps the individuals to populations this could work.

Concerning your other post, I suggest to use the Bayesian approach because the likelihood methods will be phased out, if you want to stick to that you will need to turn off the plots, and the skyline plots will not really work well (if at all) with the likelihood method.

Peter

To unsubscribe from this group and stop receiving emails from it, send an email to migrate-suppo...@googlegroups.com.
To post to this group, send email to migrate...@googlegroups.com.

lpl...@umces.edu

unread,

Mar 31, 2015, 1:21:01 PM3/31/15

to migrate...@googlegroups.com

Hi Peter,

Thanks for your reply. A FASTA converter to migrate would be fantastic! What form would the file need to be in?

Here is one example (two loci, two individuals) of how I can arrange the FASTA. Note that each fasta entry is for one individual and one allele, so two lines per individual per locus :

>12359 [Bj142]
TGCAGCCACGCTCTCGGTGGCCGGGGCGACACGTCCCTCTGGCACAGCATGCGGTTGGCGCCGGCCGGTGACCAGCCTGCAAGA
>12359 [Bj142]
TGCAGCCACGCTCTCGGTGGCCGGGGCGACACGTCCCTCTGGCACAGCATGCGGTTGGCGCCGGCCGGTGACCAGCCTGCAAGA
>12359 [Bj146]
TGCAGCCACGCTCTCGGTGGCCGGGGCGACACGTCCCTCTGGCACAGCATGCGGTTGGCGCCGGCCGGTGACCAGCCTGCAAGA
>63788 [Bj142]
TGCAGCGGGAGCAGCCGTCTTCGAGACGTTAGCTGTAGCAGAGACGGTGGCAGATGCAGGCTTGGCTGCAAGATCGGAAGAGCG
>63788 [Bj142]
TGCAGCGGGAGCAGCCGTCTTCGAGACGTTAGCTGTAGCAGAGACGGTGGCAGATGCAGGCTTGGCTGCAAGATCGGAAGAGCG
>63788 [Bj146]
TGCAGCGGGAGCAGCCGTCTTCGAGACGTTAGCTGTAGCAGAGACGGTGGCAGATGCAGGCTTGGCTGCAAGATCGGAAGAGCG
>63788 [Bj146]
TGCAGCGGGAGCAGCCGTCTTCGAGACGTTAGCTGTAGCAGAGACGGTGGCAGATGCAGGCTTGGCTGCAAGATCGGAAGAGCG

Would your parser be able to work with this, or if not, what format should the FASTA be in (in terms of locus name, ID)? Would you be willing to share it?

LP

Robert Kraus

unread,

Apr 1, 2015, 3:40:12 AM4/1/15

to migrate...@googlegroups.com

Dear LP,

this is nice to know! I am looking forward to geting in touch once you've managed to load GBS data directly into migrate-n. I don't have the actual right now so have not looked into it with high pressure. However, in initial searches I found lociNGS which you might find interesting: http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0046847#pone-0046847-g005 . It comes with options to format data for migrat-n and IM. As I said, I have not looked into details yet...

Cheers,

robert

Gesendet: Dienstag, 31. März 2015 um 16:47 Uhr
Von: lpl...@umces.edu
An: migrate...@googlegroups.com
Betreff: Re: Re: [migrate-support] RAD and GBS as input for migrate

To unsubscribe from this group and stop receiving emails from it, send an email to migrate-suppo...@googlegroups.com.
To post to this group, send email to migrate...@googlegroups.com.

Visit this group at http://groups.google.com/group/migrate-support.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "migrate-support" group.

To unsubscribe from this group and stop receiving emails from it, send an email to migrate-suppo...@googlegroups.com.
To post to this group, send email to migrate...@googlegroups.com.

Visit this group at http://groups.google.com/group/migrate-support.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "migrate-support" group.

To unsubscribe from this group and stop receiving emails from it, send an email to migrate-suppo...@googlegroups.com.
To post to this group, send email to migrate...@googlegroups.com.

Paul Maier

unread,

Apr 15, 2016, 5:40:14 PM4/15/16

to migrate-support

Hi Peter (and everyone),

I believe I am that student whose email you couldn't find ;) I am happy to share the python script if anyone still needs it (fasta2genotype.py). It converts STACKS output files into migrate-n format, among others.

I stumbled upon this thread looking for any wisdom on using ddRADseq sequences to estimate introgression rates and divergence times, for closely related lineages. Has anyone tried to do this? Basically I have 4 lineages, and evidence of introgression between the most divergent lineage and another one. Migrate-n doesn't take the topology into account, and I'm worried LAMARC or Ima2p would take just shy of forever to estimate my model with 1000s of sequences.

Any thoughts or wisdom?

Peter Beerli

unread,

Apr 15, 2016, 6:29:21 PM4/15/16

to migrate...@googlegroups.com

thanks Paul,

in the meantime I wrote my own (https://pbe...@bitbucket.org/pbeerli/scripts.git), but please post yours because I am sure mine can be improved.

Migrate 4 may be able to do what you want once I get it of the door (you can check it out in the newversion section on the migrate website. After having run many simulations I am rather unconvinced that good joint estimates can be done of introgression and divergence in general, although there are claims that some programs (one with a complicated abbreviation [French authors?] I cannot remember, and dical2 [Steinruecken, Kamm ,and Song]). Given that you have only 4 lineages you may want to look into those, although I do not yet understand these method in detail.

Peter

--
You received this message because you are subscribed to the Google Groups "migrate-support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to migrate-suppo...@googlegroups.com.
To post to this group, send email to migrate...@googlegroups.com.

Visit this group at https://groups.google.com/group/migrate-support.

cong liu

unread,

Sep 13, 2016, 9:51:34 AM9/13/16

to migrate-support

Hi Paul,

I was wondering if you could share your python script that converts stacks .fas file into mirgrate-n to me. Thank you very much !

Best,

Cong.

cong liu

unread,

Sep 13, 2016, 9:51:34 AM9/13/16

to migrate-support

Hi Peter,

I was trying to use your stacks2mig python script to convert the fasta file generated from stacks into migrate input file, but it did not work. I got the following error:

"""

['KR', 'VLM', 'VLE', 'VLW', 'TA']

Traceback (most recent call last):

File "/home/c/cong-liu/work/scripts/stacks2mig.py", line 183, in <module>

output = to_migrate(populations,locations,infilename)

File "/home/c/cong-liu/work/scripts/stacks2mig.py", line 116, in to_migrate

sites.append(str(len(sit[0][1])))

IndexError: list index out of range

"""

Do you have any idea how to fix this? Thank you very much for your time!

Best,

Cong

On Saturday, April 16, 2016 at 7:29:21 AM UTC+9, Peter wrote:

thanks Paul,
in the meantime I wrote my own (https://pbeerli@bitbucket.org/pbeerli/scripts.git), but please post yours because I am sure mine can be improved.

Peter Beerli

unread,

Sep 13, 2016, 9:58:29 AM9/13/16

to migrate...@googlegroups.com

without any data I cannot debug this for you, but you could insert a print statement just before the command and

see what is in ‘sit’

print sit

or

print sit[0]

it seems it does not find the sit[0][1] for one of your locations, could it be that your naming scheme interferes with the script?

Peter

Peter Beerli

unread,

Sep 13, 2016, 10:07:51 AM9/13/16

to migrate...@googlegroups.com

Cong,

here is Paul’s script (untested, look at the top of the file how to run it)

Peter

fasta2genotype.py

cong liu

unread,

Sep 13, 2016, 8:52:22 PM9/13/16

to migrate-support

Hi Peter,

Thanks a lot for the suggestions and Paul's script. Here is I got from printing sit[0]:

['KR', 'VLM', 'VLE', 'VLW', 'TA']

(['118630', 'KR', 'EGP0062G07]', '0'], 'ATACCGGAACTCGACAATCTATTTAAACAGCTTCAGCTGGA')

(['148157', 'KR', 'EGP0062G07]', '0'], 'CTGATAAAAAGAAATTTGTTAAAATCTATGTGATTTTCAAT')

Traceback (most recent call last):

File "/home/c/cong-liu/work/scripts/stacks2mig.py", line 184, in <module>

output = to_migrate(populations,locations,infilename)

File "/home/c/cong-liu/work/scripts/stacks2mig.py", line 116, in to_migrate

print sit[0]

IndexError: list index out of range

I am still not sure what happened. My data include five locations :KR, VLM, VLE, VLW, and TA. So the fas file looks like this:

>CLocus_148155_Sample_18_Locus_110980_Allele_2 [VLM_EGP0062G11]

GCGGATACAGGAACGCATAAATCTGCAGGTGTGGAATGGGG

>CLocus_148155_Sample_8_Locus_60808_Allele_0 [VLW_EGP0062F09]

GCGGATACAGAAACGCATAAATCTGCAGGTGTGGAATAATG

>CLocus_148155_Sample_5_Locus_6490_Allele_0 [VLW_EGP0062F10]

GCGGATACAGAAACGCATAAATCTGCAGGTGTGGAATAATG

>CLocus_148155_Sample_6_Locus_26668_Allele_0 [VLW_EGP0062H03]

GCGGATACAGAAACGCATAAATCTGCAGGTGTGGAATAATG

>CLocus_148155_Sample_7_Locus_52288_Allele_0 [VLW_EGP0062H04]

GCGGATACAGAAACGCATAAATCTGCAGGTGTGGAATAATG

>CLocus_148157_Sample_14_Locus_10254_Allele_0 [KR_EGP0062G07]

CTGATAAAAAGAAATTTGTTAAAATCTATGTGATTTTCAAT

>CLocus_148157_Sample_21_Locus_90019_Allele_0 [KR_EGP0062G08]

CTGATAAAAAGAAATTTGTTAAAATCTATGTGATTTTCAAT

>CLocus_148157_Sample_13_Locus_173164_Allele_0 [KR_EGP0062G12]

CTGATAAAAAGAAATTTGTTAAAATCTATGTGATTTTCAAT

>CLocus_148157_Sample_10_Locus_186448_Allele_0 [TA_EGP0062G04]

CTGATAAAAAGAAATTTGTTAAAATCTATGTGATTTTCAAT

Actually, the script worked if I change all the locality into one (one population)

Can I send you my data for debugging?

Best,

Cong.

Peter Beerli

unread,

Sep 14, 2016, 9:19:52 AM9/14/16

to migrate...@googlegroups.com

Dear Cong,

your dataset contains large numbers of missing loci, these will fail, you change your pipeline so that you have no missing loci (well one may be able to adjust for some of it but I have no time to do that right now)

Message has been deleted

cong liu

unread,

Sep 14, 2016, 8:29:03 PM9/14/16

to migrate-support

Hi Peter,

Thanks a lot ! I have noticed the missing loci problems of my dataset. However, I only get about 900 loci if I change the pipeline that does not allow missing loci (common problem of RAD data, I guess). Do you think 900 loci is enough ? Or should I just use the SNP data instead ? Thanks again!

Best,

Cong.

Felipe Torquato

unread,

Oct 21, 2016, 7:26:56 AM10/21/16

to migrate-support

How about use the software PGDSPIDER to convert Stacks output files into MIGRATE input?

zeamne T

unread,

Oct 24, 2016, 9:18:19 AM10/24/16

to migrate...@googlegroups.com

Hi Felipe! Actually I think Peter's script stacks2mig.py seems to work pretty well, except I needed to do some minor search and replace of headers in my fasta file from STACKS to make it in the format for the script. :)

--

You received this message because you are subscribed to the Google Groups "migrate-support" group.

To unsubscribe from this group and stop receiving emails from it, send an email to migrate-support+unsubscribe@googlegroups.com.
To post to this group, send email to migrate-support@googlegroups.com.

Felipe Torquato

unread,

Nov 3, 2016, 4:58:57 AM11/3/16

to migrate-support

I am trying to use the PGDSpider but my output file is empty.

I will try the script today...

Many thanks for the feedback!

Em segunda-feira, 24 de outubro de 2016 15:18:19 UTC+2, YC Tay escreveu:

Hi Felipe! Actually I think Peter's script stacks2mig.py seems to work pretty well, except I needed to do some minor search and replace of headers in my fasta file from STACKS to make it in the format for the script. :)

On 21 October 2016 at 17:44, Felipe Torquato <torqua...@gmail.com> wrote:

How about use the software PGDSPIDER to convert Stacks output files into MIGRATE input?

Em sexta-feira, 20 de março de 2015 09:57:31 UTC+1, robertkraus escreveu:
Hi all!

I might get into a project with RAD and GBS that shall include migrate-n analysis. I have chedk this group but found no discussions about GBS. Using RAD as a search term I found this post from about two years ago https://groups.google.com/forum/#!searchin/migrate-support/rad/migrate-support/q4kTBQFRg1k/O9PCAcmbBDYJ . There, it seems that the SNPs themselves were used. Are there examples when the whole sequence from a RAD or GBS experiment was used with a sequence evolution model? I also searched all citation to Beerli & Felsenstein 1999, 2001 - the two papers I consider are usually cited when using migrate-n. Almost nothing on first glance... Does any of you have examples where RAD or GBS was used in migrate-n? I'd prefer examples where the sequence was used instead of the SNP.

I can share an interesting paper that cropped up in my search: Hird SM (2012) lociNGS: A Lightweight Alternative for Assessing Suitability of Next-Generation Loci for Evolutionary Analysis. PLoS ONE 7(10): e46847. doi:10.1371/journal.pone.0046847

Has anyone tried this?

Cheers,
robert

--
You received this message because you are subscribed to the Google Groups "migrate-support" group.

To unsubscribe from this group and stop receiving emails from it, send an email to migrate-suppo...@googlegroups.com.
To post to this group, send email to migrate...@googlegroups.com.

Felipe Torquato

unread,

Nov 4, 2016, 9:24:41 AM11/4/16

to migrate-support

Hi Tay,

How long is it take?

It is running on the server for more then one day.

Is it ok?

Em segunda-feira, 24 de outubro de 2016 15:18:19 UTC+2, YC Tay escreveu:

Hi Felipe! Actually I think Peter's script stacks2mig.py seems to work pretty well, except I needed to do some minor search and replace of headers in my fasta file from STACKS to make it in the format for the script. :)

On 21 October 2016 at 17:44, Felipe Torquato <torqua...@gmail.com> wrote:

How about use the software PGDSPIDER to convert Stacks output files into MIGRATE input?

Em sexta-feira, 20 de março de 2015 09:57:31 UTC+1, robertkraus escreveu:
Hi all!

I might get into a project with RAD and GBS that shall include migrate-n analysis. I have chedk this group but found no discussions about GBS. Using RAD as a search term I found this post from about two years ago https://groups.google.com/forum/#!searchin/migrate-support/rad/migrate-support/q4kTBQFRg1k/O9PCAcmbBDYJ . There, it seems that the SNPs themselves were used. Are there examples when the whole sequence from a RAD or GBS experiment was used with a sequence evolution model? I also searched all citation to Beerli & Felsenstein 1999, 2001 - the two papers I consider are usually cited when using migrate-n. Almost nothing on first glance... Does any of you have examples where RAD or GBS was used in migrate-n? I'd prefer examples where the sequence was used instead of the SNP.

I can share an interesting paper that cropped up in my search: Hird SM (2012) lociNGS: A Lightweight Alternative for Assessing Suitability of Next-Generation Loci for Evolutionary Analysis. PLoS ONE 7(10): e46847. doi:10.1371/journal.pone.0046847

Has anyone tried this?

Cheers,
robert

--
You received this message because you are subscribed to the Google Groups "migrate-support" group.

To unsubscribe from this group and stop receiving emails from it, send an email to migrate-suppo...@googlegroups.com.
To post to this group, send email to migrate...@googlegroups.com.

zeamne T

unread,

Nov 7, 2016, 8:35:27 PM11/7/16

to migrate...@googlegroups.com

Hi Felipe,

Time depends on the data set, parameter settings, and the models that you want to test. If you look at the log file it usually tells you the estimated time till the end of the run. I think it is not surprising for it to run for a week even!

To unsubscribe from this group and stop receiving emails from it, send an email to migrate-support+unsubscribe@googlegroups.com.
To post to this group, send email to migrate-support@googlegroups.com.

Peter Beerli

unread,

Nov 7, 2016, 9:31:09 PM11/7/16

to migrate...@googlegroups.com

Just incase you want to try a different translator from stacks to migrate, Paul Maier has sent me his scrip (more options than mine and testfiles in a zip file (attached). I have not tried them yet.

Peter

P.S. running migrate on large number lof loci will take a while (you want to use a computer cluster and run migrate in parallel)

fasta2genotype.zip

Felipe Torquato

unread,

Dec 5, 2016, 7:49:52 AM12/5/16

to migrate-support

Hi Cong,

did you solve this problem?

I got the same error message.

If so, what did you need to do?

cheers,

Felipe

Peter Beerli

unread,

Dec 5, 2016, 9:13:30 AM12/5/16

to migrate...@googlegroups.com

Felipe,

check whether you have populations without any data, that may lead to a problem like this, but then I have not used my script on any other dataset than that stickleback data.

Peter

Message has been deleted

Sara Villa

unread,

Mar 31, 2025, 7:55:17 AM3/31/25

to migrate-support

Hi Peter and all,
I am trying to prepare my Stacks output for Migrate. I am working with 2bRAD data (109 individuals, 14 populations), and I have tried Peter's script for file conversion.
My populations command has these settings:
-r 0.80
--write-single-snp
-p 14
--fasta-samples

There is something I cannot understand in the way Peter's script processes the output of populations. When I run Peter's script on my dataset, the output is structured this way:

line 1: length of each locus extracted from the dataset (dealing with 2brad data, this is a list of numbers comprised between 37 and 42 bp, consistenlty with the rescriction enzymes used)
line 2: number of the sequences extracted for each locus (this is a list of '14' and '16')

from line 3, the structure is:
# locus number
1]:0 GTAACTTAACTTTCAAACAATGTGGGGTACATCTCACG # sequence of the allele 0 for sample 1
1]:1 GTAACTTAACTTTCAAACAATGTGGGGTACATCTCACG # sequence of the allele 1 for sample 1
2]:0 GTAACTTAACTTTCAAACAATGTGGGGTACATCTCACG # sequence of the allele 0 for sample 2
2]:1 GTAACTTAACTTTCAAACAATGTGGGGTACATCTCACG # sequence of the allele 1 for sample 2
..
14]:1 # sequence of the allele 1 for sample 14

If I understand correctly, this means that the command always considers only sequences from 7 or 8 individuals, with two alleles from each individual.
This is not consistent with the output of populations (populations.hapstats.tsv), which, as far as I can see, is correct.
It has a field 'Pop ID', where the ID corresponds to the ID given in the popmap, and a field 'N' i.e. the number of sequences actually extracted for each population, which is equal to 2xn, where n is the number of individuals in that population from which it extracts the locus).
In this file, populations are always 14, thus it seems that the populations command works well in loci extraction, according to the setting -p 14.
For the first locus, sequences are extracted from each individual, and numbers in the field 'N' are exactly twice the number of individuals in each population (a total of 109 individuals, 218 sequences).
For the locus 11547, we have this situation:
# Locus ID Chr BP Pop ID N
11547 un 182787 1 18 # one individual is missing
11547 un 182787 2 16
11547 un 182787 3 12
11547 un 182787 4 18
11547 un 182787 5 2
11547 un 182787 6 16
11547 un 182787 7 16
11547 un 182787 8 20
11547 un 182787 9 22
11547 un 182787 10 20
11547 un 182787 11 16
11547 un 182787 12 18
11547 un 182787 13 10
11547 un 182787 14 12

for a total of 108 individuals (216 sequences) from all the 14 populations. This is also consistent with fasta file produced by populatons command.

The output of Peter's script for this locus is:
# 11547
1]:0 GTAACTTAACTTTCAAACAATGTGGGGTACATCTCACG
1]:1 GTAACTTAACTTTCAAACAATGTGGGGTACATCTCACG
2]:0 GTAACTTAACTTTCAAACAATGTGGGGTACATCTCACG
2]:1 GTAACTTAACTTTCAAACAATGTGGGGTACATCTCACG
3]:0 GTAACTTAACTTTCAAACAATGTGGGGTACATCTCACG
3]:1 GTAACTTAACTTTCAAACAATGTGGGGTACATCTCACG
4]:0 GTAACTTAACTTTCAAACAATGTGGGGTACATCTCACG
4]:1 GTAACTTAACTTTCAAACAATGTGGGGTACATCTCACG
7]:0 GTAACTTAACTTTCAAACAATGTGGGGTACATCTCACG
7]:1 GTAACTTAACTTTCAAACAATGTGGGGTACATCTCACG
9]:0 GTAACTTAACTTTCAAACAATGTGGGGTACATCTCACG
9]:1 GTAACTTAACTTTCAAACAATGTGGGGTACATCTCACG
10]:0 GTAACTTAACTTTCAAACAATGTGGGGTACATCTCACG
10]:1 GTAACTTAACTTTCAAACAATGTGGGGTACATCTCACG
12]:0 GTAACTTAACTTTCAAACAATGTGGGGTACATCTCACG
12]:1 GTAACTTAACTTTCAAACAATGTGGGGTACATCTCACG

This means that Peter's script mantains only samples 1,2,3,4,7,9,10,12, with 1-10 belonging to the first population, and 12 belongs to the second population.

I would be really grateful to anyone who could help me solve this problem, does anyone have any idea where the problem is? Unfortunately, I can't share my files because are too big, but I hope the message is clear enough anyway.

I would already like to thank anyone who is willing to help me.

Sara

thanks Paul,
in the meantime I wrote my own (https://pbe...@bitbucket.org/pbeerli/scripts.git), but please post yours because I am sure mine can be improved.

Peter Beerli

unread,

Mar 31, 2025, 7:59:59 AM3/31/25

to migrate...@googlegroups.com

Sara,

I have not looked in detail yet, but it is almost certain that the translation script is faulty because the individual names look weird, for example, "7]:1 “ should probably be more like "7:1 “

I will see what I can find,

Peter

To view this discussion visit https://groups.google.com/d/msgid/migrate-support/9412c612-8265-4a00-ac2f-e41fffdb69adn%40googlegroups.com.

Sara Villa

unread,

Mar 31, 2025, 9:12:32 AM3/31/25

to migrate-support

Thank you very much for your helpfulness and very quick response!
I will try to fix this detail in the meantime
Sara

Sara Villa

unread,

Mar 31, 2025, 10:49:11 AM3/31/25

to migrate-support

I also noticed that SNPs are not correctly mantained in the output: in the same locus 11546, sample 12, allele 1 there is a SNP that does not result in line 12]:1

this is the locus extracted from the fasta file, with the SNP in red:

>CLocus_11547_Sample_12_Locus_11547_Allele_0 [CrRG_12]
GTAACTTAACTTTCAAACAATGTGGGGTACATCTCACG
>CLocus_11547_Sample_12_Locus_11547_Allele_1 [CrRG_12]
GTAACTTAACTTTCAAACAATGTGGGGTCCATCTCACG

and this is from the script output:

12]:0 GTAACTTAACTTTCAAACAATGTGGGGTACATCTCACG
12]:1 GTAACTTAACTTTCAAACAATGTGGGGTACATCTCACG

Thus there is something wrong also in how the script reads the sequences..

Sara

Sara Villa

unread,

Mar 31, 2025, 10:50:46 AM3/31/25

to migrate-support

sorry, locus 11547

Reply all

Reply to author

Forward