FASTSTRUCTURE failing to find Admixture

871 views
Skip to first unread message

tessco...@gmail.com

unread,
Jul 9, 2018, 8:01:00 PM7/9/18
to structure-software
Hi everyone,

I am having problems with the output I am getting using STRUCTURE and FASTSTRUCTURE. I have used STRUCTURE before (via an online supercomputer portal, that no longer exists), and am keen to use FASTSTRUCTURE on my mac.

I have multiple datasets, but have just been trying to doing test analyses using two of these:. 21 individuals (5000 SNPs) that should be K=1 (or 2), and 2. ~100 individuals (6000 SNPs), that should be no less than K=3. These K values are based only on the biology, PCA plots and initial output from STRUCTURE/FASTSTRUCTURE.

I have also read on other google groups and on the web that other people have had similar problems (at least with FASTSTRUCTURE), but none of the conversations I can see lead to any resolution on the main issue.

As far as I know, I have correctly installed FASTSTRUCTURE, STRUCTURE, and STRUCTURE_THREADER on my computer (Mac), and have been using the command line codes to run all of these programs.

The problem is, even for the dataset where K=3 (inferred by ChooseK (FASTSTRUCTURE) AND the Evanno method (STRUCTURE) the output plot shows no admixture - it's just one solid block of colour (exactly the same as what I see when I open the plot of K=1). I have also tried running a Logistic prior on FASTSTRUCTURE (via just FASTSTRUCTURE AND STRUCTURETHREADER), and I had the same result for the dataset that I expect K=1 (or 2), but for the second dataset (K=3) it was cripplingly slow, and didn't seem be writing much at all (even when left overnight)!

I have tried several different methods: FASTSTRUCTURE just through the command line, FASTSTRUCTURE and STRUCTURE via STRUCTURE_THREADER, and I even installed the visual interface of STRUCTURE on both my Mac and on a Windows machine. The only method that seems to give meaningful bar plot results showing admixture was when I used the visual interface of STRUCTURE on the windows, but of course, in the end thats an awfully slow analysis and not practical!

I am wondering if the code I have used for STRUCTURE_THREADER has any issue in it?

FASTSTRUCTURE
structure_threader run -K 3 -i input_file_path.str -o output_file_path -t 4 -fs path_to_structure.py --ind path_to_indfile.txt --extra_opts prior=logistic

STRUCTURE
structure_threader run -K 3 -i input_file_path.str -o output_file_path -t 4 -st path_to_structure

I have also tried the code below to plot the results, and when I do, it gives strange bar plots that are even less meaningful, consisting of half-filled bars (sometimes with what could be admixture), but many of the bars terminate before the reach the top of the bar plot itself, so it looks all haphazard.

structure_threader plot -i file_path_to_output -f structure -K 2 3 -o output_file_path --ind path_to_indfile

To make things even more confusing: STRUCTURETHREADER did not plot the output from STRUCTURE, so we ran the above command (as it asked for an indfile) to plot the results. I received an error that I had more individuals in K1 Rep 1 output (apparently n=22) than what the input file (and even K1 Rep 1 output) had (n=21). Which was weird!

I have also fluffed around with the input files, making some using DartR vingrette on R Studio, converting others using PGD spider, and making others using PLINK.

Any help or ideas about what might be going wrong would be appreciated! Ultimately, I would prefer to get FASTSTRUCTURE (rather than STRUCTURE) to work on my mac, but thats only because it has "fast" in the name!!!

Many thanks,
Tess

Vikram Chhatre

unread,
Jul 9, 2018, 8:03:29 PM7/9/18
to structure-software
The fastest way to get help would be to provide a minimal working (or nonworking) example so we can reproduce the problem on our end.  Please include a small input file, mainparams, extraparams and any other configuration files.



--
You received this message because you are subscribed to the Google Groups "structure-software" group.
To unsubscribe from this group and stop receiving emails from it, send an email to structure-softw...@googlegroups.com.
To post to this group, send email to structure...@googlegroups.com.
Visit this group at https://groups.google.com/group/structure-software.
For more options, visit https://groups.google.com/d/optout.

f.pina...@gmail.com

unread,
Jul 10, 2018, 5:48:55 AM7/10/18
to structure-software

Hi Tess,

As far as I can tell, the commands you used with Structure_threader are perfectly correct.
I’m really not sure abut the issue regarding STRUCTURE Vs. FastStructure, but it is not unusual for the results of these two programs to be different. What you describe, however, is very unusual - reporting a value of K above 1, and the plots for K=2, etc to show basically a single colour.
It might eventually be related with the issue of the plot with missing colours. I had seen that happen when using Structure_threader to wrap MavericK, and when no population information was provided. But it was only for MavericK and fixed but this commit. That means there might be another STRUCTURE output file corner case I did not account for. But I’ll have to take a look at your output files to be able to diagnose and fix that particular problem.

Also, it is normal for FastStructure to take a very long time to run when you set the prior=logistic option.

The issue you are facing seems quite complex, and will require a way to make it reproducible. I’m especially baffled by the fact that the run on the windows/GUI version results in different plots than the CLI version. Are the mainparams and extraparamsfiles generated by the GUI version the same as the ones used in the CLI version?

That being said, I’ll just reiterate Vikram’s advice - Please try to provide us a minimal test case so we can experiment with the problem on our end. Also, remember that this is a public forum, so you may want to anonymize any data you post.

Best,

Francisco

tessco...@gmail.com

unread,
Jul 10, 2018, 6:45:44 PM7/10/18
to structure-software
Hi Vikram and Francisco,

Here you are. I've put them all into a compressed folder.

I have named the MainParams file with an R or a T before to indicate which dataset it is for. R should have K no less than 2, and T should have K 2, based on original output and PCA.

I haven't uploaded all the output I got from FastStructure or Structure as I have so many folders and files now, and now that its been a couple of days since this saga, I'm worried I would send the output from a different input file or with slightly different settings (I have a rather messy folder with all these tests now!), which would make things confusing. Anyway, I am fairly confident the input files attached here are the ones I used. They were made with DartR and then I just replaced the spaces with underscores, in the sample names, and removed all the loci information that was at the top. I did this with TextEdit not Excel.

I've also attached a couple of the plots I got as output.

Cheers,
Tess
Files and Output.zip

f.pina...@gmail.com

unread,
Jul 23, 2018, 9:01:43 AM7/23/18
to structure-software
Hi Tess,

Apologies for the delay in the reply. I've been busier than usual.
For the "T" dataset, fastStructure will always provide the same result: K=1. I know you expect something else, but this is what fastStructure results in.Regardless of the seed that it is provided. The "BestK" approach also supports this result (also, fastStructure was ignoring the first 6 loci, since the program will always ignore these first six columns. I have corrected this, but it did not make any difference).

Running it in STRUCTURE does take a while (about 2h in my i7-4700MQ laptop, for 100K iterations), but is doable. Provided it is recent enough, the full run can be performed on a desktop class machine in less than a day. However, there is a problem with Structure_threader parsing the resulting output file. Apparently it is a **fourth** variation of the STRUCTURE output file. I will deal with this, and I will let you know as soon as Structure_threader can handle this output as well. This issue is being tracked here. However, from looking at the values, I can see that the result is **very** different from that of fastStructure.

For the "R" dataset, K=3 s presented as the best result, and I can obtain a plot with this data, that does show "K=3". No matter the value of K that I try, all the plots consist only of 3 different colours. I expect that STRUCUTRE results will also be quite different.

I have also noticed that you are not using the admixture model. Is this intentional?
I will post back here when I have the missing feature implemented.

Best,

Francisco

f.pina...@gmail.com

unread,
Jul 23, 2018, 9:10:50 AM7/23/18
to structure-software
In fact, you can use Structure_threader as it is if you place a colum after the names of the individuals stating which "population" each individual is from. Even if you set everything to "1", it should work.

Francisco

f.pina...@gmail.com

unread,
Jul 23, 2018, 11:10:42 AM7/23/18
to structure-software
Hi again Tess,
I have just released Structure_threader v1.2.14 in pypi, which fixes this annoying bug.
You can now run STRUCTRE wrapped in Structure_threader and you will get the plots as expected. I do recommend STRUCTURE or MavericK over fastStructure, despite the longer running times.

Best,

Francisco

Vikram Chhatre

unread,
Jul 23, 2018, 11:14:02 AM7/23/18
to structure-software
It is also worth considering ADMIXTURE (Alexander et al 2009), which I have found capable of detecting fine substructure.  Also, it's a ML based method, so it's faster than any Bayesian approaches.

V

--

Theresa Cole

unread,
Jul 23, 2018, 6:22:12 PM7/23/18
to structure...@googlegroups.com
Thanks all. I did want Admixture turned on - I thought this was already set. Do I need to change

#define NOADMIX 0

to

#define NOADMIX 1

or is there a different setting elsewhere?

I’ll go ahead and install the new Structure_threader, and see how I go (with Structure). And I will avoid FastStructure…! I will also give Admixture a go!

T

f.pina...@gmail.com

unread,
Jul 24, 2018, 4:55:54 AM7/24/18
to structure-software
That parameter is correct, but I’m fairly sure it has to be defined in the file extraparams and not in mainparams.
In fact I’d recommend that you use a full mainparams and extraparams. Since you are already using Structure_threader, you can just run structure_threader params -o /path/to/dir/where/you/want/your/parameter/files and it will generate a mainparams and an extraparams with “default setting for you. You just have to change them to whatever you find adequate for your analysis.

Best,

Francisco
T

To unsubscribe from this group and stop receiving emails from it, send an email to structure-software+unsub...@googlegroups.com.

To post to this group, send email to structure...@googlegroups.com.
Visit this group at https://groups.google.com/group/structure-software.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "structure-software" group.
To unsubscribe from this group and stop receiving emails from it, send an email to structure-software+unsub...@googlegroups.com.

tessco...@gmail.com

unread,
Jul 31, 2018, 7:34:26 PM7/31/18
to structure-software
Hi Vikram,

Thanks for all your help - we have got STRUCTURE and STRUCTUREWRAPPER to work well.

We were wondering why the genotypes are coded in a binary way (0,1)? We have SNP data and for any SNP we expect three different genotypes (For example: AA, TT, AT). But it seems that when people use SNP data they only have 0 and 1. Why is that? Can we have three different genotypes?

Many thanks,
Tess

Vikram Chhatre

unread,
Jul 31, 2018, 8:16:15 PM7/31/18
to structure-software
Can you post an excerpt from your data file showing the 0/1 encoding?  

--
You received this message because you are subscribed to the Google Groups "structure-software" group.
To unsubscribe from this group and stop receiving emails from it, send an email to structure-softw...@googlegroups.com.

Theresa Cole

unread,
Jul 31, 2018, 8:17:30 PM7/31/18
to structure...@googlegroups.com
Hi Vikram,

It’s the same dataset that I uploaded already, in this thread.

Tess

Vikram Chhatre

unread,
Jul 31, 2018, 8:41:15 PM7/31/18
to structure-software
Hi Tess,

I see nothing wrong with your data file.  There are 4279 loci and alleles for an individual are coded on two rows (ONEROWPERIND=0).  I also do not see any 0/1 coding.  All genotypes are coded as either 11, 12 or 22 corresponding to homozygote for allele "1", heterozygote and homoz for allele "2".  

For SNPs, I prefer to keep all four alleles intact so that the information on which polymorphism it is (A/T or A/C or A/G etc.).  Hence my example of 1/2/3/4 (A/T/G/C).  In case of your data, that information is lost, but as far as STRUCTURE is concerned, it makes no difference.

V

Theresa Cole

unread,
Aug 1, 2018, 6:58:34 PM8/1/18
to structure...@googlegroups.com
Great answer - thats wonderful, and very helpful Vikram,

Thank you very much!

Cheers!
Tess
Reply all
Reply to author
Forward
0 new messages