Structure Harvester is returning .indfiles not formatted for CLUMPP

330 views
Skip to first unread message

Sheina Sim

unread,
May 15, 2014, 9:12:43 PM5/15/14
to structure...@googlegroups.com
Aloha,  I'm trying to run CLUMPP, but the .indfiles produced by Structure Harvester has a the confidence intervals for the probabilities of belonging to each population along with a strong for the population number (pop 1: 0.000 0.000 0.000 | pop 2: 0.000 0.000 0.000 | etc.).  

So the indfile for a K of 4:

1   1 (0)  1 : 0.988 | Pop 2: 0.000 0.000 0.000  | Pop 3: 0.000 0.003 0.009  | Pop 4: 0.000 0.000 0.000  |
  2   2 (2)  1 : 0.987 | Pop 2: 0.000 0.000 0.000  | Pop 3: 0.000 0.004 0.009  | Pop 4: 0.000 0.000 0.000  |
  3   3 (0)  1 : 0.987 | Pop 2: 0.000 0.000 0.000  | Pop 3: 0.000 0.003 0.009  | Pop 4: 0.000 0.000 0.000  |
  4   4 (0)  1 : 0.995 | Pop 2: 0.000 0.000 0.000  | Pop 3: 0.000 0.001 0.004  | Pop 4: 0.000 0.000 0.000  |
  5   5 (0)  1 : 0.963 | Pop 2: 0.000 0.000 0.000  | Pop 3: 0.003 0.015 0.019  | Pop 4: 0.000 0.000 0.000  |
  6   6 (0)  1 : 0.993 | Pop 2: 0.000 0.000 0.000  | Pop 3: 0.000 0.001 0.005  | Pop 4: 0.000 0.000 0.000  |
  7   7 (0)  1 : 0.977 | Pop 2: 0.000 0.000 0.000  | Pop 3: 0.006 0.006 0.012  | Pop 4: 0.000 0.000 0.000  |
  8   8 (5)  1 : 0.000 | Pop 2: 0.000 0.000 0.567  | Pop 3: 0.000 0.000 0.000  | Pop 4: 0.000 0.000 0.432  |  ***
  9   9 (0)  1 : 0.994 | Pop 2: 0.000 0.000 0.000  | Pop 3: 0.000 0.000 0.006  | Pop 4: 0.000 0.000 0.000  |
 10  10 (0)  1 : 0.967 | Pop 2: 0.000 0.000 0.000  | Pop 3: 0.008 0.011 0.015  | Pop 4: 0.000 0.000 0.000  |
 11  11 (0)  1 : 0.969 | Pop 2: 0.000 0.000 0.000  | Pop 3: 0.005 0.010 0.016  | Pop 4: 0.000 0.000 0.000  |
 12  12 (2)  1 : 0.986 | Pop 2: 0.000 0.000 0.000  | Pop 3: 0.000 0.004 0.010  | Pop 4: 0.000 0.000 0.000  |
 13  13 (0)  1 : 0.995 | Pop 2: 0.000 0.000 0.000  | Pop 3: 0.000 0.001 0.004  | Pop 4: 0.000 0.000 0.000  |
 14  14 (0)  1 : 0.977 | Pop 2: 0.000 0.000 0.000  | Pop 3: 0.005 0.006 0.012  | Pop 4: 0.000 0.000 0.000  |
 15  15 (0)  1 : 0.987 | Pop 2: 0.000 0.000 0.000  | Pop 3: 0.000 0.003 0.009  | Pop 4: 0.000 0.000 0.000  |

I used to most recent python script that is available on the website and the zipped file is too large to run on the web version and it is too large to upload here.

Any help in fixing it would be great, though I suppose I could also just delete the unnecessary information such as the strong for pop number and the first and third value between the pipes. Is that the value I want to keep? Or is it the last value between the pipes?  Also, what do the asterisks symbolize for individual 8?

Thanks!
Sheina

Dent

unread,
May 16, 2014, 1:32:21 PM5/16/14
to structure...@googlegroups.com
Hi Sheina,

This looks like it may be a parsing error. If you'd like you can send me the input file off-list (or send me a link to the data) and I'll take a look at it. Best,

Dent

Sheina Sim

unread,
May 30, 2014, 7:43:30 PM5/30/14
to structure...@googlegroups.com
Hello Dent,

Thank you so much for your reply.  I made the file smaller and tried it on the web version of Structure harvester and the subsequent .indfile had the same problem.  Here's a .zip of my structure outputs (100 replicates and k of 2-40).

Thanks for your help!
Sheina

Dent

unread,
Jun 5, 2014, 3:05:05 PM6/5/14
to structure...@googlegroups.com
Hi Sheina,

I looked at the archive you sent along and I've discovered the issue. When you ran STRUCTURE you ran it using prior population information, USEPOPINFO and with GENSBACK. This causes the output of the Q-matrix to change (see http://pritchardlab.stanford.edu/structure_software/release_versions/v2.3.4/structure_doc.pdf). As things currently stand I'm not entirely sure how to make use of the changed Q-matrix in order to shift it into the format that CLUMPP and DISTRUCT expect. It appears that _maybe_ it could be done by summing all the columns for a given set of fields: | Pop: n X X X... | but I haven't seen any explicit mention of this in either the STRUCTURE manual or in publications or the web. Both CLUMPP and DISTRUCT were written before this feature was added to STRUCTURE so they won't take this format directly.

Maybe someone on the list can chime in here? What do you guys do when you have this sort of data? Do you just sum up the values for the GENSBACK fields and call that the cluster assignment? I can easily implement this in the code but I'd like to have some community input rather than making this assumption on my own.

In the meantime I've updated my code to explicitly recognize when STRUCTURE results have used USEPOPINFO and GENSBACK and to fail with a message explaining why it won't generate the CLUMPP files for you. I realize that's not the sort of solution you were looking for, but I figure it's better to fail with an explanation than to return weird data silently.

Additionally, as I look at your data, I think you should remember to run K=1 — it makes it so you can get a value for deltaK at K=2, and since your current peak is at K=3 this is important.

Best,

Dent

Bart Kensinger

unread,
Sep 23, 2014, 7:06:41 PM9/23/14
to structure...@googlegroups.com
Dent,


I am having this exact problem. Can I sum the columns manually to produce the individual matrix. Is there a current solution?

Thanks,
Bart

Dent

unread,
Sep 24, 2014, 10:12:55 AM9/24/14
to structure...@googlegroups.com
Hey Bart,

Sadly my answer is
¯\_(ツ)_/¯
I'm still not sure what the best solution is in this case. My instinct is that one could sum the columns for a particular set of fields and that may be reasonable. But I haven't tested that and I haven't seen anyone use that in the literature (though I haven't been reading much in this field for some time). 

If someone else knows, if someone else has successfully published and gone through peer review with a method I would love to read / hear about it so that I can change the Harvester to deal with these cases.

Best,

Dent

Bart Kensinger

unread,
Sep 24, 2014, 12:20:16 PM9/24/14
to structure...@googlegroups.com
Dent,

   Thanks for your response. I considered summing the columns when I examined the differences between structure files and I think they mostly add up to 1 for my data. I think that I am going to do this and present it as an additional analysis for peer review as well as the results of the admixture method evaluated by the Evanno method in Harvester.

   I'll send you an email if it makes it through peer review so that you can update the code.

Cheers,
Bart

--
You received this message because you are subscribed to a topic in the Google Groups "structure-software" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/structure-software/_1zxFnc7C10/unsubscribe.
To unsubscribe from this group and all its topics, send an email to structure-softw...@googlegroups.com.
To post to this group, send email to structure...@googlegroups.com.
Visit this group at http://groups.google.com/group/structure-software.
For more options, visit https://groups.google.com/d/optout.



--
Bart Kensinger, PhD Candidate
Oklahoma State University, Zoology Department
430 Life Sciences West
Stillwater, OK 74078
Ph: (503)720-0886

M. Olalla Lorenzo-Carballa

unread,
Sep 18, 2016, 7:59:19 AM9/18/16
to structure-software
Hello all

I  am trying also to plot my results of an assignment test and I am getting the same error posted here when running Structure Harvester.

IS there any other potential solution besides the one of summing up the values in each column (which in my case would also add up to 1 for my data?

Any help would be greatly appreciated

Thanks in advance

Olalla
To unsubscribe from this group and all its topics, send an email to structure-software+unsub...@googlegroups.com.

To post to this group, send email to structure...@googlegroups.com.
Visit this group at http://groups.google.com/group/structure-software.
For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages