Generating popfile for clumpp

33 views
Skip to first unread message

Mark Farman

unread,
Feb 14, 2025, 10:56:12 PMFeb 14
to structure-software
So let's see if I understand this correctly.... I am using Structure to identify populations in a set of 100 individuals. I have performed ten iterations of identical runs using k values from 1 to 10 and then employed the Evanno method in StructureHarvester to find the best K -value (i.e the likely number of populations from which my samples were drawn). Now I wish to run Clumpp to infer memberships of each of my samples in the populations inferred by the Evanno method. To do this I need to provide a popfile which - as far as I can tell - summarizes the membership proportions of (presumably) "populations" in each of the bestK clusters. However, as far as I can tell after a few hours searching, there is nowhere online that explains the content of the popfile file.
The StructureHarvester documentation is no help at all because it suggests that the popfile has identical structure and content to the indfile, except for an extra column whose contents are not explained at all. This makes no sense at all - why is one file called an indfile and the other a popfile when they both contain "ind" information? Also, please don't tell me that it's not necessary to understand the popfile structure because StructureHarvester outputs a popfile - it doesn't. From what I gather, StructureHarvester WILL output a popfile if one provides LOCDATA/POPDATA (its not clear which) when Structure is invoked. In my situation, however, there is no a priori POPDATA or LOCDATA - and I can imagine in many other situations there would also be no such data. Why, then, would CLUMPP require a popfile as input when in many (most?) cases, there is no a priori population data, so the inclusion of popfile data would serve no purpose?

Vikram Chhatre

unread,
Feb 14, 2025, 11:19:01 PMFeb 14
to structure...@googlegroups.com
Hi Mark,

It's been a while since I did this so I can't answer all of your questions about popfile without retracing steps. However, note that using the POPDATA column is not the same as using that information to assign cluster memberships. STRUCTURE has specific functions (POPFLAG and USEPOPINFO) which need to be set in order to use a priori information.

Also, CLUMPP can be run using DATATYPE=0 which indicates that the input data is individual and then it does not expect the popfile input. 

If this doesn't help you resolve the issue you are having, let us know.

V

On Fri, Feb 14, 2025 at 10:56 PM 'Mark Farman' via structure-software <structure...@googlegroups.com> wrote:
So let's see if I understand this correctly.... I am using Structure to identify populations in a set of 100 individuals. I have performed ten iterations of identical runs using k values from 1 to 10 and then employed the Evanno method in StructureHarvester to find the best K -value (i.e the likely number of populations from which my samples were drawn). Now I wish to run Clumpp to infer memberships of each of my samples in the populations inferred by the Evanno method. To do this I need to provide a popfile which - as far as I can tell - summarizes the membership proportions of (presumably) "populations" in each of the bestK clusters. However, as far as I can tell after a few hours searching, there is nowhere online that explains the content of the popfile file.
The StructureHarvester documentation is no help at all because it suggests that the popfile has identical structure and content to the indfile, except for an extra column whose contents are not explained at all. This makes no sense at all - why is one file called an indfile and the other a popfile when they both contain "ind" information? Also, please don't tell me that it's not necessary to understand the popfile structure because StructureHarvester outputs a popfile - it doesn't. From what I gather, StructureHarvester WILL output a popfile if one provides LOCDATA/POPDATA (its not clear which) when Structure is invoked. In my situation, however, there is no a priori POPDATA or LOCDATA - and I can imagine in many other situations there would also be no such data. Why, then, would CLUMPP require a popfile as input when in many (most?) cases, there is no a priori population data, so the inclusion of popfile data would serve no purpose?

--
You received this message because you are subscribed to the Google Groups "structure-software" group.
To unsubscribe from this group and stop receiving emails from it, send an email to structure-softw...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/structure-software/ef171ea4-3c87-4e47-a877-65fa2e37f82en%40googlegroups.com.

Farman, Mark L.

unread,
Feb 17, 2025, 12:52:59 PMFeb 17
to structure...@googlegroups.com
Hi Vikram,

Thanks for the DATATYPE=0 hint. I wasn’t aware of that option. It sounds like this may do the trick. 

Mark

Mark L. Farman 
Professor, Department of Plant Pathology
225 Plant Science Building
1405 Veteran's Dr.
University of Kentucky
Lexington, KY 40546 USA 
tel:  (859) 218-0728 
fax:  (859) 323-1961

On Feb 14, 2025, at 11:18 PM, Vikram Chhatre <cryptic...@gmail.com> wrote:

You don't often get email from cryptic...@gmail.com. Learn why this is important
CAUTION: External Sender

You received this message because you are subscribed to a topic in the Google Groups "structure-software" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/structure-software/Ydf5dS5sBGg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to structure-softw...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/structure-software/CAJZnH0%3DeaYOsdxSbd-0-R4GCCzgzZE2Cta81E%2BFX2_TAKFj%3DHQ%40mail.gmail.com.

Reply all
Reply to author
Forward
0 new messages