Error in data format for USEPOPINFO model?

367 views
Skip to first unread message

Karl Fetter

unread,
Sep 18, 2013, 3:53:33 PM9/18/13
to structure...@googlegroups.com
Hello,

I'm using structure 2.3.4. I believe I have formatted my data file incorrectly for USEPOPINFO models. I want to specify populations for three samples (they are outgroups), and let the rest of the samples group where they might. 

Here is a sample of my data file:

trnH    trnL    trnK
galo1   -9  0    9    3    2
gara1   -9  0    3    3    2
gara2   -9  0    3    3    2
gath1   -9  0    10    3    2
lc1   2  1    4    4    3
lc2   2  1    4    4    3
magvir   1  1    1    1    1

col1 = sample names; col2 = PopData, col3= PopFlag, col4:6 = loci.

I'm asking a phylogenetic question with these data, so I have two out groups, one in the genus, LC1 + LC2, & one in the family, magvir. 

It is my understanding from reading the manual that if you assign PopFlag = 0, the program will ignore PopData. But it appears this isn't happening when I run the program. It looks like it is assigning -9 as a population to every sample except where I spcify 1 or 2. 

Additionally, I supplied -9 as a value for missing data when I set up the project.

Can someone tell me if my understanding of this problem is correct & if my data file is formatted correctly? Is -9 being interpreted as a population?

Thanks,

Karl Fetter

Vikram Chhatre

unread,
Sep 18, 2013, 4:19:34 PM9/18/13
to structure-software
Karl -

Unless you ask the program (by setting POPFLAG column to 1), the location information (POPDATA) isn't used by the clustering algorithm.  In addition, you can set LOCDATA column when using LOCPRIOR model.

By default, POPDATA column is simply used to organize samples in the results.  Indeed by setting the POPFLAG=0, the clustering algorithm ignores the POPDATA, but it is still needed for organizing the samples. Thus, you should include this information in your data file.  

I hope this helps.

Vikram



--
You received this message because you are subscribed to the Google Groups "structure-software" group.
To unsubscribe from this group and stop receiving emails from it, send an email to structure-softw...@googlegroups.com.
To post to this group, send email to structure...@googlegroups.com.
Visit this group at http://groups.google.com/group/structure-software.
For more options, visit https://groups.google.com/groups/opt_out.

Karl Fetter

unread,
Sep 18, 2013, 5:05:27 PM9/18/13
to structure...@googlegroups.com
Hi Vikram,

This does help. I was very confused as to why -9 was showing up as a population in the Bar plot. Upon further reading of the manual, it looks like labeling populations as -9 is senseless, as missing data only applies to genetic data, not a priori data about the samples. 

When I use the Front End to tell structure to use the POPDATA, I do this by clicking "Update allele freqeuncies using on individuals with POPFLAG =1 data"....correct?

When I do this, I get strange results, in all of my simulations, 1-K, I get equal probability of group assignment. In other words, I have flat bars across the bar plot and no structure is found. When I use the same parameters, but unselect this option, I recover groups. This seems very odd to me.

Karl

Vikram Chhatre

unread,
Sep 18, 2013, 5:43:05 PM9/18/13
to structure-software
My answers are appended below:


This does help. I was very confused as to why -9 was showing up as a population in the Bar plot. Upon further reading of the manual, it looks like labeling populations as -9 is senseless, as missing data only applies to genetic data, not a priori data about the samples. 

When you don't include a population identifier, the second column is being interpreted as such, hence the -9 label.

When I use the Front End to tell structure to use the POPDATA, I do this by clicking "Update allele freqeuncies using on individuals with POPFLAG =1 data"....correct?

Yes, assuming you have the POPFLAG and POPDATA columns in the data file. Check the POPINFO model information on page#11 of the user manual.
 

When I do this, I get strange results, in all of my simulations, 1-K, I get equal probability of group assignment. In other words, I have flat bars across the bar plot and no structure is found. When I use the same parameters, but unselect this option, I recover groups. This seems very odd to me.

Does the above discussion help dealing with these questions?  If not, it would be helpful to look at the bar plots.

V

 

Karl Fetter

unread,
Sep 18, 2013, 5:49:41 PM9/18/13
to structure...@googlegroups.com
Thanks Vikram. Those answers do help me understand the format and structure of using POPDATA and POPFLAG. The results are still the same for me. The picture below is representative of what I get when I enable the POPDATA.

This is what I expect to get, or something similar to it.



Karl

To unsubscribe from this group and stop receiving emails from it, send an email to structure-software+unsub...@googlegroups.com.

Andrea Schreier

unread,
Sep 19, 2013, 11:22:13 AM9/19/13
to structure...@googlegroups.com
Hi, Karl.  Just to double check, have you changed your second column to population identifiers and your third column to 1's for all the samples in which you want to use POPINFO?  The one will tell Structure that yes, you want to use the population identifier when doing population assignment.  (With some applications, one might only want to use the population identifier for some individuals.) 

If you have this input file format already, I would try running the software without selecting the "update allele frequencies using only individuals with POPFLAG=1."  You don't need to select that in order for the program to use the population information.  And if you tell Structure to use population data for all individuals, this feature is sort of redundant. 

Let me know if this helps!  I ran into a similar problem using the "update allele frequencies" option when doing population assignment once - all my unknowns were split 25% into four populations - and I'd be curious to see what fixes your problem.

Andrea


To unsubscribe from this group and stop receiving emails from it, send an email to structure-softw...@googlegroups.com.

Karl Fetter

unread,
Sep 21, 2013, 5:03:49 PM9/21/13
to structure...@googlegroups.com
Hi Andrea,

I did change the 2nd and 3rd columns to have the appropriate information. I believe that the error is coming from hidden file formatting. I can't figure out a way to save my data file as a kind=document, or a file that has no formatting. A friend sent me her data from a structure analysis and (I'm using a mac) the file has a blank icon instead of a .txt or .csv icon. When I take her file and load it into structure, I have no problems running analyses. When I save here data as a .txt file (using TextWrangler OR excel), and load it into structure, I get problems. When I do this, I can't even load the data file into structure. 

I wonder how one saves a file so that when you look at the meta-data for the file, kind=document (and not .txt, or .csv, .prn....etc, etc)?

What texteditors are out there that don't add hidden formatting to the file? I've used text wrangler for years and never had this problem.

Karl
To unsubscribe from this group and stop receiving emails from it, send an email to structure-software+unsubscribe@googlegroups.com.
To post to this group, send email to structure...@googlegroups.com.

Vikram Chhatre

unread,
Sep 21, 2013, 5:37:45 PM9/21/13
to structure-software
Karl -

The most common file format errors occur due to wrong EOL (end of line) character in a text file.  You could use .txt or .str extension both of which should be simple text formats.  In a text editor such as 'vi' (also called MacVim on OSX), you can set the file format easily with:

:set ff=mac/unix/dos

Are you seeing these errors after our last round of discussion on the matter?  

V



To unsubscribe from this group and stop receiving emails from it, send an email to structure-softw...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages