Re: Structure 2.3.4 and SNP data

1,019 views
Skip to first unread message
Message has been deleted

f.pina...@gmail.com

unread,
May 23, 2018, 5:43:36 AM5/23/18
to structure-software
Hi Andy,

There are some example datasets available in STRUCTURE's website.
You may want to take a look at those.

You may also be interested in PGDSpider to perform conversion between formats.

Best,

Francisco


On Tuesday, 22 May 2018 17:38:30 UTC+1, Andy N wrote:
Quick question. How to format SNP data so Structure 2.3.4 can read it. I tried to search manual etc. but there is no examples of this. Where can I find information about various formats Structure can read? Thank you for help!

Andy N

unread,
May 23, 2018, 11:06:24 AM5/23/18
to structure-software
Thank you for answer and not ignoring my "simple" question. Those "simple" questions are many times biggest stumbling blocks before any work can be done, since most of the time those questions are so simple for people who work in the field, that... are forgotten ;).

Unfortunately, website with SNP data will not download anything. Some sort of web error.

Is there some place that explains data formats for markers? Structure manual is VERY general, to the point that is almost not useful at all.

I don't have any experience with markers and Structure, so I need relatively simple manual.
Message has been deleted

Andy N

unread,
May 23, 2018, 11:12:35 AM5/23/18
to structure-software
Do you know any tutorial for SNP data?

Vikram Chhatre

unread,
May 23, 2018, 11:21:45 AM5/23/18
to structure-software
Hi Andy,

The Structure manual is immensely useful as you will find once you get going and start testing various models.  Writing manuals for software programs is always tricky because you are trying to strike a balance between accessibility (for beginner users) vs covering all functionality.  If it catered fully to both ends of the spectrum, it would have likely been twice as long (at least).  

That said, STRUCTURE format is very simple.  For SNPs, A,T,C,G are to be converted to 1,2,3,4 (or any letter to any number, so long as you are consistent).  You can either have data in two rows, or two columns.  Look at the example data set provided with STRUCTURE, and format your file accordingly.  There are also various tools (e.g. PGDSpider) which will convert your data from tens of different formats into STRUCTURE and vice versa.

V

On Wed, May 23, 2018 at 9:12 AM, Andy N <andrz...@gmail.com> wrote:
Do you know any tutorial for SNP data?

--
You received this message because you are subscribed to the Google Groups "structure-software" group.
To unsubscribe from this group and stop receiving emails from it, send an email to structure-software+unsub...@googlegroups.com.
To post to this group, send email to structure-software@googlegroups.com.
Visit this group at https://groups.google.com/group/structure-software.
For more options, visit https://groups.google.com/d/optout.

Andy N

unread,
May 23, 2018, 11:55:58 AM5/23/18
to structure-software
Yes, I understand. I'm writing from perspective "user" who never touch it before.




Andy N

unread,
Jun 13, 2018, 12:15:49 PM6/13/18
to structure-software


Unfortunately, I still don't know how to start.

My data is provided in this format:


                     SNP1 SNP1 SNP2 SNP2
Genotype1      0        1        1         0
Genotype2      0        1        0         1
Genotype3      -         -         1         0
Genotype4      0        1        0         1

Can someone suggest how to apply this type of data in STRUCTURE 2.3.4 so it will work with it? I'm constantly getting errors about "not expected number of columns or rows etc." I'm also not sure, how to configure initial parameters to even run STRUCTURE? Any ideas?

Thank you for help!

Vikram Chhatre

unread,
Jun 13, 2018, 12:51:06 PM6/13/18
to structure-software
How was this data generated?  Your posted example is very small, so it is not clear whether it is 0/1/2 coding, where 0= zero copies of reference allele (alt homozygote), 1=1 copy of ref allele (heterozygote), and 2= 2 copies of ref allele (ref homozygote), or some other coding scheme.  

As it stands, your example is showing a "0" genotype at SNP1 for Genotype1.  Without further information, it would be difficult to provide any feedback.

The "not expected number of columns" type of error indicates problems with end of line formatting.  What is your operating system, are you using front-end or commandline version and are you using a real text editor (not spreadsheets or word processors)?

 

--
You received this message because you are subscribed to the Google Groups "structure-software" group.
To unsubscribe from this group and stop receiving emails from it, send an email to structure-softw...@googlegroups.com.
To post to this group, send email to structure...@googlegroups.com.
Message has been deleted
Message has been deleted

Vikram Chhatre

unread,
Jun 13, 2018, 3:48:02 PM6/13/18
to structure-software
My bad.  You do seem to have data on both alleles.  However, if you want to use "-" character for coding the missing data, then you need to explicitly inform STRUCTURE about that (default is -9).

It would be useful if you answer my other two remaining questions.

V

On Wed, Jun 13, 2018 at 1:09 PM Andrzej Noyszewski <andrz...@gmail.com> wrote:
I have SNP1 reference allele (e.g. 0) and alternate allele (e.g. 1) for each genotype (e.g. Genotype 1) and SNP2 etc. 



On Wed, Jun 13, 2018 at 1:57 PM Andrzej Noyszewski <andrz...@gmail.com> wrote:
If you could look again. SNP1 for Genotype one show 1 and 0 (one of reads is reference). There are only 1 and 0 and - (as no allele). I do not have any other coding in my files. There are two rows to describe each particular marker.

I will check my files in "real" editor, I forgot about it how sensitive this can be.

Thank you.
--
-- 
Andrzej Noyszewski

"There is nothing about a PhD that guarantees a person will be wiser, kinder and more ethical then someone with high school education". Dennis Prager 

"In a country well governed, poverty is something to be ashamed of. In a country badly governed, wealth is something to be ashamed of." Confucius

"We know that, for Americans, nothing — absolutely nothing — is out of reach because we don’t know the meaning of the word ‘quit.’" President Donald Trump
--
-- 
Andrzej Noyszewski

"There is nothing about a PhD that guarantees a person will be wiser, kinder and more ethical then someone with high school education". Dennis Prager 

"In a country well governed, poverty is something to be ashamed of. In a country badly governed, wealth is something to be ashamed of." Confucius

"We know that, for Americans, nothing — absolutely nothing — is out of reach because we don’t know the meaning of the word ‘quit.’" President Donald Trump

Andrzej Noyszewski

unread,
Jun 14, 2018, 10:56:21 AM6/14/18
to structure...@googlegroups.com
OK. I can load file (it is still not as should be but...), it was text editor issue.

Andrzej Noyszewski

unread,
Jun 18, 2018, 7:05:25 PM6/18/18
to structure...@googlegroups.com
So, I got file in one row format with 0,1, 2 and -  descriptors. This is how data looks like initially:

Genotype1,Genotype2,Genotype3,Genotype4,Genotype5,Genotype6
0,-,1,2,1,1
0,2,0,1,0,0
1,-,1,0,0,1

I understand I need to transpose it, but what next?
Genotype1,0,0,1
Genotype2,-,2,-
Genotype3,1,0,1
Genotype4,2,1,0
Genotype5,1,0,0
Genotype6,1,0,1

??

f.pina...@gmail.com

unread,
Jun 19, 2018, 8:41:24 AM6/19/18
to structure-software
If I were you, I would replace the "-" notation with "-9", to make it more standard.
Also, I don't think STRUCTURE supports "," as a delimiter and as such I would recommend that you replace them with " " (whitesace) ou "\t" (tab).
To unsubscribe from this group and stop receiving emails from it, send an email to structure-software+unsub...@googlegroups.com.

To post to this group, send email to structure...@googlegroups.com.
Visit this group at https://groups.google.com/group/structure-software.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "structure-software" group.
To unsubscribe from this group and stop receiving emails from it, send an email to structure-software+unsub...@googlegroups.com.

To post to this group, send email to structure...@googlegroups.com.
Visit this group at https://groups.google.com/group/structure-software.
For more options, visit https://groups.google.com/d/optout.
--
-- 
Andrzej Noyszewski

"There is nothing about a PhD that guarantees a person will be wiser, kinder and more ethical then someone with high school education". Dennis Prager 

"In a country well governed, poverty is something to be ashamed of. In a country badly governed, wealth is something to be ashamed of." Confucius

"We know that, for Americans, nothing — absolutely nothing — is out of reach because we don’t know the meaning of the word ‘quit.’" President Donald Trump
--
-- 
Andrzej Noyszewski

"There is nothing about a PhD that guarantees a person will be wiser, kinder and more ethical then someone with high school education". Dennis Prager 

"In a country well governed, poverty is something to be ashamed of. In a country badly governed, wealth is something to be ashamed of." Confucius

"We know that, for Americans, nothing — absolutely nothing — is out of reach because we don’t know the meaning of the word ‘quit.’" President Donald Trump

--
You received this message because you are subscribed to the Google Groups "structure-software" group.
To unsubscribe from this group and stop receiving emails from it, send an email to structure-software+unsub...@googlegroups.com.

To post to this group, send email to structure...@googlegroups.com.
Visit this group at https://groups.google.com/group/structure-software.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "structure-software" group.
To unsubscribe from this group and stop receiving emails from it, send an email to structure-software+unsub...@googlegroups.com.

To post to this group, send email to structure...@googlegroups.com.
Visit this group at https://groups.google.com/group/structure-software.
For more options, visit https://groups.google.com/d/optout.
--
-- 
Andrzej Noyszewski

"There is nothing about a PhD that guarantees a person will be wiser, kinder and more ethical then someone with high school education". Dennis Prager 

"In a country well governed, poverty is something to be ashamed of. In a country badly governed, wealth is something to be ashamed of." Confucius

"We know that, for Americans, nothing — absolutely nothing — is out of reach because we don’t know the meaning of the word ‘quit.’" President Donald Trump
Message has been deleted

Andy N

unread,
Jun 19, 2018, 9:53:23 AM6/19/18
to structure-software
True, I forgot to replace "," with tabs. I didn't get used to it, yet. I also replaced - with "-9". 

In general file can be read by structure, but when I start new project is try to set initial parameters (for this example: # of individuals 6, ploidy 2, # of loci 3, missing data value -9) and I'm also adding "Individual ID for each individual", but then I get error "Bad Format  in Data Source: Expected 12 rows, currently have 6 rows". When I added Ind. ID as a separate row it expects 4 columns. 

I tried to add also names of loci (selected "row with loci names in Structure), so my file looks like this:

SNP1 SNP2 SNP3
Gen1 0 0 1
Gen2 -9 2 1
Gen3 1 1 1
Gen4 1 1 1
Gen5 0 0 1
Gen6 1 1 1

Here are links to pictures with configuration and error I get so you can actually see it:




Test file is saved in Komodo Edit. 

This should work(?), and I really have not idea why I'm getting this error with row numbers. Again, thank you for help in trouble shooting.

EDIT: When I changed ploidy level to 1, file was accepted. Why to include ploidy level as one of parameters? I'm not sure what is role of ploidy parameter for structure.

f.pina...@gmail.com

unread,
Jun 19, 2018, 11:01:13 AM6/19/18
to structure-software
Hi Andy,

Like Vikram explained, you need ot have your data encoded in either 2 rows per genotype, or 2 columns per genotype. As far as I know you cannot simply have 0,1 or 2 to encode your data. You will need to convert that format to something that STRUCTURE will understand. Something like this:

    SNP1 SNP2 SNP3
Gen1 0 0 1
Gen1 0 0 0
Gen2 -9 1 1
Gen2 -9 1 0

Also note that you need to split the "GenX" from the first allele with a whitespace.

Best,

Francisco

Vikram Chhatre

unread,
Jun 19, 2018, 11:14:17 AM6/19/18
to structure-software
STRUCTURE accepted the data under 'haploid' setting because that's how you have coded it. Every SNP has only one allele coded in your data, which is an indication of haploidy.  My understanding is that your data is diploid.  In other words, you need to code two alleles per SNP per genotype.


If your data is actually coded as 0/1/2, then you will need to convert that to something that STRUCTURE understands.  An example with three genotypes and one SNP

Geno1  0
Geno2  1
Geno3  2

In other words:

Geno1  is homozygote for the reference allele.  Let's assume the polymorphism is A/T where A=1 and T=2.  Then Geno1 should be coded as 1 1. 
Geno2 is a heterozygote, so code it as 1 2
Geno 3 is a homozygote for the alternate allele, which should be coded as 2 2

If you are still running into issues after following these directions, please post a minimum working (or not working) example, which includes following files:

- your data set (for 10 genotypes and 10 SNPs)
- mainparams (the main parameter file)

V

--
You received this message because you are subscribed to the Google Groups "structure-software" group.
To unsubscribe from this group and stop receiving emails from it, send an email to structure-softw...@googlegroups.com.

Andy N

unread,
Jun 19, 2018, 11:20:45 AM6/19/18
to structure-software
Hello Francisco,

I THINK, I see what you mean. 

Andy N

unread,
Jun 19, 2018, 11:26:18 AM6/19/18
to structure-software
Vikram. No, my data is not codded they way you presented as SNP alleles A, T, G, and C coded as 1, 2, 3 and 4. 

Andy N

unread,
Jun 19, 2018, 11:48:55 AM6/19/18
to structure-software
OK. So, my original data looks like this (no, more 2 as a heterozygote description)/. Yes, my data is diploid:

Gen1 Gen2
SNP1 - 0
SNP1 - 1
SNP2 1 1
SNP2 0 0
SNP3 1 1
SNP3 0 0

Will this be a format structure will accept?

SNP1 SNP2 SNP3
Gen1 - 1 1
Gen1 - 0 0
Gen2 0 1 1
Gen2 1 0 0
(I original file are "tabs", here just for better looks).

I think that in general it is not important to know what exact SNP it observed A/T or G/T etc. etc. I hope above will help.

Vikram Chhatre

unread,
Jun 19, 2018, 12:02:32 PM6/19/18
to structure-software
In general, your new format looks like fine except for two things:

1. The missing data should be coded as "-9" which we discussed in this thread earlier
2. Generally you should keep the numeric pattern consistent throughout file.  Here, Gen1 is a heterozygote coded as 1 0 at SNP2 and SNP3.  Then GEN2, which is also a heterozygote at all three loci, but coded as 0 1 for SNP1 and 1 0 for SNP2 and SNP3. 

It would be useful for you to follow advice when things aren't working.  We still do not know where your original data was outputted from and why you haven't implemented an automated utility like PGDSpider to convert your data faithfully.

Hand coding data is sure to introduce errors which will create problems with downstream results.  When you are fluent in the use of a text editor, you could delve into hand coding. But for now, I encourage you to stick to standardized format converters.

V

Reply all
Reply to author
Forward
Message has been deleted
0 new messages