Hi,
I have a question about alleles in ped file.
I want to split a large ped file by chromosome in order to do imputation separately.
So I used a shell command;
plink \
--noweb \
--file MyGwasData \
--chr ${chr} \
--recode \
--out MyGwasData_chr-${chr}
* "MyGwasData" meams the prefix of ped & map file.
After that, I compare MyGwasData.ped with MyGwasData_chr-${chr}.ped,
I noticed that allele 1 and alllele 2 were changed in some SNPs.
For example, I'll show a ped file of 4 indivisials and 4 SNPs below.
MyGwasData.ped
0 ID1 0 0 0 -9 C C C C A A T T
0 ID2 0 0 0 -9 C T C C A A T T
0 ID3 0 0 0 -9 C C C T A A T A
0 ID4 0 0 0 -9 C C C C A G T T
*because I used an annotation of Affimetrix SNP array, allele representations are always in alphabetical order.
("A C", "A G", "A T", "C G", "C T" and "G T" do exist, but "C A", "G A", "T A", "G C", "T C" and "T G" do not exist.)
MyGwasData_chr-${chr}.ped
0 ID1 0 0 0 -9 C C C C A A T T
0 ID2 0 0 0 -9 C T C C A A T T
0 ID3 0 0 0 -9 C C T C A A T A
0 ID4 0 0 0 -9 C C C C G A T T
In this example,
the SNP2 of ID3 has been changed from "C T" to "T C" and
the SNP3 of ID4 has been changed from "A G" to "G A".
On the other hand,
the SNP2 of ID2 remained "C T" and
the SNP4 of ID3 remained "T A".
Do you know how this conversion happened.
First, I thought hetero alleles in reversed alphabetical order were automatically changed to those in alphabetical order,
but it seems that some hetero alleles were not changed.
And also, I thought that this conversion is based on some strand ("+" and "-") information,
but a file set of ped and map does not have any strand information.
Does this conversion have influence on downstream analysis?
(In other words, do "A B" and "B A" of an individual have the same meaning in ped file?)
Thanks, |