Issue loading compound genotypes in PED format file

407 views
Skip to first unread message

sahir bhatnagar

unread,
Jul 2, 2014, 1:56:18 AM7/2/14
to plink2...@googlegroups.com
I had to convert my genotype data (which was in tped format) to PED format. I have compound genotypes. The file looks like this:

===========
2 0200028 0 0 1 1 XX XX XX XX XX XX XX XX XX XX XX XX XX XX XX XX XX XX XX XX XX XX XX XX
2 0200029 0 0 1 1 XX XX XX XX XX XX XX XX XX XX XX XX XX XX XX XX XX XX XX XX XX XX XX XX
2 0200030 0 0 1 1 XX XX XX XX XX XX XX XX XX XX XX XX XX XX XX XX XX XX XX XX XX XX XX XX
2 0200031 0200001 0200015 1 1 CC GG TT TG TT TT CC AA CT CC AA CC AA AA CC CC AA TC CC TT GG AA TG XX
2 0200032 0200001 0200015 2 1 CC GG TT TG TT TT CC AA CT CC AA CC AA AA CC CC AA TC CC TT GG XX TG XX
2 0200033 0200001 0200015 2 1 CC GG TT TG TT TT CC AA TT CC TA CC AA AA TC CC AA CC TC TT GG AA TG TT
2 0200034 0200001 0200015 2 1 CC GG TT TG TT TT CC AA TT CC TA CC AA AA TC CC AA CC TC TT GG AA TG TT
2 0200035 0200001 0200015 2 1 CC GG TT TG TT TT CC AA TT CC TA CC AA AA TC CC AA CC TC TT GG AA TG TT
============

when I try to create the binary files using the command:

plink.1.90beta --file chr21 --missing-genotype X --make-bed --out chr21;

I get the following errors (this is just a subset... I have to ctrl+c to stop)

============
Warning: Variant 150717 (post-sort/filter) triallelic; setting rarest missing.
Warning: Variant 150718 (post-sort/filter) triallelic; setting rarest missing.
Warning: Variant 150719 (post-sort/filter) triallelic; setting rarest missing.
Warning: Variant 150720 (post-sort/filter) triallelic; setting rarest missing.
Warning: Variant 150721 (post-sort/filter) triallelic; setting rarest missing.
Warning: Variant 150722 (post-sort/filter) triallelic; setting rarest missing.
Warning: Variant 150723 (post-sort/filter) triallelic; setting rarest missing.
===========

However when I run it on PLINK 1.70 with the following command, it reads the file correctly

plink --noweb --file chr21 --missing-genotype X --make-bed --compound-genotypes --out chr21


===============
Options in effect:
--noweb
--file chr21
--missing-genotype X
--make-bed
--compound-genotypes
--out chr21

239352 (of 239352) markers to be included from [ chr21.map ]
1389 individuals read from [ chr21.ped ] 
1389 individuals with nonmissing phenotypes
Assuming a disease phenotype (1=unaff, 2=aff, 0=miss)
Missing phenotype value is also -9
0 cases, 1389 controls and 0 missing
673 males, 716 females, and 0 of unspecified sex
Before frequency and genotyping pruning, there are 239352 SNPs
413 founders and 976 non-founders found
Total genotyping rate in remaining individuals is 0.69009
0 SNPs failed missingness test ( GENO > 1 )
0 SNPs failed frequency test ( MAF < 0 )
After frequency and genotyping pruning, there are 239352 SNPs
After filtering, 0 cases, 1389 controls and 0 missing
After filtering, 673 males, 716 females, and 0 of unspecified sex
Writing pedigree information to [ chr21.fam ] 
Writing map (extended format) information to [ chr21.bim ] 
Writing genotype bitfile to [ chr21.bed ] 
Using (default) SNP-major mode
============





Christopher Chang

unread,
Jul 2, 2014, 2:34:57 AM7/2/14
to plink2...@googlegroups.com
Thanks for the report, I've replicated the problem and will post a fix later today.  (It's the combination of compound genotypes and a nonstandard missing genotype code that was confusing the .ped converter.)

Christopher Chang

unread,
Jul 2, 2014, 6:00:18 AM7/2/14
to plink2...@googlegroups.com
This should work properly in the 2 Jul development build.


On Wednesday, July 2, 2014 1:56:18 PM UTC+8, sahir bhatnagar wrote:

sahir bhatnagar

unread,
Jul 2, 2014, 9:46:32 AM7/2/14
to plink2...@googlegroups.com
Yes it works now. Good work.

mchr...@uw.edu

unread,
Aug 19, 2016, 5:43:26 PM8/19/16
to plink2-users
Christopher,

I too am trying to recode .ped/.map files, but to VCF and am experiencing the triallelic message with recent (july 2016) builds.  I too have compound genotypes and am changing the value for missing genotypes.   My command format is: 

plink --ped test.ped --map test.map --recode vcf --output-missing-genotype . --out outputfile

The contents of the ped files look similar to:

id1 id1 0 0 0 -9 CC GG GG CC -- AA AA 

Thanks for your help in advance!

Christopher Chang

unread,
Aug 19, 2016, 5:46:46 PM8/19/16
to plink2-users
Try adding "--missing-genotype '-'".

Qi Fu

unread,
Mar 1, 2017, 4:50:38 AM3/1/17
to plink2-users
Hi Christopher,

I have two files, one is test.ped, the other is test.map. I want to convert ped file to Structure file format.  The format of my ped file is displayed as follows:
id1 id1 0 0 0 -9 CC AA NN TT
id2 id2 0 0 0 -9 NN AA TT CC
id3 id3 0 0 0 -9 TT NN CC AA
(Note: all SNPs are coded using double-bit IUPAC nucleotide codes.)

The format of my map file is displayed as follows:
1 chromosome_1-2163  0  2163
1 chromosome_1-14606  0  14606

My command is
./plink --file test --record structure --out Str_test

Then,  I got the following messages from my laptop,below is just a subset:
============
Warning: Variant 1 (post-sort/filter) triallelic; setting rarest missing.
Warning: Variant 2(post-sort/filter) triallelic; setting rarest missing.
Warning: Variant 3 (post-sort/filter) triallelic; setting rarest missing.
============

Any methods to solve this problem?

Thank you very much.

在 2016年8月20日星期六 UTC+8上午5:46:46,Christopher Chang写道:

Christopher Chang

unread,
Mar 1, 2017, 1:07:24 PM3/1/17
to plink2-users
Hi,

The .ped and Structure formats require all variants to be biallelic; "TT/CC/AA" columns are not supported, and IUPAC nucleotide codes don't really work either.  You will need to represent your data in a simpler manner.

Qi Fu

unread,
Mar 2, 2017, 8:05:24 AM3/2/17
to plink2-users
Hi Christopher,
Can I use plink version 1.07 to convert my .ped file to Structure format?
my commands is:
plink --noweb --file test --missing-genotype N --recode --allele1234 --output-missing-genotype 0 --out Str_test

Thanks

在 2017年3月2日星期四 UTC+8上午2:07:24,Christopher Chang写道:
Reply all
Reply to author
Forward
0 new messages