What does . mean in the place of an allele

127 views
Skip to first unread message

R K

unread,
Oct 6, 2023, 7:45:17 AM10/6/23
to plink2-users
 Hi. What does it mean when you have SNPs like these in your .bim file and are these SNPs usable?

1      rs11497407  0 633147  .  G
1       rs9701872  0 632828  .  A



Many thanks

Christopher Chang

unread,
Oct 6, 2023, 11:22:21 AM10/6/23
to plink2-users
This means the other allele is unknown.  The situation arises because there are older file formats (such as .ped) which don't keep track of what the minor allele is when it doesn't appear in any sample.  Note that, if your dataset has only one sample, this will actually be true of a large majority of SNPs.

Often, you don't need to do anything about this.  When you do, PLINK 2.0's --ref-allele and --alt1-allele flags can be used to fill in the missing allele codes from e.g. 1000 Genomes phase 3.

R K

unread,
Nov 8, 2023, 12:41:51 PM11/8/23
to plink2-users
Thanks. What about the you see 0 for one allele and a named allele in the other allele placeholder? This was using plink 1.9.
Can you explain what you mean by this "Note that, if your dataset has only one sample, this will actually be true of a large majority of SNPs." Do you mean one cohort?

Many thanks!

Christopher Chang

unread,
Nov 8, 2023, 4:59:46 PM11/8/23
to plink2-users
- The "0" and "." allele codes have the same meaning.
- No, I did not mean "one cohort", I specifically meant "one sample".  You'll still have some SNPs like that with a decent-sized cohort, but you shouldn't have a "large majority" unless your cohort is tiny.

R Stephanie L

unread,
Nov 15, 2023, 8:48:01 AM11/15/23
to plink2-users
Is this something that is commonly see in .bim files? I have genetic data on human genome build 38.
I am looking to do a sex check.

I have noticed the following. 1. Some SNPs look fine and accurate but others, I don't know how to interpret (i.e. are they correct as they look wrong)
e.g. the last two look wrong because the SNP name seems to include a position in the column that doesn't correspond to the BP position column and it has two alleles which don't correspond to the ones in the alleles columns. Also what is kgp..?

23

rs141052964

65.164260

41666425

A

G

23

rs141052964

65.164260

41666425

A

G

23 rs141052964.  65.164260  41666425 A G

23 kgp22771763. 9.828674 5178976 A G


23 X:7223150-C-T 14.510090 7305109 A G


I would really appreciate your advice as I have never encountered this in PLINK files. 


Thanks


DAVID J Cutler

unread,
Nov 15, 2023, 9:44:12 AM11/15/23
to R Stephanie L, plink2-users
Stephanie,

Try googling kgp22771763.

Generally kgp stands for 1000 genome's project, and kgp22771763 was a
designation that
Illumina gave to one of its probe-sets on the Omni2.5-exome array.
This is clearly some sort
of Illumina designation.

As for

23 X:7223150-C-T

That name seems to read that it is x-linked (and labeled as Chr23 by
plink, so all is well), and
it suggests that the two alleles that were assayed are C-T. Of
course, if the assay were on
the opposite strand of the genome reference, and there was some
attempt made to report
the alleles relative to the genome reference, everything would be
reported at A-G.

Cheers,
dave
> --
> You received this message because you are subscribed to the Google Groups "plink2-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to plink2-users...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/plink2-users/bdaa43e6-0e3f-4fa4-96d9-18b8ef5044ccn%40googlegroups.com.

Christopher Chang

unread,
Nov 15, 2023, 11:39:56 AM11/15/23
to plink2-users
As for the discrepancy between the positions in the IDs and the main POS column, my guess is that they are based on different reference genome builds.
Reply all
Reply to author
Forward
0 new messages