How does plink read bed files?

87 views
Skip to first unread message

Kaiyin Zhong

unread,
Feb 12, 2015, 6:19:12 AM2/12/15
to plink...@googlegroups.com
plink bed files uses 2 bits for each genotype data point, which is most efficient in terms of storage space, but not so convenient for numerical analysis. 

  • Could anyone give an intuitive explanation on how plink reads these bits into a matrix of int/double for linear algebra operations?
  • Which functions are involved if I want to look into the source code?


Christopher Chang

unread,
Feb 12, 2015, 6:47:58 AM2/12/15
to plink...@googlegroups.com
PLINK 1.9 tries to avoid using matrices of int/double whenever possible; instead almost everything is done using bitwise operations on the raw data.  See the "Identity-by-state and software popcount" section at the end of the preprint (http://arxiv.org/abs/1410.4803 ) for a detailed explanation of how this works with genomic distance; the paper also points to the most relevant functions in the source code.
Reply all
Reply to author
Forward
0 new messages