plink.raw

573 views
Skip to first unread message

test1

unread,
Aug 1, 2017, 3:55:42 PM8/1/17
to plink2-users
I received  a file in plink.raw format ( the result of --recodeA function).   can I convert it to binary bed file? if yes, what would be the command?

thanks,

Christopher Chang

unread,
Aug 1, 2017, 9:56:40 PM8/1/17
to plink2-users
This is not currently supported by plink.  However, it should be straightforward to load this in R; from there, you should be able to find an R package that can export to binary .bed.

Tim Smyser

unread,
Apr 23, 2018, 6:40:29 PM4/23/18
to plink2-users
Christopher,

We are encountering the same challenge.  Although we have been able to recreate *.ped genotypes with an R-script, our solution using ' ifelse' statements in R is too slow to be a practical solution.  Can you point us to any R-packages in particular that might handle this conversion more efficiently than our solution? 

Any insights you may have would greatly be appreciated.

Thanks,
Tim

Christopher Chang

unread,
Apr 23, 2018, 7:23:56 PM4/23/18
to plink2-users
Your main job is to write a transposed version of the numeric part of the matrix to disk (R's t() function might help; there are lots of other ways to do this).  Once you've done that, it should only take a little bit of work to get plink 2.0's --import-dosage function to work.

Kevin Keys

unread,
Sep 11, 2019, 2:01:07 PM9/11/19
to plink2-users
I have seen this question before and come across the situation myself.

I had some simulated data that were converted from haplotypes to PLINK RAW. This meant that there were no corresponding BED/BIM/FAM or PED/MAP files.

But I was always disappointed with the lack of an implemented solution.

Following Chris Chang's guidance, I wrote an R function that eventually beat the result into a PLINK BED. I'll leave it here in hopes that it is useful. It worked for me, but your mileage may vary.

See below.
--Kevin

# will use data.table for faster I/O, but it's not strictly necessary for this script to work
library
(data.table)

# function to perform conversion
# call outside of R with plink2 (latest build, 2019-09 or later)
# infile is a file of N samples on rows and P SNPs on columns, a header, and leftmost column with sample IDs
# remaining contents of infile are hard-called dosages (0,1,2)
# call PLINK on result of files like this:
# ~/bin/plink2 --import-dosage $OUT_DOSAGE noheader --fam $OUT_FAM --make-bed --out $YOUR_OUTFILE_PREFIX
convert
.raw.to.plink2 = function(infile, out.dosage, out.fam){

   
# load genotype dosages
    geno
= fread(infile)

   
# transpose the allele dosages
   
# use as.matrix to ensure that we (efficiently) transpose the allele dosage numbers and nothing else
    tgeno
= t(as.matrix(geno))

   
# build the dosage file itself
    geno
.dosage = data.table(cbind(paste0("snp", 1:ncol(geno)), "A", "T", tgeno))

   
# could add header, this one should work with --id-delim "-" in PLINK call
   
#colnames(ceu.dosage) = c("SNP", "A1", "A2", paste0("id", 1:nrow(ceu), "-id", 1:nrow(ceu)))

   
# write the dosage file to disk
   
# set col.names=T if you want a header on the file  
    fwrite
(geno.dosage, file = out.dosage, quote = F, sep = "\t", col.names=F)

   
# build a dummy fam file and write that to disk
    geno
.fam = data.table(cbind(paste0("id", 1:nrow(geno)), paste0("id", 1:nrow(geno)), 0, 0, 0, -9))
    fwrite
(geno.fam, file = out.fam, quote = F, sep = "\t", col.names=F)

   
return()

}





El dilluns, 23 abril de 2018 16:23:56 UTC-7, Christopher Chang va escriure:

Kevin Keys

unread,
Sep 11, 2019, 2:13:39 PM9/11/19
to plink2-users
Forgot to mention: the dosages here are the columns starting at column 6.
KLK

El dimecres, 11 setembre de 2019 11:01:07 UTC-7, Kevin Keys va escriure:

Christopher Chang

unread,
Sep 18, 2019, 10:48:58 AM9/18/19
to plink2-users
Thanks for posting this, I may direct others to it in the future.  This particular bit of functionality will probably remain outside PLINK's scope.

Qin Hui

unread,
Sep 23, 2021, 7:24:58 PM9/23/21
to plink2-users
I think it should be:
tgeno = t(as.matrix(geno[,-1]))

Qin Hui

unread,
Sep 24, 2021, 12:36:53 PM9/24/21
to plink2-users
and if there is missing value, use na="NA" in fwrite().

On Wednesday, September 11, 2019 at 2:01:07 PM UTC-4 klk...@gmail.com wrote:
Reply all
Reply to author
Forward
0 new messages