Hello,
I am having a similar problem. I'm working with a bgen file and manually created an accompanying sample file, using meta data information compiled from several csv files. I read them into R, combined and reformatted, and wrote to a new file with a .sample extension:
write.table(fox_meta, file="meta_for_plink_5Apr24.sample", sep=" ", row.names=FALSE, quote=FALSE)
Here's the first 6 lines of my .sample file:
ID_1 ID_2 missing sex CurrPDDiag age GenderId HeightCm WeightKgs RaceW RaceAA RaceAI RaceA RaceNH
0 0 0 D B P D P P B B B B B
FOX_169254 NA NA 1 1 63.4 <NA> 188 79.4 1 0 0 0 0
FOX_424382 NA NA 1 1 67.5 <NA> 175.3 77.1 1 0 0 0 0
FOX_407253 NA NA 1 1 57.4 <NA> 182.9 86.2 1 0 0 0 0
FOX_022765 NA NA 2 1 53.8 <NA> 162.6 60.3 1 0 0 0 0
FOX_144078 NA NA 1 1 52.3 <NA> 182.9 74.8 1 0 0 0 0
All the IDs are the same as in the .bgen file and are in the same order. I checked by converting the bgen to a vcf file, and extracting the sample IDs with bcftools. The sample file has an extra line in it from the second header, perhaps that is causing a problem (
similar to this user).
When I run the following command in plink 2, I get an error:
plink --bgen chr1.bgen.gz --sample meta_for_plink_5Apr24.sample --geno 0.01 --mind 0.05 --out chr1_bgen_check --make-bed
Error: --bgen and --sample files contain different numbers of samples.
Any insight? I am new to working with these file formats and plink.
Thanks!