Error: --bgen and --sample files contain different numbers of samples.

410 views
Skip to first unread message

Tongzhu

unread,
Aug 4, 2022, 2:27:38 PM8/4/22
to plink2-users
Hello, 

I'm new to plink and I am currently using .bgen files and .sample files to check homozygosity rate on the X-chromosome. 
I run it with:

PLINK v1.90b6.21 64-bit (19 Oct 2020) 
--bgen imp_chrX_v3.bgen snpid-chr
  --check-sex
  --out sex-check
  --sample imp_chrX_v3_s487296.sample

My sample file looks like this:

ID_1 ID_2 missing sex
0 0 0 D
1287965 1287965 0 1

Unfortunately, I get the error message said

Error: --bgen and --sample files contain different numbers of samples.

I assume the bgen file and sample file are generated at the same time, thus, they should contained the same number of samples. So I wonder why this will happen? 

Thank you very much,

Tongzhu 


Christopher Chang

unread,
Aug 5, 2022, 1:20:05 AM8/5/22
to plink2-users
It looks like your .bgen file contains ALL the samples, while the .sample file only refers to a single sample.  So they are mismatched.  You need to get a matched pair.

Tongzhu

unread,
Aug 5, 2022, 1:42:34 AM8/5/22
to plink2-users
Thank you for the help and sorry for the confusion. This is just an example of the sample file while it actually contains all the samples. But I failed to check the number of samples in .bgen file. I wonder is there a way to check it so that I know they are matched? Thank you.

Christopher Chang

unread,
Aug 7, 2022, 12:44:02 PM8/7/22
to plink2-users
If you use plink 2.0 for this operation, it will print the number of samples in the .bgen file (and it will also be able to preserve the dosages in the file).

Tongzhu

unread,
Aug 8, 2022, 1:26:38 PM8/8/22
to plink2-users
Thank you so much! 

Samantha Schaffner

unread,
Apr 5, 2024, 7:10:09 PM4/5/24
to plink2-users
Hello,

I am having a similar problem. I'm working with a bgen file and manually created an accompanying sample file, using meta data information compiled from several csv files. I read them into R, combined and reformatted, and wrote to a new file with a .sample extension:

write.table(fox_meta, file="meta_for_plink_5Apr24.sample", sep=" ", row.names=FALSE, quote=FALSE)

Here's the first 6 lines of my .sample file:

        ID_1 ID_2 missing sex CurrPDDiag  age GenderId HeightCm WeightKgs RaceW RaceAA RaceAI RaceA RaceNH
          0    0       0   D          B    P        D        P         P     B      B      B     B      B
 FOX_169254   NA      NA   1          1 63.4     <NA>      188      79.4     1      0      0     0      0
 FOX_424382   NA      NA   1          1 67.5     <NA>    175.3      77.1     1      0      0     0      0
 FOX_407253   NA      NA   1          1 57.4     <NA>    182.9      86.2     1      0      0     0      0
 FOX_022765   NA      NA   2          1 53.8     <NA>    162.6      60.3     1      0      0     0      0
 FOX_144078   NA      NA   1          1 52.3     <NA>    182.9      74.8     1      0      0     0      0

All the IDs are the same as in the .bgen file and are in the same order. I checked by converting the bgen to a vcf file, and extracting the sample IDs with bcftools. The sample file has an extra line in it from the second header, perhaps that is causing a problem (similar to this user).

When I run the following command in plink 2, I get an error: 

plink --bgen chr1.bgen.gz --sample meta_for_plink_5Apr24.sample --geno 0.01 --mind 0.05 --out chr1_bgen_check --make-bed

Error: --bgen and --sample files contain different numbers of samples.

Any insight? I am new to working with these file formats and plink.

Thanks!

Christopher Chang

unread,
Apr 5, 2024, 7:28:59 PM4/5/24
to plink2-users
- Always include a full .log file when asking for troubleshooting help.
- In this and many other cases, plink 2.0 provides a more useful error message (reporting the number of samples in each file) than plink 1.9.
- I'm pretty sure your .bgen file is not gzipped, so it should not have .gz at the end of its name.
Reply all
Reply to author
Forward
0 new messages