Error: no samples in vcf file

1,428 views
Skip to first unread message

Dan

unread,
Oct 18, 2017, 1:54:45 AM10/18/17
to plink2-users
Hello,

I have a vcf file that I am trying to use to make bed, fam, and bim files. I am using the code:

plink --vcf ~/filtering/test1/outfiles/test1.vcf --make-bed --out ~/filtering/test1/outfiles/test1

And receiving the error:

Error: no samples in .vcf file.

However, when I inspect my vcf file, I see this:

##fileformat=VCFv4.1
##fileDate=20171018
##source=pyRAD.v.3.0.66
##reference=common_allele_at_each_locus
##INFO=<ID=NS,Number=1,Type=Integer,Description="Number of Samples With Data">
##INFO=<ID=DP,Number=1,Type=Integer,Description="Total Depth">
##INFO=<ID=AF,Number=A,Type=Float,Description="Allele Frequency">
##INFO=<ID=AA,Number=1,Type=String,Description="Ancestral Allele">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read Depth">
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO            FORMAT  1A_0    1B_0    1C_0    1D_0    2E_0    2F_0    2G_0    2H_0    3I_0    3J_0    3K_0    3L_0

31      20      .       C       G       20      PASS    NS=12;DP=6      GT      1|0     0|0     0|0     0|0     0|0     0|0     0|0     0|0     0|0     0|0     0|0     0|0
61      75      .       T       C       20      PASS    NS=12;DP=6      GT      1|1     1|1     1|1     0|0     0|0     0|0     0|0     0|0     0|0     0|0     0|0     0|0
63      71      .       C       T       20      PASS    NS=12;DP=6      GT      0|0     0|0     0|0     0|0     0|0     0|0     0|0     1|1     0|0     0|0     0|0     0|0
89      41      .       A       T       20      PASS    NS=12;DP=6      GT      0|0     0|0     0|0     0|0     0|0     0|0     0|0     1|0     0|0     0|0     0|0     0|0
123     68      .       A       C       20      PASS    NS=12;DP=6      GT      0|0     0|0     0|0     0|0     0|0     0|0     0|0     0|0     0|0     0|0     1|0     0|0
145     20      .       A       G       20      PASS    NS=12;DP=6      GT      0|0     0|0     0|0     0|0     0|0     0|0     0|0     0|0     0|0     1|0     0|0     0|0
~                                                                                                                                                                          

If I am not mistaken, everything to the right of FORMAT is a sample, and their genotypes are all listed below. I am not sure why I am receiving samples as I believe the samples are right there?

Thanks so much for your help.

Dan

unread,
Oct 18, 2017, 1:56:06 AM10/18/17
to plink2-users
Sorry, I meant: I am not sure why I am receiving a "no samples" error as I believe the samples are right there.

Christopher Chang

unread,
Oct 19, 2017, 11:27:24 PM10/19/17
to plink2-users
Hi,

Could you send me
(i) the first few lines of the VCF file, up to the second variant, and
(ii) the .log file from the plink run?

Thanks.

Dan

unread,
Oct 22, 2017, 6:57:28 PM10/22/17
to plink2-users
Hi Chris,

Thanks for your help. I have attached the .log file, and here are the first few lines of the VCF file up to the second variant:


##fileformat=VCFv4.1
##fileDate=20171018
##source=pyRAD.v.3.0.66
##reference=common_allele_at_each_locus
##INFO=<ID=NS,Number=1,Type=Integer,Description="Number of Samples With Data">
##INFO=<ID=DP,Number=1,Type=Integer,Description="Total Depth">
##INFO=<ID=AF,Number=A,Type=Float,Description="Allele Frequency">
##INFO=<ID=AA,Number=1,Type=String,Description="Ancestral Allele">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read Depth">
#CHROM    POS    ID    REF    ALT    QUAL    FILTER    INFO        FORMAT    1A_0    1B_0    1C_0    1D_0    2E_0    2F_0    2G_0    2H_0    3I_0    3J_0    3K_0    3L_0

31    20    .    C    G    20    PASS    NS=12;DP=6    GT    1|0    0|0    0|0    0|0    0|0    0|0    0|0    0|0    0|0    0|0    0|0    0|0
61    75    .    T    C    20    PASS    NS=12;DP=6    GT    1|1    1|1    1|1    0|0    0|0    0|0    0|0    0|0    0|0    0|0    0|0    0|0

test1.log

Christopher Chang

unread,
Oct 22, 2017, 11:18:01 PM10/22/17
to plink2-users
The main issue is the extra spaces after "INFO" in the last header line; those must be deleted.

You'll also need to decide how you want to translate the IDs. plink's default behavior is to treat '_' as a delimiter between the family ID ("FID") and the individual ID ("IID"), but that doesn't work here since an individual ID of '0' is prohibited. Two simple solutions are "--double-id" and "--const-fid 0"; "--double-id" sets both the FID and IID to the full ID, while "--const-fid 0" sets all FIDs to '0'.

Dan

unread,
Oct 23, 2017, 1:21:40 AM10/23/17
to plink2-users
Thank you so much!! I am no longer receiving that error anymore.

I now get an error saying: "Error: invalid chromosome code '31' on line 14 of .vcf file. (This is disallowed for humans. Check if the problem is with your data, or if you forgot to define a different chromosome set with e.g. --chr-set."

However, I saw a fix online suggesting to change all of the chromosome numbers to scaffold numbers (i.e. change 31 to scaffold_31) and then use the option --allow-extra-chr. When I do this, I no longer have errors. Is this the proper way to address this error?

Thanks so much.
Reply all
Reply to author
Forward
0 new messages