The expected dosage format?

1,156 views
Skip to first unread message

Nicolas Lau

unread,
Apr 7, 2021, 5:59:17 AM4/7/21
to plink2-users
Hi Christopher,

When I tried to use --glm, I got an error saying:
    Error: Line 45 of --vcf file has an invalid DS field.
which happens to be the first line of the variants in a vcf file merged by bcftool.

I am curious maybe is it because there's a requirement of the dosage format to be used by PLINK2? Since I used dosage=HDS and vcf dosage=HDS all the time when converting the format back and forth between vcf and pgen files, and now the input I used have a dosage format as GT:DS:HDS, is it because I am using the wrong dosage fields for analysis and GP should be kept as input? 

I have also attached the line 45 in case you want to take a look (it's a bit long)

Thanks!
all_merged_vcf_line_45.txt

Christopher Chang

unread,
Apr 7, 2021, 11:36:44 AM4/7/21
to plink2-users
The issue is entries like "0|0::", which I would have expected to be formatted as "0|0:.:." or just "0|0".  The VCF specification does not make it clear that this is allowed.  However, bcftools is the reference implementation, so if its developers consider this to be allowed, it is reasonable to expect that the specification will be clarified accordingly in the future.  I'll double-check this, and probably support this in the next plink2 build.

Nicolas Lau

unread,
Apr 7, 2021, 12:09:43 PM4/7/21
to plink2-users
Hi Christopher,

Thank you for pointing out where the problem is!

FYI, I have explored and noticed that the dosage formats become inconstant after reading vcf with dosages using PLINK2 and then export to vcf. The command is just like:
    plink2 --vcf file.vcf dosage=HDS --export vcf vcf-dosage=HDS --out output_name

my original dosage vcf files have a same format like GT:DS:HDS:GP    0|0:0:0,0:1,0,0   After converting, both 0|0 and .|. formats exist in the new vcf file, and there're also some with the problem you just pointed out, which I am not sure how it happened, since I have seen in another post that you mentioned that all genotypes should be either "0|0" or ".|." Is this expected? 

Thanks!

Christopher Chang

unread,
Apr 7, 2021, 12:12:04 PM4/7/21
to plink2-users
"0|0" and ".|." are both fine; the first indicates a homozygous-REF genotype, and the second indicates a missing genotype.

Christopher Chang

unread,
Apr 7, 2021, 12:13:57 PM4/7/21
to plink2-users
It sounds like the unexpected empty strings are coming from the bcftools merge process; please clarify if that is not the case.

Nicolas Lau

unread,
Apr 7, 2021, 12:24:45 PM4/7/21
to plink2-users
Sorry, actually I think it's probably not the case, I am now trying to use bcftool to directly concatenate the vcf files of the chromosomes of individual cohort and then merging those from different cohorts, and the formats of dosage are preserved well.

As for the empty strings, I did it in a different process, I first converted the chromosome dosage vcf files into pgen files, and used --pmerge-list on the chromosomes, then converted the merged pgen files into vcf files and merged all cohorts with bcftool. I have just checked that after the process highlighted in green, the empty strings appeared in that process.

Christopher Chang

unread,
Apr 7, 2021, 12:26:57 PM4/7/21
to plink2-users
The part of that sentence you should be highlighting is "merged all cohorts with bcftools".  Look at the VCF files before and after that merge.

Nicolas Lau

unread,
Apr 7, 2021, 12:50:25 PM4/7/21
to plink2-users
Hi Christopher,

I highlighted that part because in my case the empty strings appeared the first time I load original VCF file in PLINK2, either I export it into a new VCF or pgen files.
And I have just checked, the empty strings are also in the VCF file before the all cohorts merging step.

Christopher Chang

unread,
Apr 7, 2021, 12:54:28 PM4/7/21
to plink2-users
Can you provide a set of files I can use to replicate this, then?

Christopher Chang

unread,
Apr 7, 2021, 1:13:24 PM4/7/21
to plink2-users
To clarify, you are reporting that "--export vcf vcf-dosage=HDS" can generate empty DS/HDS strings, given your merged fileset.  I need a subset of your merged fileset (could be as small as a single variant and a single sample) for which this is true, since I am unable to replicate what you're seeing when I try to merge e.g. 1000 Genomes phase 1 dosage data.

Nicolas Lau

unread,
Apr 7, 2021, 1:17:15 PM4/7/21
to plink2-users
yes, I believe so, I am working on generating a subset of my files and replicating the issue, I will return once I have the them.
Thank you.

Nicolas Lau

unread,
Apr 7, 2021, 1:42:59 PM4/7/21
to plink2-users
Hi Christopher,

Here attached are the files, The steps I was using are:

plink2 --pmerge-list mergelist.txt pfile --out test_merged
plink2 --pfile test_merged --export bcf vcf-dosage=HDS --out test_merged

If I convert test_merged.bcf to test_merged.vcf, then I can see the empty strings.

Sorry that I was mistaken in one detail, I just confirmed that the issue happened when --export to bcf, if --export to vcf then it looks fine.
在2021年4月7日星期三 UTC+2 下午7:13:24<chrch...@gmail.com> 写道:
subset_file.zip

Christopher Chang

unread,
Apr 7, 2021, 1:45:26 PM4/7/21
to plink2-users
Well, then we're back to this being a bcftools issue in how missing fields in the BCF are rendered.

In any case, if this is allowed and just not yet in the specification, I will add support for it in plink2.

Nicolas Lau

unread,
Apr 7, 2021, 1:52:53 PM4/7/21
to plink2-users
Yes, apologies that I have given you a misleading report which could have wasted your time.

Much appreciated for your help and this has been extremely helpful!

salim megat

unread,
Apr 8, 2021, 12:32:00 PM4/8/21
to plink2-users
Hi everyone, 

I was reading stuff online cause I got stucked on the same issue !!! I get an error message "Error: Line 4575 of --vcf file has an invalid DS field". 
I can confirm that the issue happens after merging cohorts with bcftools merge as I can convert back pre-merge vcf to plink just fine. 
In my case, I had to split multiallelelic variants  first and then got the malfomred DS error. Any idea what could be the issue with the vcf ? Do you think it's possible to edit the vcf it manually ? 

Many thanks for your help ! 

Salim. 
sampled.vcf

salim megat

unread,
Apr 8, 2021, 12:32:57 PM4/8/21
to plink2-users
Hi everyone, 

I was reading stuff online cause I got stucked on the same issue !!! I get an error message "Error: Line 4575 of --vcf file has an invalid DS field". 
I can confirm that the issue happens after merging cohorts with bcftools merge as I can convert back pre-merge vcf to plink just fine. 
In my case, I had to split multiallelelic variants  first and then got the malfomred DS error. Any idea what could be the issue with the vcf ? Do you think it's possible to edit the vcf it manually ? 

Many thanks for your help ! 

Salim. 



Le mercredi 7 avril 2021 à 19:52:53 UTC+2, Nicolas Lau a écrit :
sampled.vcf

Christopher Chang

unread,
Apr 8, 2021, 12:33:51 PM4/8/21
to plink2-users
The best current workaround may be to use bcftools to convert to BCF, which plink2 should be able to read properly.

salim megat

unread,
Apr 8, 2021, 12:36:59 PM4/8/21
to plink2-users
thank you for your fast answer ! 
Do you mean converting the merged vcf to bcf or converting to bcf before merging the files ? 

Thanks for this great tool that is awesome ! 

Salim. 

Christopher Chang

unread,
Apr 8, 2021, 12:40:07 PM4/8/21
to plink2-users
Either would work.  Since you already have the merged vcf, just convert that to bcf; but when doing similar things in the future, you'll want to use the bcf format as much as possible with bcftools since it's more efficient.

Nicolas Lau

unread,
Apr 8, 2021, 1:16:17 PM4/8/21
to plink2-users
Hi Salim,

I have tested on my files and can confirm you that exporting theto bcf format will cause this problem, and I have thereby merged and exported everything in vcf.gz format and now the problem is solved and I can have --glm running.

Christopher Chang

unread,
Apr 8, 2021, 1:19:12 PM4/8/21
to plink2-users
Nicolas, I think you're misunderstanding my recommendation.  I'm saying to use the bcf format with "plink2 --bcf".

yuqin gu

unread,
Jun 25, 2022, 4:46:03 AM6/25/22
to plink2-users
Hi Christopher,
when I tried to use --make-pgen,  I got an error saying:
Error: Line 2005 of --vcf file has an invalid DS field.
Then I used bcftools to convert vcf to bcf,  it's so sorry that I got an error message:
Error: Variant record #1939 of --bcf file has an invalid DS field.
Do you have any advice?
I'm appreciated for your help!
2791656146523_.pic.jpg

Christopher Chang

unread,
Jun 25, 2022, 9:52:32 AM6/25/22
to plink2-users
You were given an exact line number to look at in the VCF file.  Maybe you should see what's going on with the DS field there?

aheritas

unread,
Sep 23, 2022, 2:20:55 PM9/23/22
to plink2-users
Hi, I have run the following command:

  plink2 \
      --threads 2 \
      --memory 16384 \
      --set-all-var-ids '@:#:$r:$a' \
      --new-id-max-allele-len 100 missing \
      --vcf my_imputed_vcf.vcf.gz dosage=DS \
      --make-pgen \
      --out vcf_my_imputed_vcf


and I have encounter the following error:

Error: Line 22110290 of --vcf file has an invalid DS field.

I have checked the content of the vcf file in that line (zcat my_imputed_vcf.vcf.gz | sed -n '22110290,22110291p')


4    84906237    rs2868841    A    G    .    PASS    RAF=0.978035;AF=1.0005;BUF=0    GT:DS:GP:RC:AC:DP    1/1:2.001:0,0.001,1:0:2:2
4    84906257    rs561160477    C    T    .    PASS    RAF=0.000399361;AF=0;BUF=0    GT:DS:GP:RC:AC:DP    0/0:0:1,0,0:3:0:3



where I understand that the problem is that the DS field has a value of 2.001 and it should be less than 2? I couldn't find documentation that this is the problem but this is what I imagine that is happening.

I would appreciate your suggestions on how to troubleshoot this issue. If you need any extra information please let me know.

I have no control over the imputation method or parameters used, but I am attaching a screenshot of the VCF headers in case this is useful for debugging as well.

vcf_head_imputed.png



Christopher Chang

unread,
Sep 23, 2022, 2:28:35 PM9/23/22
to plink2-users
Correct, plink2 will only accept DS values in [0, 2].  You'll need to either use a fixed version of the upstream tool generating the nonsensical dosage, or write a short script to clamp such values to be in range before calling plink2.
Reply all
Reply to author
Forward
0 new messages