Beagle 5.2 error while reading VCF files from PLINK

247 views
Skip to first unread message

AritraB

unread,
Oct 21, 2021, 2:05:56 PM10/21/21
to plink2-users
Hi,

I am running into an issue while doing Imputation with Beagle 5 and not sure what is causing the error. I have vcf files converted from PLINK by the following command:

./plink --bfile qcd_in --chr 20 --recode vcf-iid --out qcd_chr20

This creates a VCF file qcd_chr20.vcf which I am using as input in Beagle 5 for imputation. The command I am running is as follows:

java -Xmx20g -jar beagle.28Jun21.220.jar
  gt=qcd_chr20.vcf
  ref=phase3.chr20.GRCh38.GT.crossmap.vcf.gz
  map=plink.chr20.GRCh38.map
  chrom=20
  out=imputed_qcd_chr20

The error I get is the following:

java.lang.IllegalArgumentException: No VCF records found in the specified interval. Check chromosome identifier and interval: 20 at vcf.IntervalVcfIt.<init>(IntervalVcfIt.java:55) at main.Main.lambda$refSupplier$1(Main.java:292) at vcf.WindowIt$Reader.run(WindowIt.java:243) at java.lang.Thread.run(Thread.java:745)

The above is true for all the chromosomes. Can anybody help in resolving this issue? Am I missing something while recoding Plink files to VCF?

P.S: This is a cross post from Biostars (https://www.biostars.org/p/9494309/#9494315)
P.P.S: I tried the above with PLINK2 pgen files as well but ran into the same issues.

I'd appreciate any help with this.

Regards,
Aritra

Christopher Chang

unread,
Oct 21, 2021, 9:18:13 PM10/21/21
to plink2-users
What does the first non-header line of the VCF (i.e. after the #CHROM POS ... line) start with?

AritraB

unread,
Oct 21, 2021, 9:32:50 PM10/21/21
to plink2-users
Hi Chris,

This is the first non-header line of the VCF (after #CHROM POS):

20      62731   rs34147676      C       A       .       .       PR      GT      0/0     0/0    0/1     0/0     0/0     0/0     0/1     0/0 ..... and so on.

These are the header lines for your reference, in case you need it.
##fileformat=VCFv4.2
##fileDate=20211018
##source=PLINKv1.90
##contig=<ID=20,length=62897890>
##INFO=<ID=PR,Number=0,Type=Flag,Description="Provisional reference allele, may not be based on real reference genome">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT .... followed by sample IDs.

Thanks

Christopher Chang

unread,
Oct 21, 2021, 11:54:24 PM10/21/21
to plink2-users
Okay, my best guess is that the other VCF input to beagle has "chr20" instead of "20" in the #CHROM column, and this is causing the problem.  If that's true, you can fix it by adding "--output-chr chrM" when exporting the VCF with plink.

AritraB

unread,
Oct 22, 2021, 7:59:18 PM10/22/21
to plink2-users
Thank you Chris! The reference file indeed had chr20 instead of 20.
Reply all
Reply to author
Forward
0 new messages