Having a strange file read failure.
--bfile /godot/1000genomes/1000GP_Phase3/ALL.chr6.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes \
--snps rs9273445 rs11568830 rs11568831 rs28746851 rs9271685 rs9271689 rs9272416 rs9274407 rs9274529 rs9274654 rs9274655 rs9274656 rs9274660 rs9271488 rs9271640 rs9271720
Works fine
--bfile /godot/1000genomes/1000GP_Phase3/ALL.chr6.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes \
--snps rs9273445 rs9272346 rs1063355 rs28383344 rs2858864 rs3828800 rs532098 rs674313 rs9271408 rs9271488 rs9271640 rs9271720 rs9271775 rs9274407
Fails with:
(C) 2005-2017 Shaun Purcell, Christopher Chang GNU General Public License v3
--bfile /godot/1000genomes/1000GP_Phase3/ALL.chr6.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes
--snps rs9273445 rs9272346 rs1063355 rs28383344 rs2858864 rs3828800 rs532098 rs674313 rs9271408 rs9271488 rs9271640 rs9271720 rs9271775 rs9274407
32119 MB RAM detected; reserving 16059 MB for main workspace.
Allocated 1606 MB successfully, after larger attempt(s) failed.
14 out of 4997829 variants loaded from .bim file.
2504 people (0 males, 0 females, 2504 ambiguous) loaded from .fam.
Ambiguous sex IDs written to plink.nosex .
Using up to 8 threads (change this with --threads).
Before main variant filters, 2504 founders and 0 nonfounders present.
Calculating allele frequencies... done.
14 variants and 2504 people pass filters and QC.
Note: No phenotypes present.
--r2 in-phase dprime... 0% [processing]
Error: File read failure.
Removing the last three snps in --snps seems to fix the error, but it isn't just a number of snps issue, because the top one works fine. Also, reshuffling the snps in the non-working version doesn't change anything.
The file is the 1000genomes chromosome 6 file. I only ever get this error with this file, the other chromosome files work fine. However, I expect more snps on chromosome 6 because of the HLA, so it might just be a numbers thing.
The bfile is built with:
i=ALL.chr6.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz;
j=$(echo $i | sed 's/.vcf.gz//');
vftools --gzvcf $i --plink-tped --out $j &&
plink --tped $j.tped --tfam $j.tfam --out $j
I tried doing that a few times, it doesn't fix the issue. The VCF is straight of the EBI server.
Not sure how to go about debugging this. Any suggestions?
Thanks!