plink2 --export vcf --ref-from-fa

1,390 views
Skip to first unread message

J. Rodrigo Flores

unread,
Apr 16, 2018, 11:03:38 AM4/16/18
to plink2-users
Hello,

I have not been able to convert a set of plink files into vcf format using plink2 and the option --ref-from-fa, which should read from a fasta file the reference position and write it accordingly along with proper encoding for mayor or minor as alternate alleles,
Here below my line:

~/plink2 --bfile ~/test_example_1.5732473_sample1 --export vcf bgz --ref-from-fa ~/Homo_sapiens_assembly19.fasta -out ~/test_example_1.5732473_sample1_VCF

For simplicity I extracted 1 sample and one problematic position from a larger data set.
It all seems to work, with the following messages in the log file:

1 sample (1 female, 0 males; 1 founder) loaded from
~/test_example_1.5732473_sample1.fam.
1 variant loaded from
~/test_example_1.5732473_sample1.bim.
1 binary phenotype loaded (0 cases, 1 control).
--ref-from-fa: 0 variants changed, 0 validated.
--export vcf bgz to test_example_1.5732473_sample1.vcf.gz ... done.

..... but, .....

bcftools norm --check-ref e --fasta-ref /ebc_data/seqdata/human_reference/Homo_sapiens_assembly19.fasta test_example_1.5732473_sample1.vcf.gz
Reference allele mismatch at 1:5732473 .. REF_SEQ:'G' vs VCF:'C'

I am using PLINK v2.00a1LM 64-bit Intel (11 Feb 2018)
Attached are my testing files,
Am I forgetting or not aware of one option that I should be using, sorry?
Thanks a lot !
Rodrigo Flores
test_example_1.5732473_sample1.bed
test_example_1.5732473_sample1.bim
test_example_1.5732473_sample1.fam

Christopher Chang

unread,
Apr 16, 2018, 11:30:36 AM4/16/18
to plink2-users
If your alleles aren't consistently on the correct strand, --ref-from-fa won't fix that for you; you'll need to identify the flipped regions and use plink 1.x's --flip flag.  Your example is a C/T SNP where the reference allele is G; this is probably a strand issue.

(I do plan to add an easier-to-use version of --flip to plink 2.0, though it may be a while before it's available.)

J. Rodrigo Flores

unread,
Apr 17, 2018, 5:03:11 AM4/17/18
to plink2-users
Thanks a lot,

I have very little experience with the plink format and can not picture right now how can I identify all the SNPs I would neef to flip. Based on a few tests in which I start simply skipping problematic snps, there are many of them,
For possible reference to other users, I'm so far getting around the issue by using bcftools norm --check-ref s , which does the swithing after I convert using plink2 ,
I'll have in mind the allele flipping issue,
Thanks a lot again !
Rodrigo

Christopher Chang

unread,
Apr 17, 2018, 10:15:34 AM4/17/18
to plink2-users
With that approach, I recommend excluding all A/T and C/G SNPs; bcftools will not be able to determine which way they should go.
Reply all
Reply to author
Forward
0 new messages