Assistance Needed with Converting VCF File for BOLT-LMM Usage

70 views
Skip to first unread message

syednaje...@gmail.com

unread,
Jul 23, 2023, 6:44:27 AM7/23/23
to plink2-users
Dear PLINK Community,

I hope this message finds you well. I have encountered a challenge while attempting to convert a VCF file into a format suitable for association analyses using BOLT-LMM. Despite several days of effort, I have been unable to find a solution.

I aim to convert my VCF file into a format representing imputed SNPs in a 2-dosage design, as suggested on the BOLT-LMM documentation page, using the command plink2 --dosage format=2.

Section: 5.1.2

So far, I have used the following command to create a PLINK binary dataset:

plink2 --vcf mergedTemp.vcf dosage=DS --make-bed --out mitoDosage

However, my next step is to produce a file in the --dosage format=2 format, and this is where I've hit a stumbling block.

Could anyone please guide how I should proceed to generate a file in this specific format? I have attempted several alternatives, but none have yielded the desired results.

Any help would be greatly appreciated.

Best regards,
Najeeb

syednaje...@gmail.com

unread,
Jul 23, 2023, 6:56:50 AM7/23/23
to plink2-users
I have also cross-posted this question at :

https://www.biostars.org/p/9570180/

Christopher Chang

unread,
Jul 23, 2023, 12:15:16 PM7/23/23
to plink2-users
That section of the BOLT-LMM manual was aimed at plink 1.9 rather than 2.0, when dosage support was not well-integrated with the rest of plink.

With plink 2.0, you should target the BGEN v1.2 file format; the following command should work:

  plink2 --vcf mergedTemp.vcf dosage=DS --export bgen-1.2 bits=8 --out mitoDosage

Or if you want to keep a plink2-formatted fileset around as well, add --make-pgen.  Do NOT use --make-bed here: it is not capable of storing dosages.

Syed Najeeb Ashraf

unread,
Jul 24, 2023, 4:12:16 AM7/24/23
to Christopher Chang, plink2-users
Dear Christopher,

Thanks for the reply.

I want to prepare my file specifically for this section. I have a vcf
file where DS is the dosage probability for each sample.

Imputed SNPs in 2-dosage format. You may also specify imputed SNPs as
output by the Ricopili pipeline and plink2 --dosage format=2. This
file format consists of file pairs: (1) PLINK map files containing
information about SNP locations; and (2) genotype probability files in
the 2-dosage format, which consists of a header line

SNP A1 A2 [FID IID] x N

followed by one line per SNP in the format

rsID allele1 allele0 [p(11) p(10)] x N

The third genotype probability for each entry is assumed to be
p(00)=1-p(11)-p(10) (unlike with the IMPUTE2 format).

To compute association statistics at SNPs in a list of 2-dosage files,
you may list the files within a --dosage2FileList file. Each line of
this file should contain two entries: a PLINK map file followed by the
corresponding genotype file containing probabilities for those SNPs.
(As usual, if either file ends with .gz, it is automatically unzipped;
otherwise it is assumed to be plain text.) See the example/
subdirectory for an example.
> --
> You received this message because you are subscribed to a topic in the Google Groups "plink2-users" group.
> To unsubscribe from this topic, visit https://groups.google.com/d/topic/plink2-users/yrDZO7Ba7DY/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to plink2-users...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/plink2-users/3cfdcad6-965f-48f9-a3f5-9ca5828be9a8n%40googlegroups.com.



--
Dr. Najeeb Ashraf Syed, Ph.D., M.tech, M.Sc (Bioinformatics)
Google Scholar

Chris Chang

unread,
Jul 24, 2023, 9:45:17 AM7/24/23
to Syed Najeeb Ashraf, plink2-users
Please reread my response.
Reply all
Reply to author
Forward
0 new messages