Model training. Convert VCF file to PrediXcan genotype dosage format

119 views
Skip to first unread message

Anh Do

unread,
Jul 12, 2023, 6:07:50 PM7/12/23
to PrediXcan/MetaXcan
Hi all,

Do you know if any PrediXcan script available to convert VCF file into PrediXcan genotype dosage to run model training?
The PrediXcan genotype dosage described at 
with variants in rows and samples in columns.
Do we suppose to use filter_and_covert()? It doesn't work for me.  I got the error message as below.
"[W::bcf_sr_add_reader] No BGZF EOF marker; file '*.vcf.gz' may be truncated
[W::bcf_hdr_check_sanity] GL should be declared as Number=G
[W::vcf_parse] Contig '1' is not defined in the header. (Quick workaround: index the file with tabix.)
[W::vcf_parse_format] FORMAT 'GT' is not defined in the header, assuming Type=String
Undefined tags in the header, cannot proceed in the sample subset mode.
[W::bcf_hdr_check_sanity] GL should be declared as Number=G"

Thanks very much.

Anh

Festus

unread,
Jul 17, 2023, 10:32:14 AM7/17/23
to PrediXcan/MetaXcan
Hi Anh,

You can use this approach.

VCF > Plink BED > Plink Text format.
Using the plink --recode A-transpose generates an additive file which you can can load into R together with the plink bim file to format the data into the format required to train the model.

Regards,

Festus

Festus

unread,
Jul 17, 2023, 10:39:42 AM7/17/23
to PrediXcan/MetaXcan
The filter and convert is a custom function here which you can use too.
Screenshot 2023-07-17 at 9.38.26 AM.png

Anh Do

unread,
Jul 18, 2023, 2:11:25 PM7/18/23
to PrediXcan/MetaXcan
Thanks, Festus. The Plink approach works nicely.

Anh
Reply all
Reply to author
Forward
0 new messages