About generate imputation reference data

wei

unread,

Jan 7, 2024, 12:18:52 PM1/7/24

to PrediXcan/MetaXcan

Hello everyone, I am attempting to generate reference data for harmonization and imputation of EAS using 1000Genome .

My steps are as follows:

1. Utilizing 1000G_hg38_eur_conversion.sh to extract the VCF for EAS
2. Employing 1000G_model_training_to_parquet.sh to convert the VCF to Parquet.
3. Assembling the Parquet files into variant metadata.

However, I encountered a problem in the second step as I am unsure about the format of the annotation file.

I attempted to use the GTEx_Analysis_2017-06-05_v8_WholeGenomeSeq_838Indiv_Analysis_Freeze.lookup_table.txt.gz mentioned in the tutorial. However, this file lacks allele frequency, which differs from what the tutorial describes.

Has anyone successfully created their own metadata? How can I resolve this issue?

Any feedback would be greatly appreciated. Thank you.

wei

unread,

Jan 8, 2024, 3:19:57 AM1/8/24

to PrediXcan/MetaXcan

I have all ready resolved my problem

I found that I need to conduct "test_1000G_to_model_training.sh" by predixcan_format_to_model_training.py before the step2

Conducting test_1000G_to_model_training.sh will generate two file which are genotype and annotation

wei 在 2024年1月8日星期一凌晨1:18:52 [UTC+8] 的信中寫道：

wei

unread,

Jan 13, 2024, 12:13:31 PM1/13/24

to PrediXcan/MetaXcan

I apologize for frequent disruptions.

However, I would like to verify if using my own metadata would perform better (more accurately) in the imputation process compared to using EUR metadata.

I intend to conduct simulation and evaluate parameter as raw bias, percentage bias, and coverage rate.