About generate imputation reference data

57 views
Skip to first unread message

wei

unread,
Jan 7, 2024, 12:18:52 PM1/7/24
to PrediXcan/MetaXcan

Hello everyone, I am attempting to generate reference data for harmonization and imputation of EAS using 1000Genome .

My steps are as follows:

  1. 1. Utilizing 1000G_hg38_eur_conversion.sh to extract the VCF for EAS
  2. 2. Employing 1000G_model_training_to_parquet.sh to convert the VCF to Parquet.
  3. 3. Assembling the Parquet files into variant metadata.

However, I encountered a problem in the second step as I am unsure about the format of the annotation file.

I attempted to use the GTEx_Analysis_2017-06-05_v8_WholeGenomeSeq_838Indiv_Analysis_Freeze.lookup_table.txt.gz mentioned in the tutorial. However, this file lacks allele frequency, which differs from what the tutorial describes.

Has anyone successfully created their own metadata? How can I resolve this issue?

Any feedback would be greatly appreciated. Thank you.

wei

unread,
Jan 8, 2024, 3:19:57 AM1/8/24
to PrediXcan/MetaXcan
I have all ready resolved my problem
I found that I need to conduct "test_1000G_to_model_training.sh" by predixcan_format_to_model_training.py before the step2
Conducting test_1000G_to_model_training.sh will generate two file which are genotype and annotation


wei 在 2024年1月8日 星期一凌晨1:18:52 [UTC+8] 的信中寫道:

wei

unread,
Jan 13, 2024, 12:13:31 PM1/13/24
to PrediXcan/MetaXcan
I apologize for frequent disruptions. 
However, I would like to verify if using my own metadata would perform better (more accurately) in the imputation process compared to using EUR metadata. 
I intend to conduct simulation and evaluate parameter as raw bias, percentage bias, and coverage rate. 
Can anyone provide suggestions on the workflow and code for this purpose?

wei 在 2024年1月8日 星期一下午4:19:57 [UTC+8] 的信中寫道:
Reply all
Reply to author
Forward
0 new messages