get GTEx variant metadata in parquet format

45 views
Skip to first unread message

jfe...@gmail.com

unread,
Dec 28, 2023, 4:50:57 PM12/28/23
to PrediXcan/MetaXcan
Hi all,

I am trying to get the reference metadata using GTEx data (we have access through dbGAP), however in the tutorial https://github.com/hakyimlab/summary-gwas-imputation/wiki/Reference-Data-Set-Compilation, the only script that I see to get the variant metadata is the following:

python3 $REPO/get_reference_metadata.py \
-genotype $DATA/gtex_v8_eur_filtered.txt.gz \
-annotation $DATA/GTEx_Analysis_2017-06-05_v8_WholeGenomeSeq_838Indiv_Analysis_Freeze.lookup_table.txt.gz \
-filter MAF 0.01 \
-filter TOP_CHR_POS_BY_FREQ \
-output gtex_v8_eur_filtered_maf0.01_monoallelic_variants.txt.gz

But this script outputs a gzipped txt file and the gwas_summary_imputation.py script uses the command --parquet_genotype_metadata (I guess it needs a parquet format), should I convert the text file to parquet file (using pyarrow?) or do you have a script to do it in house?

Thanks a lot
Juan
Reply all
Reply to author
Forward
0 new messages