Dear Christopher,
Thank you very much for your answer.
Okay, you are saying I am doing an unnecessary step with my commands (two times converting from VCF to plink2-formatted fileset). But the commands itself (beside being a bit wasteful) aren't incorrect?
I actually don't need to run the first command for my pipeline.
So if I run
plink2 --vcf crcsurvival_chr20.vcf --maf 0.02 --recode vcf it is correct?
Sry, this is all very new to me, I am trying to keep it as simple as possible for myself.
Also one other question:
Afterwards I am going to run Predict.py (from PrediXcan) with my VCF files & MASHR Whole Blood to predict the gene expression.
After doing above changes to my VCF files (alternative allele frequency <0.02 and >0.98 have been excluded) strangely predict.py gives me an error:
INFO - Loading samples
INFO - Loading model
INFO - Acquiring on-the-fly mapping
INFO - Preparing genotype dosages
INFO - Acquiring liftover conversion
INFO - Setting whitelist from available models
INFO - Processing genotypes
INFO - Preparing prediction
INFO - Couldn't import h5py_cache. Anyway, this dependency should be removed. It has been folded into h5py
Level 9 - Processing vcfs
Level 9 - Processing vcf data/untared/Ready_VCF_files/crcsurvival_chr20_filtered.vcf
Traceback (most recent call last):
File "MetaXcan/software/Predict.py", line 272, in <module>
run(args)
File "MetaXcan/software/Predict.py", line 178, in run
raise e
RuntimeError: Missing DS field when vcf mode is imputed
I have also contacted the Google Groups PrediXcan for this issue, but I haven't had a reply unfortunately, maybe you have an idea as well.
Thank you so much for your support, I really appreciate it.
Best Regards,
Lea