Dear All,
I'm preparing data for Mendelian randomization (MR) analysis to assess causal effect of telomere length on kidney phenotype in UK Biobank (UKB) data. The following steps were what I have done:
1. I started to search for prior research summary data and found close to 800 SNPs for telomere length.
2. I retained only 24 SNPs with P < 5*10^-8.
3. I extracted those SNPs from UKB imputed genetic data to make **files.vcf**, each of which contains SNP dosage in each chromosome (chr), for example, the code below is just an example for SNPs in chr1:
```
plink2 --bgen ukb22828_c1_b0_v3.bgen --sample ukb22828_c1_b0_v3_s487159.sample --threads 4 --out twas_snp_chr1 --extract TL_snplist_chr1.txt --bgen-annotate ‘ref-first’ --export vcf vcf-dosage=DS-force
```
4. I converted these files (files.vcf) into bfiles (file.fam, file.bim, file.bed). An example code for SNPs of chromosome 1 is below:
```
plink --vcf twas_snp_chr1.vcf --make-bed --out twas_snp_chr1
```
5. I merged those files to make 1 file.bed, 1 file.bim and 1 file.fam, instead of 22 files each.
```
plink --bfile twas_snp_chr1 --merge-list TL_allsnps_allchromosomes.txt --make-bed --out data_TL_4_KD
```
6. I estimated polygenic risk score (PRS) using prior research effect size (beta) from step 2 and genetic dosage files from step 5. This step served doing one-sample MR analysis.
7. To generate summary data for two-sample MR analysis, I performed GWAS to estimate beta and SE of association between each of 24 SNPs and Kidney phenotype (eGFR), using the folllowing code:--
```
plink2 --bfile data/genotypes/data_TL_4_KD --glm hide-covar --pheno data/Pheno_KFs.txt --pheno-name LogeGFRcrea --covar data/Covariatesdata --covar-name PC{1..10}, Age, Sex --out output/GWAS_eGFRcrea.cvrt
```
However, the GWAS results showed NA for each size of about half of SNPs as shown below: [Attach1]
When I checked dosages in text.file, it also showed 0 dosage for many SNPs: [Attach2]
So, my questions are:
1. Why did the dosage of SNPs show 0? Is that because some SNPs actually have zero dosage or because it could be incorrect in one of the above steps?
2. If so, how to address this issue in order to generate effect size (beta and SE) for summary-level data for 2-sample MR analysis?
I searched this issue from various communities (Plink users, StackoverFlow, Bioconductor Community, etc), but I could not find a solution, therefore appreciating any of your help.
You received this message because you are subscribed to the Google Groups "plink2-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to plink2-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/plink2-users/fbd73197-b997-4480-9a62-eea1d8d749a4n%40googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/plink2-users/6d4c4a30-1113-49e2-b109-2c05a3c01f41n%40googlegroups.com.
plink2 --pfile c1 --validate #after all, all files (c1, c2, c3, ..., c22 were all good, without any corrupted problem)