Hi there,
I have two separate sets of gwas data ('A' and 'B') with a file per chr.
I am running a targeted gene/pathway analysis so I have filtered out snps per each gene location (209 snps = 209 filtered gene-specific files).
For dataset A I ran the following commands:
1) ./plink --vcf
chr10A.vcf.gz --double-id --chr 10
--from-bp 90740555 --to-bp
90786816 --make-bed --out fas_A --vcf-half-call m
2) ./plink2 --bfile fas_A --rm-dup exclude-mismatch
--make-bed --out fas_A_new
3) ./plink2 --bfile fas_A_new --update-name fas.txt
--recode A --out fas_A_rs
When I open the .RAW file I can see the column names (mostly) changed and everything worked, great! (the first lines of columns shown below)
FID IID PAT MAT SEX PHENOTYPE rs547842383_C rs182767910_G rs533270433_T rs554505977_A rs576507671_C rs117347262_G rs1286675848_G rs4934433_C rs11202917_G rs199687700_CT rs186902729_G rs192509435_C rs183542310_G rs6586161_T rs116892683_C rs149555712_G rs2862834_G rs553707684_G rs577130308_G rs7906322_C rs7082101_A rs193021099_T rs545840760_G rs147509588_T rs189717085_G rs78020557_A rs114593289_T rs7069841_C rs11202918_A
For dataset B, when I ran the plink command #1 from above I hit terminal errors on almost all of the chr files for having 'pathologically long variants'. I spoke to the creators of data and they suggest I try the bgen files instead of the vcf files
So I ran the following commands:
1) ./bgenix -g chr10B.bgen -vcf -incl-range 10:
90740555-90786816 > fas_B.vcf
2) ./plink2 --vcf fas_B.vcf --out fas_B
3) ./plink2 --pfile fas_B --rm-dup exclude-mismatch --make-pgen --out fas_B_new
4) ./plink2 --pfile fas_B_new --update-name fas.txt --recode A --out fas_B_rs
Now when I open the .RAW file, the --update-name clearly didn't work (see below). I have tried multiple variations with plink1 and 2 as well as bfile and pfile. I'm wondering if the issue I'm having could be related to comma separators in the B gwas datafiles?
FID IID PAT MAT SEX PHENOTYPE 10:90740747_G_A,10:90740747_G_A_G 10:90740815_T_C,10:90740815_T_C_T 10:90740953_C_A,10:90740953_C_A_C 10:90741122_G_A,10:90741122_G_A_G 10:90741259_T_A,10:90741259_T_A_T 10:90741374_A_G,10:90741374_A_G_A 10:90741615_C_A,10:90741615_C_A_C 10:90741955_T_C,10:90741955_T_C_T 10:90742008_A_G,10:90742008_A_G_A 10:90742049_C_T,10:90742049_C_T_C 10:90742066_A_G,10:90742066_A_G_A 10:90742467_G_C,10:90742467_G_C_G 10:90742624_A_G,10:90742624_A_G_A 10:90742982_C_T,10:90742982_C_T_C 10:90743197_G_A,10:90743197_G_A_G 10:90743317_G_A,10:90743317_G_A_G 10:90743323_C_A,10:90743323_C_A_C
If you have any suggestions, please let me know! I really need to get my snp columns in rsID format for analysis. Thanks in advance!
Best,
Hannah