--update-name not working - possibly because of comma delim?

32 views
Skip to first unread message

Hannah Mandle

unread,
May 11, 2023, 2:21:27 PM5/11/23
to plink2-users
Hi there,

I have two separate sets of gwas data ('A' and 'B') with a file per chr. 
I am running a targeted gene/pathway analysis so I have filtered out snps per each gene location (209 snps = 209 filtered gene-specific files).

For dataset A I ran the following commands:

1) ./plink --vcf chr10A.vcf.gz   --double-id  --chr 10   --from-bp 90740555   --to-bp 90786816   --make-bed   --out fas_A  --vcf-half-call m
2) ./plink2 --bfile fas_A  --rm-dup exclude-mismatch --make-bed --out fas_A_new
3) ./plink2 --bfile fas_A_new --update-name fas.txt  --recode A --out fas_A_rs

When I open the .RAW file I can see the column names (mostly) changed and everything worked, great! (the first lines of columns shown below)

FID IID PAT MAT SEX PHENOTYPE rs547842383_C rs182767910_G rs533270433_T rs554505977_A rs576507671_C rs117347262_G rs1286675848_G rs4934433_C rs11202917_G rs199687700_CT rs186902729_G rs192509435_C rs183542310_G rs6586161_T rs116892683_C rs149555712_G rs2862834_G rs553707684_G rs577130308_G rs7906322_C rs7082101_A rs193021099_T rs545840760_G rs147509588_T rs189717085_G rs78020557_A rs114593289_T rs7069841_C rs11202918_A

For dataset B, when I ran the plink command #1 from above I hit terminal errors on almost all of the chr files for having 'pathologically long variants'. I spoke to the creators of data and they suggest I try the bgen files instead of the vcf files

So I ran the following commands:

1) ./bgenix -g chr10B.bgen -vcf -incl-range 10: 90740555-90786816  > fas_B.vcf
2) ./plink2 --vcf fas_B.vcf --out fas_B
3) ./plink2 --pfile fas_B  --rm-dup exclude-mismatch --make-pgen --out fas_B_new
4) ./plink2 --pfile fas_B_new --update-name fas.txt  --recode A --out fas_B_rs

Now when I open the .RAW file, the --update-name clearly didn't work (see below). I have tried multiple variations with plink1 and 2 as well as bfile and pfile. I'm wondering if the issue I'm having could be related to comma separators in the B gwas datafiles?

FID IID PAT MAT SEX PHENOTYPE 10:90740747_G_A,10:90740747_G_A_G 10:90740815_T_C,10:90740815_T_C_T 10:90740953_C_A,10:90740953_C_A_C 10:90741122_G_A,10:90741122_G_A_G 10:90741259_T_A,10:90741259_T_A_T 10:90741374_A_G,10:90741374_A_G_A 10:90741615_C_A,10:90741615_C_A_C 10:90741955_T_C,10:90741955_T_C_T 10:90742008_A_G,10:90742008_A_G_A 10:90742049_C_T,10:90742049_C_T_C 10:90742066_A_G,10:90742066_A_G_A 10:90742467_G_C,10:90742467_G_C_G 10:90742624_A_G,10:90742624_A_G_A 10:90742982_C_T,10:90742982_C_T_C 10:90743197_G_A,10:90743197_G_A_G 10:90743317_G_A,10:90743317_G_A_G 10:90743323_C_A,10:90743323_C_A_C

If you have any suggestions, please let me know! I really need to get my snp columns in rsID format for analysis. Thanks in advance!

Best,
Hannah

Hannah Mandle

unread,
May 11, 2023, 2:23:03 PM5/11/23
to plink2-users
It wasn't until I posted the data for B in this message that I see the variants are tab delim but the actual individual variants and major and minor alleles? Is this a possibly easier fix? Thanks again!

Hannah Mandle

unread,
May 12, 2023, 12:15:33 PM5/12/23
to plink2-users
Any suggestions on how to change the variants names from 10:90740953_C_A,10:90740953_C_A_C to rsIDs? I know how to do the --update-name but I am not sure what the reference variant name should be (I have tried both  10:90740953  and rsID as well as 10:90740953_C_A,10:90740953_C_A_C and rsID but neither have worked).

Please let me know if you have any ideas! 
Thank you!!

Christopher Chang

unread,
May 12, 2023, 9:15:06 PM5/12/23
to plink2-users
Please post a full .log and set of input files indicating EXACTLY what is not working.
Reply all
Reply to author
Forward
Message has been deleted
0 new messages