--update-name not working - possibly because of comma delim?

Hannah Mandle

unread,

May 11, 2023, 2:21:27 PM5/11/23

to plink2-users

Hi there,

I have two separate sets of gwas data ('A' and 'B') with a file per chr.

I am running a targeted gene/pathway analysis so I have filtered out snps per each gene location (209 snps = 209 filtered gene-specific files).

For dataset A I ran the following commands:

1) ./plink --vcf chr10A.vcf.gz --double-id --chr 10 --from-bp 90740555 --to-bp 90786816 --make-bed --out fas_A --vcf-half-call m

2) ./plink2 --bfile fas_A --rm-dup exclude-mismatch --make-bed --out fas_A_new

3) ./plink2 --bfile fas_A_new --update-name fas.txt --recode A --out fas_A_rs

When I open the .RAW file I can see the column names (mostly) changed and everything worked, great! (the first lines of columns shown below)

FID IID PAT MAT SEX PHENOTYPE rs547842383_C rs182767910_G rs533270433_T rs554505977_A rs576507671_C rs117347262_G rs1286675848_G rs4934433_C rs11202917_G rs199687700_CT rs186902729_G rs192509435_C rs183542310_G rs6586161_T rs116892683_C rs149555712_G rs2862834_G rs553707684_G rs577130308_G rs7906322_C rs7082101_A rs193021099_T rs545840760_G rs147509588_T rs189717085_G rs78020557_A rs114593289_T rs7069841_C rs11202918_A

For dataset B, when I ran the plink command #1 from above I hit terminal errors on almost all of the chr files for having 'pathologically long variants'. I spoke to the creators of data and they suggest I try the bgen files instead of the vcf files

So I ran the following commands:

1) ./bgenix -g chr10B.bgen -vcf -incl-range 10: 90740555-90786816 > fas_B.vcf

2) ./plink2 --vcf fas_B.vcf --out fas_B

3) ./plink2 --pfile fas_B --rm-dup exclude-mismatch --make-pgen --out fas_B_new

4) ./plink2 --pfile fas_B_new --update-name fas.txt --recode A --out fas_B_rs

Now when I open the .RAW file, the --update-name clearly didn't work (see below). I have tried multiple variations with plink1 and 2 as well as bfile and pfile. I'm wondering if the issue I'm having could be related to comma separators in the B gwas datafiles?

FID IID PAT MAT SEX PHENOTYPE 10:90740747_G_A,10:90740747_G_A_G 10:90740815_T_C,10:90740815_T_C_T 10:90740953_C_A,10:90740953_C_A_C 10:90741122_G_A,10:90741122_G_A_G 10:90741259_T_A,10:90741259_T_A_T 10:90741374_A_G,10:90741374_A_G_A 10:90741615_C_A,10:90741615_C_A_C 10:90741955_T_C,10:90741955_T_C_T 10:90742008_A_G,10:90742008_A_G_A 10:90742049_C_T,10:90742049_C_T_C 10:90742066_A_G,10:90742066_A_G_A 10:90742467_G_C,10:90742467_G_C_G 10:90742624_A_G,10:90742624_A_G_A 10:90742982_C_T,10:90742982_C_T_C 10:90743197_G_A,10:90743197_G_A_G 10:90743317_G_A,10:90743317_G_A_G 10:90743323_C_A,10:90743323_C_A_C

If you have any suggestions, please let me know! I really need to get my snp columns in rsID format for analysis. Thanks in advance!

Best,

Hannah

Hannah Mandle

unread,

May 11, 2023, 2:23:03 PM5/11/23

to plink2-users

It wasn't until I posted the data for B in this message that I see the variants are tab delim but the actual individual variants and major and minor alleles? Is this a possibly easier fix? Thanks again!

Hannah Mandle

unread,

May 12, 2023, 12:15:33 PM5/12/23

to plink2-users

Any suggestions on how to change the variants names from 10:90740953_C_A,10:90740953_C_A_C to rsIDs? I know how to do the --update-name but I am not sure what the reference variant name should be (I have tried both 10:90740953 and rsID as well as 10:90740953_C_A,10:90740953_C_A_C and rsID but neither have worked).

Please let me know if you have any ideas!

Thank you!!

Christopher Chang

unread,

May 12, 2023, 9:15:06 PM5/12/23

to plink2-users

Please post a full .log and set of input files indicating EXACTLY what is not working.

Reply all

Reply to author

Forward

Message has been deleted