Need Assistance: Retrieving rsID in PLINK Association Analysis Output

905 views
Skip to first unread message

Say_t _

unread,
Oct 12, 2023, 10:29:31 PM10/12/23
to plink2-users

Hello,

I have run a linear association analysis using PLINK v1.9 with the following command:

plink --bfile HEXA --pheno HEXA_entire.txt --pheno-name FLI --linear --ci 0.95 --covar HEXA_entire.txt --covar-name AGE,SEX,BMI --allow-no-sex --hide-covar --snps-only just-acgt --out FLI_HEXA_entire

However, my resulting association file (.assoc) only contains the columns for CHR, SNP, BP, A1, etc., but not the rsID for the SNPs.

Can you provide guidance on how I might obtain the rsID information in my output? I'd appreciate any assistance or references to relevant documentation that might guide me in this matter.

and SNP from my result is like 1:751343_T/A .. is it normal?

I appreciate it and look forward to the answer!

Sei Kim

Christopher Chang

unread,
Oct 13, 2023, 7:06:47 AM10/13/23
to plink2-users
It is normal for variant IDs in PLINK files to just contain position/allele information rather than be rsIDs.  In fact, this practice is now encouraged over rsIDs because it avoids duplicate-ID problems that rsIDs often afflict you with.

However, there of course comes a time when you want to convert back to rsIDs.
- If your file is using GRCh37 reference coordinates, you can use PLINK 2.0's --recover-var-ids command on the GRCh37 1000 Genomes phase 3 .pvar.zst file from https://www.cog-genomics.org/plink/2.0/resources#phase3_1kg , which has rsIDs.
- For GRCh38, you can go to the source: see the latest-release (dbSNP 156) VCF downloads under https://www.ncbi.nlm.nih.gov/snp/docs/RefSNP_about/ , and select the GCF_000001405.40.gz file which corresponds to GRCh38.  Unfortunately, the chromosome codes in the VCF require conversion ("NC_000001.11" means chr1, etc.; see https://www.ncbi.nlm.nih.gov/grc/human/data for a conversion table) before that file is usable with --recover-var-ids.

I will look into providing a more convenient solution for the 1000-Genomes-phase-3-overlapping subset of GRCh38.

Dominick A. Leone

unread,
Oct 13, 2023, 8:50:17 AM10/13/23
to Christopher Chang, plink2-users
Sei,

There are bioinformatic tools like ANNOVAR that have rsID, but they all require manipulation of files to get what you want. I think Christopher’s recommendation might be most straightforward if you’re using PLINK2 — thank you Christopher!


Best,
Dominick Leone, MPH, MS
Doctoral Candidate, Epidemiology Department
Chronic Kidney Disease in Central America Research Group    
Boston University School of Public Health

801 Massachusetts Avenue
Biostatistics Dept; Suite 345K
Boston, MA 02118
 
Phone: (617) 893-9493
 
THINK. TEACH. DO.
FOR THE HEALTH OF ALL.





On Oct 13, 2023, at 7:06 AM, Christopher Chang <chrch...@gmail.com> wrote:

It is normal for variant IDs in PLINK files to just contain position/allele information rather than be rsIDs.  In fact, this practice is now encouraged over rsIDs because it avoids duplicate-ID problems that rsIDs often afflict you with.

However, there of course comes a time when you want to convert back to rsIDs.
- If your file is using GRCh37 reference coordinates, you can use PLINK 2.0's --recover-var-idscommand on the GRCh37 1000 Genomes phase 3 .pvar.zst file from https://www.cog-genomics.org/plink/2.0/resources#phase3_1kg , which has rsIDs.

- For GRCh38, you can go to the source: see the latest-release (dbSNP 156) VCF downloads under https://www.ncbi.nlm.nih.gov/snp/docs/RefSNP_about/ , and select the GCF_000001405.40.gz file which corresponds to GRCh38.  Unfortunately, the chromosome codes in the VCF require conversion ("NC_000001.11" means chr1, etc.; see https://www.ncbi.nlm.nih.gov/grc/human/data for a conversion table) before that file is usable with --recover-var-ids.

I will look into providing a more convenient solution for the 1000-Genomes-phase-3-overlapping subset of GRCh38.

On Thursday, October 12, 2023 at 7:29:31 PM UTC-7 kimse...@gmail.com wrote:

Hello,

I have run a linear association analysis using PLINK v1.9 with the following command:

plink --bfile HEXA --pheno HEXA_entire.txt --pheno-name FLI --linear --ci 0.95 --covar HEXA_entire.txt --covar-name AGE,SEX,BMI --allow-no-sex --hide-covar --snps-only just-acgt --out FLI_HEXA_entire

However, my resulting association file (.assoc) only contains the columns for CHR, SNP, BP, A1, etc., but not the rsID for the SNPs.

Can you provide guidance on how I might obtain the rsID information in my output? I'd appreciate any assistance or references to relevant documentation that might guide me in this matter.

and SNP from my result is like 1:751343_T/A .. is it normal?

I appreciate it and look forward to the answer!

Sei Kim


-- 
You received this message because you are subscribed to the Google Groups "plink2-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to plink2-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/plink2-users/9aa409d5-ba22-4a39-90a6-de1ce3c9227en%40googlegroups.com.

Christopher Chang

unread,
Oct 16, 2023, 6:36:28 PM10/16/23
to plink2-users

Say_t _

unread,
Oct 17, 2023, 3:19:09 AM10/17/23
to plink2-users
Dear Author,
I really appreciate your kind help.
It helped me a lot.

Best,
Sei Kim.

Azeem Javed

unread,
Oct 17, 2023, 4:29:29 AM10/17/23
to Say_t _, plink2-users
For Grch38 I used B's genome package in R...
For this you have to install.packages(bsgenome)
Then available.Snps() function will show you the latest available packages of db snps with rs IDs....install that...
I found this very easy

--
You received this message because you are subscribed to the Google Groups "plink2-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to plink2-users...@googlegroups.com.
Message has been deleted
Message has been deleted

Say_t _

unread,
Oct 19, 2023, 7:42:30 AM10/19/23
to plink2-users
Hello everyone.

my file is based on GRCh37.

I updated my file with plink2  --recover-var-ids  command.
After that, my assoc file shows rsID!!!!

I really appreciate yours' kindness.

Hope you have a gooood day
Thank you!!

Best, Sei

Dominick A. Leone

unread,
Oct 19, 2023, 11:25:48 AM10/19/23
to Say_t _, plink2-users
If you have a file with columns for SNPs with chromosome position ref and alt alleles and a column with rsID, then you can use R and merge the regression results (by a generic SNP ID … eg chr1:10000:A:C) with your file that has the rsID:

1. import regression and rsID files into R as data frames.
2. create a variable generic_ID in both data frames: paste(chromosome, position, ref, alt, sep = “:”) 
3. merge regression and rsID data frames by “generic_ID” — you probably dont want a full merge but instead only keep the merge results for SNPs in your regression results.
4. Export results to CSV or excel — e.g. write.csv()

If you prefer to use Python you can use pandas … i’m sure there are other programming languages you could use.

Dominick Leone, MPH, MS
Doctoral Candidate, Epidemiology Department
Chronic Kidney Disease in Central America Research Group    
Boston University School of Public Health

801 Massachusetts Avenue
Biostatistics Dept; Suite 345K
Boston, MA 02118
 
Phone: (617) 893-9493
 
THINK. TEACH. DO.
FOR THE HEALTH OF ALL.





On Oct 19, 2023, at 7:27 AM, Say_t _ <kimse...@gmail.com> wrote:

Hello, again.

Sorry for I could not do it by myself...
I did it and got the result again like below.

I updated my bfile with plink2  --recover-var-ids.
but still assoc.linear file dose not show any rsID.
If you can help me out, let me know how to do it.

maybe my   --recover-var-ids KARE2.bim strict-bim-order partial command isn't right?

I really appreciate all the time.

Best,
Sei Kim





PLINK v2.00a5.3 M1 (18 Oct 2023)               www.cog-genomics.org/plink/2.0/

(C) 2005-2023 Shaun Purcell, Christopher Chang   GNU General Public License v3

Logging to KARE3.log.

Options in effect:

  --bfile KARE2

  --make-bed

  --out KARE3

  --recover-var-ids KARE2.bim strict-bim-order partial


Start time: Thu Oct 19 20:00:16 2023

24576 MiB RAM detected; reserving 12288 MiB for main workspace.

Using up to 8 compute threads.

5493 samples (0 females, 0 males, 5493 ambiguous; 5493 founders) loaded from

KARE2.fam.

5417091 variants loaded from KARE2.bim.

Note: No phenotype data present.

--recover-var-ids: 5417091 lines scanned.

--recover-var-ids: 5417091/5417091 IDs updated.

Writing KARE3.fam ... done.

Writing KARE3.bim ... done.

Writing KARE3.bed ... done.

End time: Thu Oct 19 20:00:23 2023

(base) sei@Seiui-MacBookAir ~ % Desktop/plink --bfile KARE3 --pheno  KARE_male.txt --pheno-name FLI --linear --ci 0.95 --covar KARE_male.txt --covar-name AGE,BMI --allow-no-sex  --hide-covar --snps-only just-acgt  --out FLI_KARE3_male

PLINK v1.90b7 64-bit (16 Jan 2023)             www.cog-genomics.org/plink/1.9/

(C) 2005-2023 Shaun Purcell, Christopher Chang   GNU General Public License v3

Logging to FLI_KARE3_male.log.

Options in effect:

  --allow-no-sex

  --bfile KARE3

  --ci 0.95

  --covar KARE_male.txt

  --covar-name AGE,BMI

  --hide-covar

  --linear

  --out FLI_KARE3_male

  --pheno KARE_male.txt

  --pheno-name FLI

  --snps-only just-acgt


Note: --hide-covar flag deprecated.  Use e.g. "--linear hide-covar".

24576 MB RAM detected; reserving 12288 MB for main workspace.

5417091 variants loaded from .bim file.

5493 people (0 males, 0 females, 5493 ambiguous) loaded from .fam.

Ambiguous sex IDs written to FLI_KARE3_male.nosex .

1748 phenotype values present after --pheno.

Using 1 thread (no multithreaded calculations invoked).

--covar: 2 out of 4 covariates loaded.

3745 people were not seen in the covariate file.

Before main variant filters, 5493 founders and 0 nonfounders present.

Calculating allele frequencies... done.

Total genotyping rate is exactly 1.

5417091 variants and 5493 people pass filters and QC.

Phenotype data is quantitative.

Writing linear model association results to FLI_KARE3_male.assoc.linear ...

done.


On Tuesday, October 17, 2023 at 5:29:29 PM UTC+9 Azeem Javed wrote:
Reply all
Reply to author
Forward
0 new messages