--update-name mixes chr:pos with rsID

Diandra Brkic

unread,

Feb 3, 2021, 1:43:10 PM2/3/21

to plink2-users

Hi there,

I am a plink beginner. I was trying to recode chr:pos (from my .vcf file) to the SNP rsID and I think there is something very odd with the --out file. The steps I've followed were

1. create a list of SNPs which have a MAF > 0.1% and Imputation R2 > 0.3 from .info file

2. download the re file from https://www.ncbi.nlm.nih.gov/variation/docs/human_variation_vcf/

3. deal with duplicates with plink2 --rm-dup force-first list --make-bed --out name.nodup

4. --exclude duplist

5. --update-name textfile.txt

Now the output of step 5 looks very weird with V1 having rsIDs mixed with chr:pos (picture attached)

my question would be is this normal? and should I remove the chr:pos before merging with the ref data?

THANK YOU

Diandra

Screenshot 2021-02-03 at 19.41.58.png

Christopher Chang

unread,

Feb 3, 2021, 1:54:40 PM2/3/21

to plink2-users

A more common approach is to use --set-all-var-ids to make *all* variants have position/allele based IDs, and then use --recover-var-ids whenever you want to return to rsIDs.

Diandra Brkic

unread,

Feb 3, 2021, 3:00:15 PM2/3/21

to plink2-users

Ok cool, thank you. I know this is more common, but to be able to merge with 1000 genome ref sample, and subsequently perform my --assoc analysis with a subset of SNPs that I am interested in, I need my data to be in snp rsID format and not in chr:pos... if that makes sense?

Just to make sure I got this right: this means in step 3 add --set-all-var-ids. so it would be:

plink2 --rm-dup force-first list --set-all-var-ids --make-bed --out name.nodup ; right?

thanks again for your help

Christopher Chang

unread,

Feb 3, 2021, 3:16:39 PM2/3/21

to plink2-users

* You need to specify a template string for --set-all-var-ids; please read the documentation. Also read the documentation for --recover-var-ids which I linked to in my previous response, since that lets you get the rsIDs back whenever you want.

* --assoc has been obsolete for more than a decade, since it doesn't let you use e.g. principal-component covariates to correct for large-scale population structure. You almost certainly want to use --glm (or --linear/--logistic in plink 1.9) instead.

Diandra Brkic

unread,

Feb 3, 2021, 3:40:11 PM2/3/21

to plink2-users

As you can tell I am a total beginner, I missed the string part. Thank you so much :)

* I am not sure I get the back/forth part --set-all-var-ids / --recover-var-ids), all I would like to do is recode the chr:pos into their respective rsIDs - even if that means taking the 'stringent' route and getting rid of duplicates. I know this is not ideal, but since my ultimate goal is to just focus on a subset of specific SNPs I thought this was the simpler way? Also merging with ref population which has only rsID would be easier. Or am I completely off the road here?

* I am aware of the --assoc limitations, but this first step was for me to try to understand the QC and how to add continuous phenotypes to the data. I am hoping to get to the --logistic part but will take some time.

Christopher Chang

unread,

Feb 3, 2021, 5:07:42 PM2/3/21

to plink2-users

Ok, so the main premise of this discussion is that you don't have rsIDs for some of your variants, and your --update-name file doesn't help. Given that, you're probably best off using chr:pos:alleles for everything and not looking back. Ignore my comment about --recover-var-ids, and instead convert your 1000 Genomes dataset using the same --set-all-var-ids setting before merging. This is substantially easier than trying to replicate the chr:pos:alleles -> rsID mapping used in the 1000 Genomes dataset, because that can vary based on dbSNP build and filtering criteria.

Meanwhile, --linear/--logistic is not any harder to get started with than --assoc.

Diandra Brkic

unread,

Feb 3, 2021, 6:31:42 PM2/3/21

to plink2-users

Perfect, that sounds reasonable. Thank you so much, you just spared me a massive headache.

Will definitely try --linear/--logistic, then.

Reply all

Reply to author

Forward