treat chrX as autosome

190 views
Skip to first unread message

Samuel Hollinger

unread,
May 1, 2024, 9:33:35 AMMay 1
to plink2-users

I'd like to run PLINK2 on a vcf file to do linkage pruning, followed by a PCA. The input vcf file is from a non-human organism. I have already excluded the known sex chromosome from the vcf file. However, when I run PLINK2, I always get an error since it recognizes chrX as the sex chromosome. This happens even when I specify "--allow-extra-chr". "How can I force PLINK2 to treat chrX as a normal autosome (which is the case in my species). PLINK2 makes a few suggestions for how one may solve the error (see below the error message), but I don't find any of those solutions straight forward to simply force PLINK2 to not treat chrX as the sex chromosome. Notably, when I instead run PLINK1.9, I don't get a respective error.

Here the command I run for linkage pruning:

plink2 --vcf ${FILE}.vcf --allow-extra-chr --set-all-var-ids @:# --indep-pairwise 50 10 0.3 --make-bed --out ${FILE}_pruningFact.0.3

Here the error I'm getting:

Error: chrX is present in the input file, but no sex information was provided; rerun this import with --psam or --update-sex. --split-par may also be appropriate.

Christopher Chang

unread,
May 1, 2024, 9:45:54 AMMay 1
to plink2-users
You need to use --chr-set, not --allow-extra-chr, to specify a nonhuman chromosome set.

Your command line was just as wrong for PLINK 1.9 as it was for PLINK 2.0, but PLINK 2.0 has better sanity-checking.

Samuel Hollinger

unread,
Jun 2, 2024, 3:57:04 PMJun 2
to plink2-users
Thanks for your answer! Yet, I'm still confused. I don't understand why "--allow-extra-chr" doesn't work, as it should just ignore the chromosome-ID, no? At least in plink1.9, it seems to work fine but no longer for plink2.0. 

Also, I'm not sure I should correctly use the "--chr-set" flag. The description is rather cryptic to me. How do I use this flag if I wanted to specify that the data I'm using is WGS-snp data set from a non-human species from 20 autosomes? And what would I' need to do if I wanted to additionally just run plink2 for all those autosomes plus the two known sex chromosomes in my species (the sex chromosomes are named "chrY" and "chrXIX" in my species)?

Thanks again for your help!

Christopher Chang

unread,
Jun 3, 2024, 1:02:58 PMJun 3
to plink2-users

"If none of the additional codes start with a digit, you can permit them with the --allow-extra-chr flag."

Emphasis added.  --allow-extra-chr does not change plink's default human-centric interpretation of numeric chromosome codes, where 23 is interpreted as chrX, 24 as chrY, 25 as pseudoautosomal, 26 as chrM unless you specify --chr-set.

(--allow-extra-chr is still useful for permitting "chrXIX" when it would otherwise cause plink to error out.  In this case, "chrXIX" will be treated much like an autosome.)

2. Please spell out EXACTLY what you find "cryptic" about the --chr-set documentation, after carefully reading https://www.cog-genomics.org/plink/2.0/general_usage#flag_usage if you haven't previously done so.

Samuel Hollinger

unread,
Jun 11, 2024, 10:51:27 AMJun 11
to plink2-users
Maybe to provide a bit of background: I know many studies that used plink1.9 or plink2 to do an autosome-wide PCA in my target species, by circumventing the problem of plink of being tailored to the human genome by specifying --allow-extra-chr. I also have done this in the past, and I was never thrown an error using plink1.9, but now I get an error that chrX is being interpreted as the sex chromosome.

According to the definition you provided for the --allow-extra-chr flag, this flag should indeed do the trick for me. To show this directly, I retrieved all chromosome names from the vcf-file I want to run my PCA on using grep -v '^#' MY.FILE.vcf | awk '{print $1}' | sort | uniq. This is what I get:
chrI
chrII
chrIII
chrIV
chrIX
chrV
chrVI
chrVII
chrVIII
chrX
chrXI
chrXII
chrXIII
chrXIV
chrXV
chrXVI
chrXVII
chrXVIII
chrXX
chrXXI

So, all these chromosomes are autosomes in my species (and should also be treated this way in my PCA). Furthermore, there are only 20 autosomes in total and none of them starts with a digit. Hence, I don't see why plink2.0 throws me an error when I run it on this file. Here again, just in case, the exact command I used for plink2.0:

plink2 --vcf ${FILE}.vcf --allow-extra-chr --set-all-var-ids @:# --indep-pairwise 50 10 0.3 --make-bed --out ${FILE}_pruningFact.0.3

And here again the error I'm getting:

Error: chrX is present in the input file, but no sex information was provided; rerun this import with --psam or --update-sex. --split-par may also be appropriate.

Chris Chang

unread,
Jun 11, 2024, 12:19:47 PMJun 11
to Samuel Hollinger, plink2-users
“chrX”, which you’re intending to mean “ten”, is also being interpreted by plink 1.9 as the X chromosome.  Again, your command was equally wrong there.  You need to switch to a naming scheme that doesn’t use “chrX”.

--
You received this message because you are subscribed to the Google Groups "plink2-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to plink2-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/plink2-users/a2e85442-066f-45d3-a718-320dcd57fe92n%40googlegroups.com.

marius...@unibe.ch

unread,
Jun 11, 2024, 12:53:40 PMJun 11
to chrch...@gmail.com, plink2...@googlegroups.com
Ok! – To be honest, I'm still somewhat surprised by the fact that plink1.9 would not throw me an error using this same data set and command, while plink2.0 does.
Do you have a prediction for how the PCA result would be biased if the chrX was erroneously interpreted as the sex chromosome? I think others also assumed, by specifying -allow-extra-chr that this would make plink interpret the chrX as not being the sex chromosome. See here, for example:  https://speciationgenomics.github.io/pca/

So, if I understand the --chr-set flag correctly, then I would need to run plink2.0 like this on my vcf file containing variants from 20 diploid autosomes only (i.e., no sex chromosomes); is this correct?:

plink2 --vcf ${FILE}.vcf --allow-extra-chr --set-all-var-ids @:# --chr-set 20 --indep-pairwise 50 10 0.3 --make-bed --out ${FILE}_pruningFact.0.3

If I now wanted to run the same command but using a vcf-file including the known sex (X) chromosome (which is chrXIX in my species), how would I do this then? I guess the following would not work because
plink assumes that the X-chromosome is also called X in my species, right?

plink2 --vcf ${FILE}.vcf --allow-extra-chr --set-all-var-ids @:# --chr-set 20+1 --indep-pairwise 50 10 0.3 --make-bed --out ${FILE}_pruningFact.0.3

Thanks a lot for all your help!


On 11 Jun 2024, at 18:19, Chris Chang <chrch...@gmail.com> wrote:

Chris Chang

unread,
Jun 11, 2024, 1:14:36 PMJun 11
to marius...@unibe.ch, plink2...@googlegroups.com
- As mentioned earlier, plink 2.0 includes some additional sanity checks that address common plink 1.x usage mistakes, one of which is working with chrX without providing sex information.
- Fortunately, losing one random chromosome out of 20 is not likely to be a big deal for PCA.
https://speciationgenomics.github.io/pca/ talks about --allow-extra-chr enabling chromosomes outside of human 1-22 + X.  It does not say anything about it making plink *not* interpret chrX as a sex chromosome.
- If the chromosomes are simply numbered 1, 2, ..., 20 or chr1, chr2, ..., chr20, and you want to treat every chromosome as an autosome, your first command-line is fine.
- If you want to treat 19 as X and 20 as Y, "--chr-set 18" is actually what you want.  (Again assuming that you have relabeled your chromosomes "1, 2, ..., 20" or "chr1, chr2, ..., chr20".)  From the --chr-set documentation: "Note that, when there are n autosome pairs, the X chromosome is assigned numeric code n+1, Y is n+2, ..."

marius...@unibe.ch

unread,
Jun 11, 2024, 1:26:17 PMJun 11
to chrch...@gmail.com, plink2...@googlegroups.com
Ok, thanks. So there is no way I can run plink then on the chromosomes as they are named right now? This is a bit annoying, as this is just how the chromosomes are designated at default in my species, thus making other files and results directly comparable and useable  (and sometimes even mergeable across studies), since this is the way chromosomes are always used in this species. Also, I would need to figure out how to safely change the chromosome names in my vcf file, since it seems to be quite common practice not to mess around with vcf files...

Btw, you are right concerning this point: I checked and the file that they are using designates chromosomes as chr1, chr2, chr3 etc.
https://speciationgenomics.github.io/pca/ talks about --allow-extra-chr enabling chromosomes outside of human 1-22 + X.  It does not say anything about it making plink *not* interpret chrX as a sex chromosome.

Chris Chang

unread,
Jun 11, 2024, 1:54:04 PMJun 11
to marius...@unibe.ch, plink2...@googlegroups.com
Correct, plink does not provide a direct way to disable treatment of "chrX" as chromosome X.

"bcftools annotate --update-chrs" provides one way to switch back and forth between plink-compatible naming and the usual convention for your species.

Chris Chang

unread,
Jun 11, 2024, 1:54:32 PMJun 11
to marius...@unibe.ch, plink2...@googlegroups.com
oops, that should be --rename-chrs, not --update-chrs.

marius...@unibe.ch

unread,
Jun 11, 2024, 1:56:36 PMJun 11
to chrch...@gmail.com, plink2...@googlegroups.com
Thanks, Chris. I'll give it a try!

marius...@unibe.ch

unread,
Jun 25, 2024, 4:54:17 AMJun 25
to chrch...@gmail.com, plink2...@googlegroups.com
Hi Chris,

Sorry, this took a while. I tried what you suggested and unfortunately, it still doesn't work. 

To explain what I did:

1) I renamed the chromosome names in of my vcf file using bcftool --rename-chrs. I then checked the chromosome names in the renamed vcf file:

grep -v '^#' ${PATH}/testFile.vcf | awk '{print $1}' | sort | uniq

chr01
chr02
chr03
chr04
chr05
chr06
chr07
chr08
chr09
chr10
chr11
chr12
chr13
chr14
chr15
chr16
chr17
chr18
chr20
chr21

It seems the renaming worked correctly! The vcf file contains data from 20 diploid (renamed) autosomes (note that chr19 is not included because it's the known sex chromosome in my species). 

2) I then ran plink2 like this:

plink2 \
--vcf ${PATH}/testFile.vcf \
--allow-extra-chr \
--double-id \
--set-missing-var-ids @:# \
--chr-set 20 \
--pca \
--make-bed \
--out ${PATH}/testFile

But, yet again, I get an error:

Start time: Tue Jun 25 10:44:45 2024
128235 MiB RAM detected, ~113996 available; reserving 64117 MiB for main
workspace.
Using up to 128 threads (change this with --threads).
--vcf: 2706651 variants scanned.
Error: chrX is present in the input file, but no sex information was provided;
rerun this import with --psam or --update-sex.  --split-par may also be
appropriate.
End time: Tue Jun 25 10:44:50 2024

I don't understand where plink2 detects chrX, since there is clearly no longer any chromosome with that name.

Sorry this appears to be such a hassle and thanks for your help!


You received this message because you are subscribed to a topic in the Google Groups "plink2-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/plink2-users/Qphx_6BpPII/unsubscribe.
To unsubscribe from this group and all its topics, send an email to plink2-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/plink2-users/5AAF30CF-48FB-47E7-BBD2-E7CEA86B4F0C%40unibe.ch.

Chris Chang

unread,
Jun 25, 2024, 8:44:47 AMJun 25
to marius...@unibe.ch, plink2...@googlegroups.com
With “—chr-set 20”, chr21 is treated as X.

marius...@unibe.ch

unread,
Jun 25, 2024, 8:49:46 AMJun 25
to chrch...@gmail.com, plink2...@googlegroups.com
So what then do I need to set for "—chr-set" (I don't understand why "chr21" is treated as "X" because neither their name nor their number match).

Do I need to specify  "--chr-set 21", although there are only 20 autosomes? Or, do I need to rename the chromosomes such that the original name "chrXX" is changed to "chr19" and "chrXXI" to "chr20" (which becomes very confusing though)?

Christopher Chang

unread,
Jun 26, 2024, 2:54:04 AMJun 26
to plink2-users

marius...@unibe.ch

unread,
Jun 26, 2024, 5:23:26 AMJun 26
to chrch...@gmail.com, plink2...@googlegroups.com
I don't know what I have to re-read. I will now just specify "—chr-set 21", although I only have 20 diploid autosomes and none of them is called anything like "X". 

Thanks.

Chris Chang

unread,
Jun 26, 2024, 9:50:18 AMJun 26
to marius...@unibe.ch, plink2...@googlegroups.com
The last sentence of the linked comment says "From the --chr-set documentation: 'Note that, when there are n autosome pairs, the X chromosome is assigned numeric code n+1, Y is n+2, ...'"
Reply all
Reply to author
Forward
0 new messages