split-x command not removing pseudoautosomal regions?bug error??

259 views
Skip to first unread message

R Stephanie L

unread,
Mar 13, 2024, 10:30:30 AM3/13/24
to plink2-users
Dear all,

I am experiencing an issue with removing pseudoautosomal regions. I am trying to reduce the heterozygous haploid calls which occur in phenotypically male individuals. I realised that most of the SNPs in the .hh file are males and SNPs are on chromosome 23. I tried to remove the pseudoautosomal region using the command below but it is not working. even though a pseudoautosomal region is present according to the boundaries set in the documentation. Any help is greatly appreciated!

Options in effect:
  --bfile data
  --make-bed
  --out data_pseudo
  --split-x hg38

1031883 MB RAM detected; reserving 515941 MB for main workspace.
631417 variants loaded from .bim file.
825 people (398 males, 427 females) loaded from .fam.
Using 1 thread (no multithreaded calculations invoked).
Before main variant filters, 825 founders and 0 nonfounders present.
Calculating allele frequencies... done.
Warning: 6994 het. haploid genotypes present (see
data_pseudo.hh ); many commands treat these as missing.
Total genotyping rate is 0.997418.
631417 variants and 825 people pass filters and QC.
Note: No phenotypes present.
Error: --split-x cannot be used when the dataset already contains an XY region.

The plink documentation states the following
  • 'b38'/'hg38': GRCh38/UCSC human genome 38, boundaries 2781479 and 155701383
These are some of the SNPs that are in the hh file which fall in the boundaries so I am not sure why it is saying that there is no XY region?


23 rs2306737 7.640728 2781927 G A
23 JHU_X.2720583 7.696064 2802543 A G
23 exm1625855 7.707273 2806719 A G
23 rs5939352 7.778135 2833119 A G
23 rs17330993 7.854872 2861708 A G
23 rs28935474 8.051249 2934870 A G
23 rs112504868 8.10601 2955272 C A
23 Variant15801 8.119789 2960405 D I
23 JHU_X.2879434 8.122443 2961394 G A
23 rs200082842.2 8.453199 3084620 A G
23 rs12687708 8.669928 3165364 C A
23 rs149323859 8.671158 3165822 G A
23 rs138910099 8.776537 3205082 G A
23 rs7060694 9.511957 3479069 A G
23 JHU_X.3458347 9.676329 3540307 A G
23 rs73440602 10.12976 3709235 G C
23 rs73625743 10.15232 3717643 C A
23 rs5961749 10.20902 3738764 G A
23 rs4595276 10.21871 3742374 A G
23 rs4826889 10.40125 3810384 G A
23 rs6655835 10.79183 3955898 A G
23 rs16979673 11.75091 4224219 A C
23 kgp22732185 9.667004 4371186 A G
23 rs142609224 20.79819 11177583 A G
23 rs139707133 21.25248 11425876 C A
23 JHU_X.11964126 21.92602 11946008 A C
23 kgp22812109 21.92783 11947458 A G
23 rs5979552 22.21473 12214226 C A
23 JHU_X.12272211 22.34571 12254093 G A
23 exm2268567 22.63091 12340899 A G
23 kgp22796335 23.15734 12501127 A G
23 rs149093089 23.23566 12524965 G A
23 rs148714605 23.53262 12615348 A G
23 rs7877276 23.83507 12707404 G A
23 rs139151624 23.87075 12718264 C A
23 rs143186909 24.43721 12890675 G A
23 rs7884668 24.45686 12896656 G A
23 exm1628707 24.93034 13040768 A G
23 rs17328294 25.52047 13220384 A G
23 rs143839244 25.53667 13225314 G A
23 rs151017313 25.54018 13226382 A G
23 rs6632972 26.27757 13497240 G A
23 rs104894948 26.86908 13716080 G A
23 seq-rs746032983 26.91784 13734121 A G
23 rs12556839 27.09703 13800416 G A
23 rs12115935 28.2 14063101 A G
23 rs16979405 28.2 14077427 G A
23 rs12009996 28.2 14087371 A G
23 rs147978069 28.2 14177602 A G
23 kgp22767708 28.2 14461178 G A
23

Christopher Chang

unread,
Mar 13, 2024, 12:38:49 PM3/13/24
to plink2-users
See the plink2 --merge-x documentation, and follow the steps described there.

The issue is that the pseudoautosomal regions are already split off in your input fileset... but using the plink 1.x method of assigning *both* pseudoautosomal regions the "XY" / "25" chromosome code.  The plink2 method of assigning "PAR1" and "PAR2" chromosome codes is cleaner because it doesn't force the variants to be resorted.

Stephanie L

unread,
Apr 7, 2024, 11:18:57 AM4/7/24
to plink2-users, chrch...@gmail.com
Many thanks for the suggestion. I did the step, which replaced the code 23 to X - Is there anything else that has happened, as the heterozygous haploid variants have increased dramatically ( from about 7000 to 87383) when I did this step and the variants for the X chromosome, as well? The variants in the previous versions were all on chromosome 23 so not sure how it's reading it differently other than renaming the code for all of them to X.
My understanding is that I had to recode the variants in this region so that PLINK would recognise it as a PAR region and expect males to have a lower homozygosity rate. Before performing the sex check with the recoded dataset, I set the haploid heterozygous calls to missing (is this appropriate?) and used this as input for the sex check. When I did this, the sex-phenotypic sex mismatches resolved (if I don't set the haploid heterozygous to missing, I have 410 mismatches). I just want to check, as I'm not sure I've understood completely what's happening under the scenes (i.e., the haploid heterozygous variants increasing, the number of X variants increasing); the sex check seems to have worked perfectly and the 3 issues that were found were slightly above the threshold for assigned gender (which I would class as correct). I would really appreciate your explanation on this. Thank you for your help.

I did the following:
PLINK v2.00a5LM 64-bit Intel (23 Sep 2023)

Options in effect:
  --bfile data
  --make-pgen
  --merge-x
  --out data_pseudo
  --sort-vars

Random number seed: 1710351740
1031883 MiB RAM detected, ~877038 available; reserving 515941 MiB for main
workspace.
Using up to 256 threads (change this with --threads).
825 samples (427 females, 398 males; 825 founders) loaded from
data.fam.
--merge-x: 867 chromosome codes changed.
631417 variants loaded from data.bim.
Note: No phenotype data present.
Writing data_pseudo.pvar ... done.
Writing data_pseudo.psam ... done.
Writing data_pseudo.pgen ... done.


plink --bfile data_pseudo --set-hh-missing --make-bed --out try2


825 people (398 males, 427 females) loaded from .fam.
Using 1 thread (no multithreaded calculations invoked).
Before main variant filters, 825 founders and 0 nonfounders present.
Calculating allele frequencies... done.
Warning: 87383 het. haploid genotypes present (see try2.hh ); many commands

treat these as missing.
Total genotyping rate is 0.997418.
631417 variants and 825 people pass filters and QC.
Note: No phenotypes present.
--make-bed to try2.bed + try2.bim + try2.fam ... done.

Options in effect:
  --bfile try2
  --check-sex

Random number seed: 1710358767

1031883 MB RAM detected; reserving 515941 MB for main workspace.
631417 variants loaded from .bim file.
825 people (398 males, 427 females) loaded from .fam.
Using 1 thread (no multithreaded calculations invoked).
Before main variant filters, 825 founders and 0 nonfounders present.
Calculating allele frequencies... done.
Total genotyping rate is 0.99725.

631417 variants and 825 people pass filters and QC.
Note: No phenotypes present.
--check-sex: 2070 Xchr and 0 Ychr variant(s) scanned, 3 problems detected.
Report written to plink.sexcheck .

--
You received this message because you are subscribed to a topic in the Google Groups "plink2-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/plink2-users/5dd-NfcJcoc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to plink2-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/plink2-users/27a08b42-fdf5-42f1-a49b-dd7d902330a1n%40googlegroups.com.

Chris Chang

unread,
Apr 8, 2024, 2:20:18 AM4/8/24
to Stephanie L, plink2-users
1. Please reread the —merge-x documentation.
2. Also reread the plink 1.9 —check-sex documentation.
Reply all
Reply to author
Forward
0 new messages