Too many distinct nonstandard chromosome/contig names

675 views
Skip to first unread message

Rohan Flint

unread,
Jul 26, 2021, 8:37:35 AM7/26/21
to plink2-users
Hi,
I am trying to convert my vcf file containing only SNP variants to plink format using the command;
plink --vcf my_file.vcf --allow-extra-chr --out my_file
The reference genome I am using contains 77390 contigs, and when I try to run this command, it returns the error 'Error: Too many distinct nonstandard chromosome/contig names.' 
I have tried using the '0' modifier for the --allow-extra-chr option but it still returns the same error.
Some examples for the contig names are 'ScfyRBE_1.HRSCAF.2' for contig 1 and 'ScfyRBE_2.HRSCAF.32' for contig 2. 
I have tried this on both PLINK v2.00a3 64-bit (17 Feb 2020) and PLINK v1.90b6.16 64-bit (17 Feb 2020) but both versions return this error.
I was wondering if there were any workarounds for this, as I am trying to perform other plink analyses on this data.
Thanks.

Christopher Chang

unread,
Jul 26, 2021, 12:24:01 PM7/26/21
to plink2-users
I'd recommend the following:
1. If your variants don't have distinct IDs, you probably want to use a short shell script to give them unique position-based IDs (which include contig names).
2. Once you've done that, it should be ok to use shell one-liner to zero out the chromosome column of the VCF.

Christopher Chang

unread,
Jun 3, 2022, 12:00:42 PM6/3/22
to plink2-users
For anyone who runs into this problem in the future: if you clone the source code from GitHub and uncomment "#define HIGH_CONTIG_BUILD" at the top of plink2_common.h, and then recompile, you'll get a plink2 build that should support up to about a million contigs; and it's straightforward for me to raise this limit for anyone that needs it.

On Monday, July 26, 2021 at 5:37:35 AM UTC-7 flyn...@gmail.com wrote:
Reply all
Reply to author
Forward
0 new messages