Invalid chromosome code

656 views
Skip to first unread message

Courtney Gardiner

unread,
Feb 13, 2024, 9:35:09 AM2/13/24
to plink2-users
Hi there

I am trying to convert a very large .vcf file to a .bed file using Plink (I have tried both 1.90 and 2) to use in pcadapt down the line. This is the code I have been using:

plink --vcf ./BCFTools_MP/MP_filtered.recode.vcf --allow-extra-chr --double-id --make-bed --out ./BCFTools_MP/MP_filtered.recode

I receive the following error: 
Error: Invalid chromosome code '40' on line 88802 of .vcf file.
(This is disallowed for humans. Check if the problem is with your data, or if you forgot to define a different chromosome set with e.g. --chr-set.)

I thought that by including "--allow-extra-chr" I was circumventing this problem. I am working on a fish species and the reference genome I'm using is highly fragmented and at scaffold level. I have since gone back and added "chr" to each chromosome code but it is still not working. It seems like a fairly simple issue but I can't seem to resolve it. 

Chris Chang

unread,
Feb 13, 2024, 9:49:25 AM2/13/24
to Courtney Gardiner, plink2-users
—allow-extra-chr allows generic alphabetic chromosome codes (and is no longer necessary in recent plink2 development builds).  —chr-set (
https://www.cog-genomics.org/plink/2.0/input#chr_set ) changes the interpretation of numeric chromosome codes; by default, 23 is chrX, and you don’t want that.

--
You received this message because you are subscribed to the Google Groups "plink2-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to plink2-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/plink2-users/23308cfa-62d6-45ad-830a-6af3fc5d01e8n%40googlegroups.com.

Chris Chang

unread,
Feb 13, 2024, 9:51:57 AM2/13/24
to Courtney Gardiner, plink2-users
Forgot to mention, “chr” prefixes are special (they’re treated like no prefix at all), you need to add another prefix like “contig” when dealing with scaffolds.

On Tue, Feb 13, 2024 at 6:35 AM Courtney Gardiner <courtsg...@gmail.com> wrote:
--

Courtney Gardiner

unread,
Jun 3, 2024, 5:03:48 AM6/3/24
to plink2-users
Hi thank you so much for the reply and apologies for the delay on my end.

As mentioned I have a very fragmented genome at scaffold level with 532902 scaffolds. My VCF file is 2.9 GB so it is extremely large. I don't have any information on the number of chromosomes for my study species to specify with -chr-set. Do I understand correctly that the best way to work around this would be to just add "contig" as a prefix? Or are there other workarounds I should implement? Can you let me know what you think the best course of action would be?

I am now using PLINK 2.0 for this conversion but am still running into problems. 

Christopher Chang

unread,
Jun 3, 2024, 1:06:09 PM6/3/24
to plink2-users
Yes, you should add "contig" as a prefix, and then use a PLINK 2.0 build supporting a large number of contigs (see https://groups.google.com/g/plink2-users/c/pWGwFrE_ex0/m/dLpFBVQLAQAJ for a link to a compiled March 2024 build).
Reply all
Reply to author
Forward
0 new messages