error chromosome code

5,439 views
Skip to first unread message

Jessica Perez Alquicira

unread,
Sep 21, 2015, 8:36:35 PM9/21/15
to plink2-users
Hello, I have a VCF file and then got the PLINK format using vcftools. I tried to perform a MDS but I got an error  indicating,

Error: Invalid chromosome code '27' on line 263 of .map file.

(This is disallowed for humans.  Check if the problem is with your data, or if

you forgot to define a different chromosome set with e.g. --chr-set.).


In the first column  "chromosome"  I have 0 - 23000 which corresponds to the scaffold number. I dont have the chromosome number. I tried to use --allow-extra-chr but  I still having the same error. 


This is the head of my file:

#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT CP20:C6A89ANXX:7:250395325
0 114368 S1_114368 A T . PASS DP=2392 GT:AD:DP:GQ:PL 0/0:28,0:28:99:0,84,255
0 114369 S1_114369 A G . PASS DP=2387 GT:AD:DP:GQ:PL 0/0:31,1:32:99:0,60,255



Any suggestions?


Thanks!

Christopher Chang

unread,
Sep 22, 2015, 3:34:52 AM9/22/15
to plink2-users
There are two complications here.

1. PLINK 1.9 supports scaffold IDs, but expects them to start with a letter rather than a digit, and requires the --allow-extra-chr flag (or --aec for short) to be used with them.  This helps avoid improper handling of e.g. chromosome codes 23 and 24 (which normally correspond to X and Y).
2. By default, the number of scaffolds is limited to ~5000.  However, I can provide a build which raises the limit to ~64000, which sounds like enough for your purposes; send me an email to request this.

Leandro Neves

unread,
Oct 6, 2016, 11:56:37 AM10/6/16
to plink2-users
Dear Christopher Chang.

It seems like I am having the same problem. What e-mail should I send a request for the build which raises the limit to ~64000? I tried pl...@chgr.mgh.harvard.edu but it returned.

Thanks in advance.
Leandro

Christopher Chang

unread,
Oct 6, 2016, 12:41:23 PM10/6/16
to plink2-users
The mainline plink builds have supported 64000 contigs since March.

Leandro Gomide Neves

unread,
Oct 17, 2016, 10:52:11 AM10/17/16
to Christopher Chang, plink2-users
Thank you, Christopher. I tested again and it works now.

Regards
Leandro


--
You received this message because you are subscribed to a topic in the Google Groups "plink2-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/plink2-users/b5bP9YZfkFU/unsubscribe.
To unsubscribe from this group and all its topics, send an email to plink2-users...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Santiago Sánchez

unread,
Jan 25, 2017, 1:16:03 PM1/25/17
to plink2-users
Hi all,

I also had this issue using PLINK v1.9 with only ~1500 scaffolds. But after switching back to v1.07, PLINK ran smoothly.

After running PLINK v1.9 with the --allow-extra-chr flag, my bed file was generated. However, it seemed to be causing problems with other downstream applications, such as ADMIXTURE, where I was getting an "Abort trap 6" error. Files generated with v1.07 are not giving such error.

Cheers,
Santiago

Christopher Chang

unread,
Jan 25, 2017, 1:20:05 PM1/25/17
to plink2-users
To avoid problems with downstream applications which can't handle arbitrary contig names, you can use "--allow-extra-chr 0".

kin onn chan

unread,
Apr 4, 2017, 12:16:51 PM4/4/17
to plink2-users
Hello everyone,

I'm trying to convert ped/map to bed and ran into the same error. I generated my ped/map files using vcftools. My original vcf looks like this:

##fileformat=VCFv4.1
##fileDate=20150624
##source=pyRAD.v.3.0.5
##reference=common_allele_at_each_locus
##INFO=<ID=NS,Number=1,Type=Integer,Description="Number of Samples With Data">
##INFO=<ID=DP,Number=1,Type=Integer,Description="Total Depth">
##INFO=<ID=AF,Number=A,Type=Float,Description="Allele Frequency">
##INFO=<ID=AA,Number=1,Type=String,Description="Ancestral Allele">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read Depth">
#CHROM POS ID REF ALT QUAL FILTER INFO     FORMAT 10056 20978 20979 20980
1 91 . G A 20 PASS NS=49;DP=5 GT 0|0 ./. ./. ./. ./. ./. ./. ./. ./. ./. ./.
2 24 . G T 20 PASS NS=56;DP=5 GT 0|0 ./. ./. ./. ./. ./. ./. ./. ./. ./. ./.
2 35 . G T 20 PASS NS=56;DP=5 GT 0|0 ./. ./. ./. ./. ./. ./. ./. ./. ./. ./.
2 50 . A G 20 PASS NS=56;DP=5 GT 0|0 ./. ./. ./. ./. ./. ./. ./. ./. ./. ./.
2 66 . G A,T 20 PASS NS=56;DP=5 GT 0|0 ./. ./. ./. ./. ./. ./. ./. ./. ./. ./.
3 48 . T A 20 PASS NS=57;DP=5 GT 0|0 ./. ./. ./. ./. ./. ./. ./. ./. ./. ./.
3 70 . A G 20 PASS NS=57;DP=5 GT 0|0 ./. ./. ./. ./. ./. ./. ./. ./. ./. ./.
4 3 . C T 20 PASS NS=58;DP=5 GT 1|0 ./. ./. ./. ./. ./. ./. ./. ./. ./. ./.
4 27 . A C 20 PASS NS=58;DP=5 GT 0|0 ./. ./. ./. ./. ./. ./. ./. ./. ./. ./.
4 81 . T C 20 PASS NS=58;DP=5 GT 0|0 ./. ./. ./. ./. ./. ./. ./. ./. ./. ./.
5 16 . A G 20 PASS NS=58;DP=5 GT 0|0 ./. ./. ./. ./. ./. ./. ./. ./. ./. ./.
5 17 . G C 20 PASS NS=58;DP=5 GT 0|0 ./. ./. ./. ./. ./. ./. ./. ./. ./. ./.
5 90 . C T 20 PASS NS=58;DP=5 GT 0|0 ./. ./. ./. ./. ./. ./. ./. ./. ./. ./.
6 52 . A G 20 PASS NS=90;DP=5 GT 0|0 1|1 1|1 1|1 1|1 1|1 1|1 1|1 1|1 1|1 1|1
6 72 . T C 20 PASS NS=90;DP=5 GT 0|0 1|1 1|1 1|1 1|1 1|1 1|1 1|1 1|1 1|1 1|1
6 88 . T C 20 PASS NS=90;DP=5 GT 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0
6 91 . A G 20 PASS NS=90;DP=5 GT 0|0 1|1 1|1 1|1 1|1 1|1 1|1 1|1 1|1 1|1 1|1
7 6 . C T 20 PASS NS=90;DP=5 GT 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0
...
...
..

After conversion with vcf tools i get a ped file that looks like this (first two columns are sample IDs)
10056 10056 0 0 0 0 G G G G G G A A
20978 20978 0 0 0 0 0 0 0 0 0 0 0 0
20979 20979 0 0 0 0 0 0 0 0 0 0 0 0
20980 20980 0 0 0 0 0 0 0 0 0 0 0 0
20981 20981 0 0 0 0 0 0 0 0 0 0 0 0
20982 20982 0 0 0 0 0 0 0 0 0 0 0 0
20983 20983 0 0 0 0 0 0 0 0 0 0 0 0
20984 20984 0 0 0 0 0 0 0 0 0 0 0 0
20985 20985 0 0 0 0 0 0 0 0 0 0 0 0
20986 20986 0 0 0 0 0 0 0 0 0 0 0 0
20987 20987 0 0 0 0 0 0 0 0 0 0 0 0
20988 20988 0 0 0 0 G G G G G G A A
20989 20989 0 0 0 0 G G G G G G A A
20990 20990 0 0 0 0 0 0 G G G G A A
21008 21008 0 0 0 0 0 0 0 0 0 0 0 0
21009 21009 0 0 0 0 0 0 0 0 0 0 0 0
21010 21010 0 0 0 0 G G G G G G A A
21011 21011 0 0 0 0 0 0 G G G G A A
21012 21012 0 0 0 0 G G G G G G A A

map file:
1 1:91 0 91
2 2:24 0 24
2 2:35 0 35
2 2:50 0 50
3 3:48 0 48
3 3:70 0 70
4 4:3 0 3
4 4:27 0 27
4 4:81 0 81
5 5:16 0 16
5 5:17 0 17
5 5:90 0 90
6 6:52 0 52
6 6:72 0 72
6 6:88 0 88
6 6:91 0 91
7 7:6 0 6
7 7:18 0 18
7 7:33 0 33
7 7:38 0 38

I ran into the same error when it reaches the map file with a chromosome number 27. I am using the latest build (1.9) and tried using the flag --aec or --aec 0. I always get the same error. I'm wondering this is a problem with my ped/map file. The vcf file are from anonymous SNP data so there are no chromosome information. I assume the #CHROM refers to the SNP loci. I executed plink using:

plink --file myfile --aec --make-bed --out myoutfile
plink --map myfile.map --ped myfile.ped --aec --out myoutfile

All of which returned the same error:

Error: Invalid chromosome code '27' on line 100 of .map file.

(This is disallowed for humans.  Check if the problem is with your data, or if

you forgot to define a different chromosome set with e.g. --chr-set.).



Hope someone cal help. Thank you in advance.

Chan

Christopher Chang

unread,
Apr 4, 2017, 11:32:28 PM4/4/17
to plink2-users
It looks like you have contigs instead of real chromosomes.  plink can handle up to ~65000 of them, but you need to give them not-purely-numeric names like "contig1", "contig2", etc., because numeric codes are required to refer to actual chromosomes (and 23 is normally interpreted as chrX with all the special handling that entails, etc.).

CHROMOSOME KEYS

unread,
Sep 25, 2018, 7:36:11 AM9/25/18
to plink2-users
Hi  everyone,

I ma trying to conduct QC on my data and I am getting the same error, please assist.

Thank you in advance

Error: Invalid chromosome code '27' on line 114094 of .map file.

Christopher Chang

unread,
Sep 25, 2018, 11:23:45 AM9/25/18
to plink2-users
If these are actual chromosomes, for a nonhuman species with more than 22 autosomes, use --chr-set to specify the number of autosomes.

If these are scaffolds for a draft genome, give the scaffolds names which start with a letter, e.g. "scaffold_27", and use the "--allow-extra-chr" flag to make plink accept them.

Heather Hemmingmoore

unread,
Jan 28, 2019, 6:51:29 AM1/28/19
to plink2-users
Hello all,

I am getting the same issue. I understand the cause - I am using scaffolds instead of chromosomes.
I understand that I need to add text at the beginning of each scaffold number, but I have just not been able to figure out how to do that in the .ped and .map files.

I did it in the original vcf file by adding the word "test" to the beginning of each number in the #CHROM column.

However, when I converted it to Plink, I got the following message for every scaffold:
"Unrecognized values used for CHROM: test25939 - Replacing with 0."

Christopher or others, I'm sorry for the newbie question, but can you tell me how to add text, e.g. "scaffold_27". 
I've been googling and trying different things for hours.

Thanks so much!

Heather Hemmingmoore

unread,
Jan 28, 2019, 7:04:00 AM1/28/19
to plink2-users
Hi All,

Following my question, I would like to add that I tried the --allow-extra-chr flag, but because I am using vcftools to convert to plink file formats, it isn't recognized.

Thanks again! 

Kind regards,
Heather

Mario Ernst

unread,
Aug 20, 2019, 10:53:36 AM8/20/19
to plink2-users
Hi Christopher,

I am working with scaffolds and I managed to add scaff_ at the beginning of each ID. The reason why I want to use plink is that I want to do LD pruning by using --indep-pairwise. The .prune.in and .prune.out files should contain variant IDs, but in my case they only contain dots as my VCF has all variant IDs set to ‘.’ if I use the  "--allow-extra-chr" flag. How can I bypass this?


Thnx in advance!

Christopher Chang

unread,
Aug 20, 2019, 11:00:55 AM8/20/19
to plink2-users
Use plink 2.0’s —set-missing-var-ids or —set-all-var-ids flag to assign position-based variant IDs.

Han Xiao

unread,
Oct 14, 2019, 7:51:48 AM10/14/19
to plink2-users
Dear all,

I am new to plink and currently using plink 1.9. I get the same error that my CHROM ID is pure number and can be higher than human's.

My simple question is, how to change, for example, "25" to "scafford_25" in my vcf file? I know this should be super primitive but will really appreciate it if someone can help me fix it as soon as possible.

Thank you very much!

Best regards,
Han Xiao




在 2015年9月22日星期二 UTC上午12:36:35,Jessica Perez Alquicira写道:

Christopher Chang

unread,
Oct 14, 2019, 11:48:24 AM10/14/19
to plink2-users
Two of the most useful tools for this type of text manipulation are the "sed" and "awk" Unix utilities.

sed's "s/<regular expression>/<thing to replace it with>/g" command is frequently used to perform an update to the beginning and/or end of many lines in a file, which is what you need to do here.  More precisely, you want to add "scaffold_" in front of every line that starts with a digit;
  sed -E "s/^([0-9])/scaffold_\1/g' old.vcf > new.vcf
should get the job done.  In that command, "^" refers to the beginning of the line, and the parentheses and "\1" are used to copy the digit instead of replacing it (we want to replace "25" with "scaffold_25", not "scaffold_5").  https://www.gnu.org/software/sed/manual/html_node/Regular-Expressions.html describes a bunch of other things you can do with sed regular expressions.

For slightly more complicated jobs that don't correspond to just replacing all copies of a simple regular expression, take a look at awk.

Tatiana Feuerborn

unread,
Jun 19, 2020, 6:54:52 AM6/19/20
to plink2-users
Hi,

I have a dataset that has 82,000 contigs, as I understand it the maximum possible is around 65,000, is there a recommended way to use my dataset in plink?

Thanks for the help

Christopher Chang

unread,
Jun 19, 2020, 8:22:53 AM6/19/20
to plink2-users
If you're on Linux, there's a high-contig plink 1.9 build posted at http://s3.amazonaws.com/plink1-assets/plink_linux_high_contig_20200616.zip .

Bárbara Arévalo

unread,
Jan 10, 2023, 10:58:17 PM1/10/23
to plink2-users
Hello everyone
I'm new to plink, I'm working with plink2 but I'm having problems with transforming from ped, map to vcf.
When executing the command plink2 --pedmap CIAT --recode vcf --out CIAT_corverted
Error: Invalid chromosome code '29' on line 5666 of .map file.
I am working with phaseolus vulgaris data.
How could I solve this error?

Thank you!

Christopher Chang

unread,
Jan 11, 2023, 11:37:56 PM1/11/23
to plink2-users
See the --chr-set flag.
Reply all
Reply to author
Forward
0 new messages