Error: Variant names are limited to 16000 characters.

208 views
Skip to first unread message

Abel Chang

unread,
Feb 24, 2016, 3:44:46 AM2/24/16
to plink2-users
Hi guys,

I tried to convert a VCF file to bed/bim/fam format with PLINK v1.90b3.31 64-bit (3 Feb 2016) plink2 
And the parameters are listed below:

plink \
    --vcf test.vcf \
    --keep-allele-order \
    --vcf-idspace-to _ \
    --allow-extra-chr 0 \
    --split-x b38 no-fail \
    --make-bed \
    --noweb \
    --out result

Unfortunately, however, plink2 stopped and reported error like this:
Error: Variant names are limited to 16000 characters.

I guess some variants in my VCF file have a very long name. Because the variant name is comprised of it's POS and REF/ALT alleles, which maybe too long.

Anyone knows how to solve this problem? I want a simple solution, just like using some special parameters of plink

Christopher Chang

unread,
Feb 24, 2016, 10:18:59 AM2/24/16
to plink2-users
Well, you'll need to limit the length of the variant IDs; this should be done before trying to convert the file to plink format, since plink does not yet have proper REF/ALT-based variant naming.

A simple approach is to take only the first ~20 characters of each allele code (plink's --set-missing-var-ids flag normally uses the first 23).

Abel Chang

unread,
Mar 2, 2016, 3:48:04 AM3/2/16
to plink2-users
Actually, the long IDs are synthesized by REF and ALT to avoid duplicate ID. If I limit the length of the variant IDs, plink will throw out this:  Error: Duplicate ID '.' . In other words, it seems that plink is in a dilemma.
在 2016年2月24日星期三 UTC+8下午11:18:59,Christopher Chang写道:

Christopher Chang

unread,
Mar 2, 2016, 3:56:54 AM3/2/16
to plink2-users
You only get "Duplicate ID '.'" if you don't change the variant IDs at all.  If you merely limit their lengths (e.g. take the first 23 letters of the REF and ALT) and make sure to include chromosome and position in your IDs as well, you will almost certainly avoid duplication.

Abel Chang

unread,
Mar 2, 2016, 4:06:47 AM3/2/16
to plink2-users
But what if variants have absolutely same chromosome and position, because they are split from one multi-allelic variant ?

在 2016年3月2日星期三 UTC+8下午4:56:54,Christopher Chang写道:

Christopher Chang

unread,
Mar 2, 2016, 4:12:53 AM3/2/16
to plink2-users
The ID should have chromosome, position, first-23-characters-of-REF, AND first-23-characters-of-ALT.

In the few situations where even that's not good enough, you may have to make up some arbitrary IDs.  That's far better than permitting completely unreadable and untypeable >16000 characters IDs.
Message has been deleted

Abel Chang

unread,
Mar 2, 2016, 4:29:12 AM3/2/16
to plink2-users
OK, thank you any way. I'll try to solve this problem and give you feedback if I fail.

在 2016年3月2日星期三 UTC+8下午5:12:53,Christopher Chang写道:

Biyao Wang

unread,
Jun 11, 2021, 12:26:15 PMJun 11
to plink2-users
Hi, I'm facing the same problem of super long ID. I've tried to limit the length of ID in plink but didn't work. By saying "you'll need to limit the length of the variant IDs; this should be done before trying to convert the file to plink format", do you mind elaborating on how exactly this should be done? Many thanks! 

Christopher Chang

unread,
Jun 12, 2021, 11:11:18 AMJun 12
to plink2-users
I would write a short script which assigns IDs like "longindel1", "longindel2", ... to these super-long indels, while saving a table with the CHROM/POS/alleles for these new IDs.
Reply all
Reply to author
Forward
0 new messages