Error: Length-0 chromosome ID in .bgen file.

690 views
Skip to first unread message

Matteo Sesia

unread,
Mar 2, 2018, 3:52:09 PM3/2/18
to plink2-users
I'm trying to read a bgen 1.2 file with plink2, using

plink2 --bgen $GEN_FILE

but I keep getting the following error:

PLINK v2.00a2LM 64-bit Intel (26 Feb 2018)     www.cog-genomics.org/plink/2.0/
(C) 2005-2018 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to plink2.log.
Options in effect:
  --bgen /scratch/PI/candes/biobank/haplotypes/ukb_hap_chr22_v2.bgen

Start time: Fri Mar  2 xxx 2018
xxx MB RAM detected; reserving xxx MB for main workspace.
Using up to 16 threads (change this with --threads).
--bgen: 10911 variants detected, format v1.2.
--bgen: 487409 sample IDs written to plink2.psam .

Error: Length-0 chromosome ID in .bgen file.
End time: Fri Mar  2 xxx 2018

The bgen file contains phased haplotypes (chromosome 22) from the UK Biobank and it is exactly as provided by the UK Biobank.

Any ideas?

What I'm trying to do is to convert it to bgen v1.1 so that I work on it with plink 1.9.


Christopher Chang

unread,
Mar 2, 2018, 4:40:03 PM3/2/18
to plink2-users
Try "plink2 --bgen $GEN_FILE snpid-chr"; it looks like that .bgen is not storing chromosome IDs in the usual place.

Christopher Chang

unread,
Mar 2, 2018, 4:48:04 PM3/2/18
to plink2-users
...though this issue may be moot because plink 2.0 can't read phased bgen-1.2 data yet.

This is overdue; I'll try to fix this within the next week.

Matteo Sesia

unread,
Mar 2, 2018, 5:33:27 PM3/2/18
to plink2-users
I had already tried

plink2 --bgen $GEN_FILE snpid-chr

which gives a different error:

PLINK v2.00a2LM 64-bit Intel (26 Feb 2018)     www.cog-genomics.org/plink/2.0/
(C) 2005-2018 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to plink2.log.
Options in effect:
  --bgen /scratch/PI/candes/biobank/haplotypes/ukb_hap_chr22_v2.bgen snpid-chr

Start time: Fri Mar  2 14:30:40 2018
64216 MB RAM detected; reserving 32108 MB for main workspace.
Using up to 32 threads (change this with --threads).

--bgen: 10911 variants detected, format v1.2.
--bgen: 487409 sample IDs written to plink2.psam .

Error: Invalid chromosome code 'rs62224618' in --bgen file.
(Use --allow-extra-chr to force it to be accepted.)

I was under the impression that plink2 now supported phased data.
Can you please let me know when it does? By the way, do you know of any other tools that I could use to convert phased bgen 1.2 data into something usable?

Christopher Chang

unread,
Mar 2, 2018, 6:05:21 PM3/2/18
to plink2-users
It looks like I will need to make --oxford-single-chr usable on .bgen files, then; I'll combine this with the other .bgen import update.

You're correct that plink2 generally supports phased data; what isn't quite finished is phased-dosage support.  Since the bgen-1.2 format represents phased haplotypes in the same manner as phased dosages, I had decided to defer writing that part of the import code until plink2 could represent phased dosages.

As for alternative tools, have you tried using bgenix to convert to VCF?

Matteo Sesia

unread,
Mar 7, 2018, 1:07:44 PM3/7/18
to plink2-users
Hello Christopher,

thanks for your quick response! I see that there is a new version of plink 2 that should have fixed the issue. However, when I try:

plink2 --bgen $GEN_FILE --sample $SAM_FILE --oxford-single-chr 22

a segmentation fault occurs:

PLINK v2.00a2LM 64-bit Intel (5 Mar 2018)      www.cog-genomics.org/plink/2.0/
(C) 2005-2018 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to plink2.log.
Options in effect:
  --bgen /scratch/PI/candes/biobank/haplotypes/ukb_hap_chr22_v2.bgen
  --oxford-single-chr 22
  --sample /scratch/PI/candes/biobank/genotypes/sample_app1372/ukb1372_hap_chr22_v2_s487406.sample

Start time: Wed Mar  7 10:04:32 2018
257672 MB RAM detected; reserving 128836 MB for main workspace.
Using up to 20 threads (change this with --threads).
--bgen: 10911 variants detected, format v1.2.
487409 samples imported from .sample file to plink2.psam .
Segmentation fault

I'm using the latest compiled binary for linux 64 bits.

Any ideas what's going on?

Christopher Chang

unread,
Mar 7, 2018, 1:26:36 PM3/7/18
to plink2-users
Well, this isn't supposed to actually work yet (.bgen phased-haplotype import code isn't written yet), but it isn't supposed to segfault either.  I'll revisit this code within the next week or so anyway, but if you could provide a .bgen + .sample file pair which segfaults on your end, that would be great.

Matteo Sesia

unread,
Mar 7, 2018, 1:54:41 PM3/7/18
to plink2-users
Oh I see.
I'm afraid I can't provide an example. The data I'm using is restricted and I don't understand the bgen format well enough to easily make an artificial example.

Christopher Chang

unread,
Mar 7, 2018, 2:39:09 PM3/7/18
to plink2-users
Okay.  I'll let you know if I have trouble replicating the segfault when I do revisit the code.

Matteo Sesia

unread,
Mar 27, 2018, 12:41:36 AM3/27/18
to plink2-users
Hello Christopher,
any news about this?

Thanks!
Matteo

Christopher Chang

unread,
Mar 28, 2018, 12:13:08 PM3/28/18
to plink2-users
This should be ready later today.

Christopher Chang

unread,
Mar 28, 2018, 8:04:59 PM3/28/18
to plink2-users
Phased-haplotype-supporting builds are now posted; let me know if you notice any problems.

Matteo Sesia

unread,
Apr 3, 2018, 7:18:15 PM4/3/18
to plink2-users
Thanks!

Katie Lloyd

unread,
Feb 4, 2020, 6:04:02 AM2/4/20
to plink2-users
Hi Both, 

I am having difficulty with a similar issue to this, trying to read a .Bgen 1.2 file into Plink. I have tried all of the above suggestions (snpid-chr, --oxford-single-chr), but am still getting the following issue, which I think is related to the format being Chr:Pos:

PLINK v2.00aLM 64-bit Intel (16 Oct 2017)      www.cog-genomics.org/plink/2.0/
(C) 2005-2017 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to /newhome/kl12444/Converted/Chr_02.log.
Options in effect:
  --bgen /newhome/kl12444/Genotype/data_02.bgen snpid-chr
  --make-bed
  --out /newhome/kl12444/Converted/Chr_02
  --sample /newhome/kl12444/datav2.sample

Start time: Tue Feb  4 10:47:12 2020
129152 MB RAM detected; reserving 64576 MB for main workspace.

Using up to 16 threads (change this with --threads).
--bgen: 3392238 variants detected, format v1.2.
17816 samples imported from .sample file to
/newhome/kl12444/Converted/Chr_02-temporary.psam .

Error: Invalid chromosome code '2:10597' in --bgen file.

(Use --allow-extra-chr to force it to be accepted.)

If you could offer any help, I would really appreciate it!

Many thanks

Christopher Chang

unread,
Feb 4, 2020, 11:54:03 AM2/4/20
to plink2-users
1. There have been multiple .bgen-import bugfixes since Oct 2017; you should update to a more recent build.
2. From the error message, it looks like snpid-chr is not correct for your dataset.
Reply all
Reply to author
Forward
0 new messages