"split chromosome" after sorting with bcftools

411 views
Skip to first unread message

David Condon

unread,
Jul 20, 2022, 10:56:29 AM7/20/22
to plink2-users
I am running bcftools in a series of commands

first I run `bcftools sort` and then

`plink --bcf tmp.sort.bcf --allow-extra-chr --make-bed --out merged.eigenstrat.34string.length.vcf.gz`

which works, and then

`plink --bcf tmp.sort.bcf --allow-extra-chr --recode --out merged.eigenstrat.34string.length.vcf.gz`

but then I get an error:

> Error: .bim file has a split chromosome.  Use --make-bed by itself to
> remedy this

this last error should have been remedied by `bcftools sort` but wasn't.

If I try to run the `recode` step based on the `bed` file, I get the error:

    Options in effect:
      --allow-extra-chr
      --bed merged.eigenstrat.34string.length.vcf.gz
      --out merged.eigenstrat.34string.length.vcf.gz
      --recode
   
    Error: A full .bed + .bim + .fam fileset is required for this.
    For more information, try "plink --help <flag name>" or "plink --help | more".

How can I fix this error? sort didn't fix it, even though it should have.
I am running the most recent plink version

Christopher Chang

unread,
Jul 20, 2022, 11:32:33 AM7/20/22
to plink2-users
1. --out specifies a filename *prefix*.  Neither ".vcf" nor ".gz" should ever appear in its argument; those extensions will be attached when an actual compressed VCF file is generated.
2. If you want to generate a compressed VCF file from a BCF, you should use plink 2.0, not plink 1.9, for all commands whenever possible; otherwise you are likely to have incorrectly swapped REF/ALT alleles.  See https://www.cog-genomics.org/plink/1.9/data#ax_allele for more discussion.
3. With plink 2.0, use the --sort-vars flag in combination with --make-bed to sort the variants.
4. The flag for reading a .bed + .bim + .fam fileset is --bfile, not --bed.
5. The command for exporting a compressed VCF is "--export vcf bgz", not "--recode".  (Plink 2.0 will actually error out on plain --recode, since that previously corresponded to a horribly inefficient file format that was being exported >20x as often as it should have been.)

Christopher Chang

unread,
Jul 20, 2022, 11:38:04 AM7/20/22
to plink2-users
Also, since it looks like your BCF contains multiallelic variants, you should be aware that those are not supported by the .bed + .bim + .fam format.  The workaround is to use --make-pgen/--pfile instead of --make-bed/--bfile, since the newer .pgen file format does support them.

David Condon

unread,
Jul 20, 2022, 4:34:47 PM7/20/22
to plink2-users
I'm trying out your suggestions now, it will take a while, the file is large.  Thanks for your help!
Reply all
Reply to author
Forward
0 new messages