Genotyping support for very large chromosomes (which rely on .csi indexed vcfs)

67 views
Skip to first unread message

Max Stammnitz

unread,
Mar 10, 2020, 11:25:06 AM3/10/20
to Platypus Users
Hi Andy,

I am currently working on a genotyping setup that involves very large marsupial chromosomes (> 512 Mb).

Quoting the current workflow:

python Platypus.py callVariants \
--bamFiles=data.bam \
--refFile=ref.fa \
--output=out.vcf \
--source=listOfVariants.vcf.gz \
--minPosterior=0 \
--getVariantsFromBAMs=0

In which the listOfVariants also has a corresponding listOfVariants.vcf.gz.tbi file previously generated with:

tabix -p vcf listOfVariants.vcf.gz


From my tests it looks like unfortunately .tbi indexed positions > 512 Mb are skipped by Platypus, and thus these cannot be genotyped.


In my view, the obvious fix to this would be using:

tabix -C -p vcf listOfVariants.vcf.gz


This generates listOfVariants.vcf.gz.csi, however Platypus' genotyping can't cope with these index files as input, error messages read:


Exception IOError: IOError('index `listOfVariants.vcf.gz.tbi` not found',) in 'variantcaller.callVariantsInRegion' ignored


Any idea if .csi index support for large chromosomes would be possible within foreseeable time? It feels like this wouldn't be too much of a hustle to implement, although simply replacing all "tbi" mentionings with "csi" in the source code doesn't do the job (I just tried that).


Happy to provide more detailed examples, log files, test dev branches, etc.


Many thanks,

Max

Reply all
Reply to author
Forward
0 new messages