Error while indexing

21 views
Skip to first unread message

Brett Chapman

unread,
Oct 18, 2022, 3:23:15 AM10/18/22
to RTG Users
Hi

I'm trying to use rtg vcfeval but get an error about indexing. My chromosomes are too large to use TBI indexing and need to use CSI indexing. Is there a work around or any updates incoming to correct this problem? I'm working with plant genomes.

I posted the issue on GitHub, based on where I was following a tutorial which used rtg vcfeval: https://github.com/pangenome/pggb/issues/225

Thanks for any help you can provide.

Cheers

Bret

Len Trigg

unread,
Oct 18, 2022, 5:12:06 AM10/18/22
to Brett Chapman, RTG Users
Hi Brett,

We have not currently looked at adding CSI support to vcfeval or our other tools.

As a tip if you do work around by splitting your large chromosomes into chunks for evaluation, you might find the vcf2rocplot useful for combining chunked evaluation outputs (your intermediate evaluations will need to use the annotated output mode).

Cheers,
Len.

--
You received this message because you are subscribed to the Google Groups "RTG Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rtg-users+...@realtimegenomics.com.
To view this discussion on the web visit https://groups.google.com/a/realtimegenomics.com/d/msgid/rtg-users/c685d35a-9b33-4a4d-ab57-b8644a1ed7fen%40realtimegenomics.com.

Brett Chapman

unread,
Oct 19, 2022, 12:16:00 AM10/19/22
to RTG Users, Len Trigg, RTG Users, Brett Chapman
Hi Len

Thanks for the advice. I'll take a look at vcf2rocplot for combining the outputs. I'll update again if I have any issues.

Cheers

Brett

Brett Chapman

unread,
Oct 28, 2022, 3:46:30 AM10/28/22
to RTG Users, Brett Chapman, Len Trigg, RTG Users
Hi Len

I've been running vcfeval across all the chunked chromosomes (I split the chromosome two segments).

I have the following out from vcfeval for each chromosome segment:
-rw-rw-r-- 1 ubuntu ubuntu   67 Oct 28 07:34 phasing.txt
-rw-rw-r-- 1 ubuntu ubuntu  420 Oct 28 07:34 allele_weighted_roc.tsv.gz
-rw-rw-r-- 1 ubuntu ubuntu  434 Oct 28 07:34 allele_snp_roc.tsv.gz
-rw-rw-r-- 1 ubuntu ubuntu  368 Oct 28 07:34 allele_non_snp_roc.tsv.gz
-rw-rw-r-- 1 ubuntu ubuntu  420 Oct 28 07:34 weighted_roc.tsv.gz
-rw-rw-r-- 1 ubuntu ubuntu  434 Oct 28 07:34 snp_roc.tsv.gz
-rw-rw-r-- 1 ubuntu ubuntu  368 Oct 28 07:34 non_snp_roc.tsv.gz
-rw-rw-r-- 1 ubuntu ubuntu  303 Oct 28 07:34 summary.txt
-rw-rw-r-- 1 ubuntu ubuntu 1.8M Oct 28 07:34 calls.vcf.gz
-rw-rw-r-- 1 ubuntu ubuntu  90K Oct 28 07:34 calls.vcf.gz.tbi
-rw-rw-r-- 1 ubuntu ubuntu 1.9M Oct 28 07:34 baseline.vcf.gz
-rw-rw-r-- 1 ubuntu ubuntu  70K Oct 28 07:34 baseline.vcf.gz.tbi
-rw-rw-r-- 1 ubuntu ubuntu   30 Oct 28 07:34 done
-rw-rw-r-- 1 ubuntu ubuntu 6.6K Oct 28 07:34 vcfeval.log
-rw-rw-r-- 1 ubuntu ubuntu  170 Oct 28 07:34 progress

Looking at the usage for vcf2rocplot, what do I input from the vcfeval output to consolidate the precision and recall calculation across 2 segments of the whole chromosome, so that the calculation in summary.txt represents the whole chromosome? Is it a particular file or the whole folder? Thanks.

Cheers

Brett

Len Trigg

unread,
Oct 28, 2022, 4:05:10 AM10/28/22
to Brett Chapman, RTG Users
Hi Brett,

There is an example in the user manual. Basically you just give vcf2rocplot all of the annotated baseline and calls vcf files, along with any other options for which rocs etc you want (e.g if you want custom stratifications).

Cheers,
Len.
Reply all
Reply to author
Forward
0 new messages