Error in using vcfeval with different with reference sequence

146 views
Skip to first unread message

Arti Tandon

unread,
Jul 27, 2018, 4:11:24 PM7/27/18
to RTG Users
Hi, I am running the following command to compare the Genome in a bottle VCF file, with the one I created for the same sample using my pipeline, and get the following error:

/rtg vcfeval -t hg19_SDF/ -b ../HG004_GRCh37_GIAB_highconf_CG-IllFB-IllGATKHC-Ion-10X_CHROM1-22_v.3.3.2_highconf.vcf.gz -c GIAB.HG004.gatk.vcf.gz --bed-regions ../HG004_GRCh37_GIAB_highconf_CG-IllFB-IllGATKHC-Ion-10X_CHROM1-22_v.3.3.2_highconf_noinconsistent.bed --sample HG004 -o HG004_unfilteredQual --squash-ploidy --ref-overlap --output-mode roc-only --vcf-score-field "QUAL"
Error: There were no sequence names in common between the supplied baseline and called variant sets. Check they use the same reference and are non-empty.

I understand it is because the references used are different, what is the best of getting around this to make this comparison?

Thanks,

Sean Irvine

unread,
Jul 29, 2018, 6:16:11 PM7/29/18
to Arti Tandon, RTG Users
Hi Arti,

You are correct that vcfeval requires the calls and baseline to be with respect to the same reference and chromosome naming convention.  The error suggests that it is the chromosome naming convention that is the problem.

The GIAB baseline you have is for build 37 of the human genome using the HGRC naming convention for chromosome (i.e., "1" not "chr1", etc.).

Assuming your calling was done against  build 37 it will be sufficient to change the naming convention of either the calls or baseline before running VCF so that they match.   Something like:

zcat GIAB.HG004.gatk.vcf.gz | sed 's/^chrM/MT/;s/^chr//;/^##contig/s/chr//' | rtg bgzip - >GIAB.HG004.gatk.hgrc.vcf.gz
rtg index GIAB.HG004.gatk.hgrc.vcf.gz

Alternatively, if your calling is against a different build (say build 38), then download the appropriate GIAB baseline reference for that build.  Otherwise you're in the murky land of doing a liftover of calls from one reference to another.

Sean.

--
You received this message because you are subscribed to the Google Groups "RTG Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rtg-users+unsubscribe@realtimegenomics.com.
Visit this group at https://groups.google.com/a/realtimegenomics.com/group/rtg-users/.

Gil Stelzer

unread,
Mar 15, 2022, 6:20:14 PM3/15/22
to RTG Users, Sean A. Irvine, RTG Users, arti....@gmail.com
Hi 

I am using hg19.sdf.zip (https://s3.amazonaws.com/rtg-datasets/references/hg19.sdf.zip) from the rtg website (https://www.realtimegenomics.com/news/pre-formatted-reference-datasets) where the chromosome names are "chr1", "chr2", etc.  I want to compare many vcf files (chromosome are named "1", "2" etc) to a vcf with a variant set of interest (chromosome are named "1", "2" etc).  Is there no way to set a parameter that determines the name of the chromosome (e.g. "chr17" vs "17") rather than having to change the chromosome name .   I looked in the manual and couldn't find such a parameter.  What am I missing?

Many thanks,
Gil

To unsubscribe from this group and stop receiving emails from it, send an email to rtg-users+...@realtimegenomics.com.

Sean Irvine

unread,
Mar 15, 2022, 6:31:53 PM3/15/22
to Gil Stelzer, RTG Users, arti....@gmail.com
 Hi Gil,

We don't provide any options for swapping between the representations of chromosome names because usually such problems are indicative of not properly matching up the reference to the mappings or calls.

Rather than using hg19.sdf you could probably instead use:


which has names in the form 1, 2, 3, etc.

This SDF is also linked to from the same place you got the hg19 SDF (https://www.realtimegenomics.com/news/pre-formatted-reference-datasets).

Sean.

Reply all
Reply to author
Forward
0 new messages