VCF evaluation using RTG.jar vcfeval

83 views
Skip to first unread message

Behzad Moumbeini

unread,
Jul 20, 2020, 3:08:49 PM7/20/20
to RTG Users

Hi there,

I am trying to do benchmarking for my pipeline (to analyze WES and WGS germline and generate VCF file for SNV and INDELs). to do so,

I got the WES data for this sample from hare:
https://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/giab/data/NA12878/Garvan_NA12878_HG001_HiSeq_Exome/

and worked on these 2 datasets separately:

 
NIST7035_TAAGGCGA_L001_R1_001_trimmed.fastq.gz
NIST7035_TAAGGCGA_L001_R2_001_trimmed.fastq.gz
 

 
NIST7086_CGTACTAG_L002_R1_001_trimmed.fastq.gz
NIST7086_CGTACTAG_L002_R2_001_trimmed.fastq.gz


and also the VCF file from the same link as my reference (golden standard):

project.NIST.hc.snps.indels.vcf


then I tried to use the following command to evaluate my VCF file(made for the above files using my pipeline):

java -Xmx4G -jar  RTG.jar vcfeval -t Homo_sapiens.GRCh37.GATK.illumina.SDF  -T 6 --baseline=[GIAB truth VCF] --calls=[SNV/INDEL VCF] --all-records --bed-regions=[Exome BED file]

I made this folder : Homo_sapiens.GRCh37.GATK.illumina.SDF using this command:

rtg format --output  Homo_sapiens.GRCh37.GATK.illumina.SDF  hg19.fasta

as --baseline I used above VCF file (the golden standardnd as --calls I used the VCF file that I made). I also got the bed file from the same link.
when I run the RTG.jar using the mentioned command I would get this error:

Error: No sample name provided but baseline is a multi-sample VCF.

do you know how to fix the problem?

Thanks

Len Trigg

unread,
Jul 20, 2020, 5:29:17 PM7/20/20
to Behzad Moumbeini, RTG Users
On Tue, 21 Jul 2020 at 07:08, Behzad Moumbeini <behzadm...@gmail.com> wrote:
java -Xmx4G -jar  RTG.jar vcfeval -t Homo_sapiens.GRCh37.GATK.illumina.SDF  -T 6 --baseline=[GIAB truth VCF] --calls=[SNV/INDEL VCF] --all-records --bed-regions=[Exome BED file]

I'm curious as to why you explicitly run using the jarfile instead of just using "rtg vcfeval ..." like you do for the "rtg format" command you listed below?
 
rtg format --output  Homo_sapiens.GRCh37.GATK.illumina.SDF  hg19.fasta

as --baseline I used above VCF file (the golden standardnd as --calls I used the VCF file that I made). I also got the bed file from the same link.
when I run the RTG.jar using the mentioned command I would get this error:

Error: No sample name provided but baseline is a multi-sample VCF.

do you know how to fix the problem?

When you have a multi-sample VCF (and vcfeval is indicating this is the case for your baseline VCF), you need to tell vcfeval which sample from the VCF you are wanting to compare. See the help output for --sample for more information.

Cheers,
Len.





Reply all
Reply to author
Forward
0 new messages