roc curves

56 views
Skip to first unread message

Alexandra Vatsiou

unread,
Mar 24, 2020, 1:53:50 PM3/24/20
to rtg-...@realtimegenomics.com
Hi all, 

I am running rtg-tools, and I want to generate the roc curves but it seems that the roc file have only a single value. 

ls 1888965_rtg_dragen_pass_snvs 

done  fn.vcf  fp.vcf  non_snp_roc.tsv  phasing.txt  progress  snp_roc.tsv  summary.txt  tp-baseline.vcf  tp.vcf  vcfeval.log  weighted_roc.tsv

(base) [avatsiou@ukga-prd-brln02 rtg_output]$ cat 1888965_rtg_dragen_pass_snvs/snp_roc.tsv

#Version RTG Tools 3.10.1 / Core f9bc0ddb3e (2019-01-21), ROC output 1.2

#CL vcfeval -c 1888965_pass_snvs.vcf.gz -b truth.vcf.gz --bed-regions truth.bed --no-gzip -o 1888965_rtg_dragen_pass_snvs --decompose --all-records -t GRCh38Decoy.sdf/ --sample NA12878,dsample_Proband

#selection: SNP (baseline rescaled)

#total baseline variants: 1023601

#total call variants: 317554

#score field: GQ (FORMAT)

#score true_positives_baseline false_positives true_positives_call false_negatives precision sensitivity f_measure

None 317228.00 326.00 317228.00 706373.00 0.9990 0.3099 0.4731

 

This is how I am running rtg-tools


/home/avatsiou/rtg-tools-3.10.1/rtg vcfeval -c + output + id ".vcf.gz" +\

             ' -b ' + truth_vcf +\

             ' --bed-regions ' + truth_bed +\

             ' --no-gzip -o ' + output +\

             ' --decompose ' +\

             ' --all-records ' +\

             ' -t ' + sdf_ref +\

             ' --sample NA12878,' + tumor 


Any help would be appreciated!

Thanks
Alexandra

Len Trigg

unread,
Mar 24, 2020, 2:36:10 PM3/24/20
to Alexandra Vatsiou, RTG Users

Was there any warning printed out when you ran it?

It looks like there was no score field in your input data.

Cheers,
Len.


--
You received this message because you are subscribed to the Google Groups "RTG Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rtg-users+...@realtimegenomics.com.
To view this discussion on the web visit https://groups.google.com/a/realtimegenomics.com/d/msgid/rtg-users/CACrn4_AM1x4AOqD9zGLfdgz30cAgGKNV7DYgRBRXgF2KFg3%3DXw%40mail.gmail.com.

Alexandra Vatsiou

unread,
Mar 24, 2020, 2:53:33 PM3/24/20
to Len Trigg, rtg-...@realtimegenomics.com
Hi Len, 

I am quite new to rtg. 
Is the score filed suppossed to be in the input vcf? 
What is that score exactly? 
Could I input it? 
there was no error. 

Thanks,
Alexandra


On Tue, Mar 24, 2020 at 6:33 PM Len Trigg <len...@gmail.com> wrote:
Was there any warning printed out when you ran it?

It looks like there was no score field in your input data.

Cheers,
Len.


--
You received this message because you are subscribed to the Google Groups "RTG Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rtg-users+...@realtimegenomics.com.
To view this discussion on the web visit https://groups.google.com/a/realtimegenomics.com/d/msgid/rtg-users/CACrn4_AM1x4AOqD9zGLfdgz30cAgGKNV7DYgRBRXgF2KFg3%3DXw%40mail.gmail.com.


--

Len Trigg

unread,
Mar 24, 2020, 3:09:18 PM3/24/20
to Alexandra Vatsiou, Len Trigg, RTG Users
Hi Alexandra,

If you are new to RTG I would recommend running the demo-tools.sh script that comes with the install, and also reading the user manual section on vcfeval.

The ROC curve plots the accuracy trade-off with respect to some quality score that could be used for filtering. The score must be a field in the VCF, and by default vcfeval will use the GQ field. There is a command line option that you can use to select the field you are interested in (e.g QUAL).

Cheers,
Len

Alexandra Vatsiou

unread,
Mar 24, 2020, 5:02:38 PM3/24/20
to Len Trigg, Len Trigg, RTG Users
Hi Len,

thanks for the quick reply. the vcf file is from a somatic analysis, and I think this might be the reason that the scores are not reported. I will check the vcf, and I will come back to you if I still have issues. 
I have run vcfeval a few times but first I am plotting first time the ROC. 
Best,
Alexandra

Len Trigg

unread,
Mar 24, 2020, 5:53:57 PM3/24/20
to Alexandra Vatsiou, RTG Users
Hi Alexandra,

Hopefully the somatic caller will have added some kind of confidence score associated with each call that you can use. For somatic evaluation, you might want to look at using the --squash-ploidy option, since you are more concerned with the discovery of the somatic variant than the exact genotype that the caller produced (which is likely to vary according to tumor purity, heterogeneity, CNVs etc).

Cheers,
Len.

Reply all
Reply to author
Forward
0 new messages