Baseline line file in Tumour:Normal samples

22 views
Skip to first unread message

Amjad Ali

unread,
Dec 31, 2022, 4:54:39 PM12/31/22
to RTG Users
Hi I have a quick question about the baseline file.

I have aligned/mapped my both my normal and tumour whole genome sequence files to the reference human genome. 

I have subsequently called the variants from these mapped tumour.bam and normal.bam filed to generate a calledvariants.vcf.gz file. 

I am now attempting to evaluate the results, as follows:

rtg vcfeval \
-t referencegenome.sdf \
-b ???
-c calledvariants.vcf.gz \
-o results.vcfeval \
-f RFGQ \
-m annotate \
--ref-overlap

Can I ask what do I use as my baseline vartiants containing file? e.g., is there a set of 'truth' etc available.

Thank you

Sean Irvine

unread,
Dec 31, 2022, 5:09:59 PM12/31/22
to Amjad Ali, RTG Users
Hi Amjad,

Yes, the baseline file is usually the truth file.  Exactly what you should be using for the baseline will depend on what you are trying to do.  If this is a well-studied sample like NA12878, then (germline) truth files are available from GIAB:


If you are wanting to compare the difference in variants between your normal and tumor samples. In that case, it will depend on whether your calledvariants.vcf.gz has columns for both normal and tumor samples. In this case, both samples will be in the same VCF.  So you might want something like:

... -b calledvariants.vcf.gz -c calledvariants.vcf.gz --sample NORMAL,TUMOR

(where NORMAL and TUMOR are whatever the corresponding samples are named in your VCF file).

If your caller has produced triploid (or higher calls), then you will likely also want --squash-ploidy.

Finally, if you are wanting to compare with a population database of variants (e.g. COSMIC), then you should use that as your baseline file and specify --sample ALT,TUMOR to allow matching against any of the alleles in the reference.

Sean.


--
You received this message because you are subscribed to the Google Groups "RTG Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rtg-users+...@realtimegenomics.com.
To view this discussion on the web visit https://groups.google.com/a/realtimegenomics.com/d/msgid/rtg-users/81809026-47f9-4710-9252-dd1131b966a7n%40realtimegenomics.com.

Amjad Ali

unread,
Jan 2, 2023, 8:26:51 AM1/2/23
to RTG Users, Sean A. Irvine, RTG Users, Amjad Ali
Thank You Sean, 

This is really useful to know. 

My caller reported both the normal and tumour variants. So I shall specify that. 

One more quick question. Is there a bench mark file containing both small variants and structural variants. Or do we have to analyse thee spearately?

Thank You

Sean Irvine

unread,
Jan 3, 2023, 2:24:36 PM1/3/23
to Amjad Ali, RTG Users
Hi,

The vcfeval command only deals with small variants (i.e., cases which are fully resolved into nucleotides).

The evaluation of structural variants is harder for a variety of reasons (lack of well-characterized truth sets, differences in representations, lack of precision in the range of such events, etc.).  There are various open source projects that can do certain comparisons (SURVIVOR, TruVari, etc.), but they will all need some appropriate truth file.

Sean.

Reply all
Reply to author
Forward
0 new messages