SVProps Output Header Definitions?

297 views
Skip to first unread message

David McBay

unread,
Feb 23, 2018, 10:53:37 AM2/23/18
to delly-users

Hi, we're trying to understand the SVprops output but can't find the definitions anywhere. These are our best guesses, can somebody please correct us/fill in the gaps! 


chr chromosome
start start position
chr2 chromosome 2 (if applicable)
end end position
id unique SV id
size size of event in bp
vac variant allele count (reads)
vaf variant allele frequency
singleton N/A if event present in multiple samples or sample name if only present in one
missingrate ?
svtype type of SV (DEL, INS, INV, BND, DUP)
ct ?
precise 1 if event is included in split read?
ci ?
inslen insertion length?
homlen homology length?
ce ?
refgq reference genotype quality?
altgq alternative genotype quality?
gqsum ?
rdratio ?
medianrc ?
refratio ?
altratio ?
maxaltratio ?
PEsupport paired ends
SRsupport split reads
supportsum ?



Cheers, 
David

Tobias Rausch

unread,
Apr 6, 2018, 4:40:13 AM4/6/18
to David McBay, delly-users
Hi David,

The primary purpose of SV props is to create a feature matrix for machine learning approaches to filter germline SVs. For very large cohorts such as 1000 Genomes or some of the large clinical resequencing cohorts this is a viable alternative to delly's default germline SV filter if and only if a good training set is available.

Thus, svprops generates mostly some site statistics across all samples and we usually first recalibrate the GQs for such large cohorts using:


./src/gq -g 30 -v output.vcf.gz input.vcf.gz

... then the output additionally includes HWE, inbreeding coefficients and some other metrics.

I am still updating some of the output fields of svprops but here is a brief explanation:


vac variant allele count  (across all samples)
vaf variant allele frequency (across all samples)
singleton N/A if present in multiple samples or sample name if only present in one
missingrate how many samples have a missing genotype (useful after GQ calibration)
ct Delly's INFO:CT
precise 1 if INFO:PRECISE
ci Delly's INFO:CIPOS
inslen insertion length
homlen homology length
ce Delly's INFO:CE 
refgq median reference genotype quality for all SV non-carriers (GT==0/0)
altgq median alternative genotype quality for het. SV carriers
gqsum total GQ sum (useful to flag repetitive sites that are poorly genotyped)
rdratio read-depth ratio of SV carriers to non-carriers (useful for filtering CNVs)
medianrc median coverage 
refratio median REF support for non-carriers
altratio median ALT support for carriers
maxaltratio max. ALT support for carriers
PEsupport total paired end support across all samples
SRsupport total split-read support across all samples
supportsum total depth across all samples


Best, Tobias



--
You received this message because you are subscribed to the Google Groups "delly-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to delly-users+unsubscribe@googlegroups.com.
To post to this group, send email to delly...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages