hard filtered variants in split directory pass file

15 views
Skip to first unread message

Jonathan Mitchell

unread,
Mar 20, 2018, 12:00:40 PM3/20/18
to GotCloud
If I understood the gotcloud wiki correctly then the variants which fail a hardfilter and so are not marked as PASS in the chr*.hardfiltered.vcf.gz file (and the chr*.hardfiltered.sites.vcf file) should not appear in the  chr*.filtered.PASS.vcf.gz file in the split directory.

However, I've ran gotcloud snpcall (without errors) and the only variants which have been filtered out in the split directory vcf are due to svm.  All the variants which do not PASS the hardfiltering remain.  If this behaviour is not expected can you please suggest where I might be going wrong?  If it is expected, can you advise how to run snpcall so that the hardfiltered variants don't appear as PASS in the split directory vcf?

Thanks for your help.

Mary Kate Wing

unread,
Mar 25, 2018, 11:48:13 PM3/25/18
to Jonathan Mitchell, GotCloud
It is a little confusing, but it is expected that some variants that fail hard filters will end up as PASS after SVM filtering.  The others will be relabeled as failing SVM.
Let me try to explain.

GotCloud has a 2-step filtering approach.  
First uses the "hard" filters to identify reads that fail a set of thresholds.  Since it is difficult to calibrate these thresholds to best identify "failed" reads, we follow up the hard filter step with SVM filtering.

The SVM filter uses the sites that fail multiple hard filters as false positives and combines those with some external information for positive examples to train itself to best identify which sites should be marked as PASS and which sites fail.

So the idea is that just because a site fails a single hard filter that is difficult to perfectly calibrate, it may not actually be a failure.  The SVM filter remarks the sites as either pass or as failing SVM based on all of the inputs it receives (from both external reference files and from the hard filtering information).

Does that help clarify how GotCloud works and why some of the hard filtered sites appear as PASS?

As for your situation, are all of the hard filtered sites in the PASS file or just some of them?  They would be relabeled as failing SVM rather than as failing the hard filters they failed in the hard filter step.

If you have further questions, please let me know.

Mary Kate Wing

The GotCloud publication contains a more in-depth explanation of the methods.  If you are interested, here is a link: http://genome.cshlp.org/content/early/2015/04/14/gr.176552.114.abstract

--
You received this message because you are subscribed to the Google Groups "GotCloud" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gotcloud+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Hyun Min Kang

unread,
Mar 26, 2018, 9:48:21 AM3/26/18
to Mary Kate Wing, Jonathan Mitchell, GotCloud
Hard filter is provides preliminary filtering results that will be used for SVM filtering, and SVM filtering results determines the final outcome of the SVM filter

To unsubscribe from this group and stop receiving emails from it, send an email to gotcloud+u...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "GotCloud" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gotcloud+u...@googlegroups.com.
Reply all
Reply to author
Forward
Message has been deleted
0 new messages