Troubleshooting Germline Performance

45 views
Skip to first unread message

nick.p...@personalis.com

unread,
Feb 20, 2018, 3:59:41 PM2/20/18
to strelka-discuss
Hi, 

I'm evaluating Strelka2 for small variant calling on WES data, and I'm having trouble replicating results from literature. I'm comparing Strelka2 to both HaplotypeCaller and Sentieon for known SNVs/Indels. BQSR has been applied to the mapped reads, but no indel realignment has been performed. 

I'm configuring the Strelka workflow with the `--exome` flag enabled. When I run the workflow, I obtain results that are approximately correct, but fall short of both HaplotypeCaller and Sentieon. For example: (~ for approximate)

SNV
                                     Sensitivity           Specificity 
HaplotypeCaller:             ~.99                       ~1
Sentieon:                         ~.99                      ~1
Strelka2:                           .971                     ~1


Indel 
                                      Sensitivity             Specificity 
HaplotypeCaller:                ~.92                      ~1
Sentieon:                           ~.92                      ~1
Strelka2:                            .874                      ~1

My results with Strelka2 are not completely incorrect, but are not in the ballpark of published results, which appear to meet or exceed the performance of tools like HaplotypeCaller. Is this this performance expected from Strelka2 on exome data? If not, are there any troubleshooting steps I could try to improve performance? 

Thanks, 

Nick 

Saunders, Chris

unread,
Feb 23, 2018, 1:21:15 PM2/23/18
to strelka...@googlegroups.com

Hi Nick,

 

Sorry for the delay. Note we are better able to support queries through issues filed on the github repo here:

 

https://github.com/Illumina/strelka/issues

 

Regarding your question, strelka2 is optimized for whole genome sequencing, and all of our client projects are WGS. We do support exome but put in less optimization effort on this case, so your result is not necessarily surprising, although the indel recall you report is certainly a bit lower than expected. It is hard to tell if this is just a difference in PASS levels because only specificity rather than precision is visible below, thus the corresponding difference in FP counts between these tools is not obvious.

 

To optimize for exome, we would recommend training a scoring model specifically for the exome case. The scoring model training procedure for strelka2 is described here:

 

https://github.com/Illumina/strelka/blob/master/docs/userGuide/trainingGermlineEmpiricalScore.md

 

…we’ve done some quick prototypes on this for exome and seen good results, but when you use the `--exome` flag today for germline analysis, strelka disables its rescoring model and reverts to simple hard filters.

 

Hope that helps,

 

-Chris


This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. This message may contain privileged and / or confidential  information. If you are NOT the intended recipient of this message, copying, printing, disseminating, forwarding or any other use or action derived from its content is strictly prohibited. Please notify the sender immediately by e-mail if you have received this e-mail by error and delete this e-mail from your system. If you received the email by error and this message contains patient information, please report the error by contacting the Personalis Clinical Laboratory at clin...@personalis.com.

--
You received this message because you are subscribed to the Google Groups "strelka-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to strelka-discu...@googlegroups.com.
To post to this group, send email to strelka...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/strelka-discuss/15b6d56b-a7e6-4c88-82be-6825e312dbbb%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages