STR genotyping cannot handle "N" bases

16 views

Skip to first unread message

koh...@uw.edu

unread,

Aug 3, 2018, 7:26:40 PM8/3/18

to lobSTR user group

I already posted this on the GitHub a few weeks ago, so this is reposting it here to see if anyone has any ideas

lobSTR seems to mis-genotype reads (or at least some reads) where the STR (in this case, a poly-G region) has an N in the middle. Is there any way to fix/change this behavior?

Example:

The following read was put into lobSTR (STR bold):
TCTCGGCATCAACATCCAGAGTTTAGGGACCATGTCCCAGTCTCTGTGAGGTGGATGGGAAGTCAACATTAGTTGACTGAGCACCACCTGCGTGGAAGATGCAGCCCCCCCCNGCCCCATCACTGGGAATACAGTGCTGAGCAGGACAGCACCTGATGTGCGAGGGGGAAGACAGACAACAAATACATAAGCAATGGAATGTACCTTTGGCAGGCCGAT

The tags attached to the read by lobSTR afterwards are:

XS:i:46990694
XE:i:46990706
XR:Z:C
XD:i:-5
XC:f:13
XG:Z:CCCCCCC
XX:i:1
XM:i:-1
XQ:i:41
RG:Z:lobSTR;s66;spike_in
NM:i:7

I believe this to be incorrectly genotyped, as the tags should be:
XS:i:46990694
XE:i:46990706
XR:Z:C
XD:i:1
XC:f:13
XG:Z:CCCCCCCCNGCCCC
...

Is there an easy way to fix this behavior?

As a note, we can't use HipSTR for this application because we want a genotype per read, and HipSTR does not allow us to get the level of detail we need for our studies (effectively, pooled populations of cells with unknown population size).