SNPs during alignment?

Roman Hillje

unread,

Nov 25, 2017, 5:24:52 AM11/25/17

to rna-star

Hi Alex,

I wanted to know if STAR can be supplied with a list of genomic positions (such as SNPs) that will be ignored during the alignment? This would be helpful since we are looking at SNP that we know are not wild-type but I'm suspecting that not ignoring those positions during the alignment will introduce a bias since it will essentially be a mismatch in addition to the ones added by the sequencing. Therefore, reads containing the SNP are less likely to be uniquely aligned, right?

It would be great to hear back from you about this. Maybe this feature exists already and I completely missed it? Or my concept is faulty anyway and it works completely different.

Thanks,

Roman

Alexander Dobin

unread,

Nov 27, 2017, 9:54:04 PM11/27/17

to rna-star

Hi Roman,

I am not sure I fully understand the problem. Unless you have diploid personal genome, there are always variants differentiating the specific individual from the reference genome sequence.

Cheers

Alex

Roman Hillje

unread,

Dec 1, 2017, 7:57:33 AM12/1/17

to rna-star

Yes, you're right that this is another issue. However, usually this can be ignored since it applies to all reads and is more or less random, right?

However, in scRNAseq analysis, assume that you have two cell populations, one that has a SNP and then another which doesn't. Wouldn't that introduce a bias towards discarding reads of the cell population that has the variant versus the population that agrees with the reference? With a mismatch threshold of 1 bp, we are essentially expecting all reads from the first cell population (variant) to have no technical errors whereas the other cells (wildtype) can have one technical error and still pass the filtering. Or am I missing something?

Alexander Dobin

unread,

Dec 3, 2017, 2:58:18 PM12/3/17

to rna-star

Hi Roman,

I understand the question now - it is similar to the "reference" bias in the calculation of allele specific expression.
I think the best approach to deal with it is to map simultaneously to all haplotypes, which you can construct by adding the SNPs to the reference sequence. Then unique mappers will be haplotype specific, but you will need to deal more carefully with the multi-mappers to both haplotypes.

There are other approaches to deal with it, without changing the reference. One is to replace the SNPs with Ns, and another, my favorite, is WASP (https://www.nature.com/articles/nmeth.3582) in which you remap the reads that overlap SNPs loci with alternative variant.

Cheers

Alex

Roman Hillje

unread,

Dec 11, 2017, 10:05:26 AM12/11/17

to rna-star

Thank you for the feedback, Alex! I'll look into these options.

Keep up the good work!

Cheers,

Roman

Reply all

Reply to author

Forward