Large difference in mismatch rate per base between samples from the same experiment

167 views

Skip to first unread message

Nathanael Walker-Hale

unread,

May 4, 2022, 5:53:03 PM5/4/22

to rna-star

Hi Alex,

I'm using STAR to map plant RNA-seq data from a time course experiment with 5 time points, with control and treatment samples from each time point. I've observed that some samples (the first 26) have what seem to me to be concerning mismatch rates, e.g.

However, a little over halfway through the samples (sample 27-44), the mismatch rate dramatically improves, e.g.

I noticed also that for these latter samples, the average mapped length is a little closer to the average input length and the proportion of uniquely mapping reads is maybe the tiniest bit elevated, suggesting higher quality.

I conducted all the extractions myself over the course of about 5 days, meaning that some samples stayed at -80 longer than others. However if this was the issue, I would expect that later extractions to have higher mismatch rates, and this is not what I see. Our sequencing provider (BGI) is supposed to have processed them all together, to avoid batch effects. I suppose that my questions are i) do you think this indicates a library quality-related batch effect and ii) are the elevated mismatch rates of the first 26 samples a concern for alignment quality and read quantification for DEG analysis?

Many thanks for the help,

Best wishes,

Nathanael

Alexander Dobin

unread,

May 13, 2022, 2:32:58 PM5/13/22

to rna-star

Hi Nathanael,

A higher mismatch error rate could indicate poorer sequencing quality or library prep issues,, but, in principle, could also be biological in nature.

You can look at the quality score distributions to see if the higher error rate library had lower scores, i.e. poorer sequencing quality.

It's interesting that other mapping statistics do not change significantly, which points against the sequencing quality issue.

Biological explanations require some imagination... e.g., some samples have more expression from more variable regions of the genome.

Cheers

Alex

Reply all

Reply to author

Forward

0 new messages