Jul 6, 2022, 7:21:04 AMJul 6
I have ran into an issue, and could really use some help. I can't find some reads that are present in the raw read files, but disappear after alignment i.e I can't find the read identifier and I am also unable to find any bits of those sequences (samtools view output.bam | grep "piece-of-that-sequence").
Background: I have scRNA data, smartseq2, paired-end, no spike-ins. Aligning with STAR (2.7.10a). I was tracing why I had strong plate effects between some "replicates", but not others and went through the FASTQC files and found that the plates with issues had disproportionate amounts of over-represented sequences. I arbitrarily grep-ed some of these sequences in the raw fastq files and they are indeed present; then I wanted to see what they end up mapping to in STAR and I am unable to find these reads anywhere - does STAR discard some reads without a trace? The final output log states that the correct amount of reads were input, but still I can't find these reads anywhere...
I am hoping I am missing something obvious, but I can't figure it out.