Hi Alex (and anyone else),
I have a couple of questions about mapping a single reads from paired end data (for example if one read maps more poorly to different locus, is too short after QC trimming, etc). I've found three useful posts in this group here, with slightly different options that I'm trying to reconcile (copied your responses in quotes out of context below for others' reference):
"you could get single-end alignments by reducing both --outFilterScoreMinOverLread --outFilterMatchNminOverLread to below 0.5, say to 0.4.
These parameters control the minimum read length normalized to the total read length (sum of the mates).
To output the unmapped read, you will need to use --outSAMunmapped Within , the unmapped mate will have the coordinate of the mapped one according to the standard SAM rules."
"By default, at least ~2/3 of the read length (sum of both mates) has to be mapped, which for untrimmed reads of equal length requires paired-end alignments.
However, if one of the reads is trimmed significantly before mapping, a single-end alignment may be allowed.
For instance, if you have 2x100 and the 2nd mate was trimmed to 20 bases, than an alignment longer than ~2/3*(100+20)=80 will be accepted.
To ensure that you only get paired alignments you can specify --outFilterMatchNmin <minMateLength+1> (e.g. 101 for 2x100)."
"you could get single-end alignments by reducing both --outFilterScoreMinOverLread --outFilterMatchNminOverLread to below 0.5, say to 0.4. This will work if your mates have the same length, if not, you need to set
--outFilterScoreMinOverLread 0 --outFilterMatchNminOverLread 0 --outFilterScoreMin <minMateLength*0.8>"
My experiment/processing steps: I have 2x101 HiSeq 2500 data with a polyA-enriched library using custom inline barcodes/oligo-dT adapters. R2 has the barcode and poly-T tract, which I force trim from the left (using bbduk, to 78bp). R2 is in general, of poorer quality than R1 (presumably due to polyT tract interference on the sequencer), so I then QC trim the right to varying lengths (with bbduk Q10). I trim R1 for read-through barcodes and poly-A tracts. This results in read pairs of different lengths, and some R2 of less than 25bp which I would like to discard (ignore). However, I keep the read files the same number of lines and ordered for paired end analysis in STAR. Is there a way to ignore/filter the a mate that might have low mapping scores and/or is too short?
In my case, would I want to use:
--outFilterScoreMinOverLread and --outFilterMatchNminOverLread ~0.4
--outFilterMatchNmin <minMateLength+1> (e.g. 79 for my trimmed R2) (in which case should --outFilterScoreMinOverLread and --outFilterMatchNminOverLread be 0 or 0.4?)
or
--outFilterScoreMinOverLread 0 --outFilterMatchNminOverLread 0 --outFilterMatchNmin <minMateLength*0.8> (e.g. 78*0.8=62)?
or
--<none of the above, I don't understand what these mean>
Thanks,
Anthony