—alignEndsType: ”Local” versus ”EndToEnd”

1,152 views

Skip to first unread message

Shanrong Zhao

unread,

Dec 18, 2014, 5:08:26 PM12/18/14

to rna-...@googlegroups.com

I want to disable soft clipping when running STAR by setting —alignEndsType from "Local” to ”EndToEnd”. I have two samples (from human whole blood) and have request BGI to do both stranded and non-stranded RNA-sequencing for each sample. I got a total of 4 datasets. Below are the mapping summary for my analysis:

—alignEndsType=”Local”

Sample Total_reads Uniq_Rate Multi_Rate Unmap_Rate

00004_nonstranded 61617213 92.57 3.64 3.79

00005_nonstranded 63096109 93.02 3.53 3.45

00004_stranded 62893222 90.56 3.73 5.71

00005_stranded 61184208 90.17 3.66 6.17

—alignEndsType=”EndToEnd”

Sample Total_reads Uniq_Rate Multi_Rate Unmap_Rate

00004_nonstranded 61617213 89.44 3.40 7.16

00005_nonstranded 63096109 89.89 3.29 6.82

00004_stranded 62893222 85.53 3.37 11.10

00005_stranded 61184208 84.63 3.28 12.09

STAR itself is NOT strand-aware when mapping the reads to the reference genome. In general:

1. Stranded vs non-stranded sequencing: slightly more reads are uniquely mapped in non-stranded sequencing (accordingly, less reads become unmapped in non-stranded sequencing

2. When “—alignEndsType” is set from “Local” to “EndToEnd”, its impact on “stranded” sequencing is much bigger than nonstranded sequencing. For instance, for "uniquely mapped reads" in sample 0004:

a. Nonstranded: 92.57 (LOCAL) à 89.44 (EndToEnd”): ~3% DIFFERENCE

b. Stranded: 90.56 (LOCAL) à 84.63 (EndToEnd”): ~6% DIFFERENCE

Especially for the 2nd fact, I don't have a good explanation.

My command line looks like:

STAR --genomeDir /hpc/grid/shared/ngsdb/STAR/hg19_gencode19 --readFilesIn /hpc/grid/ngsws/molmed/data/BGIpilotNov2014/CleanData/RC-140808-00004_stranded_1.fq.gz /hpc/grid/ngsws/molmed/data/BGIpilotNov2014/CleanData/RC-140808-00004_stranded_2.fq.gz --readFilesCommand zcat --runThreadN 8 --alignSJDBoverhangMin 1 --outReadsUnmapped Fastx --alignEndsType EndToEnd --outFilterMismatchNoverLmax 0.05 --alignIntronMax 1000000 --outSAMtype BAM SortedByCoordinate;

Reply all

Reply to author

Forward

0 new messages