SMART-SEQ scRNA alignment

User_new

unread,

Apr 23, 2020, 5:34:50 PM4/23/20

to rna-star

I have SMART-SEQ data generated as part of a single cell experiment and used STAR for mapping the reads. After removing the nextera sequences the mapping percentages had too many unmapped reads (not contamination)

STAR --runThreadN 8 --genomeDir $TRANS_DATA --readFilesIn <(gunzip -c ${names[${SLURM_ARRAY_TASK_ID}]}_R1_001_val_1.fq.gz) <(gunzip -c ${names[${SLURM_ARRAY_TASK_ID}]}_R2_001_val_2.fq.gz) --outSAMtype BAM SortedByCoordinate --outFileNamePrefix ${names[${SLURM_ARRAY_TASK_ID}]}_noextraparam --quantMode GeneCounts

Started job on | Apr 22 23:19:11

Started mapping on | Apr 22 23:19:39

Finished on | Apr 23 00:01:11

Mapping speed, Million of reads per hour | 38.02

Number of input reads | 26318495

Average input read length | 183

UNIQUE READS:

Uniquely mapped reads number | 8665045

Uniquely mapped reads % | 32.92%

Average mapped length | 178.45

Number of splices: Total | 37848

Number of splices: Annotated (sjdb) | 1071

Number of splices: GT/AG | 25908

Number of splices: GC/AG | 1456

Number of splices: AT/AC | 31

Number of splices: Non-canonical | 10453

Mismatch rate per base, % | 0.36%

Deletion rate per base | 0.06%

Deletion average length | 1.15

Insertion rate per base | 0.01%

Insertion average length | 1.18

MULTI-MAPPING READS:

Number of reads mapped to multiple loci | 291859

% of reads mapped to multiple loci | 1.11%

Number of reads mapped to too many loci | 45962

% of reads mapped to too many loci | 0.17%

UNMAPPED READS:

% of reads unmapped: too many mismatches | 0.00%

% of reads unmapped: too short | 65.42%

% of reads unmapped: other | 0.37%

CHIMERIC READS:

Number of chimeric reads | 0

% of chimeric reads | 0.00%

I went through a couple of posts, and included the following parameters which improved the unique reads to 65% how would it affect the read counting before proceeding to downstream analysis.

STAR --runThreadN 8 --genomeDir $TRANS_DATA --readFilesIn <(gunzip -c ${names[${SLURM_ARRAY_TASK_ID}]}_R1_001_val_1.fq.gz) <(gunzip -c ${names[${SLURM_ARRAY_TASK_ID}]}_R2_001_val_2.fq.gz) --outFilterScoreMinOverLread 0 --outFilterMatchNminOverLread 0 --outFilterMatchNmin 0 --outFilterMismatchNmax 2 --outSAMtype BAM SortedByCoordinate --outFileNamePrefix ${names[${SLURM_ARRAY_TASK_ID}]}_extraparam --quantMode GeneCounts

Mapping speed, Million of reads per hour | 38.20

Number of input reads | 27081369

Average input read length | 182

UNIQUE READS:

Uniquely mapped reads number | 17671953

Uniquely mapped reads % | 65.26%

Average mapped length | 105.20

Number of splices: Total | 45787

Number of splices: Annotated (sjdb) | 1221

Number of splices: GT/AG | 17506

Number of splices: GC/AG | 2289

Number of splices: AT/AC | 93

Number of splices: Non-canonical | 25899

Mismatch rate per base, % | 0.55%

Deletion rate per base | 0.05%

Deletion average length | 1.16

Insertion rate per base | 0.01%

Insertion average length | 1.22

MULTI-MAPPING READS:

Number of reads mapped to multiple loci | 8788785

% of reads mapped to multiple loci | 32.45%

Number of reads mapped to too many loci | 502928

% of reads mapped to too many loci | 1.86%

UNMAPPED READS:

% of reads unmapped: too many mismatches | 0.00%

% of reads unmapped: too short | 0.00%

% of reads unmapped: other | 0.43%

CHIMERIC READS:

Number of chimeric reads | 0

% of chimeric reads | 0.00%

Alexander Dobin

unread,

Apr 27, 2020, 7:53:26 PM4/27/20

to rna-star

Hi @User_new

in your 2nd run, you basically removed all "mapping quality" filtering, allowing alignments of any length.

The danger with such an approach is that many short alignments may be wrong, which may skew the quantification.

I think the best approach is to try to understand why the reads do not map. It looks like you checked for contamination.

Other probable causes are

(i) poor sequencing quality

(ii) presence of adapter sequences at the read ends - have you trimmed the adapter sequences before mapping?

Cheers

Alex

User_new

unread,

Apr 27, 2020, 11:38:03 PM4/27/20

to rna-star

Yes, the nextera sequences were trimmed before mapping. Maybe will check by mapping one set of reads of the paired end.

Alexander Dobin

unread,

Apr 30, 2020, 6:43:54 PM4/30/20

to rna-star

That's a good check!

Reply all

Reply to author

Forward