Parameters and expected output using STAR for mapping microRNA

363 views
Skip to first unread message

Yuquan Tong

unread,
Mar 30, 2021, 10:40:22 AM3/30/21
to rna-star
Hi,

I found a 2013 email thread about using STAR to map small RNA seq data and have tested it with my own dataset. Since the parameters from that email is rather old, I want to post and confirm if I need to make any changes, and if my outputs look good or not.

The parameters I used are below (part from previous email discussion and part from ENCODE project guideline):

--sjdbGTFfile ~/STARfiles/GENCODE_miRNA_subset.gtf \
--readFilesCommand zcat \
--outFileNamePrefix ~/155micro/STARoutput/SA-463-01/463-01- \
--outSAMtype BAM SortedByCoordinate \
--outSAMattributes Standard \
--alignEndsType EndToEnd \
--outFilterMismatchNmax 1 \
--outFilterMultimapScoreRange 0 \
--quantMode TranscriptomeSAM GeneCounts \
--outReadsUnmapped Fastx \
--outFilterMultimapNmax 10 \
--outSAMunmapped Within \
--outFilterScoreMinOverLread 0 \
--outFilterMatchNminOverLread 0 \
--outFilterMatchNmin 16 \
--alignSJDBoverhangMin 1000 \
--alignIntronMax 1 

The input files were trimmed by cutadapt with --minimum-length 1

The Log.out report screenshot is below. 

Since this is my first time using STAR for small RNAs, may I ask if my output looks fine/good/bad? If not good, is there any parameter I should tweak to improve?

Thanks a lot!
Yuquan

Picture1.jpg



Alexander Dobin

unread,
Mar 31, 2021, 5:35:27 PM3/31/21
to rna-star
Hi @yuquantong97

the parameters are still OK.
The results look good to me, with few unmappable reads and a nice proportion of unique mappers.

Cheers
Alex

Yuquan Tong

unread,
Apr 4, 2021, 1:51:24 AM4/4/21
to rna-star
Sorry I think I replied only by email, not on website. Here I copied below:

Thank you so much, Alex.

I have a follow up question. To analyze bam files, I used featurecounts to generate a read count table for Deseq2 analysis.

Here is the script I ran for featurecounts, using gff file downloaded from mirbase:

featureCounts -a hsa.gff3 -t miRNA -g 'Name' -o 460-01_counts.txt /path/463-01-Aligned.sortedByCoord.out.bam

I I attached the screenshot of my featurecounts result summary below. It seemed only 26.3% of reads were successfully assigned. Is this normal because many reads were not mapped to the mirome but to the genome?

If not normal, what can I try to improve the percentage of assigned reads?

Thanks!
Yuquan

unnamed.png

Alexander Dobin

unread,
Apr 8, 2021, 6:28:47 PM4/8/21
to rna-star
Hi @yuquantong97

This is probably normal - it means that only ~26% of your reads are miRNA. The rest are probably other non-coding RNA, or fragments of them.
A high % of miRNA is hard to achieve unless the protocol uses very careful size selection steps filtering out RNAs longer than ~24b.

Cheers
Alex
Reply all
Reply to author
Forward
0 new messages