Aligning tRNA and tRNA halves using STAR

509 views
Skip to first unread message

carla zijlstra

unread,
Jun 28, 2016, 11:01:35 AM6/28/16
to rna-star
Dear all,
 I would like to align Illumina reads from small RNA libraries from which I know they contain tmature RNAs (about 75 bp)  and tRNA-halves (~35bp). 
These RNAs are encoded by multiple genes, so I wondered what would be the best settings for STAR. I suppose that the filters --outFilterScoreMinoverLread and --outFiltermatchNminoverLread should be switched off, but what about the filter --outFilterMultimapNmax (default setting is10)??

Kind regards, Carla Zijlstra

Alexander Dobin

unread,
Jun 30, 2016, 6:00:59 PM6/30/16
to rna-star
Hi Carla,

these are the parameters that we use for ENCODE:


We do the trimming with STAR internal trimmer, but you can also use outside tools, I do not think it makes a lot of difference.
Our ENCODE parameters are as follows:
--outFilterMultimapNmax 20     
              you can increase this even more, it's easy to filter out the multimappers later if you decide to
--clip3pAdapterSeq TGGAATTCTC --clip3pAdapterMMp 0.1            
              trimming is definitely a must for the small RNA, as most reads will be longer than RNA molecules, so there will be adapter sequences at 3’. You can also use an outside trimmer
--outFilterMismatchNoverLmax 0.03 
              this sets the max number of mismatches at 3% of the mapped length, it's quite permissive
--outFilterScoreMinOverLread 0 --outFilterMatchNminOverLread 0 --outFilterMatchNmin 16 
              as you stated, the 1st and 2nd options switch off the filters of min length/score over read length, the 3rd introduces a min mapped length of 16b
--alignIntronMax 1
               this switches off splicing

Cheers
Alex

carla zijlstra

unread,
Jul 4, 2016, 5:32:31 AM7/4/16
to rna-star
Dear Alex, thank you for your quick reply! This is certainly very helpful.

I still have a question about the trimming; pre tRNAs are known to have genomic sequences at their 5' and 3'ends that are spliced of during maturation. However, our samples might contain pre-tRNAs and we would like to know whether the tRNAs in our sample still have these sequences or not. By using trimming they might be trimmed off??

Kind regards, Carla

Alexander Dobin

unread,
Jul 6, 2016, 4:41:24 PM7/6/16
to rna-star
Hi Carla,

trimming should only remove the adapter sequence, so it should not trim out any genomic sequences, unless by some unlucky chance the adapter sequence matches these specific genomic sequences.
If you expect "splicing" in your pre-tRNA sequences (i.e. pieces of read sequence coming from non-contiguous but subsequent regions of the genome), you should *not* turn off splicing with --alignIntronMax 1
but rather use an expected max intron size (which I guess is pretty short, a few hundred bases?), e.g. --alignIntronMax 200.

Cheers
Alex

carla zijlstra

unread,
Jul 8, 2016, 10:07:18 AM7/8/16
to rna-star
Thanks again! kind regards, Carla

carla zijlstra

unread,
Jul 11, 2016, 5:52:41 AM7/11/16
to rna-star
Dear Alex,

As you suggested I will clip the adapter sequence from 3p by indicating the string of this adapter (parameter --clip3pAdapterSeq), but should I also indicate the string for the 5p adapter somewhere? or only indicate the length of the 5p adapter (parameter --clip5pNbases)? Default setting for --clip5pNbases is 0...

kind regards, carla

On Friday, July 1, 2016 at 12:00:59 AM UTC+2, Alexander Dobin wrote:

Alexander Dobin

unread,
Jul 14, 2016, 11:24:20 AM7/14/16
to rna-star

Hi Carla


we are not supposed to see the 5’ adapter in the Illumina sequencing, since the sequencing primer hybridizes to it, and the sequencing has to start right after the last base of the adapter.

Because of that STAR does not have the 5’ adapter trimming option.

However, there may be rare cases where 5’ adapter appears in the sequencing - such as adapter dimers, i.e. 5’ adapter attaches to itself.

Such sequences have to be completely filtered out, since they are completely artificial - this needs to be done before mapping. 

However, I believe these cases are very rare and shall not affect the results significantly.


Cheers
Alex

carla zijlstra

unread,
Jul 15, 2016, 3:28:38 AM7/15/16
to rna-star
Dear Alex, in that case I will use the default setting(0) for --clip5pNbases.

Kind regards, carla

carla zijlstra

unread,
Aug 18, 2016, 7:59:16 AM8/18/16
to rna-star
Dear Alex, we did the alignment of our small RNAs again with the new settings and the % of reads unmapped too short has dropped from >90% to 0%, so that is great!. The number of input reads is a bit lower than with the previous settings, but now at least all the reads are mapped ( about 50% unique reads and 50% multimapping reads)
However, when I compare Idxstats of the bam files (using SAMtool from Galaxy site), the number of mapped reads is much higher than the number of input reads reported in the STAR log.out files. Why is this??

Kind regards, Carla

carla zijlstra

unread,
Aug 18, 2016, 8:30:29 AM8/18/16
to rna-star
I already figured out that it is because Idxstat counts all alignments in the SAM file (like Flagstat), such that multimappers are counted many times, while STAR counts reads in such a way that a multimapper is counted only once :)

Kind regards, carla

carla zijlstra

unread,
Sep 12, 2016, 4:21:59 AM9/12/16
to rna-star
Hi Alex,
 one of the files generated by STAR is called smallRNA read counts and lists for ENSG... IDs the read counts (very nice!). However, I cannot find transfer RNAs  and ribosomal RNAs in this list. How can I retrieve read counts for these RNAs from STAR files? I already used Galaxy tools, but that's list all alignments of these multimappers and not read counts.

Kind regards, carla

Alexander Dobin

unread,
Sep 14, 2016, 12:19:08 PM9/14/16
to rna-star
Hi Carla,

for ENCODE, this was done while mapping with the --quantMode GeneCounts and adding the tRNAs genes into the GTF annotations.
If you do not want to re-map, you can use htseq-count (slow) or featureCounts - they count unique mappers only.
Or you can filter the BAM files for unique mappers with samtools view -q 255 ... and then pass it to Galaxy tools.

Cheers
Alex
Reply all
Reply to author
Forward
0 new messages