STAR aligner for the 3'-biased (CEL-Seq) reads

46 views
Skip to first unread message

M-F Maxwell Shih

unread,
Apr 26, 2021, 4:57:35 PM4/26/21
to rna-star
Hello Alex,
I have been a big fan of STAR. Thanks for the great aligner tool.
My project is to use the CEL-Seq2 lib prep protocol to profile the fly transcriptome. This protocol has a design to preferentially sequence the very 3'-end of each transcript.
(ref:
Hashimshony, T. et al. CEL-Seq2: sensitive highly-multiplexed single-cell RNA-Seq. Genome Biol 17, 77 (2016).
Hashimshony, T., Wagner, F., Sher, N. & Yanai, I. CEL-Seq: Single-Cell RNA-Seq by Multiplexed Linear Amplification. Cell Reports 2, 666–673 (2012). )

Recently, I found that using default settings of STAR gave me some spliced alignments like CIGAR 8M236534N43M. That is not real to me. Is there any parameter setting that can limit the spliced alignments to span only one intron? I have tried "--outSJfilterIntronMaxVsReadN" and "--alignIntronMax" but most likely failed (still see big gaps withing single alignment records).

Thank you again! Looking forward to hearing back from you,

Maxwell

Alexander Dobin

unread,
Apr 26, 2021, 5:02:14 PM4/26/21
to rna-star
Hi Maxwell,

If you use, say --alignIntronMax 50000, it should prohibit *unannotated* introns larger than 50kb. However, the annotated introns (i.e. those present in the GTF) will still be allowed.
You can check if an intron is annotated by adding jM tag to the --outSAMattributes list.

Cheers
Alex

M-F Maxwell Shih

unread,
Apr 26, 2021, 8:32:51 PM4/26/21
to rna-star
Hello Alex,
Thanks! That explains it. And thanks so much for the 'jM' tag trick!

Sincerely,

Maxwell

M-F Maxwell Shih

unread,
Apr 27, 2021, 2:53:55 PM4/27/21
to rna-star
Hello Alex,
I am not sure whether this is a bug, but when I use code like this
STAR --genomeDir /gpfs/projects/DubnauGroup/ref/ --sjdbGTFfile /gpfs/projects/DubnauGroup/ref/dm6.refGene.CEL-Seq.gtf --readFilesIn DPM3_R2_exUMI_044_masked --alignIntronMax 5000 --outSAMattributes jM --soloStrand Forward --outFilterType BySJout --outFilterIntronMotifs RemoveNoncanonicalUnannotated --runThreadN 40 --readFilesCommand zcat --outReadsUnmapped Fastx --outFileNamePrefix ./debug.3_
It generated errors during the samtools steps such as:
[E::sam_parse1] hex field does not have an even number of digits
[W::sam_read1] Parse error at line 305169
[main_samview] truncated file.

I have tried to reduce the thread N but didn't help to prevent the error. Eventually, I took the '--outSAMattributes jM' out of the code and haven't got any errors.

For my purpose, I am fine to ignore whether the introns are annotated or not. Could it be my mistake not properly setting up the parameter? Or maybe you want to look into it to see whether it's a bug.

Sincerely,

Maxwell

Alexander Dobin

unread,
Apr 27, 2021, 5:13:39 PM4/27/21
to rna-star
Hi Maxwell,

what's the version of samtools that you are using? The *very* old versions did not support the tags with arrays of numbers, which is what jM tag is.

Cheers
Alex

M-F Maxwell Shih

unread,
Apr 27, 2021, 8:48:08 PM4/27/21
to rna-star
Hello Alex,
Thanks for your instruction. I went back to check, and it's 1.10, which was released on Dec 6, 2019.
Not sure whether it's very old or not. In the future, if I have the authority to install samtools I will pay attention to the version issue.
Thanks again! Really appreciate your time and professional instructions!

Sincerely,

Maxwell
Reply all
Reply to author
Forward
0 new messages