Star parameteres for splicing and splicing analysis

44 views
Skip to first unread message

Raja Ishaq Nabi Khan

unread,
Oct 21, 2024, 12:09:57 AM10/21/24
to rMATS User Group
Hi,

I plan to do the gene expression and alternative splicing analysis for Fastq files of mice. I am going with Star to generate Bam files and through rMATs, I will do the splicing analysis.

I have 150-read-long paired-end reads. What would be the ideal parameters/arguments for the command of Star mapping to get the Bam files that can used for splicing analysis? 
Can I use the same Bam files for gene-expression analysis or do I have to make any changes in any of the arguments of the star mapping command? 


Thank you so much, 
Raja

kutsc...@gmail.com

unread,
Oct 21, 2024, 9:12:56 AM10/21/24
to rMATS User Group
Here's the STAR command used by rMATS if run with fastq files: https://github.com/Xinglab/rmats-turbo/blob/v4.3.0/rmats.py#L66
Some parameters can change depending on the rMATS parameters but it's essentially:
STAR --chimSegmentMin 2 --outFilterMismatchNmax 3 --twopassMode Basic --runThreadN 4 --outSAMstrandField intronMotif --outSAMtype BAM SortedByCoordinate --alignSJDBoverhangMin 1 --alignIntronMax 299999 --genomeDir /path/to/STAR/index --sjdbGTFfile /path/to/gtf --outFileNamePrefix bam_out/ --readFilesIn reads_1.fastq reads_2.fastq

From this paper the recommended STAR command is https://www.nature.com/articles/s41596-023-00944-2
STAR --genomeDir /path/to/STAR/index --readFilesIn reads_1.fastq reads_2.fastq --outFileNamePrefix bam_out/ --outSAMunmapped Within --outSAMattributes NH HI AS NM MD XS --twopassMode Basic --alignSJDBoverhangMin 1 --alignSJoverhangMin 8 --alignEndsType EndToEnd --runThreadN 6 --outSAMtype BAM SortedByCoordinate --outSAMstrandField intronMotif

Either command should be fine. I think the bam files from STAR should be fine to use for both gene expression and alternative splicing analysis. You could check the documentation for the differential gene expression tool that you plan to use to see if there are any suggestions about aligning the reads

Eric

Raja Ishaq Nabi Khan

unread,
Oct 24, 2024, 3:53:36 PM10/24/24
to kutsc...@gmail.com, rMATS User Group
Hi Eric, 
Many thanks for your email. While generating the sashimi plot through this command 

rmats2sashimiplot --b1 $bam1,$bam2,$bam3 --b2 $bam4,$bam5,$bam6 \--event-type SE -e sashimi_events.txt --l1 PC3E_rep --l2 GS689_rep \--exon_s 1 --intron_s 5 -o ./output --group-info sashimi_groupInfo.txt


I found some unusual error 


The inclusion levels of Event 'chr17_34948374_34948472_-@chr17_34948156_34948239_-@chr17_34947739_34947871_-' contains 'NA' value,


While looking manually at my sashimi_events.txt file, I could not find any NA value.  I used R to get the sashimi_events.txt of some of the important genes from the SE.MATS.JC file

se_sashmiplot <- SE.MATS.JC[SE.MATS.JC$geneSymbol %in% pcif_wt_s_filt_top_SE_ann$geneSymbol,]
 
Could you please suggest now how to avoid the problem?   Could you suggest some alternative ways to generate the sashimi_events.txt

Regards,
Raja Ishaq Nabi Khan



--
You received this message because you are subscribed to the Google Groups "rMATS User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rmats-user-gro...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/rmats-user-group/02267333-6e31-4ef8-8a85-6042328d4863n%40googlegroups.com.

kutsc...@gmail.com

unread,
Oct 25, 2024, 10:02:40 AM10/25/24
to rMATS User Group
It could be that your R code changed the format of the file. In this post (https://github.com/Xinglab/rmats2sashimiplot/issues/40) the issue was a filtering step that added quotes around some of the columns. If you have a similar issue with quotes being added you could try modifying your R code or using the example python script from that post

Eric

Raja Ishaq Nabi Khan

unread,
Oct 30, 2024, 2:03:54 PM10/30/24
to kutsc...@gmail.com, rMATS User Group
Many thanks for your email. The Python script works. I have one more question. How can the average number of reads from the JC or JCEC file types be calculated?


Best, 
Raja



Reply all
Reply to author
Forward
0 new messages