RSEM-calculate-expression non determinsitic?

234 views

Skip to first unread message

Abhilash KANNAN

unread,

Jan 22, 2021, 11:25:33 AM1/22/21

to RSEM Users

Dear RSEM community,

I am having an issue of non-deterministic counts of alt/ref alleles in the RNA-seq analysis (I get different ref and alt allele counts from independent runs)

Here are the steps that I follow:

1) Preparing the Reference with RSEM and making use of STAR

rsem-prepare-reference --gtf GENOME_data/Homo_sapiens.GRCh38.90.gtf --star GENOME_data/Homo_sapiens.GRCh38.dna.genome.fa --star-path /data/NeoSELECT_prob/NeoCODE/applications/STAR-2.53a/bin -p28 GENOME_data/rsem 2> GENOME_data/error.txt 1> GENOME_data/output.txt &

2) Mapping with STAR

STAR --genomeDir GENOME_data --outSAMunmapped Within --outFilterType BySJout --outSAMattributes NH HI AS NM MD --outFilterMultimapNmax 20 --outFilterMismatchNmax 999 --outFilterMismatchNoverLmax 0.04 --alignIntronMin 20 --alignIntronMax 1000000 --alignMatesGapMax 1000000 --alignSJoverhangMin 8 --alignSJDBoverhangMin 1 --sjdbScore 1 --runThreadN 28 --genomeLoad NoSharedMemory --outSAMtype BAM Unsorted SortedByCoordinate --quantMode TranscriptomeSAM --outSAMheaderHD \@HD VN:1.4 SO:unsorted --outFileNamePrefix RNASEQ_data/R1.temp/R1 --readFilesCommand zcat --readFilesIn ../R1.read1.cut.fastq.gz ../R1.read2.cut.fastq.gz

3) Before running RSEM, try to sort the R1Aligned.toTranscriptome.out.bam and use this Sorted.bam as an input to rsem

Command:

cat <( samtools view -H R1Aligned.toTranscriptome.out.bam ) <( samtools view R1Aligned.toTranscriptome.out.bam | awk '{printf "%s", $0 "~"; getline; print}' | sort -S 30G -T ./ | tr '~' '\n' ) | samtools view -bS - > Sorted.bam

(Such a sorted file should not change from run to run, and RSEM results should be exactly the same.) This was mentioned in: https://groups.google.com/g/rna-star/c/kQGfbQhezsU?pli=1

To be sure I check the statistics of the Sorted.bam files from two independent runs. They are identical. So till thins step, everything seems to be deterministic and reproducible.

4) Running rsem-calculate expression using the Sorted.bam files

Command:

rsem-calculate-expression --p 28 --bam --output-genome-bam --paired-end --forward-prob 0 --sort-bam-by-coordinate RNASEQ_data/R1.temp/Sorted.bam --seed 0 GENOME_data/rsem RNASEQ_data/R1.temp/R1 2> RNASEQ_data/error_rsem.txt 1> RNASEQ_data/output_rsem.txt &

5) After the above steps, I check the statistics for the BAM files generated from RSEM run using samtools stats.

Problem:

I find the that R1.transcript.bam (from two independent runs) and R1.transcript.sorted bam (from two independent runs) are identical and deterministic, while R1.genome.bam and R1.genome.sorted.bam vary slightly between independent runs.

Additionally, I also tried to run rsem-calculate-expression using STAR to aligned reads with the parameter (--sort-bam-by-read-name). However i get an error (listed below).

the command that i used was:

for preparing the reference

rsem-prepare-reference --gtf GENOME_data/Homo_sapiens.GRCh38.90.gtf --star GENOME_data/Homo_sapiens.GRCh38.dna.genome.fa --star-path /users/data/applications/STAR-2.5.3a/bin/ -p 28 GENOME_data/rsem_ref

for Quantification with RSEM

rsem-calculate-expression --p 28 --star --star-gzipped-file --star-path /users/data/applications/STAR-2.5.3a/bin --paired-end --seed 5 --output-genome-bam --sort-bam-by-read-name --forward-prob=1.0 ../R1.read1.fastq.gz ../R1.read2.fastq.gz GENOME_data/rsem_ref/ RNASEQ_data/R1

Now i get this error:

Read A00182:360:HJM25DRXX:1:1102:8703:35728: the two mates do not align to the same transcript. RSEM does not support discordant alignments.

"rsem-parse-alignments GENOME_data/rsem-ref RNASEQ_data/R1.temp/R1 RNASEQ_data/R1.stat RNASEQ_data/R1.temp/R1.sorted.bam 3 -t ag XM" failed. Please check if you provide correct parameters/options for the pipeline

Any help on this would be highly appreciated. Thank you very much.

Best Regards,

Abhilash Kannan

Reply all

Reply to author

Forward

0 new messages