Error in RSEM simulation.

45 views

Skip to first unread message

Derrick DeConti

unread,

Mar 4, 2020, 12:46:59 PM3/4/20

to RSEM Users

Hi,

I have been attempting to create a synthetic data set with a ground truth. I am basing this ground truth from real samples. To this end, I aligned some PE75 RNASeq libraries with STAR, and passing it through RSEM (version 1.3.3) for read simulation. However, at the read simulation step, I get the following error:

Refs.loadRefs finished!
rsem-simulate-reads: simulation.cpp:113: void simulate(char*, char*) [with ReadType = PairedEndReadQ; ModelType = PairedEndQModel]: Assertion `denom > EPSILON' failed.
Aborted

There were now previous errors in the STAR alignment nor the RSEM calculate expression steps.

To this effect, here is what I did:

# STAR alignment

STAR \
 --readFilesIn r1.fq.gz r2.fq.gz \
 --genomeDir ~/ref/hs \
 --outFileNamePrefix for_sim. \
 --twopassMode Basic \
 --runThreadN 16 \
 --readFilesCommand zcat \
 --sjdbGTFfile Homo_sapiens.GRCh38.95.gtf \
 --quantMode TranscriptomeSAM \
 --outFilterType BySJout \
 --outSAMtype BAM SortedByCoordinate \
 --outReadsUnmapped Fastx


# RSEM reference build

rsem-prepare-reference \
 --gtf Homo_sapiens.GRCh38.95.gtf \
 Homo_sapiens.GRCh38.dna.primary_assembly.fa \
 Homo_sapiens.GRCh38.dna.primary_assembly


# RSEM calculate expression to create a "ground truth" transcriptomic profile

rsem-calculate-expression \
 -p 4 \
 --paired-end \
 --bam \
 --no-bam-output \
 --estimate-rspd \
 --append-names \
 --calc-ci \
 --single-cell-prior \
 for_sim.Aligned.toTranscriptome.out.bam \
 Homo_sapiens.GRCh38.dna.primary_assembly \
 for_sim.Aligned.toTranscriptome;


# RSEM read simulation

rsem-simulate-reads \
 Homo_sapiens.GRCh38.dna.primary_assembly \
 for_sim.Aligned.toTranscriptome.stat/for_sim.Aligned.toTranscriptome.model \
 for_sim.Aligned.toTranscriptome.results \
 0.5 \
 10000000 \
 sim_10m \
 --seed 0

What exactly is the problem? I see in the code that denom accumulates theta, which (at 0.5) is greater than the EPSILON defined in utils.h at 1e-300. The only thing I could think of is that denom is initialized to 0, and the function is skipping the loop that increments denom with theta. But I'm not sure why it would be doing this.

Reply all

Reply to author

Forward

0 new messages