Hi,
I have been attempting to create a synthetic data set with a ground truth. I am basing this ground truth from real samples. To this end, I aligned some PE75 RNASeq libraries with STAR, and passing it through RSEM (version 1.3.3) for read simulation. However, at the read simulation step, I get the following error:
Refs.loadRefs finished!
rsem-simulate-reads: simulation.cpp:113: void simulate(char*, char*) [with ReadType = PairedEndReadQ; ModelType = PairedEndQModel]: Assertion `denom > EPSILON' failed.
Aborted
There were now previous errors in the STAR alignment nor the RSEM calculate expression steps.
To this effect, here is what I did:
# STAR alignment
STAR \
--readFilesIn r1.fq.gz r2.fq.gz \
--genomeDir ~/ref/hs \
--outFileNamePrefix for_sim. \
--twopassMode Basic \
--runThreadN 16 \
--readFilesCommand zcat \
--sjdbGTFfile Homo_sapiens.GRCh38.95.gtf \
--quantMode TranscriptomeSAM \
--outFilterType BySJout \
--outSAMtype BAM SortedByCoordinate \
--outReadsUnmapped Fastx
# RSEM reference build
rsem-prepare-reference \
--gtf Homo_sapiens.GRCh38.95.gtf \
Homo_sapiens.GRCh38.dna.primary_assembly.fa \
Homo_sapiens.GRCh38.dna.primary_assembly
# RSEM calculate expression to create a "ground truth" transcriptomic profile
rsem-calculate-expression \
-p 4 \
--paired-end \
--bam \
--no-bam-output \
--estimate-rspd \
--append-names \
--calc-ci \
--single-cell-prior \
for_sim.Aligned.toTranscriptome.out.bam \
Homo_sapiens.GRCh38.dna.primary_assembly \
for_sim.Aligned.toTranscriptome;
# RSEM read simulation
rsem-simulate-reads \
Homo_sapiens.GRCh38.dna.primary_assembly \
for_sim.Aligned.toTranscriptome.stat/for_sim.Aligned.toTranscriptome.model \
for_sim.Aligned.toTranscriptome.results \
0.5 \
10000000 \
sim_10m \
--seed 0
What exactly is the problem? I see in the code that denom accumulates theta, which (at 0.5) is greater than the EPSILON defined in utils.h at 1e-300. The only thing I could think of is that denom is initialized to 0, and the function is skipping the loop that increments denom with theta. But I'm not sure why it would be doing this.