rsem-run-em doesn't appear to be running multiple threads

601 views
Skip to first unread message

Shawn Driscoll

unread,
Nov 3, 2012, 7:01:15 PM11/3/12
to rsem-...@googlegroups.com
Hello,

I've just started testing RSEM for expression estimation on mouse data. I'm using the available UCSC known gene database for these tests. I've noticed that, although I've run the rsem-calculate-expression command with -p 8, when the pipeline finally arrives at rsem-run-em and begins the iterative process after what looks like parsing the reads into 8 threads the processor useage for rsem-run-em hovers around 100% instead of the expected 800% for 8 cores.  As far as the system is concerned it doesn't appear that rsem-run-em is running multiple parallel threads at all (unless for some reason each thread is only taxing each core 12.5%).

My concern is that last night I started an rsem run to quantify expression on a paired-end set of reads (approx 40 million alignments were generated) and today I checked in on the results and the process was still running and it was only on its 5th round of analysis. Am I to expect that this abundance estimation process is going to take more than a day per sample?

My system:
Mac Pro, dual Xeon 6-core processors, 32 GB ram, OSX 10.7

b...@cs.wisc.edu

unread,
Nov 4, 2012, 6:26:37 AM11/4/12
to rsem-...@googlegroups.com
Hi Shawn,

There must be something wrong. If you only have around 40 million
alignments and 6 cores, it should be finished in around 1 hour. Can you
post your commands used?

Best,
Bo

Shawn Driscoll

unread,
Nov 4, 2012, 7:12:33 AM11/4/12
to rsem-...@googlegroups.com, b...@cs.wisc.edu
Hi Bo,

Here's the command:

rsem-calculate-expression --time -p 8 --no-bam-output --paired-end \
  <(gunzip -c $left_reads) <(gunzip -c $right_reads) ~/opt/rsem_references/mm9 $1

this line was in a loop in a bash script that was crawling through my paired read folders (each folder has a pair of files). it passed the left and right read file names as $left_reads and $right_reads and the output name was passed as $1. i didn't want to have to gunzip a copy of my reads so i passed them into the call as you see there. that didn't seem to cause any trouble for bowtie and i've done it like that in the past.  i built the mm9 reference exactly as outlined in the documentation on the RSEM site. mine was based on the UCSC known gene database.

is it possible that although the compilation completed something may have not been built and i'm running a handicapped version? i'll try compiling it again and see if i catch any warnings.

i'm off to sleep for now. i plan to mess around with this a lot more in this next week. i'd like to start using RSEM as an alternative to cufflinks for isoform level expression as i've had many issues with cufflinks' quantification in the past. or, i should say, with every version of it since 2009.

shawn

Shawn Driscoll

unread,
Nov 4, 2012, 4:18:48 PM11/4/12
to rsem-...@googlegroups.com
I re compiled RSEM last night and there were no warnings except for an unused variable in a few steps. Looks good from that end.

Shawn Driscoll

unread,
Nov 5, 2012, 3:03:25 PM11/5/12
to rsem-...@googlegroups.com
i finally let one of the runs finish and it did. here's the time for EM:

Time Used for EM.cpp : 16 h 05 m 33 s

also the content of <sample_name>.time

Aligning reads: 2559 s.
Estimating expression levels: 62940 s.
Calculating credibility intervals: 0 s.

Jerod Parsons

unread,
Mar 29, 2013, 1:37:59 PM3/29/13
to rsem-...@googlegroups.com, b...@cs.wisc.edu
I have the same issue, also running a Mac Pro, OSX 10.8 + dual core i7.

I'm normally okay with the ~24h runtime, but if it should be finishing in an hour, that would be better!  Is there something to do to about this?

b...@cs.wisc.edu

unread,
Mar 29, 2013, 5:31:05 PM3/29/13
to rsem-...@googlegroups.com
Hi Jerod,

Yes, we have realized there is a problem for parallelization in RSEM for
Mac OS system. We plan to fix it soon.

Best,
Bo
> --
> You received this message because you are subscribed to the Google Groups
> "RSEM Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to rsem-users+...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>
>


Bo Li

unread,
Apr 15, 2013, 5:18:06 PM4/15/13
to rsem-...@googlegroups.com, Colin Dewey
Hi Jerod,

Please have a try of our newly released RSEM v1.2.4, which fixed a bug that leads to poor parallelization performance in RSEM for Mac OS systems. I believe that the new version will significantly improve the speed of RSEM in your server.

Best,
Bo

Bo Li

unread,
Apr 15, 2013, 5:25:38 PM4/15/13
to rsem-...@googlegroups.com, Jerod Parsons, Shawn Driscoll, Colin Dewey
Hi Shawn and Jerod,

We have just released RSEM v1.2.4, which fixed a bug for parallelization in Mac OS systems. So now it should work and please have a try. You can find the newest version at

http://deweylab.biostat.wisc.edu/rsem/

Best,
Bo
Reply all
Reply to author
Forward
0 new messages