how to reduce memory resources by sharing genome

127 views
Skip to first unread message

Alex Lee

unread,
Nov 24, 2016, 12:45:44 PM11/24/16
to STAR-Fusion
Hi, 
so as I'm running my samples, either through line command or through a job schedule is there a way I can have the samples running in parallel share the genome that has already been loaded or is there a command I can use to preload the genome first into memory. 

So for example, say I have 3 samples. 

I run the first one and the first few steps is .... "loading genome" and I can see the memory usage go up. 
If I run the 2nd sample from a fork would it know to just wait for the same genome that is being loaded? or say I run it 10 mins later when the first sample has already loaded would the second sample be able to utilize this?  

It gets more complicated when I submit these a job to something like an sge cluster. 

I guess what I'm trying to do it to reduce the memory resources as I run samples in parallel. 

thanks! 
Alex

Brian Haas

unread,
Nov 24, 2016, 3:14:36 PM11/24/16
to Alex Lee, STAR-Fusion, Dobin, Alexander
I've CC'd Alex Dobin for comment.   Based on his recommendations, we could certainly explore this.

~b

--
You received this message because you are subscribed to the Google Groups "STAR-Fusion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to star-fusion+unsubscribe@googlegroups.com.
To post to this group, send email to star-...@googlegroups.com.
Visit this group at https://groups.google.com/group/star-fusion.
To view this discussion on the web visit https://groups.google.com/d/msgid/star-fusion/a6683c60-e4a4-460f-993a-7501b25dd462%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
--
Brian J. Haas
The Broad Institute
http://broadinstitute.org/~bhaas

 

Alex Lee

unread,
Nov 24, 2016, 5:04:24 PM11/24/16
to STAR-Fusion
Thanks Brian looking forward to it.  Would be a great addition since I normally run a bunch of samples in one go;  I figure if this works I could simply spin up something like c4.8xlarge with 36 cores but only 60 GB memory on awis, 
happy thanksgiving! 
Alex

On Thursday, November 24, 2016 at 12:14:36 PM UTC-8, Brian Haas wrote:
I've CC'd Alex Dobin for comment.   Based on his recommendations, we could certainly explore this.

~b
On Thu, Nov 24, 2016 at 12:45 PM, Alex Lee <simp...@gmail.com> wrote:
Hi, 
so as I'm running my samples, either through line command or through a job schedule is there a way I can have the samples running in parallel share the genome that has already been loaded or is there a command I can use to preload the genome first into memory. 

So for example, say I have 3 samples. 

I run the first one and the first few steps is .... "loading genome" and I can see the memory usage go up. 
If I run the 2nd sample from a fork would it know to just wait for the same genome that is being loaded? or say I run it 10 mins later when the first sample has already loaded would the second sample be able to utilize this?  

It gets more complicated when I submit these a job to something like an sge cluster. 

I guess what I'm trying to do it to reduce the memory resources as I run samples in parallel. 

thanks! 
Alex

--
You received this message because you are subscribed to the Google Groups "STAR-Fusion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to star-fusion...@googlegroups.com.

Brian Haas

unread,
Nov 24, 2016, 6:56:55 PM11/24/16
to Alex Lee, STAR-Fusion
Great point!   I think we'll be able to make this work.

Happy ThxGiving to you too!

~b

To unsubscribe from this group and stop receiving emails from it, send an email to star-fusion+unsubscribe@googlegroups.com.

To post to this group, send email to star-...@googlegroups.com.
Visit this group at https://groups.google.com/group/star-fusion.

For more options, visit https://groups.google.com/d/optout.

Alexander Dobin

unread,
Nov 28, 2016, 3:16:00 PM11/28/16
to STAR-Fusion
Hi Alex, Brian,

it is possible to run STAR on multiple samples with genome in the shared memory. This is done by adding --genomeLoad LoadAndKeep option for each run.

Cheers
Alex

Brian Haas

unread,
Nov 28, 2016, 4:18:58 PM11/28/16
to Alexander Dobin, STAR-Fusion
I'll just add this as an option and pass it through to the STAR run.  It'll be in the devel code shortly.

~b

To unsubscribe from this group and stop receiving emails from it, send an email to star-fusion+unsubscribe@googlegroups.com.

To post to this group, send email to star-...@googlegroups.com.
Visit this group at https://groups.google.com/group/star-fusion.

For more options, visit https://groups.google.com/d/optout.

Brian Haas

unread,
Nov 28, 2016, 4:27:46 PM11/28/16
to Alexander Dobin, STAR-Fusion

If you pull the latest code from github:

   git clone --recursive https://github.com/STAR-Fusion/STAR-Fusion.git

then you can run STAR-Fusion with the new '--STAR_use_shared_memory' flag.

It'll go into the next release.

best,

~b



On Mon, Nov 28, 2016 at 3:16 PM, Alexander Dobin <ado...@gmail.com> wrote:
>

Alex Lee

unread,
Nov 28, 2016, 6:45:03 PM11/28/16
to STAR-Fusion
Thanks Alex and Brian, 
I'll test this soon!  I will test this later but by any chance, what if the second run is executed before the 1st run finish loading? 

STAR-Fusion --genomeLoad LoadAndKeep ...blah blah blah
STAR-Fusion --genomeLoad LoadAndKeep ...blah blah blah
STAR-Fusion --genomeLoad LoadAndRemove ...blah blah blah

So in this example if the 2nd and 3rd runs are executed seconds after the 1st so that there is not enough time to completely load the genome, will it "know" to wait or do I have to wait till the 1st run has completely loaded the genome first? 

thanks this is super duper !
Alex

Brian Haas

unread,
Nov 28, 2016, 7:19:07 PM11/28/16
to Alex Lee, STAR-Fusion
I read that you can just spawn the processes in parallel (but have at least a second between them) and the later processes will wait till the first one has done loading the genome.

Once you're all done, just run:

   STAR --genomeLoad Remove

and that should pull it from memory.

Alex D. - please correct. :-)

best,

~b

Alex Lee

unread,
Nov 29, 2016, 2:23:45 PM11/29/16
to STAR-Fusion
Thanks Brian, 
So tried it the --STAR_use_shared_memory option but I got the following error. 

EXITING because of fatal PARAMETERS error: 2-pass method is not compatible with genomeLoad<<LoadAndKeep
SOLUTION: re-run STAR with --genomeLoad NoSharedMemory ; this is the only compatible option at the moment.s

Did I do something weird? I can see from the output that STAR was invoked with --genomeLoad LoadAndKeep  

My command was the following: 
STAR-Fusion --genome_lib_dir /alex/index/GRCh37_gencode_v19_CTAT_lib/ --left_fq /alex/data/s1_1.fq.gz --right_fq /alex/data/s2_2.fq.gz --output_dir /alex/data/star_fuse/s1 --STAR_use_shared_memory

thanks! 
A
Alex

Alexander Dobin

unread,
Nov 29, 2016, 2:30:07 PM11/29/16
to STAR-Fusion
Hi Alex, Brian,

sorry, I forgot that STAR-Fusion uses the 2-pass alignment, which is incompatible with shared memory, since in the 2nd pass the genome is modified specifically for each of the runs, so the genomes cannot be shared between the samples.
Brian, would it be possible to make the 2-pass optional?

Cheers
Alex

Brian Haas

unread,
Nov 29, 2016, 2:53:12 PM11/29/16
to Alexander Dobin, STAR-Fusion
I just updated the dev code so that twopass mode is disabled when shared mem is being used.  

I'm not sure how this will impact the fusion results, but hopefully it'll be minor.  I'll aim to look into it later on.  Alex L - you might want to run some tests w/ and w/o using shared mem to see if it makes much of a difference as far as your fusion predictions.

Just  'git pull' and it should give you the latest code.  Just rerun your earlier command and it'll hopefully work.

best,

~brian

To unsubscribe from this group and stop receiving emails from it, send an email to star-fusion+unsubscribe@googlegroups.com.

To post to this group, send email to star-...@googlegroups.com.
Visit this group at https://groups.google.com/group/star-fusion.

For more options, visit https://groups.google.com/d/optout.

Alex Lee

unread,
Nov 29, 2016, 9:32:18 PM11/29/16
to STAR-Fusion
Hi Brian, 
do you know how much smaller second pass genome load is?  Is it possible, then to just share during the initial STAR alignment but not second pass?  Would this be a good compromise  in terms of memory balance load?
also, thank you so much for your help in this. 
Alex  

Brian Haas

unread,
Nov 29, 2016, 10:07:03 PM11/29/16
to Alex Lee, STAR-Fusion
Hi Alex,

I'm pretty sure the two-pass method is baked into STAR so it's not so easy to decouple them (Alex D can confirm or refute this).   We've been debating for a while on whether or not to even bother doing the two-pass approach as part of STAR-Fusion.  It'd be more efficient to just not do it...  I just never got around to exploring it (lack of time, not interest).   Most of our computes are split across a compute farm and so there isn't much use of shared-memory in that case, and unfortunately I don't think it's going to work with our FusionInspector tool either.  

~b

To unsubscribe from this group and stop receiving emails from it, send an email to star-fusion+unsubscribe@googlegroups.com.

To post to this group, send email to star-...@googlegroups.com.
Visit this group at https://groups.google.com/group/star-fusion.

For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages