Scratch directory on SGE

457 views
Skip to first unread message

Charles Murphy

unread,
Apr 27, 2016, 9:55:51 AM4/27/16
to Nextflow
I' am running Nextflow on an SGE cluster. I know there is the option to add scratch to have the process create the output files in $TMPDIR on the worker node, but is there a way to have it copy all input files to $TMPDIR before the process runs? I don't want to needlessly stress the I/O on the head node with many jobs. In my case, the worker nodes and head node have different file systems. Thanks!

Paolo Di Tommaso

unread,
Apr 27, 2016, 10:31:42 AM4/27/16
to nextflow
Hi, 

No this option is not provided out out the box. However I'm wondering if there's really a benefit on doing that. You will have all the node accessing in read mode the input files to copy them to each task scratch directory. 

Cheers,
Paolo

On Wed, Apr 27, 2016 at 3:55 PM, Charles Murphy <murphy....@gmail.com> wrote:
I' am running Nextflow on an SGE cluster. I know there is the option to add scratch to have the process create the output files in $TMPDIR on the worker node, but is there a way to have it copy all input files to $TMPDIR before the process runs? I don't want to needlessly stress the I/O on the head node with many jobs. In my case, the worker nodes and head node have different file systems. Thanks!

--
You received this message because you are subscribed to the Google Groups "Nextflow" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nextflow+u...@googlegroups.com.
Visit this group at https://groups.google.com/group/nextflow.
For more options, visit https://groups.google.com/d/optout.

Charles Murphy

unread,
Apr 27, 2016, 10:57:52 AM4/27/16
to Nextflow
Well I' am not too familiar with the technicalities of I/O. Is reading less computationally expensive than writing? But besides not overloading the head node, there might be a cost associated with transferring the data over the network between head and job node.

Either way, I'll give it a shot without that feature. Maybe you're right and there is no benefit to such a feature.

Paolo Di Tommaso

unread,
Apr 27, 2016, 11:37:05 AM4/27/16
to nextflow
Surely to copy you need to read them. 

There's a benefit if you tasks are accessing the input files randomly in an intensive manner. But this is not a common usage scenario. 

However quite surely you will *not* overload the head node (unless you are running an ephemeral cluster in the cloud using tools such as StarCluster or Elasticluster). 

In any case please report the result of your tests. It could make sense to add this feature to nextflow. 


Cheers,
Paolo

Charles Murphy

unread,
Apr 28, 2016, 10:56:40 AM4/28/16
to Nextflow
Thanks I think you are right. Among the several algorithms I' am using I' am not quiet sure off the top of my head which would access files in a random and intensive manner. I will do some timings and see how things go.

Charles Murphy

unread,
May 1, 2016, 4:47:15 PM5/1/16
to Nextflow
So I did some timings with the STAR and Varscan tools comparing two conditions: (1) all prerequisite files were first copied over to the $TEMP directory (condition 1), and (2) the prerequisite files were NOT copied over (condition 2). I ran STAR on a small sample of reads (1 million), and Varscan on a small (1 million reads) and large (~90 million reads) dataset. A priori, I expected the two conditions for STAR to be the same, but for Varscan condition 2 to take longer (because the BAM file is streamed into the tool via stdin). Note I did not actually use nextflow for my tests because I expect condition 2 to be equivalent to what Nextflow does. All output files were saved to $TEMP.

STAR
Condition 1
Mean time: 49 minutes
Standard deviation: 20 minutes

Condition 2
Mean time: 69 minutes
Standard deviation: 7 minutes

Varscan (small dataset)
Condition 1
Mean time: 19.6 minutes
Standard deviation: 2.15 minutes

Condition 2
Mean time: 22.2 minutes
Standard deviation: 1.18 minutes

Varscan (large dataset)
Condition 1
Mean time: 60.5 minutes
Standard deviation: 6.1 minutes

Condition 2
Mean time: 62.5 minutes
Standard deviation: 7.32 minutes



I conclude here the differences in times are practically negligible.

Charles Murphy

unread,
May 1, 2016, 4:48:48 PM5/1/16
to Nextflow
Oops, those STAR timings are actually in seconds not minutes.

Paolo Di Tommaso

unread,
May 2, 2016, 5:28:16 AM5/2/16
to nextflow
Thanks a lot for this detailed benchmark. 

It's very interesting and confirms my assumption. Also, if a pipeline stages the input files in the local storage for each step you will end up reading that files twice: one to copy them locally and a second to carry out the tasks. 

I won't expect this time to double because Linux provides very efficient caching mechanism. However it's reasonable to expect that this approach (copying locally and then process them) it's slower than processing the inputs directly from the shared file system. 



Best,
Paolo

Charles Murphy

unread,
Jul 5, 2016, 8:07:56 AM7/5/16
to Nextflow
So to update this, I was recently running about 30-40 Varscan jobs and my cluster admins shut down my jobs because I was over-taxing the shared file system. In the case of the cluster I use, different file servers are used for $TMPDIR on the job nodes. And since each Varscan job needs as input 1 BAM file and 1 reference fasta, 60 files were being read simultaneously off the filesystem used by the main scheduler node for the duration of the jobs.

For now I' am just going to have the files copied over within the execution block (between the two """). But I guess my case is an example of a reason to include an option to copy files over to $TMPDIR.

Charles Murphy

unread,
Jul 5, 2016, 11:00:54 PM7/5/16
to Nextflow
To add onto this, if such a feature were made available I can see an additional feature being useful. That is, setting a rate limit on the number of jobs that can be submitted to SGE. For example, say I have 100 jobs. I know there is a feature to have only, say, 10 jobs run concurrently. What I mean is have a feature to submit 5 jobs, wait 2 minutes, submit 5 jobs, wait 2 minutes, etc... The aim again is to not overtax the file system.

Not having such a feature can pose the problem if: for example, if I' am submitted 200 jobs and each copies over a large file to the $TMPDIR, then that is 200 different processes reading accessing the same file simultaneously.

Paolo Di Tommaso

unread,
Jul 6, 2016, 6:17:10 AM7/6/16
to nextflow
Thanks for updating on this. OK, please open a feature request for this. 


Cheers,
Paolo

Paolo Di Tommaso

unread,
Jul 6, 2016, 6:17:55 AM7/6/16
to nextflow
This sounds a nice idea, please open a feature request also for this. 


Cheers,
Paolo
Reply all
Reply to author
Forward
0 new messages