java.lang.OutOfMemoryError: Requested array size exceeds VM limit

424 views
Skip to first unread message

Matthew MacManes

unread,
Dec 20, 2014, 1:23:04 PM12/20/14
to next...@googlegroups.com
Having an issue with a pipeline. Seems to be running out or memory as per the subject line.

My config file is an attempt to increase heap space:

env {
  NXF_OPTS="-Xmx=200g"
}

and my error message is like this:

time nextflow test.nf --in long1-8.sam.gz -c /home/macmanes/nextflow.config
N E X T F L O W  ~  version 0.11.4
[warm up] executor > local
[03/699dd3] Submitted process > splitSequences (1)
Exception in thread "Thread-2" java.lang.OutOfMemoryError: Requested array size exceeds VM limit
        at java.util.Arrays.copyOf(Arrays.java:2367)
        at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:130)
        at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:114)
        at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:535)
        at java.lang.StringBuilder.append(StringBuilder.java:204)
        at org.codehaus.groovy.runtime.IOGroovyMethods.getText(IOGroovyMethods.java:865)
        at org.codehaus.groovy.runtime.NioGroovyMethods.getText(NioGroovyMethods.java:420)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.codehaus.groovy.runtime.metaclass.ReflectionMetaMethod.invoke(ReflectionMetaMethod.java:51)
        at org.codehaus.groovy.runtime.metaclass.NewInstanceMetaMethod.invoke(NewInstanceMetaMethod.java:54)
        at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:324)
        at groovy.lang.MetaClassImpl.getProperty(MetaClassImpl.java:1844)
        at groovy.lang.MetaClassImpl.getProperty(MetaClassImpl.java:3690)
        at groovy.lang.DelegatingMetaClass.getProperty(DelegatingMetaClass.java:128)
        at org.codehaus.groovy.runtime.InvokerHelper.getProperty(InvokerHelper.java:173)
        at org.codehaus.groovy.runtime.callsite.PojoMetaClassGetPropertySite.getProperty(PojoMetaClassGetPropertySite.java:33)
        at org.codehaus.groovy.runtime.callsite.AbstractCallSite.callGetProperty(AbstractCallSite.java:227)
        at nextflow.processor.TaskProcessor$_bindOutputs_closure16.doCall(TaskProcessor.groovy:878)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:90)
        at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:324)
        at org.codehaus.groovy.runtime.metaclass.ClosureMetaClass.invokeMethod(ClosureMetaClass.java:278)
        at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1016)
        at groovy.lang.Closure.call(Closure.java:423)
        at org.codehaus.groovy.runtime.DefaultGroovyMethods.callClosureForMapEntry(DefaultGroovyMethods.java:4271)
        at org.codehaus.groovy.runtime.DefaultGroovyMethods.each(DefaultGroovyMethods.java:1408)


the .nextflow.log

Dec-20 13:14:40.646 [main] DEBUG nextflow.cli.Launcher - $> /share/bin/nextflow test.nf --in long1-8.sam.gz -c /home/macmanes/nextflow.config
Dec-20 13:14:40.696 [main] INFO  nextflow.cli.CmdRun - N E X T F L O W  ~  version 0.11.4
Dec-20 13:14:40.710 [main] DEBUG nextflow.script.ConfigBuilder - User config file: /home/macmanes/nextflow.config
Dec-20 13:14:40.712 [main] DEBUG nextflow.script.ConfigBuilder - Parsing config file: /home/macmanes/nextflow.config
Dec-20 13:14:41.059 [main] DEBUG nextflow.Session - Session uuid: 62f256a6-57fc-4162-bc58-01c5d8adc069
Dec-20 13:14:41.061 [main] DEBUG nextflow.Session - Executor pool size: 63
Dec-20 13:14:41.073 [main] DEBUG nextflow.cli.CmdRun -
  Version: 0.11.4 build 2496
  Modified: 16-12-2014 16:56 UTC (11:56 EDT)
  System: Linux 3.13.0-43-generic
  Runtime: Groovy 2.3.8 on OpenJDK 64-Bit Server VM 1.7.0_65-b32
  Encoding: UTF-8 (UTF-8)
  Address: davinci [127.0.1.1]

Dec-20 13:14:41.081 [main] DEBUG nextflow.Session - Session start > phaser register (session)
Dec-20 13:14:41.086 [main] DEBUG nextflow.processor.TaskDispatcher - Dispatcher > start
Dec-20 13:14:41.086 [main] DEBUG nextflow.script.ScriptRunner - > Script parsing
Dec-20 13:14:41.236 [main] DEBUG nextflow.script.ScriptRunner - > Launching execution
Dec-20 13:14:41.317 [main] DEBUG nextflow.processor.ProcessFactory - << taskConfig executor: null
Dec-20 13:14:41.318 [main] DEBUG nextflow.processor.ProcessFactory - >> processorType: 'local'
Dec-20 13:14:41.325 [main] DEBUG nextflow.executor.Executor - Initializing executor: local
Dec-20 13:14:41.327 [main] INFO  nextflow.executor.Executor - [warm up] executor > local
Dec-20 13:14:41.335 [main] DEBUG n.processor.TaskPollingMonitor - Creating task monitor for executor 'local' > capacity: 63; pollInterval: 100ms; dumpInterval: 5m
Dec-20 13:14:41.343 [main] DEBUG nextflow.processor.TaskDispatcher - Starting monitor: TaskPollingMonitor
Dec-20 13:14:41.344 [main] DEBUG n.processor.TaskPollingMonitor - >>> phaser register (scheduler)
Dec-20 13:14:41.351 [main] DEBUG nextflow.executor.Executor - Invoke register for executor: local
Dec-20 13:14:41.404 [main] DEBUG nextflow.script.BaseParam - output > channel unknown: records -- creating a new instance
Dec-20 13:14:41.428 [main] DEBUG n.processor.ParallelTaskProcessor - Creating operator > splitSequences -- maxForks: 63
Dec-20 13:14:41.440 [main] DEBUG nextflow.Session - >>> phaser register (process)
Dec-20 13:14:41.446 [main] DEBUG nextflow.processor.ProcessFactory - << taskConfig executor: null
Dec-20 13:14:41.446 [main] DEBUG nextflow.processor.ProcessFactory - >> processorType: 'local'
Dec-20 13:14:41.447 [main] DEBUG nextflow.executor.Executor - Initializing executor: local
Dec-20 13:14:41.448 [main] DEBUG nextflow.script.BaseParam - output > channel unknown: out -- creating a new instance
Dec-20 13:14:41.460 [main] DEBUG n.processor.ParallelTaskProcessor - Creating operator > reverse -- maxForks: 63
Dec-20 13:14:41.461 [main] DEBUG nextflow.Session - >>> phaser register (process)
Dec-20 13:14:41.463 [main] DEBUG nextflow.processor.ProcessFactory - << taskConfig executor: null
Dec-20 13:14:41.463 [main] DEBUG nextflow.processor.ProcessFactory - >> processorType: 'local'
Dec-20 13:14:41.463 [main] DEBUG nextflow.executor.Executor - Initializing executor: local
Dec-20 13:14:41.464 [main] DEBUG nextflow.script.BaseParam - output > channel unknown: flag -- creating a new instance
Dec-20 13:14:41.476 [main] DEBUG n.processor.ParallelTaskProcessor - Creating operator > flagstat -- maxForks: 63
Dec-20 13:14:41.476 [main] DEBUG nextflow.Session - >>> phaser register (process)
Dec-20 13:14:41.483 [main] DEBUG nextflow.script.ScriptRunner - > Await termination
Dec-20 13:14:41.525 [Actor Thread 3] DEBUG nextflow.Session - Script base path does not exist or is not a directory: /mouse/Mya/abyss/k71/bin
Dec-20 13:14:41.564 [Actor Thread 3] INFO  nextflow.processor.TaskDispatcher - [6f/2db799] Submitted process > splitSequences (1)
Dec-20 13:19:13.634 [Thread-2] DEBUG n.processor.TaskPollingMonitor - <<< phaser de-register (scheduler)

Any ideas what is going on? 100% sure that system memory is not being exhausted.

Paolo Di Tommaso

unread,
Dec 20, 2014, 1:32:56 PM12/20/14
to nextflow
Can you share the splitSequences process' code that is raising that error?

Cheers, p 



--
You received this message because you are subscribed to the Google Groups "Nextflow" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nextflow+u...@googlegroups.com.
Visit this group at http://groups.google.com/group/nextflow.
For more options, visit https://groups.google.com/d/optout.

Matthew MacManes

unread,
Dec 20, 2014, 1:35:06 PM12/20/14
to next...@googlegroups.com
Sure, 

#!/usr/bin/env nextflow

params.in = "test.fa"
sequences = file(params.in)

/*
 * split a fasta file in multiple files
 */
process splitSequences {

    input:
    file 'input' from sequences

    output:
    stdout records

    """
    gzip -cd input
    """
}


Paolo Di Tommaso

unread,
Dec 20, 2014, 1:59:50 PM12/20/14
to nextflow
Hi Matthew, 

OK, the problem is that, when using "stdout" as output it will try to load all the standard output produced by the process in memory, so it will crash.

You can solve this by simply using a file output, for example: 

process splitSequences {

    input:
    file 'input' from sequences

    output:
    file 'chunk' into records

    """
    gzip -cd input > chunk
    """


Also note that increasing the JVM memory will not solve the problem because Java strings cannot be larger than 2GB. One more thing, NXF_OPTS variable must be defined in the launcher environment to be effective. The env scope in the nextflow.config file is meant to setup the environment for the tools in your pipeline, not for Nextflow itself. 


Hope this helps. 


Thanks a lot for your feedback, 
Paolo
Reply all
Reply to author
Forward
0 new messages