Seeking Advice on HERON Simulation (output.xml) Error with Parallelization

9 views
Skip to first unread message

So-Bin Cho

unread,
Jul 20, 2023, 3:49:11 PM7/20/23
to INL RAVEN Users Group
Hi Raven Community,

I am reaching out to seek your advice on an error I encountered while running the HERON output.xml file with the <internalparallel> node. I executed them in <sweep> mode and noticed that the jobs closed in the middle of the simulation.

I am uncertain whether a specific combination of <grid> values triggered the error or if the parallelization itself caused the issue. Strangely, the error seems to occur randomly, even when I submitted the same output.xml file multiple times.

I understand that the current raven supports the <parallelmethod> node. Would it be beneficial to use either ray or dask for parallelization to potentially avoid this error? For your reference, I am enclosing the logfiles from the closed jobs. Your guidance and insights are greatly appreciated.

Thank you for your time and support!

So-Bin
STDIN (1).o5578737
STDIN (1).o5578736
STDIN (1).e5578737
STDIN (1).e5578736

Paul W. Talbot

unread,
Jul 20, 2023, 3:52:34 PM7/20/23
to So-Bin Cho, INL RAVEN Users Group

Hi So-Bin,

 

Joshua is out this week, but may have a better answer to your question. For now, from what I understand, “dask” is the default parallelization library and is used with internalParallel as True. We’re migrating towards using parallelmethod instead of internalParallel. So far, “dask” is more reliable than “ray” for nested parallel computations on our HPC.

 

Thanks,

 

Paul

 

--
You received this message because you are subscribed to the Google Groups "INL RAVEN Users Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to inl-raven-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/inl-raven-users/253e4c7d-8f5c-46b5-b963-5865faf8a02cn%40googlegroups.com.

Joshua J. Cogliati

unread,
Jul 26, 2023, 2:37:34 PM7/26/23
to INL RAVEN Users Group, So-Bin Cho
Hm, it looks like one of the runs failed:
Traceback (most recent call last):
  File "/home/chos/tes/raven/raven_framework.py", line 25, in <module>
    sys.exit(main(True))
  File "/home/chos/tes/raven/ravenframework/Driver.py", line 207, in main
    raven()
  File "/home/chos/tes/raven/ravenframework/Driver.py", line 160, in raven
    simulation.run()
  File "/home/chos/tes/raven/ravenframework/Simulation.py", line 901, in run
    self.executeStep(stepInputDict, stepInstance)
  File "/home/chos/tes/raven/ravenframework/Simulation.py", line 834, in executeStep
    stepInstance.takeAstep(stepInputDict)
  File "/home/chos/tes/raven/ravenframework/Steps/Step.py", line 317, in takeAstep
    self._localTakeAstepRun(inDictionary)
  File "/home/chos/tes/raven/ravenframework/Steps/MultiRun.py", line 178, in _localTakeAstepRun
    myLambda([finishedJob,outputs[outIndex]])
  File "/home/chos/tes/raven/ravenframework/Steps/MultiRun.py", line 109, in <lambda>
    self._outputCollectionLambda.append( (lambda x: inDictionary['Model'].collectOutput(x[0],x[1]), outIndex) )
  File "/home/chos/tes/raven/ravenframework/Models/Code.py", line 773, in collectOutput
    self._replaceVariablesNamesWithAliasSystem(evaluation, 'input',True)
  File "/home/chos/tes/raven/ravenframework/Models/Model.py", line 305, in _replaceVariablesNamesWithAliasSystem
    found = sampledVars.pop(whichVar,[notFound])
AttributeError: 'Error' object has no attribute 'pop'

but I didn't see the error from any of the runs, so I am not sure what is causing it.  If you are using Raven Runs Raven, there will be some files out~inner in subdirectories that might have a better error.
Also, using changing your simulation node to include debug: <Simulation verbosity="debug">  will increase the amount of debugging information output.

Joshua



From: 'So-Bin Cho' via INL RAVEN Users Group <inl-rav...@googlegroups.com>
Sent: Thursday, July 20, 2023 1:49 PM

To: INL RAVEN Users Group <inl-rav...@googlegroups.com>
Subject: [EXTERNAL] Seeking Advice on HERON Simulation (output.xml) Error with Parallelization
--
Reply all
Reply to author
Forward
0 new messages