Headless is hanging

117 views
Skip to first unread message

Charles

unread,
Mar 14, 2021, 4:37:50 PM3/14/21
to netlogo-devel
I'm having a problem with running NetLogo headless (through netlogo-headless.sh as described in the BehaviorSpace documentation).  I'm running NL6.2.0 on a slurm grid, under Ubuntu 18.04, with Open JDK 1.8.0 (and alternatively OpenJDK 11.0.10).  The BehaviorSpace runs complete as expected, but java never returns control to the shell that calls it, but keeps on running.  NetLogo seems not to be signaling that the job is finished.  I wonder if this has been encountered before?

Thanks,
Charles

Seth Tisue

unread,
Mar 16, 2021, 7:04:25 PM3/16/21
to netlogo-devel
Doesn't sound familiar.

This happens with any model? Or only with some model in particular? That's probably the most important single information to have.

You could ask the JVM for a thread dump to find out where it's hanging.

Seth

Charles

unread,
Mar 17, 2021, 1:07:47 PM3/17/21
to netlogo-devel
With any model, Seth.  Even the most simple, and even when I issue the java command directly from the command line.  E.g.,

cstaelin@gridmgr:~/scratch/staelinlab/Research/Tests$ java -Xmx1024m -Dfile.encoding=UTF8 -cp /gridapps/software/NetLogo/6.1.0-64/app/netlogo-6.1.0.jar org.nlogo.headless.Main --model gridtest6.1.0.nlogo --experiment Test &

(The same problem with 6.2.0 as well.)

I'll investigate the thread dump.

Thanks,
Charles

jeremy...@northwestern.edu

unread,
Mar 17, 2021, 2:57:58 PM3/17/21
to netlogo-devel
Hey Charles,

I did a quick test run on Ubuntu 20.04 with OpenJDK 1.8.0_282 using the command you gave with a different model (Wolf Sheep Predation with a simple experiment), and everything ran okay; it stopped as expected and created the output when I used the `--spreadsheet` option.  

I'd be curious if you see any difference in behavior if you use the `--threads 1` switch to keep things single-threaded.

-Jeremy

Seth Tisue

unread,
Mar 17, 2021, 4:50:16 PM3/17/21
to netlogo-devel
Does it generate results and then hang, or does it hang before any runs even occur...?

Seth

Charles

unread,
Mar 18, 2021, 10:13:33 AM3/18/21
to netlogo-devel
All the results are generated and the files (if any) are written.  Here is the java thread dump of the hanging process text file attached).  It looks similar for all the models I've tried.  Clearly java is waiting for something to clean up and close.  I will try Jeremy's suggestion.  

Is it possible that it is trying to read/write to a directory to which I don't have access?

Charles

stack10.txt

Charles

unread,
Mar 18, 2021, 1:27:42 PM3/18/21
to netlogo-devel
Hi Jeremy,

Alas, it also works on my VirtualBox Ubuntu installation on my Windows machine.  There seems to be something specific to the grid that I am working on that does not affect any other java programs that I create and run.  I've tried NetLogo with and without writing to files and with just a single thread, and java always hangs after completing the run(s).  Luckily I can kill the java process without losing any of my NetLogo results, so I can make progress.  It it sure would be nice to know why java hangs.  I don't have admin privs on this grid and, indeed, I don't have write access to the NetLogo installation, but that should not matter, should it?  

Charles

Seth Tisue

unread,
Mar 18, 2021, 1:52:33 PM3/18/21
to netlogo-devel
There are no NetLogo threads in the thread dump, so this is rather mystifying.

The thread dump says "Full thread dump OpenJDK 64-Bit Server VM (11.0.2+9 mixed mode)", is JDK 11 also what you're doing your other testing with...? (I'm grasping at straws.)

Seth

Charles

unread,
Mar 18, 2021, 3:54:59 PM3/18/21
to netlogo-devel
I've run this under both JDK 11 and 1.8 (Java HotSpot(TM) 64-Bit Server VM (25.281-b09 mixed mode)), with the same results (attached).  I know nothing abut reading these dumps, but the  "DestroyJavaVM" is referencing java.util.prefs.FileSystemPreferences$4$1.  I'm wondering if I run this (or have my system manager run it) with full root privs would be worth a try,  just to make sure it's not a privilege problem.  Another straw!

Charles

stack5.txt

jeremy...@northwestern.edu

unread,
Mar 18, 2021, 4:30:37 PM3/18/21
to netlogo-devel
It could some kind of access issue, but my expectation would be that it would just bomb out completely if it tried to open or write to something that you don't have permissions for.  But I can't say for sure.

I think you're onto something with the FileSystemPreferences, those are the default Preferences used by Java on Linux, and NetLogo does read and write to them for a few things.  It's possible to set a different Preferences handler by Java switch, so I'll see if I can make up a dummy one you can try out.  If it doesn't hang, at least we'll know the culprit even if we don't know why.

-Jeremy

Charles

unread,
Mar 22, 2021, 11:33:40 AM3/22/21
to netlogo-devel
Hi Jeremy.  New information: I was able to work with my system administrator this morning to run NetLogo with root privilege.  It worked!  So, it may well be, then, that there is something peculiar to our grid's configuration that is preventing Java from cleaning up after NetLogo completes its job.  We did notice that there was a .NFSxxxxx file that was locked while java was hanging and that it disappeared when he killed the job.  (That hang seemed to arise from NetLogo itself being unable to delete a user file due to a permissions problem, which we fixed.)  

Do you have any suggestions as to where we should look for what NetLogo is trying to do that java can not clean up without root priviledge?  Any other tests we might make?

Thanks,
Charles

jeremy...@northwestern.edu

unread,
Mar 22, 2021, 4:11:20 PM3/22/21
to netlogo-devel
Hey Charles,

One thing I found in poking around at `FileSystemPreferences` is that encrypted home directories can cause problems for them due to path length limits.  Do you know if your home directories on encrypted on the server you're using?  I believe it uses the `user.home` Java property as the place to place the `.userPrefs/` folder (maybe under `.java/`), so if the home folder location is the problem, setting that property to a non-encrypted location might fix things.  https://stackoverflow.com/a/13847337/1246011

In either case, to test out whether or not the file system prefs are the issue I made a dummy one that doesn't do anything.  It shouldn't matter for what you're doing with BehaviorSpace headlessly.  You can download the file here: https://github.com/LaCuneta/DummyPreferences/blob/main/dummypreferences_2.13-0.0.1.jar, or if you prefer you can clone that repo and build it yourself using `sbt package` to avoid using a binary file from a stranger on the internet on your server :-) (it'll be in `target/scala-2.13/` after packaging).

To use it you'll need to add it to your classpath and then set the `java.util.prefs.PreferencesFactory` Java option.  Here is an example of what I ran to use it:

```
java -Xmx1024m -Dfile.encoding=UTF8 -Djava.util.prefs.PreferencesFactory=org.nlogo.DummyPreferencesFactory -cp "/path/to/NetLogo 6.2.0/app/netlogo-6.2.0.jar:/path/to/dummypreferences_2.13-0.0.1.jar" org.nlogo.headless.Main --model ~/WolfSheepPredation.nlogo --experiment energy --spreadsheet ~/WSP-energy.csv
```

To set the classpath I had to use colons to delimit the jars, surround the string in double quotes, and then the full, absolute paths instead of the home shortcut, `~`, as it doesn't work inside the quotes (dunno if you were using that or not).

If that helps, then we at least know the file prefs are the cause, even if we don't exactly know why.  It could be the encrypted home drive issue, or possibly something else.  In either case, let me know what you find.  Thanks.

-Jeremy

Charles

unread,
Mar 24, 2021, 12:39:08 PM3/24/21
to netlogo-devel
Jeremy,

You are a genius!  That did the trick.  I've yet to test it on something really big, but all my small tests were successful.  Now, of course, the question is why.  (By the way, the directories on my Ubuntu installation are not encrypted.)

Let me know if I can help track down what the problem is with the file preferences, although I suspect that you have a few other things to do as well.

Many thanks,
Charles

jeremy...@northwestern.edu

unread,
Mar 25, 2021, 9:21:25 AM3/25/21
to netlogo-devel
Nice, I'm glad we at least have a better workaround!  Debugging this would be tough as the code that's hanging is inside the `FileSystemPreferences`, not anything in the NetLogo code.  I'll make a note to look for any existing issues with it that sound similar, and I'll reach out if I need any tests done.  Thanks for the offer and for following up.

-Jeremy

Reply all
Reply to author
Forward
0 new messages