Basically I'm running a few hundred R programs in parallel on a computer cluster. Each program needs a single cpu, no shared memory, nothing really complicated. Input and output files are all on the local node. The only library being used is sqldf which is used to run queries over data frames generated within the program.
The issue I ran into is that as more instances of R are running on the cluster, the programs all collectively grind to a halt and there is an immense amount of traffic generated to the head node. I tried installing R locally on the nodes, but this didn't make the problem go away. I used the linux command "lsof" to check for open files on the head node, but nothing came up that (to me) indicated i/o from the nodes or libraries being repeatedly loaded.
I was able to finally fix it by updating the code to close any existing sqldf connections and create a new one:
# close an old connection if it exists
# create a new connection
In the original code, no connection was initiated beyond just loading the library and then directly running queries on the data frames.
I'm not sure if what I encountered was expected behavior or not. The code runs fine on a single computer, and as I understand, sqldf should just load the database into memory, so no network traffic should be necessary.