Hi Wade,
A few notes/suggestions:
1) The small dataset size of only a few thousand rows can be more prone to harmful numerical effects due to multithreading race conditions (we use "Hogwild!" approach). I suggest trying "reproducible=TRUE" or "force_load_balance=FALSE" to disable or limit multi-threading, or, alternatively, not downsampling to such a small dataset size and keeping all threads active. One theory to explain the observed increased stability is that your transformations to binary flags increased the number of the first hidden neuron weights and hence reduced race conditions.
2) If 1) didn't help, I suggest adding "max_w2=10, l2=1e-5" to your arguments (to keep the "Rectifier" from exploding), or, alternatively, switching to the "Tanh" activation function, which is bounded naturally.
3) Once it appears stable, I suggest also trying a "better" network, i.e., one with more hidden neurons and training for more epochs:
hidden = list(c(50,50), c(100,100), c(200,200), c(200,200,200)),
epochs = c(1,10,100,1000)
Note you'll still need grid search for the epochs when doing cross-validation, as all cv-models train up to the specified number of epochs, and there is no early stopping based on validation error even if replace_with_best_model=TRUE - which can otherwise lead to overfitting on the validation set (if specified) for small data.
Hope this helps, please let me know if you have questions,
Arno