1) Not sure on that one, but I also have not yet ported my large nonlinear models over to JuMP yet for detailed benchmarking. I know reducing the setup time and memory consumption for nonlinear models is on the JuMP team's to-do list, but the second-best-way to help with that is to provide reproducible example code. Can you replace any sensitive data with made-up inputs just to test the solver in a way that you could share? The best solution would be patches to address the problem, but that's much harder.
2) Are you watching the output from Ipopt? Does it even start the optimization process, or is the setup itself seeming to take a long time? Can you provide a breakdown of setup time vs Ipopt linear solver time vs function callback timing, perhaps for smaller instances of your problem that are able to run? There are too many unknowns when it comes to large nonlinear optimization models so you'll need to provide more information about what behavior you're seeing. I recommend setting the Ipopt option "print_timing_statistics" to "yes," this will give you a detailed timing breakdown but only if the optimization actually terminates (successfully, or via "max_iter" or "max_cpu_time").
If you want to solve a single very large problem, the version of Ipopt interfaced by Julia is not capable of parallelizing across multiple nodes of a distributed-memory cluster. There is an experimental MPI branch in the Ipopt repository, but to my knowledge it has not been hooked up to Julia, and the scalability results even from C++ or AMPL were not very encouraging. If you use a linear solver other than MUMPS, you can however parallelize the Newton step direction at each Ipopt iteration using shared-memory multithreading. But we'd have to know whether the Newton step direction is actually the bottleneck for your problems. In C++ or AMPL it often is, but your models may be taxing the JuMP auto-differentiation implementation to an unusual extent.
You can set the Ipopt option "hessian_approximation" to "limited-memory" to test whether the second derivatives are dramatically more expensive than the first derivatives. Typically using this quasi-Newton approximation comes at the cost of requiring more iterations to converge than using exact Hessian information, sometimes not converging at all, but for some problem types the Hessian may be very expensive to calculate and it may be worth it.
-Tony