Yep, sounds about right.
For things to run faster, you can either do no drawing, do less complex drawing, have less nodes to draw or implement it using C++. I've banged my head on this a few times as well, and I wish there was a better answer. :(
With regards to parallel evaluation; that doesn't apply here because (1) drawing always happens in the main thread and (2) Python nodes cannot be parallelised. In fact, if you include a Python DG node somewhere in a dependency graph, Maya will kindly go ahead and disable parallel evaluation for that branch.