TL;DR: Most CPU time is still outside of netflix-graph in my app, doing some input parsing and splitting, etc. Given that I can't sort the input data before-hand I think my best bet for a speed-up is to build the graph from multiple threads. I can make some gains in parallelizing my splitting and parsing code but until netflix-graph is thread-safe I am still stuck building it from a single thread.
Also TL;DR: Given the load I'm very impressed with the graph building performance. While profiling it I found it to be very very efficient. Good work.
I've finished profiling. At the end of the graph-building process there is still more time spent in addConnection than contains. NFBuildGraphNodeCache.get ends up showing up near the top for addConnection and getConnectionSet. I think perhaps I was mistaken in thinking it was the few highly connected nodes that were a problem, I think it might be the node count leading to very large hash maps. Nothing much to be done about that I imagine.
method, time, %, own time, invocation count
addConnection, 47615, 24%, 1, 906018
getConnectionSet, 27746, 14%, 3, 1202328
contains, 6805, 3%, 6805, 1201423