I've used visualvm locally, which is a cheap way of doing it, but you probably couldn't profile startup effectively, and even so its your local dev startup only. You could in theory drop a breakpoint in your code, connect, then continue to see whats within your control (i.e. your application startup, as opposed to resource fetching, container spin-up etc which would be different in the cloud than locally).
I've also done some coarse level manual profiling while running on appengine, that is recording timings yourself - but again, thats within your code rather than anything else. I'm not sure about scala, but if you had some way of adding interceptors/AOP to whatever the Servlet is (even if you have to wrap it), you could time that. Beyond that, I guess its comparisons against the log output for startup.
In conversations with Google previously (a year ago now), they had suggested that Spring was a culprit for slow startups they'd observed in deployed apps - its probably a combination of scanning and jar size, but I wasn't able to get any more info than that really. I wasn't really talking to people who's expertise would match our in the nuances of these technologies either, so I'd take it with a grain of salt.