I am not aware of any way to profile the innards of a byte-code
compiled function (which doesn't mean that there ins't a way…)
But if you compile to C (not byte code), technically yo could use a
profiler for C to get some information about what needs to be sped up.
I expect it'll take some work to figure out how to profile such a
function effectively, as the code is generated and loaded into the
kernel by default. You could split the function off and call it from a
separate program (not the kernel) to profile it more easily. I have
never done this personally.