Typically, native code is not actually much faster at run-time but rather the start-up time is shorter. Even an AOT'ed C# or F# code would still be doing array bounds checks, etc. When you really want raw performance on things like inner loops that are being called 100s of millions of times, then you are a bit out-of-luck with a managed language.
That said, if you properly use structs as opposed to classes you already can avoid a lot of GC overhead and indeterminism. For the majority of cases, you'll find F# easily fast enough; we do HPC on GPUs and distributed grids and it is virtually never an issue that F# itself is too slow. Especially, that you can parallelise it quite easily. When we feel like some particular piece of code really needs raw performance, we tend to write the algorithm on GPU (from within F#), that said, all data transformation is still done in idiomatic F# code without mutable state, etc. without ever having performance issues.
The specifically designed GPU code, is usually very different, of course and much, much more difficult to write and maintain than normal F# code.