Hey David,
So I'm seeing on another processor where if I run AES under Keystone, the runtime + initialization cycles are just a tad under when I run it on base Linux. This is a big performance gap that has AES running significantly better, but I don't think it's due to paging. When I reported this result in August, I actually went back and changed the timebase frequency in the DTS to a value we see in the Rocket Core DTS under the Sifive/Freedom repo: 1000000. Even though that's a wrong value for the processor I'm working on (the actual value based on the hardware for when the cycle register gets updated is 6250000 for comparison). This is a processor running at 100MHz FYI.
After changing it to the lower value just to see, this processor was reporting what I believe would be the correct cycle count, even though the Linux date/time is completely off.
Here's some comparisons:
AES with 625000 Timebase Frequency (correct timebase based on the hardware):
Base AES Runtime: 14920622241
Base Real Time: 2m29.226s
Keystone AES Runtime: 8720936538
Keystone Real Time: 2m25.808s
Keystone Initialization: 5524916405
AES with 1000000 Timebase Frequency (incorrect timebase):
Base Real Time: 18m46.314s
Keystone AES Runtime: 14962720786
Keystone Real Time: 21m51.782s
Keystone Initialization: 5657233091
Ignoring the Real Time that's off, the run with the incorrect DTS looks more what I would expect to see running the base vs Keystone. There's a bit faster performance with Keystone, but adding the initialization to it then makes the performance slightly longer. This lines up with what you've described and what I've seen.
Would you happen to have any idea what would cause the large cycle difference with Keystone in the run with a correct timebase frequency? Or would the performance gap truly be that much?