There's a paper that describes a few things on performance here:
It's from last year, though, and several things have been optimized and improved in the meanwhile. Besides, we didn't push it to the top in all scenarios due to lack of resources. Just to do a quick summary, in general we found out that CPU/mem usage was minimal, and the real bottleneck was bandwidth usage.
Should you make more tests on your own and be willing to share the results, that would be great.
Lorenzo