Have you tried putting logs all over the place (use a ServiceLayerDecorator that overrides every method to add some logs, and override the RequestFactoryServlet's doPost to log before and after) to try to see where the sluggishness comes from?
Do you have any servlet filter (e.g. managing transactions) that could explain that?
There are a few instances of "synchronized(cache)" blocks which might benefit from using a "2-step check" (i.e. wrap "synchronized(cache) { foo = cache.get(...); if (foo == null) { foo = ...; cache.put(..., foo); } }" within a "foo = cache.get(...); if (foo == null) { /* previous block here */ } }" to avoid locking the cache when the looked up value is already in it). People have complained that such constructs with "synchronized" can slow down GWT-RPC, so it could very well slow-down RF too:
http://code.google.com/p/google-web-toolkit/issues/detail?id=6740(BTW, no, I never experienced this, but I'm still using an old pre-2.3 version, and I'm not using AppEngine)