It is basically random which connections/queries go to which secondaries related to the working set of data. It could spread reads out in a different way. Currently the goal is to do this based on proximity of the client, but haveing an algorithm which tried to use the chunk (shard key) ranges could provide for better cache efficiency across the replicaset (when reading from secondaries).
One way to get something like the effect you want is to run 4 shards; you could run multiple mongod's on each box and thus get redundancy as well. Then you would not use slaveok and in a read-heavy application your caches would be reasonably well segmented. Other pros and cons with this approach, just wanted to point it out.
If you can get by with 2 replicas and an arbiter, you'd pick up even more efficiency.
But yes, different mongos behavior could achieve better caching, though perhaps at the expense of an evenly balanced load in some cases.
-- Max
There is no clear way to consistency spread the load across all the
replicas, and if you depend on this behavior then when you lose a
single replica your application performance may drop considerably
and/or horribly; the goal for replica sets is to be reliable and to
provide redundancy, and while this might be true in some ways by doing
something like this it makes it untrue in others.
Sharding is the solution to the problem you have described.