Bigger caches are of limited value once the working set is larger than the cache size[1]. In the low-latency space we often do crazy things to keep the entire application in cache but this is not the mainstream. Moving to large pages and having L2 support for these large pages can be more significant than actual cache size as we can now see for some large memory applications running on Haswell.
Rather than going parallel it is often more productive moving to cache friendly or cache oblivious (actually very cache friendly) data structures. It is very easy to make the argument that if a small proportion of the effort that went into Fork-Join and parallel streams was spent on providing better general purpose data structures, i.e. Maps and Trees, that are cache friendly then mainstream applications would benefit more than they do from FJ and parallel streams that are supposedly targeted at "solving the multi-core problem". It is not that FJ and parallel streams are bad. They are really an impressive engineering effort. However it is all about opportunity cost. When we choose to optimise we should choose what gives the best return for the investment.
A lot can be achieved with more course grain parallelism/concurrency. The Servlet model is a good example of this, or even how the likes of PHP can scale on the server side. Beyond this pipeling is often a more intuitive model that is well understood and practised extensively by our hardware friends.
When talking about concurrent access to data structures it is very important to separate query from mutation. If data structures are immutable, or support concurrent non-blocking reads, then these can scale very well in parallel and can be reasoned about. Concurrent update to any remotely interesting data structure, let alone full model, is very very complex and difficult to manage. Period. Leaving aside the complexity, any concurrent update from multiple writers to a shared model/state is fundamentally limited as proven by Universal Scalability Law (USL)[2]. As an industry we are kidding ourselves if we think it is a good idea to have concurrent update from multiple writers to any model when we expect it to scale in our increasing multi-core world. The good thing is that the majority of application code that needs to be developed is queries against models that do not mutate that often.
A nasty consequence of our industry desire to have concurrent access to shared state is that we also do it synchronously and that spreads to a distributed context. We need to embrace the world as being asynchronous as a way to avoid the latency constraints in our approaches to design and algorithms. By being asynchronous we can be non-blocking and set our applications free to perform better and be more resilient due to enforced isolation. Bandwidth will continue to improve at a great rate but latency improvements are levelling off.
My new years wishlist to platform providers would be the infrastructure to enable the development of more cache friendly, immutable, and append only data structures, better support for pipeline concurrency, non-blocking APIs (e.g. JDBC), language extensions to make asynchronous programming easier to reason about (e.g. better support for state machines and continuations), and language extensions for writing declarative queries (e.g. LINQ[3] for C# which could be even better). Oh yes, and don't be so shy about allowing lower level access from the likes of Java, we are well beyond writing applets that run in browser sandboxes these days!
Martin...