I don't beleive there would be any major differences from a memory performance point of view with any of the options. More experience Redis developers might know better, but I've never seen any disclaimers about trying to optimize on that dimension. On the ohter hand, each Redis instance will only primarily run on a single core, with the potential to use a second core for the persist to disk work. As a result you might get more performance sharding enough to make use of more cpus cores on the machine. If your use cases end up being bandwidth / io bottlenecked, getting the extra cpu performance might not make much of an impact anyway.
If you were to push it even further, and did 24 x 4gb you would be able to get away with using a 32bit build of redis. This is a lot more efficient when dealing with lots of small keys, where the pointer itself is a noticeable portion of the size. Not sure on how that would overall effect performance in your use case though, and what the trade offs for running more redis instances than cpu cores would be. If you're more ram bound than cpu bound, I could see it being worth it in some cases in theory, but you'd need to verify with more appropriate testing for your use cases.