The problem with all of these scaling questions is the answer really is "it depends", until you see the type of data, the true amount of data, and how operators use moloch, you don't really know. If you have long term budget/planning issues and its hard to get machines then order more now then later, otherwise you can order as you see. More disk will allow you to have more days in the future, and handle Moloch using more disk per day in the future. More memory will make things faster and allow longer spi view queries.
Yes running 2 or 3 nodes on 128G machines is still the way to go.