If you only have stores in the 3+TB range, then it looks like you're getting decent sizes per chunk, and you're probably fine.
The tradeoff with large numbers of partitions in RO & BnP are the following:
1) In its current implementation, BnP cannot run less than one reducer per partition. In your case, spinning up 7000 reducer tasks may be quite a big overhead, but if you're not feeling it, then there's probably no reason to worry too much about it.
2) In HDFS, it is generally preferable to have few big files rather than many small files. Since partitions produce at least two files each (index + data), that causes extra load and memory pressure on the Hadoop NameNode.
3) On the V servers, every file takes up a file descriptor. This is not a big deal, as you can just increase the OS limits, but it is yet another resource.
4) The above factors result in a deployment that does not scale *down* gracefully for small store sizes if you have too many partitions. If you have use cases with small amounts of data (few records and/or small records), it's really not worth it to throw all of the above resources at the problem. Since V clusters have a statically defined number of partitions for all stores hosted on it, it is a bit inflexible if you want to have a wide range of store sizes. This is particularly important for us at LinkedIn: we have hundreds of stores, with new ones every week, and they range from a handful of records all the way up to billions of records, from KB to TB.
On the other hand, the main advantage of over-partitioning is to accommodate future expansion. Arun's figure of 20 partitions per server allows you to grow maybe 4x comfortably (bringing you to 5 partitions per host, around which point, skewed data distributions may cause hot nodes and become problematic) and theoretically up to 20x if your data is perfectly distributed. In our case, we have chosen to build new clusters when we fill up the old ones, rather than relying exclusively on expanding them, so this amount of over-partitioning works out ok for us.
The small number of nodes per cluster (20-60) in turn allows us to run with a replication factor of just 2 on most clusters, which would probably not be sufficient if we tried to deploy cluster with hundreds of nodes each, since correlated failures would be much more common in that scenario.
Another aspect is that when you have large store sizes, you can execute BnP at a higher throughput the more files you generate. Fortunately, BnP supports chunks in order to cope with this. Large stores are automatically (or via configs) chunked more aggressivey, which can increase the paralellism of both the build and push phases.
Therefore, I think it is best to choose a number of partition solely based on cluster size and future expendability, and to let chunking take care of increasing the throughput for large stores. But that doesn't mean 7000 partitions can't work. We simply haven't tried it (:
I hope that helps.