Hi Scott!
I'm very fond of the key/value (k/v) design pattern, where you represent arbitrary keys and values using an array of documents {k:'keyvalue', v:'valuevalue'}. I'm glad that you like it too. I'm having a hard time with some of your questions, because I'm finding them a bit ill-defined.
First, though -- an important point. You can *combine* the k/v index with other indexes. For example, I worked with a customer building a tenanted system: there were some fields in the documents that would always be present since they were used by the overall system (think: "CustomerID" and "ApplicationID": also "DateCreated" and "DateUpdated"), and then there were a set of fields that were created by the individual customer and which would vary wildly [0].
My overall advice was to store the customer fields in a 'custfields' array, using the k/v design, and to build a *compound* index, with the mandatory fields first and the 'custfields' array last. IIRC, they knew that every query would always use CustomerID, ApplicationID, and one of the dates, so the overall index was {CustomerID :1, ApplicationID:1, DateCreated:1, custfields:1}. One good thing about this design was that {CustomerID:1, ApplicationID:1, DateCreated:1} made a *very* nice shard key.
They wanted to do searching and sorting based on the values in the 'cfields' array. Searching is easy: you just have to use $elemMatch -- "{ cfields: {$elemMatch: {k:'color', v:'blue'} }" (or whatever). Sorting is more challenging: you have use the aggregation framework, do something like this [1] to move the sorted fields to the top level, and then apply $sort to the generated fields.
This was using 2.2, so they had to make sure that the result sets would fit in a single document and that no single $sort stage used too much memory. If you use 2.6, where aggregation returns a cursor and the {allowDiskUse:1} parameter is available, neither of these is an issue any more.
Going through your questions:
It depends [2].
I'm having a hard time understanding this question. The obvious answer is "yes, if you design the schema badly, this can be a problem". :-) If you use the k/v section as part of a larger index (see above), then you'll get decent index locality. Note that you *will* need to have enough RAM to keep the entire index in memory, since entries can be scattered anywhere, but that's no difference than any other MongoDB index that isn't right-balanced or left-balanced.
The other question is "compared to what?". It's generally a good idea to have as few indexes as possible. If you can replace 8 or 10 indexes with a single k/v index, it's a net win on everything: on disk usage, on RAM usage, on disk I/O, on the time required to build and select a query plan.
On the other hand, if you have a schema where all of your queries can be satisfied with only 2-3 compound indexes, those will be more efficient to use the "conventional" compound indexes than a k/v index.
Where's the crossover point? It depends on the details of your application: you'd have to test and measure.