Hi Aman,
There may be a few ways to do this. One thing to be aware of is "index selectivity" however, and so if you proceed with option 1, make sure to analyze the distribution of values in your fields first. (Google "index selectivity" for more details.)
Let's take it back to the Cars analogy :)
Say Ford is manufacturer, and within the set of cars manufactured by Ford, there are types of car such as "Estate", "Pickup" etc. You could say that the types manufactured by Ford are a subset of all Ford cars, but also other manufacturers might manufacture cars of the same type. So you do indeed need to model the subset relationship.
If you build a HashIndex on Car.manufacturer and Car.type, then great it will reduce the candidate set somewhat. But the "selectivity" of both those indexes would be fairly poor. Say you had a collection of 100000 cars, of which 50000 (50%) are manufactured by Ford, and 50000 (50%) are manufactured by Honda. Let's say 50% of Ford's fleet are Pickup trucks (25000 cars), but let's say that 50% of Honda's fleet are also pickup trucks (25000 cars).
Given a query "and(equal(Car.manufacturer, "Ford"), equal(Car.type, "Pickup"))", CQEngine will have a choice between both of those indexes, but neither index alone will uniquely identify "Ford Pickup" cars. The index on manufacturer ("Ford") will narrow the candidate set to 50000, and the index on type ("Pickup") will narrow the candidate set to 50000 also (i.e. Ford manufactured 25000 Pickups, and Honda manufactured another 25000). The candidate sets from either of those indexes will be large at 50000 cars, 50% of the collection, requiring a non-trivial amount of filtering. So this is where you might run into problems with option 1.
However often this is where CompoundIndex comes in. If you "addIndex(CompoundIndex.onAttributes(Car.manufacturer, Car.type))" then CQEngine will build an index on the *combinations* of those two attributes. This way it can jump immediately to "Ford Pickup" cars which is much smaller, 25% of the collection, requiring much less filtering.
Ultimately, if you analyze the "selectivity" of field values in your collection, you might find that it's not worth adding indexes on certain fields at all. For example any particular value of Car.type matches a large fraction of the entire collection. So to make the retrieval faster you can use CompoundIndex to build indexes over combinations of fields.
So it sounds like CompoundIndex might allow you to model your isSubsetOf relationship in an indirect way?
An alternative to a CompoundIndex, is to define a virtual attribute which is a combination of the two fields, and then you can build any type of index on top of the virtual attribute. This may allow you to model isSubsetOf relationships directly, and might be closer to your option 2.
A simple way with string fields, is to concatenate them in the virtual attribute such that the attribute returns "Ford Pickup", "Ford Estate" etc even though no such field really exists in the objects. For non-string fields, you can define an attribute which returns "CarType" objects encapsulating two fields manufacturer and type, and implement equals(), hashCode() (and Comparable.compareTo() for NavigableIndex) as normal on that value object. You can then query the collection by supplying a CarType object encapsulating the sought combination of those fields (this is like Hibernate's "query by example"). This approach is for advanced cases. CompoundIndex exists to automate boilerplate like that.
With both CompoundIndex and when building indexes on virtual attributes, you might also wish to control the tradeoff between retrieval time and memory required by the index. Take a look at Quantizers which can transparently group objects into the same buckets inside indexes, trading retrieval speed for reduced overall size of the indexes.
I'm not sure but I hope that helps,
Niall