My guess is that OrientDB will index in Lucene whatever data lands on a given node in the cluster. If every node is a complete replica of all the data, then the local Lucene index should have all you need. If, however, the actual data is sharded across multiple machines in the cluster (like Solr does), then your local Lucene index will only contain part of the data. I haven't really studied the OrientDB distributed data model, so I can't say for sure, but based on what I read yesterday about
Hazelcast (which is the technology that underlies the OrientDB multi-master distributed database capability, and which is excellent technology in its own right), I would guess that each node is a complete replica. Again, though, please confirm this for yourself.