Hi Peter,
"In general, the idea [behind the VSM] is the more times a query term appears in a document relative to the number of times the term appears in all the documents in the collection, the more relevant that document is to the query." (from
http://lucene.apache.org/core/3_6_0/scoring.html)
A query can be viewed as a short document itself and Lucene will find documents that are similar (relevant) to it. Each document is represented by an N-dimension vector where N counts the number of unique terms found across all documents. The value in each dimension of a document vector is a function of each term's frequency in that document relative to its frequency in all documents. Similarity between two documents is defined as the angle between their corresponding vectors. More details in
http://en.wikipedia.org/wiki/Vector_Space_Model.
Full detail of computing similarity with formulas is in
http://lucene.apache.org/core/3_6_0/api/core/org/apache/lucene/search/Similarity.html.
Thanks,
--Sit