Exactly. LSA and LRA implement the transitive property of semantics. They allow you to infer relations like rain-wet from observed pairs like rain-water and water-wet. This would allow a compressor to predict "wet" in context "rain" even though the pair was not previously observed.
In paq8hp12 (and probably durilca and decmprs8), words like "rain", "wet", and "water" would be grouped during dictionary preprocessing so that when they are assigned a 2 byte dictionary symbol, one byte is the same. This improves compression because it allows for models that ignore the other context byte.
LSA could be used to automatically construct such dictionaries. Each word is assigned a context vector. Then we find a short path through this space and output the
words in that order. For an ordinary word-word association matrix, the context vector would have one component for each other word in the vocabulary. These vectors could be reduced to a few hundred by LSA.
In Gorrell's onlike LSA implementation, a 3 layer neural network is trained by back propagation to predict nearby words in a text corpus. There is one input and output neuron for each word in the vocabulary. Neurons are gradually added to the hidden layer, up to a few hundred. The context vectors are then given by the weights between the input and hidden layers.
I think you might get better compression by predicting words directly rather than going through an offline dictionary. Constructing a linear dictionary loses information about the original distances between words. It would probably be slower because training the network requires a lot of computation that would otherwise be done outside
the decompresser's model.
-- Matt Mahoney,
matma...@yahoo.com