How is your corpus represented? Typically, we work with corpora that are in a file where each line in the file is considered a separate document. If your corpus is stored this way you can used it with LSA like the following:
import java.io.*;
import java.util.*;
import edu.ucla.sspace.util.*;
import edu.ucla.sspace.lsa.*;
public class Example {
public static void main(String[] args) throws Exception {
if (args.length != 1) {
System.out.println("usage: java Example my-corpus.txt");
return;
}
File corpusFile = new File(args[0]);
// Create your LSA object
LatentSemanticAnalysis lsa = new LatentSemanticAnalysis();
// Read each line from the file, which we are going to consider as a
// separate document
for (String document : new LineReader(corpusFile)) {
// Wrap the document as a BufferedReader. It's not a great design
// in this case, but it's what has to be done.
lsa.processDocument(new BufferedReader(new StringReader(document)));
}
// Instruct LSA to perform the SVD
lsa.processSpace(System.getProperties());
// Your LSA object is now ready to go!
}
}
Hopefully this helps clear things up, but if you run into problems, or if your corpus is in a different format, please let us know!