How to give corpus to the models

9 views
Skip to first unread message

Narges Safi

unread,
Aug 31, 2014, 2:01:02 PM8/31/14
to s-space-re...@googlegroups.com
Dear developers,
I have begun working with S-Space for the first time and now I'm very confused. I want to run LSA model, but I don't know how to give my corpus to it. Where should I define my corpus and how should I give it to the codes?
I know that my question is very simple, but I'll be very grateful if help me.

thanks, Saafi.

David Jurgens

unread,
Sep 3, 2014, 2:44:35 PM9/3/14
to s-space-re...@googlegroups.com
Hi Saafi,

  How is your corpus represented?  Typically, we work with corpora that are in a file where each line in the file is considered a separate document.  If your corpus is stored this way you can used it with LSA like the following:

import java.io.*;
import java.util.*;
import edu.ucla.sspace.util.*;
import edu.ucla.sspace.lsa.*;

public class Example {

    public static void main(String[] args) throws Exception {

        if (args.length != 1) {
            System.out.println("usage: java Example my-corpus.txt");
            return;
        }

        File corpusFile = new File(args[0]);

        // Create your LSA object
        LatentSemanticAnalysis lsa = new LatentSemanticAnalysis();

        // Read each line from the file, which we are going to consider as a
        // separate document
        for (String document : new LineReader(corpusFile)) {
            // Wrap the document as a BufferedReader.  It's not a great design
            // in this case, but it's what has to be done.
            lsa.processDocument(new BufferedReader(new StringReader(document)));
        }

        // Instruct LSA to perform the SVD
        lsa.processSpace(System.getProperties());

        // Your LSA object is now ready to go!
    }
}

Hopefully this helps clear things up, but if you run into problems, or if your corpus is in a different format, please let us know!

  Thanks,
  David


--
You received this message because you are subscribed to the Google Groups "Semantic Space Research - Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to s-space-research...@googlegroups.com.
To post to this group, send email to s-space-re...@googlegroups.com.
Visit this group at http://groups.google.com/group/s-space-research-dev.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages