How to give corpus to the models

9 views

Skip to first unread message

Narges Safi

unread,

Aug 31, 2014, 2:01:02 PM8/31/14

to s-space-re...@googlegroups.com

Dear developers,

I have begun working with S-Space for the first time and now I'm very confused. I want to run LSA model, but I don't know how to give my corpus to it. Where should I define my corpus and how should I give it to the codes?

I know that my question is very simple, but I'll be very grateful if help me.

thanks, Saafi.

David Jurgens

unread,

Sep 3, 2014, 2:44:35 PM9/3/14

to s-space-re...@googlegroups.com

Hi Saafi,

How is your corpus represented? Typically, we work with corpora that are in a file where each line in the file is considered a separate document. If your corpus is stored this way you can used it with LSA like the following:

import java.io.*;

import java.util.*;

import edu.ucla.sspace.util.*;

import edu.ucla.sspace.lsa.*;

public class Example {

public static void main(String[] args) throws Exception {

if (args.length != 1) {

System.out.println("usage: java Example my-corpus.txt");

return;

}

File corpusFile = new File(args[0]);

// Create your LSA object

LatentSemanticAnalysis lsa = new LatentSemanticAnalysis();

// Read each line from the file, which we are going to consider as a

// separate document

for (String document : new LineReader(corpusFile)) {

// Wrap the document as a BufferedReader. It's not a great design

// in this case, but it's what has to be done.

lsa.processDocument(new BufferedReader(new StringReader(document)));

}

// Instruct LSA to perform the SVD

lsa.processSpace(System.getProperties());

// Your LSA object is now ready to go!

}

Hopefully this helps clear things up, but if you run into problems, or if your corpus is in a different format, please let us know!

Thanks,

David

--
You received this message because you are subscribed to the Google Groups "Semantic Space Research - Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to s-space-research...@googlegroups.com.
To post to this group, send email to s-space-re...@googlegroups.com.
Visit this group at http://groups.google.com/group/s-space-research-dev.
For more options, visit https://groups.google.com/d/optout.

Reply all

Reply to author

Forward

0 new messages