LSA for dissertation

3 views
Skip to first unread message

max nethercott

unread,
Dec 1, 2017, 9:36:57 AM12/1/17
to s-spac...@googlegroups.com, s-space-re...@googlegroups.com
Hi, I have come across your project on git hub. It looks very interesting.

I am currently in 3rd year, studying computer science. My project title is: Automating the semantic analysis of Social Media conversations to assist in child safeguarding

Would it be possible to use your API to perform latent semantic analysis on text messages to alert parents on certain suspect messages?

I am currently finding it quite hard to get my head around how your API works, I would love a reply with some feedback on this.


Thank you

Max

David Jurgens

unread,
Dec 1, 2017, 9:44:43 AM12/1/17
to s-spac...@googlegroups.com, s-space-re...@googlegroups.com
Hi Max,

  Yup, that idea sounds pretty feasible with the package.  You'll want to create a new LatentSemanticAnalysis object and run your corpus through it to get the dimensionally-reduced representations of your documents.  Since the package doesn't normally retain the document space on its own, when you create the LatentSemanticAnalysis object, be sure to set the retainDocumentSpace argument to true.  If you have new documents that you want to project into your dimensionally-reduced space, there's a project() function that will let you do that.

   All of this said, LSA is old technology and you might want to try something more modern.  Latent Dirichlet Analysis could work well (python's gensim or java's Mallet packages do this).  Another simple approach would be grabbing some off-the-shelf word vectors (e.g., word2vec or GloVe) and then treating message as the sum of its words' vectors.  That summation works fairly well for a document representation, especially for the short documents like you'll see in social media.  The big word2vec dataset from google also has lots of spelling variants in it (e.g., "cool", "cooool") so it's robust to some of the variation you'll see in social media.

  Hope this helps!

  David


Max

--

---
You received this message because you are subscribed to the Google Groups "S-Space Package Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to s-space-users+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages