Running Doc2Vec on a CSV Corpus

823 views
Skip to first unread message

ML.TN

unread,
May 30, 2015, 3:41:38 AM5/30/15
to gen...@googlegroups.com
Hi, 

I have a corpus with two fields: sentences and their labels.
I want to execute Doc2Vec directly on that file.

Could I do that ?
If it's possible, I appreciate any kind of help.
It's my first time to use a Machine Learning framework in Python :)

Christopher S. Corley

unread,
May 30, 2015, 11:11:01 PM5/30/15
to gensim
Yup. Gensim makes it fairly easy to do that.

If I were you, I'd write a function something like this:

from gensim.models.doc2vec import LabeledSentence
import csv

def make_labeled_csv(filename):
    with open(filename) as f:
          r = csv.reader(f)
          for row in r:
              label = row[...]
              sentence = row[...]
              words = sentence.split()    # or any other preprocessing
              yield LabeledSentence(words, labels=[label])

Or, at least, that's the general idea. You can hand off the result of make_labeled_csv to Doc2Vec.

Chris.

--
You received this message because you are subscribed to the Google Groups "gensim" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gensim+un...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

ML.TN

unread,
May 31, 2015, 3:58:51 AM5/31/15
to gen...@googlegroups.com
Thank you Sir for your help.
The given code helped me a lot to achieve the rest of the training.

Pavan Kancharala

unread,
Oct 29, 2016, 11:10:16 AM10/29/16
to gensim

I am new to machine learning and python, I am working on the similar problem(2 columns text, labels) can u please send me your code 

how u imported csv file and converted to labeled sentences (DOC2VEC)
cleaning the text 
splitting the data train and test
training the model
evaluating the model

feel free to mail me on kvn...@gmail.com..

thanks

Lev Konstantinovskiy

unread,
Oct 30, 2016, 6:24:59 AM10/30/16
to gensim
Hi Pavan,

This tutorial uses doc2vec and many other techniques to achieve the same goal that you have: https://github.com/RaRe-Technologies/movie-plots-by-genre

Let me know if you have more questions,
Lev

Pavan Kancharala

unread,
Nov 1, 2016, 7:59:16 AM11/1/16
to gensim
Hi Lev Konstantinovskiy
     Thank you very much for sharing the link it solved my problem
Reply all
Reply to author
Forward
0 new messages