Replacing tweets with synonyms

38 views
Skip to first unread message

afre

unread,
Oct 4, 2017, 11:55:08 AM10/4/17
to nltk-users
Hi all, 
I am new in NLP and NLTK. 
I have a small set of twitter data. I want to increase it with replacing words with synonyms.
I want to retrieve meaningful data. Like wordnet can be used but the results are not readable. 
Can anybody guide me how to deal with this situation and tell me how to do it with python nltk.
Or guide me towards a tutorial for beginers.
Thanks in advance for early response.
Regards
Afre 

Dimitriadis, A. (Alexis)

unread,
Oct 5, 2017, 11:34:53 PM10/5/17
to nltk-...@googlegroups.com
Hi Afre,

Replacing one word with another won’t give you a bigger dataset, it will give you a small dataset with manipulated variations. It will not have the variation that a real dataset would have. If you want a bigger dataset, you must get more tweets — twitter makes it possible to download tweets for a particular tag or user.

If you have a transformation implemented and “the results are not readable”, as you write, you can post the relevant code and someone can probably help you improve it. But I advise against using the approach you describe.

Best,

Alexis


Dr. Alexis Dimitriadis | Assistant Professor and Senior Research Fellow | Utrecht Institute of Linguistics OTS | Utrecht University | Trans 10, 3512 JK Utrecht, room 2.33 | +31 30 253 65 68 | a.dimi...@uu.nl | www.hum.uu.nl/medewerkers/a.dimitriadis

--
You received this message because you are subscribed to the Google Groups "nltk-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nltk-users+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

afre

unread,
Oct 8, 2017, 2:42:01 AM10/8/17
to nltk-users
Hi Dr. Alexis, 
Thank you for your reply. 
I have tweets, but the dataset is completely imbalance (0.45% vs 99.55%). So I think I should oversample the minority class. 
Finally I did replacement successfully. However, the plain replacement is not very good. 
Which technique (for oversampling) will you suggest in this scenario i.e. dataset is imbalance and plain synonym replacement doesnt result in proper result. 

Regards
Afre
Reply all
Reply to author
Forward
0 new messages