Replacing tweets with synonyms

afre

unread,

Oct 4, 2017, 11:55:08 AM10/4/17

to nltk-users

Hi all,

I am new in NLP and NLTK.

I have a small set of twitter data. I want to increase it with replacing words with synonyms.

I want to retrieve meaningful data. Like wordnet can be used but the results are not readable.

Can anybody guide me how to deal with this situation and tell me how to do it with python nltk.

Or guide me towards a tutorial for beginers.

Thanks in advance for early response.

Regards

Afre

Dimitriadis, A. (Alexis)

unread,

Oct 5, 2017, 11:34:53 PM10/5/17

to nltk-...@googlegroups.com

Hi Afre,

Replacing one word with another won’t give you a bigger dataset, it will give you a small dataset with manipulated variations. It will not have the variation that a real dataset would have. If you want a bigger dataset, you must get more tweets — twitter makes it possible to download tweets for a particular tag or user.

If you have a transformation implemented and “the results are not readable”, as you write, you can post the relevant code and someone can probably help you improve it. But I advise against using the approach you describe.

Best,

Alexis

--
You received this message because you are subscribed to the Google Groups "nltk-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nltk-users+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

afre

unread,

Oct 8, 2017, 2:42:01 AM10/8/17

to nltk-users

Hi Dr. Alexis,

Thank you for your reply.

I have tweets, but the dataset is completely imbalance (0.45% vs 99.55%). So I think I should oversample the minority class.

Finally I did replacement successfully. However, the plain replacement is not very good.

Which technique (for oversampling) will you suggest in this scenario i.e. dataset is imbalance and plain synonym replacement doesnt result in proper result.