Compare 2 strings token by token.

27 views
Skip to first unread message

Vivek Panwar

unread,
Oct 4, 2016, 7:41:21 AM10/4/16
to nltk-users
Hello,

I want to compare below 2 strings token by token through Python. One is ORG string and another is OUT string. Want to check how many words match in OUT string and how many extra words are inserted in OUT like "Canteen" becomes "wants en" after typing.
Best method to convert the strings in Tokens and then compare them token by token.

ORG: Steal from office Canteen and Eat that in Dinner

OUT: Steel from office wants en and Eat that night Dinner

Thanks
Vivek

Constantin Orăsan

unread,
Oct 4, 2016, 10:19:58 AM10/4/16
to nltk-users
Hello,

If I understand correctly you want a kind of edit distance measure. You can easily change the code from https://en.wikibooks.org/wiki/Algorithm_Implementation/Strings/Levenshtein_distance#Python to achieve what you want. There is no need to use NLTK, unless you want to use the tokeniser from NLTK (see http://www.nltk.org/book/ch03.html for how to tokenise texts).

Hope it helps,

Constantin

--
You received this message because you are subscribed to the Google Groups "nltk-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nltk-users+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Dimitriadis, A. (Alexis)

unread,
Oct 4, 2016, 10:38:02 AM10/4/16
to nltk-...@googlegroups.com
Google “nltk edit distance”. Sure you can implement it yourself, but the nltk already provides a Levenshtein distance function. 

Alexis

Dr. Alexis Dimitriadis | Assistant Professor and Senior Research Fellow | Utrecht Institute of Linguistics OTS | Utrecht University | Trans 10, 3512 JK Utrecht, room 2.33 | +31 30 253 65 68 | a.dimi...@uu.nl | www.hum.uu.nl/medewerkers/a.dimitriadis
Reply all
Reply to author
Forward
0 new messages