superfluous letter in freq distrib list - nltk

18 views
Skip to first unread message

I Ozkaragoz

unread,
Jul 9, 2016, 11:13:55 PM7/9/16
to nltk-...@googlegroups.com
I have downloaded the recent NLTK from the website to test out some of the exercises in the original NLTK book on p 17. I am using the 2.7.10 version of Python IDLE.
Can someone please tell me why I am getting the "u" in front of each instance of the frequency distribution list below? I tried both text1 and text2 that i downloaded as part of NLTK. Is there a way I can eliminate them? Is it my Python version? Everything else seems to work fine.
Thanks very much.

>>> fdist1= FreqDist(text2)
>>> fdist1
FreqDist({u',': 9397, u'to': 4063, u'.': 3975, u'the': 3861, u'of': 3565, u'and': 3350, u'her': 2436, u'a': 2043, u'I': 2004, u'in': 1904, ...})
>>> vocab1=fdist1.keys()
>>> vocab1[:50]
[u'succour', u'four', u'woods', u'hanging', u'woody', u'conjure', u'looking', u'eligible', u'scold', u'unsuitableness', u'meadows', u'stipulate', u'leisurely', u'bringing', u'disturb', u'internally', u'hostess', u'mohrs', u'persisted', u'Does', u'succession', u'tired', u'cordially', u'pulse', u'elegant', u'second', u'sooth', u'shrugging', u'abundantly', u'errors', u'forgetting', u'contributed', u'fingers', u'increasing', u'exclamations', u'hero', u'leaning', u'Truth', u'here', u'china', u'hers', u'natured', u'substance', u'unwillingness', u'pretensions', u'reports', u'NOT', u'NOW', u'projection', u'sweetest']
>>>

Alex Rudnick

unread,
Jul 10, 2016, 2:38:00 AM7/10/16
to nltk-...@googlegroups.com
Hey there,

The u in front of the strings means that those are Unicode strings in
Python 2. (as opposed to regular strings, which are sequences of bytes
rather than sequences of Unicode characters.) So everything's working
exactly as expected :)

If you were to use Python 3 instead, the handling of Unicode strings
might be a bit more intuitive.

For more background: https://docs.python.org/2/howto/unicode.html
> --
> You received this message because you are subscribed to the Google Groups
> "nltk-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to nltk-users+...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.



--
-- alexr

Steven Bird

unread,
Jul 10, 2016, 3:47:30 AM7/10/16
to nltk-...@googlegroups.com
Note that the online version of the book is updated for Python 3:

I Ozkaragoz

unread,
Jul 10, 2016, 4:51:48 AM7/10/16
to nltk-...@googlegroups.com
Thank you very much, Alex and Steven, for your replies. Glad to hear that it's "normal" for my version.
Best, Inci

Alexis

unread,
Jul 13, 2016, 5:42:59 AM7/13/16
to nltk-...@googlegroups.com
Alex and Steven have given the narrow answer to your question, but let me follow up with a suggestion: Upgrade to Python 3 immediately. Why are you using python 2? There is no reason to be learning an outdated and inferior version of python. You don’t need that kind of distraction.

Alexis

I Ozkaragoz

unread,
Jul 13, 2016, 10:35:23 PM7/13/16
to nltk-...@googlegroups.com
Yes thank you, Alexis. I will do that soon...
Reply all
Reply to author
Forward
0 new messages