''' Read this text file and return the top three most ocurring words '''
inFile = r'E:\ProfessionalDevelopment\python\Introduction to Python Scripting in Maya\week4\EinsteinCredo.txt'
wordList=[]
occurences=[]
with open(inFile, 'r') as fin:
# removes the punctuation and splits the words into a list
for line in fin:
punct = ["'","?",".","!","?",",","\r\n","-"]
for p in punct:
line = line.replace(p,"").upper()
line = line.split()
for word in line:
wordList.append(word)
# make a word count list
for x in wordList:
occurences.append(wordList.count(x))
# make a dictionary of both the wordList and occurences
wordFrequencey = dict(zip(wordList,occurences))
# find the top three most occuring words
order = list(set(sorted(wordFrequencey.values())))
topThree = order[-3:]
# print the results
for k, v in wordFrequencey.items():
if v in topThree:
print 'the word " %s " occured %s times' % (k,v)
I am trying to list the top 3 most occurring words from a text document. I have managed to distill it down to the top three [13, 22, 24]. But for some reason my final print statement gives me the 4 most reoccurring words and not even in a numerical order [22, 22, 24, 13 ] Could someone show me why this is happening ?I have attached the text file that I am sourcing called EinsteinCredo.txt
''' Read this text file and return the top three most ocurring words '''
inFile = r'E:\ProfessionalDevelopment\python\Introduction to Python Scripting in Maya\week4\EinsteinCredo.txt'
wordList=[]
occurences=[]
with open(inFile, 'r') as fin:
# removes the punctuation and splits the words into a list
for line in fin:
punct = ["'","?",".","!","?",",","\r\n","-"]
for p in punct:
line = line.replace(p,"").upper()
line = line.split()
for word in line:
wordList.append(word)
# make a word count list
for x in wordList:
occurences.append(wordList.count(x))
# make a dictionary of both the wordList and occurences
wordFrequencey = dict(zip(wordList,occurences))
# find the top three most occuring words
order = list(set(sorted(wordFrequencey.values())))
topThree = order[-3:]
# print the results
for k, v in wordFrequencey.items():
if v in topThree:
print 'the word " %s " occured %s times' % (k,v)
--
You received this message because you are subscribed to the Google Groups "Python Programming for Autodesk Maya" group.
To unsubscribe from this group and stop receiving emails from it, send an email to python_inside_m...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/python_inside_maya/0655c287-af16-4657-a808-e77d56d65ed3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
"It would be better to just build up a dictionary directly within that word loop. That way you have a unique mapping of words to their occurrences. Then you can use sorted(words.items(), words.get) in order to sort the words by their value, in reverse order. That resulting list will let you slice off the last three, which will be the (key, val) tuples. You will no longer have issues with managing separate key/value lists. Let me know if you want the example"
Thanks for your help Justin. I would like an example.
occurences = {}
punct = set(["'", "?", ".", "!", ",", "\r\n", "-"])
with open(inFile, 'r') as
fin
for line in fin:
for p in punct:
line = line.replace(p,""
)
for word in line.upper().split():
try:
occurences[word] += 1
except KeyError:
occurences[word] = 1
ordered = sorted(occurences, key=occurences.get, reverse=True)
topThree = ordered[:3]
for k in topThree:
v = occurences[k]
print 'the word " %s " occured %s times' % (k,v)
"It would be better to just build up a dictionary directly within that word loop. That way you have a unique mapping of words to their occurrences. Then you can use sorted(words.items(), words.get) in order to sort the words by their value, in reverse order. That resulting list will let you slice off the last three, which will be the (key, val) tuples. You will no longer have issues with managing separate key/value lists. Let me know if you want the example"Regarding the zipping of two lists to make a dictionary. I haven't noticed any disassociate of key and values in the process. The length of the list did shrink but only because the zipping process removes duplicates. So instead of seeing the word AND appear 24 times, in the zipped dictionary it appeared only once like this {'AND':24}
--
You received this message because you are subscribed to the Google Groups "Python Programming for Autodesk Maya" group.
To unsubscribe from this group and stop receiving emails from it, send an email to python_inside_m...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/python_inside_maya/0a706b57-0c97-4930-ae92-7dcb9902fb7c%40googlegroups.com.
for word in line.upper().split(): # I get this, you split the words into a list
try:
occurences[word] += 1 # Looks like you are giving the dict keys. But I am not sure how you are also inserting values.
except KeyError: # Not sure what this is doing!
occurences[word] = 1 # or this
ordered = sorted(occurences, key=occurences.get, reverse=True)
To unsubscribe from this group and stop receiving emails from it, send an email to python_inside_maya+unsub...@googlegroups.com.
This is great, thanks.I see in this section you are building the dictionary, assigning both keys and values, but I have some questions, see red
for word in line.upper().split(): # I get this, you split the words into a list
try:
occurences[word] += 1 # Looks like you are giving the dict keys. But I am not sure how you are also inserting values.
except KeyError: # Not sure what this is doing!
occurences[word] = 1 # or this
Looks like you are sorting the occurrences dict into a descending order based on the values? So how are you telling sorted to look at the values fields to sort ?
ordered = sorted(occurences, key=occurences.get, reverse=True)
To unsubscribe from this group and stop receiving emails from it, send an email to python_inside_m...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/python_inside_maya/0a706b57-0c97-4930-ae92-7dcb9902fb7c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "Python Programming for Autodesk Maya" group.
To unsubscribe from this group and stop receiving emails from it, send an email to python_inside_m...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/python_inside_maya/576e1143-dfce-4fb9-aa4c-5228fc086bae%40googlegroups.com.
To unsubscribe from this group and stop receiving emails from it, send an email to python_inside_maya+unsub...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/python_inside_maya/0a706b57-0c97-4930-ae92-7dcb9902fb7c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "Python Programming for Autodesk Maya" group.
To unsubscribe from this group and stop receiving emails from it, send an email to python_inside_maya+unsub...@googlegroups.com.
To unsubscribe from this group and stop receiving emails from it, send an email to python_inside_maya+unsub...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/python_inside_maya/0a706b57-0c97-4930-ae92-7dcb9902fb7c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "Python Programming for Autodesk Maya" group.
To unsubscribe from this group and stop receiving emails from it, send an email to python_inside_maya+unsub...@googlegroups.com.