Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

List of Words

8 views
Skip to first unread message

John Barton

unread,
Oct 3, 2003, 2:26:46 PM10/3/03
to
Does anyone know how to get Word (or something else) to
give me a list of the words found in a Word document?

I'd like to take the list, identify key words in the list,
put those words in a file that I can then use in a Word
concordance.

Thanks,
John Barton

Chris Worth

unread,
Oct 4, 2003, 1:28:12 AM10/4/03
to
As far as I know, there isn't a user interface way to get
at that information. However, using Visual Basic for
Applications (in Tools -> Macros -> Visual Basic Editor)
you can generate this list.

Are you looking for a list with frequencies ("Abracadabra,
15 times"), or just that unique words ("Abracadabra is
present")?

>.
>

John

unread,
Oct 6, 2003, 6:51:45 PM10/6/03
to
Chris,

Just looking for a list of the unique words, ultimately. I
presume I'll start with just a list of words. However, if
a unique list is possible, or with a count of uses... that
would be great.

Thanks,
John

>.
>

Greg Maxey

unread,
Oct 6, 2003, 9:46:49 PM10/6/03
to
John,

The following macro might be useful:

Sub WordFrequency()

Dim SingleWord As String 'Raw word pulled from doc
Const maxwords = 9000 'Maximum unique words allowed
Dim Words(maxwords) As String 'Array to hold unique words
Dim Freq(maxwords) As Integer 'Frequency counter for Unique Words
Dim WordNum As Integer 'Number of unique words
Dim ByFreq As Boolean 'Flag for sorting order
Dim ttlwds As Long 'Total words in the document
Dim Excludes As String 'Words to be excluded
Dim Found As Boolean 'Temporary flag
Dim j, k, l, Temp As Integer 'Temporary variables
Dim IngWordCount As Long 'Total non-excluded words in document
Dim NonWordObjects As Long
Dim AllWordOjects As Long
Dim TotalWords As Long
Dim tword As String '

'Set up excluded words
'Excludes = "[pickleloaf][gruntbutter]"
'Excludes = Excludes & InputBox$("The following words are excluded by
default: " & Excludes & ". Enter additional words that you wish to exclude,
surrounding each word with [ ].", "Excluded Words", "")
Excludes = InputBox$("Enter words that you wish to exclude. Place each word
within square brackets [ ]. Example: [is][a].", "Excluded Words", "")

'Find out how to sort
ByFreq = True
Ans = InputBox$("Default sort order is word freqeuncy. To sort
alphabetically by word, type Word in the field below.", "Sort order",
"FREQ")
If Ans = "" Then End
If UCase(Ans) = "WORD" Then
ByFreq = False
End If

Selection.HomeKey Unit:=wdStory
System.Cursor = wdCursorWait
WordNum = 0
ttlwds = ActiveDocument.Words.Count
'AllWordObjects = ActiveDocument.Words.Count
'TotalWords = NonWordObjects

'Control the repeat
For Each aword In ActiveDocument.Words
SingleWord = Trim(LCase(aword))
If SingleWord < "a" Or SingleWord > "z" Then SingleWord = "" 'Out of range?
If SingleWord < "a" Or SingleWord > "z" Then NonWordObjects = NonWordObjects
+ 1
'SingleWord = Trim(aword)
'If SingleWord < "A" Or SingleWord > "z" Then SingleWord = "" 'Out of range?
If InStr(Excludes, "[" & SingleWord & "]") Then SingleWord = "" 'On exclude
list?
If Len(SingleWord) > 0 Then
IngWordCount = IngWordCount + 1
Found = False
For j = 1 To WordNum
If Words(j) = SingleWord Then
Freq(j) = Freq(j) + 1
Found = True
Exit For
End If
Next j
If Not Found Then
WordNum = WordNum + 1
Words(WordNum) = SingleWord
Freq(WordNum) = 1
End If
If WordNum > maxwords - 1 Then
j = MsgBox("The maximum array size has been exceeded. Increase
maxwords.", vbOKOnly)
Exit For
End If
End If
ttlwds = ttlwds - 1
StatusBar = "Remaining: " & ttlwds & " Unique: " & WordNum
Next aword

'Now sort it into word order
For j = 1 To WordNum - 1
k = j
For l = j + 1 To WordNum
If (Not ByFreq And Words(l) < Words(k)) Or (ByFreq And Freq(l) > Freq(k))
Then k = l
Next l
If k <> j Then
tword = Words(j)
Words(j) = Words(k)
Words(k) = tword
Temp = Freq(j)
Freq(j) = Freq(k)
Freq(k) = Temp
End If
StatusBar = "Sorting: " & WordNum - j
Next j

AllWordObjects = ActiveDocument.Words.Count
NonWordObjects = NonWordObjects
TotalWords = AllWordObjects - NonWordObjects

'Now write out the results
tmpName = ActiveDocument.AttachedTemplate.FullName
Documents.Add Template:=tmpName, NewTemplate:=False
Selection.ParagraphFormat.TabStops.ClearAll
With Selection
For j = 1 To WordNum
.TypeText Text:=Words(j) & vbTab & Trim(Str(Freq(j))) & vbCrLf
Next j
End With
ActiveDocument.Range.Select
Selection.ConvertToTable
Selection.Collapse wdCollapseStart
ActiveDocument.Tables(1).Rows.Add BeforeRow:=Selection.Rows(1)
ActiveDocument.Tables(1).Cell(1, 1).Range.InsertBefore "Unique Words"
ActiveDocument.Tables(1).Cell(1, 2).Range.InsertBefore "Number of
Occurrences"
ActiveDocument.Tables(1).Columns(2).Select
Selection.ParagraphFormat.Alignment = wdAlignParagraphRight
Selection.Collapse wdCollapseStart
ActiveDocument.Tables(1).Rows(1).Shading.BackgroundPatternColor =
wdColorGray20
ActiveDocument.Tables(1).Columns(1).PreferredWidth = InchesToPoints(4.75)
ActiveDocument.Tables(1).Columns(2).PreferredWidth = InchesToPoints(1.9)

ActiveDocument.Tables(1).Rows.Add
ActiveDocument.Tables(1).Cell(ActiveDocument.Tables(1).Rows.Count,
1).Range.InsertBefore "Summary"
ActiveDocument.Tables(1).Cell(ActiveDocument.Tables(1).Rows.Count,
2).Range.InsertBefore "Total"
ActiveDocument.Tables(1).Rows(ActiveDocument.Tables(1).Rows.Count).Shading.B
ackgroundPatternColor = wdColorGray20


ActiveDocument.Tables(1).Rows.Add
ActiveDocument.Tables(1).Cell(ActiveDocument.Tables(1).Rows.Count,
1).Range.InsertBefore "Number of Unique Words in Document"
ActiveDocument.Tables(1).Cell(ActiveDocument.Tables(1).Rows.Count,
2).Range.InsertBefore Trim(Str(WordNum))
ActiveDocument.Tables(1).Rows(ActiveDocument.Tables(1).Rows.Count).Shading.B
ackgroundPatternColor = wdColorAutomatic

ActiveDocument.Tables(1).Rows.Add
ActiveDocument.Tables(1).Cell(ActiveDocument.Tables(1).Rows.Count,
1).Range.InsertBefore "Number of Non-Excluded Words in Document"
ActiveDocument.Tables(1).Cell(ActiveDocument.Tables(1).Rows.Count,
2).Range.InsertBefore (IngWordCount)

ActiveDocument.Tables(1).Rows.Add
ActiveDocument.Tables(1).Cell(ActiveDocument.Tables(1).Rows.Count,
1).Range.InsertBefore "Number of Words (Excluded and Non-Excluded) in
Document"
ActiveDocument.Tables(1).Cell(ActiveDocument.Tables(1).Rows.Count,
2).Range.InsertBefore (TotalWords)
System.Cursor = wdCursorNormal

MsgBox "This document contains " & Trim(Str(WordNum)) & " unique words. "
MsgBox "This document contains " & IngWordCount & " non-excluded words. "
MsgBox "This document contains a total of " & TotalWords & " (excluded and
non-excluded) words. "
MsgBox "For more statistics on this document, use Tools>Word Count in the
original document. "

Selection.HomeKey wdStory

End Sub


--
Greg Maxey
A peer in "peer to peer" support
Rockledge, FL
Remove the obvious (wham...m) to reply in e-mail

John

unread,
Oct 31, 2003, 12:52:22 PM10/31/03
to
Thanks for the macro, Greg. It worked great.

John

0 new messages