MicrosoftExcel is a fantastic tool for working with numbers, but it can become challenging when it comes to counting specific words in your spreadsheets. If you've ever struggled with Excel's complex formulas for this task, there's an effective solution - creating a custom function that makes text frequency analysis a breeze.
Excel offers various approaches to retrieve a list of unique values and get the count of specific text within the target range. However, these methods often require the use of intricate Excel functions, which might not be intuitive and user-friendly for everyone.
Tip. If you plan to use this function frequently, consider storing it in an add-in file rather than your current workbook. This way, you can access the function from any Excel file. Here are the detailed instructions on creating and using add-ins to store custom functions in Excel.
Tip. Unfortunately, Excel doesn't display tooltips for custom function arguments like it does for built-in functions. However, there's a little trick to display argument names. Simply enter the name of the custom function and then press the Ctrl + Shift + A shortcut. For more details, see how to display tooltips for custom VBA functions in Excel.
In Dynamic Array Excel, FreqWord behaves like a dynamic array function, i.e. you enter a formula in one cell and it returns an array of values into neighboring cells.Just make sure there are enough empty cells down and at least one empty column to the right to output the results, otherwise you may get a #SPILL error.
Using FreqWords function in Excel 2019 - 2007In pre-dynamic versions of Excel, you need to use the FreqWords function as an array formula, which requires a slightly different approach:
For more information, see How to enter an array formula in Excel.Now that you understand how to use the FreqWords function, let's look at some real-life scenarios where it can simplify your Excel data work.
How to count frequency of words in ExcelLet's assume you have a list of addresses in A2:A25. To find out which words most often occur in this list, enter this formula in C3:
After hitting Enter, you will receive a list of all unique words and the number of times each word appears in the source range. The words will be displayed in column C, and their corresponding frequencies will be in column D.In Excel 365 and Excel 2021, the formula returns an array of values in multiple cells. However, in Excel 2019 and earlier versions, a normal formula will return just a single value. To obtain a list of values, be sure to enter it as an array formula as explained above.
How to do word frequency analysis with custom delimiterOften, when importing data into Excel from other programs or CSV files, values are not separated by spaces but by special characters such as commas, semicolons, colons, slashes, and more.
The result of this calculation is a frequency table of words based on your specified delimiter:How to count frequency of text within one cellIn case you need to count text frequencies within a single cell, simply supply that cell address for the first argument of the FreqWords function:
Let's consider a scenario where you have two lists with similar items but in different orders. Moreover, the items in these lists may have different word arrangements, such as "chocolate milk" and "milk chocolate". The question is: are these lists identical or different?The solution involves two simple steps.
By applying these formulas, you obtain the word count for each list. Examining the results, as shown in the screenshot below, you can conclusively determine that the two lists are different.Step 2: Compare word frequencies in two lists
To accomplish this task, you need to place the word frequencies side by side. In our example, since List 2 contains more unique words, we'll consider it the main list and pull the frequencies from List 1 to the adjacent column using the XLOOKUP function.
To make the differences even more apparent, you can add labels such as "match" and "don't match" in a separate column. This can be achieved using the following IF statement that compares the word counts in columns I and J:
As a result, you'll get the same word frequencies highlighted in both lists, allowing you to visually distinguish matches and differences.To wrap it up, analyzing text frequencies in Excel doesn't have to be tough anymore. With the custom FreqWords function and the Compare Tables tool, it becomes super simple! Whether you're working on textual analysis, data mining, or simply trying to figure out the patterns in your spreadsheet, these tools together will make your tasks smoother and more understandable. Just give them a try and see the difference they can make.
Alexander. In your FreqWords() function you only remove special character in the If Delimiter = "" Or Delimiter = " " Then Delimiter = " " portion of the IF statement but not the ELSE portion. I don't understand why special characters would be dependent on the delimiter or is there something I'm not understanding.
Thanks
Dear Alexander, thanks for the post. It works perfectly for 1 word analysis, but in my case, I analyse ingredient lists.
I have 1 ingredient per cell. Each ingredient can be 1 word (i.e. salt) or 2 words (i.e. white sugar).
Would it be possible to do this without removing the space between "white" and "sugar"? I am just guessing that if I delete spaces between words and prepare the table without spaces (i.e. "WhiteSugar", it should work. But the result is less user-friendly.
Threshold: My moving the slider from left to right, you can determine that for instance only words with a frequency of at least 10 should be displayed. The number of the left-hand side shows the lowest, the number on the right-hand side the highest occurring frequency.
Stop / Go Lists: To exclude particular words, you can select stop lists. If only certain words should be displayed, you can create a Go List. See the section on "Stop and Go Lists" below.
Show Inflected Forms: The plural forms of nouns, the past tense, past participle, and present participle forms of verbs, and the comparative and superlative forms of adjectives and adverbs are known as inflected forms. If you activate this options, the word cloud only shows the basic form of the word, e.g. building but not buildings.
There are limits in terms of how much data can be meaningfully processed, or moreover handled by and displayed in Excel. Assuming there are 2000 unique words in a document, and you process 100 documents, this results in an Excel table consisting of 200.000 cells.
Character Filter: In this section you can enter special characters that you do not want to be counted and displayed in the word cloud. By default, the following characters are excluded:
So far, I've been able to create the list of words. (by searching the forum.)The list of words is generated in column B, can anyone help me with the code so it also generate the count of occurrence in column C?
you are welcome and thank you. i stumbled upon wiktionary when i was looking for word lists and i know its not easy to find many free / extensive sources. since i only frequented English Wiktionary, i only posted links on the english version of the page.
well my overall subtitle corpus for all languages was 53 GB compressed archive. I unfortunately deleted all except the original archive. Let me open it and i can give you an idea on the number of files at least. Based on my tests, frequency lists generated using decent amount of data should be comparable. I am assure you that there were lot more entries than 50k i used and provided for download
I suggest you lemmatize your wordlists, and not only present them as wordforms (group verbforms walks, walked, under walk_V, and noun forms a walk, the walks under walk_N), and similarily for the other languages. Here is an overview of software to do so: _Grammar
well i can do that but the word lists i consume are for a keyboard app and i need raw words to match user input. infact when i started i came across a few word lists and i could not use them for my requirement purely because i depending upon user input i would want to show walked in my app and lemmatised word lists would make loading a lot slower.
well the format i used for word list is sort of generic one i found around. it goes like this
word1 wordfrequency1
word2 wordfrequency2
word3 wordfrequency3
word comes first, and is followed by word frequency which is a number, with space in between.
you were correct. I re-ran the word list generator a couple of times and I found the mistake i made in computing the total count. The other details are correct however the total word count came up to 765703147 and not 690788712769
Total word count is the total count used for frequency list.
Overall word count was the actual word count. some words has junk character or at length of 1 which are ignored. Hence Total word count
Luigi,
The word lists i have generated ignore 1 letter words like a and i. Its difficult to validate a single char word across multiple languages unless you know the language or can spend time tuning the rules per language. I know a bit about it as i have done something similar for accents across various latin based european languages. If you really want one, i can generate a one off and email it to you.
Having said that i will try to generate torrent files, one that references all the 50kzip and another one that references all full zips. Once i generate these, i will udpate this page with the torrent files.
well i used to have an excellent package which would give me tons of bandwidth and allow me to host couple of gigs of data however i was not using it.. i dont even know if it still works (actually i will check in a bit).. eventually i moved my email hosting to microsoft live a while back and moved hosting there as well.. worse is wordpress.. they allow you to upload tons of things including movies but not zipped files..
3a8082e126