Hi Richard,
Once you have clicked the "Facet by choice counts" link you should get a numeric facet with a slider at each end. If you move the righthand slider down to the number you want (e.g. 5) then you'll be filtering the rows in the data grid down to only those rows that contain "low count words" (i.e. words that appear in the column in 5 or less rows)
In this screenshot I've done this and you can see for my data set this includes 112 rows out of a total of 113 (i.e. all but one row contains a "low count word")
I think the direct use of the GREL facetCount(value.split(' '), "value.split(' ')", "YOUR COLUMN NAME") was being suggested by Tom only if creating a Word Facet, and then selecting "Facet by choice counts" was a problem (e.g. if the Word Facet turned out to be so large it led to performance issues). However, if you wanted to use this approach I think I'd use the following approach:
In a column dropdown menu select "Facet" -> "Custom text facet"
In the expression editor put the GREL:
filter(facetCount(value.split(' '),"value.split(' ')", "YOUR COLUMN NAME"),v,v<=5).length()>0
This should give you a true/false value where "true" means that the cell contains a word that occurs in 5 or fewer rows.
facetCount(value.split(' '),"value.split(' ')", "YOUR COLUMN NAME") creates an array for a cell which has one integer value for each word in the cell, that integer is how many cells/rows the word appears in, in the specified column.
In the screenshot you can see - for the first row there is just "by" which gets [ 63 ] - i.e. the word "by" appears in 63 cells in this column. In the second row, we have "A Question of Holy Writ ..." and [ 24, 1, 56, 1, 1 etc. That is "A" appears in 24 cells, "Question" appears in 1 cell, "of" appears in 56 cells, "Holy" appears in 1 cell, "Writ" appears in one cell etc. etc.
The filter removes any integers from that array that are 5 or less.
So now in the first line we have just [ ] (an empty array) - because it only had one number, which was "63" because "by" is a commonly occurring word. In the second line we can see that the larger numbers (24, 56 etc) are gone (being larger than 5) but there are still plenty of low numbers because the second row contains a mixture of commonly occurring words, and uncommon words (Question, Holy, Writ, etc.)
Finally we can now test the length of the array - if it is zero, then it means all the words in the original cell occurred in 6 or more cells in the column - as per our first row - so if we test whether the length of the array is greater than zero, we get "false". But the other rows, because they had one or more "low count" words (ones that appear in 5 cells or less), still have some integers in the array we created, so the length of the array is >0 - so we get "true"
The final outcome of this is a facet of true/false where "true" means that the cell/row contains only words that appear in 6 or more rows, and false means that the cell/row contains at least one word that appears in five or less rows
So if I want to delete rows, then I can select "true" and remove all those rows, knowing that they contain at least one "low-count words"