A few months ago, Paul announced the availability of a data set containing scores per player per round, going back to KECKLE. Exploring that data had been on my to-do list for a while, but it wasn’t until this week that I was able to carve out some time to
dig into it.
I am having some fun exploring various statistics based just on the scores, but one of the things Paul said the data could not do (on its own, at least) was evaluate the words themselves. Well, if you join it together with the used words list, you absolutely
can evaluate the words!
One statistic of interest is around word length. Some dealers like short words.
round dealer word word_length
3214 Shani Naylor E 1
408 Froma Bessel OY 2
838 Russ Heimerson CA 2
892 Dave Cunningham QU 2
1164 Dave Cunningham ZA 2
1690 Chris Carson EA 2
2100 Chris Carson FY 2
3207 Shani Naylor OG 2
Some like long words.
round dealer word word_length
2906 Judy Madnick EELLOGOFUSCIOUHIPOPPOKUNURIOUS 30
175 Steve Dixon FLOCCINAUCINIHILIPILIFICATION 29
3013 Debbie Embler PRISENCOLINENSINAINCIUSOL 25
23 R.H. Ingalls MOTHER CAREY'S CHICKENS 23
2817 Mike Shefler NINA WITH HER HAIR DOWN 23
6 Joshua Poulson HUMUHUMUNUKUNUKUAPUAA 21
1336 Dan Widdis BAD-I-SAD-O-BIST-ROZ 20
2583 Mike Shefler KAWAIOLAONAPUKANILEO 20
3096 Shani Naylor CHITTERIE-CHATTERIE 19
2875 Dan Widdis CROOCHIE-PROOCHLES 18
But most words are in the middle. It does seem that there’s a bit of a “shorter is better” effect once we get to 5 letters or more (The “Guessability” measure is the percentage of players who voted correctly.)
But do certain dealers favor shorter or longer words? It seems we do, and dealers who pick mostly long or mostly short words don’t do as well as those who pick in the 7-8 letter range….
I’m still exploring a few other relationships.
-
The fewer unique letters in a word, the better.
-
Uniqueness ratio favors a medium ground between low uniqueness (50% unique letters like BANANA and all unique letters like FLIGHT aren’t as good as a mix)
-
Words that score lower in scrabble are harder to guess than the ones with Q’s, Z’s, J’s, etc.
-
Entropy is similar to unique letters but also includes the distribution of the letters
-
Not really an obvious relationship for rare letters (compared to word frequency in the dictionary)
Anyway, I tried to use at least some of this information in selecting this round’s word. Let’s see how well It works.