[OT] Word Length and Guessability

4 views
Skip to first unread message

Daniel B Widdis

unread,
Jan 2, 2026, 1:30:27 AM (7 days ago) Jan 2
to 'Mike Shefler' via Dixonary
A few months ago, Paul announced the availability of a data set containing scores per player per round, going back to KECKLE.  Exploring that data had been on my to-do list for a while, but it wasn’t until this week that I was able to carve out some time to dig into it.

I am having some fun exploring various statistics based just on the scores, but one of the things Paul said the data could not do (on its own, at least) was evaluate the words themselves.   Well, if you join it together with the used words list, you absolutely can evaluate the words!

One statistic of interest is around word length.  Some dealers like short words.

round          dealer word  word_length  3214    Shani Naylor    E            1   408    Froma Bessel   OY            2   838  Russ Heimerson   CA            2   892 Dave Cunningham   QU            2  1164 Dave Cunningham   ZA            2  1690    Chris Carson   EA            2  2100    Chris Carson   FY            2  3207    Shani Naylor   OG            2

Some like long words.

round         dealer                           word  word_length  2906   Judy Madnick EELLOGOFUSCIOUHIPOPPOKUNURIOUS           30   175    Steve Dixon  FLOCCINAUCINIHILIPILIFICATION           29  3013  Debbie Embler      PRISENCOLINENSINAINCIUSOL           25    23   R.H. Ingalls        MOTHER CAREY'S CHICKENS           23  2817   Mike Shefler        NINA WITH HER HAIR DOWN           23     6 Joshua Poulson          HUMUHUMUNUKUNUKUAPUAA           21  1336     Dan Widdis           BAD-I-SAD-O-BIST-ROZ           20  2583   Mike Shefler           KAWAIOLAONAPUKANILEO           20  3096   Shani Naylor            CHITTERIE-CHATTERIE           19  2875     Dan Widdis             CROOCHIE-PROOCHLES           18

But most words are in the middle.  It does seem that there’s a bit of a “shorter is better” effect once we get to 5 letters or more (The “Guessability” measure is the percentage of players who voted correctly.)


But do certain dealers favor shorter or longer words?  It seems we do, and dealers who pick mostly long or mostly short words don’t do as well as those who pick in the 7-8 letter range….



I’m still exploring a few other relationships.


  • The fewer unique letters in a word, the better.
  • Uniqueness ratio favors a medium ground between low uniqueness (50% unique letters like BANANA and all unique letters like FLIGHT aren’t as good as a mix)
  • Words that score lower in scrabble are harder to guess than the ones with Q’s, Z’s, J’s, etc.
  • Entropy is similar to unique letters but also includes the distribution of the letters
  • Not really an obvious relationship for rare letters (compared to word frequency in the dictionary)

Anyway, I tried to use at least some of this information in selecting this round’s word. Let’s see how well It works.
img-1b712efb-a8b7-4e49-8ba8-1c31fc0ce203
img-c14d882e-d4d1-4622-8365-a710b522d0e6
img-69b97d4c-9596-42e2-8d30-a3278a966aae
Reply all
Reply to author
Forward
0 new messages