My Lingua Franca colleage Anne Curzan recently published a post about recent additions to the official Scrabble dictionary, of which there have been a surprisingly large number. (I guess that’s the way they’ve found to keep on selling Scrabble dictionaries.)
Naturally Anne didn’t object to this horde of new arrivals. We linguists always seem to be on the side of change, diversity, exoticness, and immigration, don’t we? But some people don’t like to see new additions. A few not only recoil at the lexicographical recognition of specific new words, but see new words as endangering the fun of the game itself.
In fact there may be some who believe that if you keep adding flocks of new words like this, the game will become too easy, because almost anything will count as a word.
At the risk of being somewhat nerdy, let me just make it clear that such fears would be wildly far off the mark.
It is very easy to use your computer to check how many strings of a given length are actual words. I often make reference to a standard machine-searchable word list of 25,000 words (a reasonable approximation to an ordinary person’s active vocabulary) that is supplied with many Unix-based systems (like Linux and Mac OS X) in a file called /usr/share/dict/words. Unfortunately many systems these days supply a less useful file 10 times bigger, stuffed with rarities and sillinesses. (The small file has 177 words ending in ation; the big one has 4,520, including ridiculous objects like counterexpostulation. Run a Google search on that and you’ll see why it’s ridic.) But let’s use both files to answer this question: How many letter strings of lengths between 1 and 7 are actual words? Here are the figures:
LENGTH | POSSIBLE | ACTUAL (small file) | ACTUAL (big file) |
1 | 261 = 26 | 26 (100%) | 26 (100%) |
2 | 262 = 676 | 59 (8.7%) | 121 (17.9%) |
3 | 263 = 17576 | 576 (2.3%) | 1134 (4.5%) |
4 | 264 = 456976 | 1778 (0.4%) | 4347 (0.95%) |
5 | 265 = 11881376 | 2415 (0.02%) | 8494 (0.07%) |
6 | 266 = 308915776 | 2891 (0.001%) | 15066 (0.005%) |
7 | 267 = 8031810176 | 3102 (0.00003%) | 20551 (0.00025%) |
As you can see, regardless of the file choice, as string lengths go up, the percentage of the possible strings that happen to be lower-case words heads exponentially down toward zero. Even at length 7 (the number of tiles you are allowed to hold on a Scrabble rack) it’s very small indeed.
I amused myself writing a little program to see how many times you’d have to draw 7 tiles at random from the bag in order to accidentally hit on a word. Writing such programs is easy these days. Typing this command in the Terminal application on a Mac produces 20 random 7-letter strings, different ones every time:
jot -r -c 140 a z | rs -g 0 7
And this command will check whether the string bostide is in the file /usr/share/dict/words:
grep '^bostide$' /usr/share/dict/words
Piecing together such components, it is child’s play (literally: a 12-year-old could do it) to construct a little script that keeps testing random letter-strings to see if they’re in the word list, going back for more if not, and keeping track of how many have been checked. My Mac tested nearly a million random 7-letter strings before it finally happened on delouse.
So the answer to what proportion of letter-strings happen to be English lower-case words is: virtually none of them. Words get rarer and rarer in the space of letter-strings as length increases.
Of course, you can arrange your 7 Scrabble tiles in different orders, so your chances of getting an actual word when you draw 7 random tiles are much better than getting them using the order in which the tiles came out of the bag. Given 7 tiles, the number of ways in which you can sequence them is 7 × 6 × 5 × 4 × 3 × 2 × 1 = 5,040; so for each of the 8,031,810,176 possible 7-letter strings you get 5,040 chances for each drawing of 7 tiles. That would mean you should get a seven-letter word on your rack once in each 1,593,613 games!
Except for one thing: I assumed a bag containing an infinite supply of tiles for each letter of the alphabet. Within real Scrabble there are tight limitations on the tile supply: You can’t draw Z Z Z Z Z Z Z out of the bag because there aren’t that many Z tiles in the bag. I’ll leave it as an exercise for the statistically inclined Scrabblophilic reader to compute the actual probability of finding a 7-letter word already on your rack at the start of a game. The only point I want to make here is that the density of words drops off extremely fast as string length goes up. Adding a few thousand new words to the ones that officially count as playable will scarcely change things at all: Scrabble will continue to be very hard. Hardly any Scrabble rack will spell out a 7-letter word.
So play on. And may you find yourself looking at E Q U A L I Z with 7 free squares above an accessible E. Running through a triple word-score square.