The image above was created using AI. More specifically, this was the third image generated by Stable Diffusion, when given the prompt “Word Direct Intersection”.
As a quick recap: last post, I covered my process in which I found a digital dictionary in text file format, split it to form individual word strings, filtered out all the words without exactly 5 letters, and counted the number of 5 letter words (15920, which is on par with most relevant dictionaries).
But the finding “best” starting word is much more convoluted than it may seem on the surface. For uniformity, I’ll use the term “Direct Intersection” to refer to letters that are congruent and are located in congruent positions within a word; comparing the words “apple” and “angle” results in three direct intersections: “a__le”. In addition, I’ll use the term “Indirect Intersection” to refer to letters that appear in both words, but are not in the same position; comparing the words “intel” and “tenet”, we have indirect intersections for “n”, 2″t”, 2″e”. Notice that despite an e being a direct intersection, it is also an indirect intersection. All direct intersections are also indirect intersections, but not necessarily vice-versa.
To even begin tackling this question, we need to develop a strategy and therefore define the question more precisely. For starters, we need to consider whether to value direct intersections more than indirect intersections. Do we need to assign point values to indirect and direct intersections? Will that make the data harder to analyze? Will that actually be reflective of how “good” the word actually is? If we do, how much more should direct intersections be worth? 2x? 5x? 10x? In addition, we also need to think about special cases: if a word like “apple” is compared to the word “pines”, should indirect intersections be counted twice? Why? Why not? What is better?
And these are all valid sub-questions when it comes to answering the big question. My philosophy surrounding starting to tackle challenges however is that simpler is better. For that reason, I started the project with the following set of rules:
- Indirect intersections count as 1 “point”
- Direct intersections are not treated any differently: they also count as 1 “point”, but not additionally to the indirect intersection already counted
- All intersections are counted, even if duplicate
- The word with the highest score is the best word (for now)
First thing’s first, I needed to call a dictionary entry for each word, and have each entry possess a score. To call this dictionary, I simply use the line:
dict = {}
This gave me a framework and repository to store and compare information about each word. Now, I needed a way to compare every word to every other word. Because I could not type 253,446,400 lines of code to test every word, I set up a system of two inlaid loops that would select two words from words_5 consequentially. These loops were:
for w in words_5:
for v in words_5:
Here, it would select the first word from
words_5, then compare it to every other word from words_5 (v). It would then select the second word from
words_5, then compare it to every other word from words_5 (v), repeating this process until every possible combination of words w and v were compared.
I then needed a way to test for indirect intersections. To do this I used a similar system of two inlaid loops to compare each of the five letters in word w to each of the five letters in word v. I did this with the following block of code.
for w in words_5:
for v in words_5:
for o in range (0,4):
for m in range (0,4):
if w[o] == v[m]:
This is effective because, in addition to going through every possible word for w and v, it also goes through every character placement within those words, by comparing every combinatory pair of 0, 1, 2, 3, and 4. This means that, for instance, if digit 3 from word w matched digit 2 from word v, anything under if w[o] == v[m]: would happen. It’s rather important to note that since digit counting in Python starts at 0 rather than 1, I made the range 0-4 instead of 1-5.
Now, to count the number of intersections, I simply needed to add 1 to a word’s score every time it intersected indirectly with other words, which I could do with the following code;
for w in words_5:
dict[w] = 0
for v in words_5:
for o in range (0,4):
for m in range (0,4):
if w[o] == v[m]:
dict[w] += 1
Using the line dict[w] = 0, I set the score for each word before its intersections were counted to 0.
Finally, to display the data, I printed the sorted dictionary with the following lines of code;
for w in sorted(dict, key=dict.get, reverse=False):
print(w, dict[w])
Resulting in this:

And upon scrolling all the way down, I found that the 5-letter words with the most intersections were:
- Aread 24400
- Areae 24400
- Areal 24400
- Arean 24400
- Arear 24400
- Areas 24400
- Reaal 24400
