Jim Vellenga logo

Palindromic Pangram

Search Space Reduction


Search Space Reduction

Since the problem is almost certainly of exponential complexity, I sought to reduce the size of the search space by some preliminary checks, both for the original file WORD.LST and for the augmented file WORDAI.LST.

Substring Reversibility

An observation:  If a character sequence appears in a palindrome, its reverse must also appear.  For example, in “lid off a daffodil”, because “offada” appears in the forward direction, “adaffo” must also appear.  The sequence and its reverse may or may not be overlapping.

This means that we can eliminate a word from the word list if it contains a sequence that cannot be constructed in reverse.  For example, the sequence “auv” occurs in the sample word list in words such as “fauve” or “mauve”.  However, no words in the word list contain the sequence “vua”.  Nor can “vua” be constructed from a juxtaposition of two words from the word list, since no words end in “vu” (in this list) and no words begin with “ua”.   For this reason, we can eliminate the words “fauve” and “mauve” from the set of words that can appear in a palindrome.

One- and Two-Letter Sequences

To check for reversibility, the program first finds all distinct one- and two-letter sequences that can appear at the beginning or end of phrases. For the two-letter sequences, the native pairs are those that occur at the beginning or end of words in the list, while constructed pairs also include one-letter words followed by a legal initial letter or preceded by a legal final letter.

File WORDAI.LST contains 2 one-letter words.

No. of distinct initial chars: 26

No. of distinct final chars: 26

No. of distinct letters in file: 26

 

Out of a possible 676 two-letter sequences...

No. of distinct two-letter words: 96

No. of native initial pairs: 318

No. of native final pairs: 361

No. of constructed initial pairs: 324

No. of constructed final pairs: 363

Three-Letter Sequence Checks

The next step was to find all three-letter sequences that were found natively in words or could be constructed from the beginnings and endings of words, and then eliminate those words that contained three-letter sequences for which the reverse could not be constructed.  The results for WORDAI.LST:

Out of a possible 17576 three-letter sequences...

No. of contiguous triples: 7047

No. of constructed triples: 12317

After removing nonreversible triples...

Number of reversible triples: 10772

 

Now removing words containing nonreversible triples...

Word list now contains 165857 words.

This a 4.4% reduction in the number of words that can appear.

Four-Letter Sequence Checks

For four-letter sequences, the program first constructs a character tree containing all the four-letter sequences that appear in single words.

It then constructs the sets of three-letter sequences, initTriples and finalTriples, containing the starting and ending triples of all words with three or more letters. It also uses the one- and two-letter initial and final sequences found above.

Then, in order to add constructible quads to the tree, the program

  • adds to initTriples:

    -        one-letter words followed by initial pairs.

    -        two-letter words followed by initial characters.

  • adds to finalTriples

    -        final characters followed by two-letter words

    -        final pairs followed by one-letter words

  • adds to the character tree as forward sequences

    -        final characters followed by members of initTriples

    -        final pairs followed by initial pairs

    -        members of finalTriples followed by initial characters

  • invalidates each four-letter sequence that is not reversible (that is, for which the reverse sequence is not also in the character tree

  • removes from the word list any words that contain a nonreversible four-letter sequence.

Results for WORDAI.LST:

Now removing words containing nonreversible quads...

Word list now contains 138244 words.

Omitting Composite Words

We can also reduce the size of the dictionary considerably by removing composite words.  We can do this because any palindrome that contains the word “mahogany,” for example, can be constructed equally well with the three-word sequence “ma hog any.”  This is particularly powerful for the words in the dictionary furnished as part of the problem because WORD.LST contains such unusual “words” as “es” and “ya,” which do not occur as such in most standard English-language dictionaries.

Effects of Word List Reductions

Initially:

File name

WORDAI.LST

WORD.LST

Number of words in list

173530

173528

Percent of original number

100%

100%

Maximum word length

28

29

Average word length

9.1

9.1

Most frequent word length

8

8

After removing nonreversible 3-letter sequences:

File name

WORDAI.LST

WORD.LST

Number of words in list

165857

165855

Percent of original number

96%

96%

Maximum word length

27

27

Average word length

9.1

9.0

Most frequent word length

8

8

After removing nonreversible 3-letter and 4-letter sequences:

File name

WORDAI.LST

WORD.LST

Number of words in list

138249

138202

Percent of original number

80%

80%

Maximum word length

25

25

Average word length

8.9

8.9

Most frequent word length

8

8

After removing nonreversible 3-letter and 4-letter sequences, as well as composites:

File name

WORDAI.LST

WORD.LST

Number of words in list

54245

60843

Percent of original number

31%

35%

Maximum word length

22

22

Average word length

8.2

8.3

Most frequent word length

8

8

A copy of the reduced word list for the original file, WORD.LST, can be found as ReducedWordList.txt. I have done a certain amount of testing of the algorithms and inspection of the results, but let me know if you find any mistakes.