A minimal pair is a pair of words differing in only one phoneme, such as bat and pat, or more and bore. A more complicated example would be the oft-quoted “tom-ay-to, tom-ah-to” (although these are really the same word, drawn from different dialects). The goal of the minimal pairs application is to be able to quickly locate examples of these for two specified phonemes.
The data is drawn from the CMU Pronouncing Dictionary, maintained by Carnegie Mellon University in Pennsylvania, USA. Their entire database is available for download, and contains words and their corresponding pronounciations in terms of phonemes, where each phoneme is drawn from the arpabet. The database has entries like the following:
TOMATO T AH0 M EY1 T OW2 TOMATO(1) T AH0 M AA1 T OW2
Here we can see the two different pronounciations of “tomato”. The digits following each vowel record the amount of stress put on that particular vowel when saying the word.
The actual database contains over 130,000 words, which is far too many to be useful for our purposes. It also contains swear words, and other undesirable words. I filtered the dictionary using some random wordlist of the top 10,000 most common English words with rude words removed. I also strip the stresses off the vowels, and translate the arpabet phonemes into (roughly) equivalent IPA, which is more useful to Australians.
Unfortunately, I can’t find a pronounciation dictionary for Australian or British English, and so there are many erroneous pairs, which would not be minimal pairs with Australian pronounciation, for example caught and lot.